# Introduction to Pandas

Pandas is a high-level data manipulation tool developed by Wes McKinney.
It is built on the Numpy package and its key data structure is called the DataFrame.

----

## Pandas DataFrames

DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.

Use the following import convention:

In [3]:
>>> import pandas as pd

### Pandas Data Structures

#### Series
A one-dimensional labeled array capable of holding any data type 

In [6]:
>>> s = pd.Series([3, -5, 7, 4],  index=['a',  'b',  'c',  'd'])

print(s)


a    3
b   -5
c    7
d    4
dtype: int64


#### DataFrame
A two-dimensional labeled data structure with columns of potentially different types

In [7]:
>>> data = {'Country': ['Belgium',  'India',  'Brazil'],

'Capital': ['Brussels',  'New Delhi',  'Brasilia'],

'Population': [11190846, 1303171035, 207847528]}
>>> df = pd.DataFrame(data,columns=['Country',  'Capital',  'Population'])

print(df)

   Country    Capital  Population
0  Belgium   Brussels    11190846
1    India  New Delhi  1303171035
2   Brazil   Brasilia   207847528


Note that the first column 1,2,3 is the index and Country,Capital,Population are the Columns. 

#### Asking For Help

help(pd.Series.loc)

### I/O
#### Read and Write to CSV

pd.read_csv('file.csv', header=None, nrows=5)

pd.to_csv('myDataFrame.csv')

#### Read multiple sheets from the same file

xlsx = pd.ExcelFile('file.xls')

df = pd.read_excel(xlsx,  'Sheet1')

#### Read and Write to Excel

pd.read_excel('file.xlsx')

pd.to_excel('dir/myDataFrame.xlsx',  sheet_name='Sheet1')

#### Read and Write to SQL Query or Database Table

(read_sql()is a convenience wrapper around read_sql_table() and read_sql_query())

from sqlalchemy import create_engine

engine = create_engine('sqlite:///:memory:')

pd.read_sql(SELECT * FROM my_table;, engine)

pd.read_sql_table('my_table', engine)

pd.read_sql_query(SELECT * FROM my_table;', engine)


pd.to_sql('myDf', engine)