<p style="font-family: Arial; font-size:1.75em;color:purple; font-style:bold">Pandas</p>

!!The main data structures *pandas* provides are *Series* and *DataFrames*.

We can ingest data from a variety of sources using pandas.
There're functions to achieve this:

1. read_csv (csv - comma seperated values)
input: path to the csv file
output: Pandas DataFrame object containing contents of the file
   
2. read_json (json - JavaScript object notation)
input: path to a json fil or a valid json string
output: Pandas DataFrame or Series object containing contents of the file
   
3. read_html
input: URL or a file or a row HTML string
output: a list of pandas DataFrames
   
4. read_sql_query (SQL - Structured Query Language)
input: SQL query
output: Pandas DataFrame object containing contents of the file
(and many more files)

Additional Recommended Resources:
* *pandas* Documentation: http://pandas.pydata.org/pandas-docs/stable/
* *Python for Data Analysis* by Wes McKinney
* *Python Data Science Handbook* by Jake VanderPlas

In [6]:
import pandas as pd

<p style="font-family: Arial; font-size:1.5em;color:#2462C0; font-style:bold">
pandas Series</p>
*pandas Series* one-dimensional labeled array. 

In [None]:
ser = pd.Series([100, 'foo', 300, 'bar', 500], ['tom', 'bob', 'nancy', 'dan', 'eric'])
ser

In [None]:
ser.index

In [None]:
ser.loc[['nancy','bob']]

In [None]:
#Series are ordered data type

ser[[4, 3, 1]]

In [None]:
ser.iloc[2]

In [None]:
'bob' in ser

In [None]:
ser * 2

In [None]:
ser[['nancy', 'eric']] ** 2

<p style="font-family: Arial; font-size:1.25em;color:#2462C0; font-style:bold">
pandas DataFrame</p>
*pandas DataFrame* is a 2-dimensional labeled data structure.

<p style="font-family: Arial; font-size:1.1em;color:#2462C0; font-style:bold">
Create DataFrame from dictionary of Python Series</p>

In [7]:
d = {'one' : pd.Series([100., 200., 300.], index=['apple', 'ball', 'clock']),
     'two' : pd.Series([111., 222., 333., 4444.], index=['apple', 'ball', 'cerill', 'dancy'])}

In [9]:
df = pd.DataFrame(d)
print(df)
df

          one     two
apple   100.0   111.0
ball    200.0   222.0
cerill    NaN   333.0
clock   300.0     NaN
dancy     NaN  4444.0


Unnamed: 0,one,two
apple,100.0,111.0
ball,200.0,222.0
cerill,,333.0
clock,300.0,
dancy,,4444.0


In [None]:
df.index

In [None]:
df.columns

In [10]:
pd.DataFrame(d, index=['dancy', 'ball', 'apple'])

Unnamed: 0,one,two
dancy,,4444.0
ball,200.0,222.0
apple,100.0,111.0


In [11]:
pd.DataFrame(d, index=['dancy', 'ball', 'apple'], columns=['two', 'five'])

Unnamed: 0,two,five
dancy,4444.0,
ball,222.0,
apple,111.0,


<p style="font-family: Arial; font-size:1.1em;color:#2462C0; font-style:bold">
Create DataFrame from list of Python dictionaries</p>

In [12]:
data = [{'alex': 1, 'joe': 2}, {'ema': 5, 'dora': 10, 'alice': 20}]

In [13]:
pd.DataFrame(data)

Unnamed: 0,alex,alice,dora,ema,joe
0,1.0,,,,2.0
1,,20.0,10.0,5.0,


In [14]:
pd.DataFrame(data, index=['orange', 'red'])

Unnamed: 0,alex,alice,dora,ema,joe
orange,1.0,,,,2.0
red,,20.0,10.0,5.0,


In [15]:
pd.DataFrame(data, columns=['joe', 'dora','alice'])

Unnamed: 0,joe,dora,alice
0,2.0,,
1,,10.0,20.0


<p style="font-family: Arial; font-size:1.1em;color:#2462C0; font-style:bold">
Basic DataFrame operations</p>

In [16]:
df

Unnamed: 0,one,two
apple,100.0,111.0
ball,200.0,222.0
cerill,,333.0
clock,300.0,
dancy,,4444.0


In [17]:
df['one']

apple     100.0
ball      200.0
cerill      NaN
clock     300.0
dancy       NaN
Name: one, dtype: float64

In [18]:
df['three'] = df['one'] * df['two']
df

Unnamed: 0,one,two,three
apple,100.0,111.0,11100.0
ball,200.0,222.0,44400.0
cerill,,333.0,
clock,300.0,,
dancy,,4444.0,


In [19]:
df['flag'] = df['one'] > 250
df

Unnamed: 0,one,two,three,flag
apple,100.0,111.0,11100.0,False
ball,200.0,222.0,44400.0,False
cerill,,333.0,,False
clock,300.0,,,True
dancy,,4444.0,,False


In [20]:
three = df.pop('three')

In [21]:
three

apple     11100.0
ball      44400.0
cerill        NaN
clock         NaN
dancy         NaN
Name: three, dtype: float64

In [22]:
#column three is cutted
df

Unnamed: 0,one,two,flag
apple,100.0,111.0,False
ball,200.0,222.0,False
cerill,,333.0,False
clock,300.0,,True
dancy,,4444.0,False


In [23]:
del df['two']

In [24]:
#column two is deleteed permanently
df

Unnamed: 0,one,flag
apple,100.0,False
ball,200.0,False
cerill,,False
clock,300.0,True
dancy,,False


In [25]:
df.insert(2, 'copy_of_one', df['one'])
df

Unnamed: 0,one,flag,copy_of_one
apple,100.0,False,100.0
ball,200.0,False,200.0
cerill,,False,
clock,300.0,True,300.0
dancy,,False,


In [26]:
df['one_upper_half'] = df['one'][:2]
df

Unnamed: 0,one,flag,copy_of_one,one_upper_half
apple,100.0,False,100.0,100.0
ball,200.0,False,200.0,200.0
cerill,,False,,
clock,300.0,True,300.0,
dancy,,False,,
