# <font color=purple>Pandas</font>
_pandas_ is a Python library for data analysis. It offers a number of data exploration, cleaning and transformation operations that are critical in working with data in Python.

_pandas_ build upon _numpy_ and _scipy_ providing easy-to-use data structures and data manipulation functions with integrated indexing.

The Main data structures _pandas_ provides are _Series_ and _DataFrames_. After a brief introduction to these two data structures and data ingestion, the key features of _pandas_ this notebook covers are:

- Generating descriptive statistics on data.
- Data cleaning using built in pandas functions.
- Frequent data operations for subsetting, filtering, insertion, deletion and aggregation of data.
- Merging multiple datasets using dataframes.
- Working with timestamps and time-series data.


In [2]:
import pandas as pd

## <font color = blue>Introduction to pandas Data structures</font>
**pandas** has two main data structures it uses, namely, Seies and DataFrames.

## <font color = purple>Pandas series</font>
pandas **Series** one_dimensional labeled array.

In [3]:
ser = pd.Series(data = [100, 'foo', 300, 'bar', 500], index=['tom','bob','nancy', 'dan', 'eric'])

In [4]:
ser

tom      100
bob      foo
nancy    300
dan      bar
eric     500
dtype: object

In [5]:
ser.index

Index(['tom', 'bob', 'nancy', 'dan', 'eric'], dtype='object')

In [6]:
ser['nancy']

300

In [7]:
ser.loc[['nancy', 'bob']] # loc - location

nancy    300
bob      foo
dtype: object

In [8]:
ser[[4, 3, 1]]

eric    500
dan     bar
bob     foo
dtype: object

In [9]:
ser.iloc[2] # i loc - index location

300

In [10]:
'bob' in ser

True

In [11]:
ser * 2

tom         200
bob      foofoo
nancy       600
dan      barbar
eric       1000
dtype: object

In [12]:
ser[['nancy','eric']] ** 2

nancy     90000
eric     250000
dtype: object

## <font color = purple>pandas Data Frames</font>
pandas **Data frames** is a 2-dimensional labeled data structure.

### create Data Frame from dictionary of Python series


In [13]:
d = {'one' : pd.Series([100., 200., 300.], index = ['apple', 'ball', 'clock']),
     'two' : pd.Series([111., 222., 333., 4444.], index=['apple', 'ball', 'cerill', 'dency'])} 

In [14]:
df = pd.DataFrame(d)

df # print in table format

Unnamed: 0,one,two
apple,100.0,111.0
ball,200.0,222.0
cerill,,333.0
clock,300.0,
dency,,4444.0


In [15]:
df.index

Index(['apple', 'ball', 'cerill', 'clock', 'dency'], dtype='object')

In [16]:
df.columns

Index(['one', 'two'], dtype='object')

In [17]:
pd.DataFrame(d, index=['dancy', 'ball', 'apple'])

Unnamed: 0,one,two
dancy,,
ball,200.0,222.0
apple,100.0,111.0


In [18]:
pd.DataFrame(d, index=['dancy', 'ball', 'apple'], columns=['two', 'one'])

Unnamed: 0,two,one
dancy,,
ball,222.0,200.0
apple,111.0,100.0


### Create DataFrame list of Python dictionaries

In [18]:
data = [{'alex' : 1, 'joe' : 2}, {'ema' : 5, 'dora' : 10, 'alice' : 20}]

In [19]:
pd.DataFrame(data)

Unnamed: 0,alex,joe,ema,dora,alice
0,1.0,2.0,,,
1,,,5.0,10.0,20.0


In [20]:
pd.DataFrame(data, index=['orange', 'red'])

Unnamed: 0,alex,joe,ema,dora,alice
orange,1.0,2.0,,,
red,,,5.0,10.0,20.0


In [21]:
pd.DataFrame(data, columns=['joe', 'dora', 'alice'])

Unnamed: 0,joe,dora,alice
0,2.0,,
1,,10.0,20.0


### Basic DataFrame operations

In [22]:
df

Unnamed: 0,one,two
apple,100.0,111.0
ball,200.0,222.0
cerill,,333.0
clock,300.0,
dency,,4444.0


In [23]:
df['one']

apple     100.0
ball      200.0
cerill      NaN
clock     300.0
dency       NaN
Name: one, dtype: float64

In [24]:
df['three'] = df['one'] * df['two']
df

Unnamed: 0,one,two,three
apple,100.0,111.0,11100.0
ball,200.0,222.0,44400.0
cerill,,333.0,
clock,300.0,,
dency,,4444.0,


In [25]:
df['flag'] = df['one'] > 250
df

Unnamed: 0,one,two,three,flag
apple,100.0,111.0,11100.0,False
ball,200.0,222.0,44400.0,False
cerill,,333.0,,False
clock,300.0,,,True
dency,,4444.0,,False


In [26]:
three = df.pop('three')
three

apple     11100.0
ball      44400.0
cerill        NaN
clock         NaN
dency         NaN
Name: three, dtype: float64

In [27]:
df

Unnamed: 0,one,two,flag
apple,100.0,111.0,False
ball,200.0,222.0,False
cerill,,333.0,False
clock,300.0,,True
dency,,4444.0,False


In [28]:
del df['two']

In [29]:
df

Unnamed: 0,one,flag
apple,100.0,False
ball,200.0,False
cerill,,False
clock,300.0,True
dency,,False


In [30]:
df.insert(2, 'copy_of_one', df['one'])
df

Unnamed: 0,one,flag,copy_of_one
apple,100.0,False,100.0
ball,200.0,False,200.0
cerill,,False,
clock,300.0,True,300.0
dency,,False,


In [31]:
df['one_upper_half'] = df['one'][:2]
df

Unnamed: 0,one,flag,copy_of_one,one_upper_half
apple,100.0,False,100.0,100.0
ball,200.0,False,200.0,200.0
cerill,,False,,
clock,300.0,True,300.0,
dency,,False,,
