# Pandas

In [5]:
import pandas as pd

In [6]:
import numpy as np

>Pandas provides three data structure - all of which are build on top of the NumPy array - all the data structures are value-mutable

- ***Series (1D)*** : labeled, homogenous array of immutable size
- ***DataFrames (2D)*** : labeled, heterogeneously typed, size-mutable tabular data structures
- ***Panels (3D)*** : Labeled, size-mutable array

## Series

- single-dimensional array structures that stores homogenous data
- All the elements of a Series are value-mutable and size-immutable
- Data can be of multiple data types such as ndarray, lists, constants, series, dict etc.

In [3]:
series = pd.Series()
series

Series([], dtype: object)

> series from ndarray

In [7]:
arr = np.array([10,20,30,40,50])
arr

array([10, 20, 30, 40, 50])

In [8]:
series1 = pd.Series(arr)
series1

0    10
1    20
2    30
3    40
4    50
dtype: int32

> Series from python dictionary

In [10]:
data = {'a' : 10, 'b' : 20, 'c' : 30}
data

{'a': 10, 'b': 20, 'c': 30}

In [11]:
series2 = pd.Series(data)
series2

a    10
b    20
c    30
dtype: int64

In [12]:
series1[1:4]

1    20
2    30
3    40
dtype: int32

---

## DataFrames

-  2D data structure in which data is aligned in a tabular fashion consisting of rows & columns
-  can be created using the following constructor : `pandas.DataFrame(data, index, dtype, copy)`
-  Data can be of multiple data types such as ndarray, list, constants, series, dict etc.

> Creating a DataFrame from python list

In [13]:
list1 = [10,20,30,40,50]
table = pd.DataFrame(list1)
table

Unnamed: 0,0
0,10
1,20
2,30
3,40
4,50


> Creating DataFrame from list of dictionary

In [14]:
data = [{'a' : 1, 'b' : 2}, {'a' : 3, 'b' : 4, 'c' : 5}]
data

[{'a': 1, 'b': 2}, {'a': 3, 'b': 4, 'c': 5}]

In [19]:
table = pd.DataFrame(data)
table

Unnamed: 0,a,b,c
0,1,2,
1,3,4,5.0


- `NaN` : Not a Number is stored where there is no data

> Adding indexes

In [23]:
table2 = pd.DataFrame(data, index =['first' , 'second'])
table2

Unnamed: 0,a,b,c
first,1,2,
second,3,4,5.0


> Converting dictionary of series into a DataFrame

In [24]:
data1 = {'one' : pd.Series([1,2,3], index = ['a', 'b', 'c']),
         'two' : pd.Series([1,2,3,4], index = ['a', 'b', 'c', 'd'])}
data1

{'one': a    1
 b    2
 c    3
 dtype: int64,
 'two': a    1
 b    2
 c    3
 d    4
 dtype: int64}

In [25]:
table3 = pd.DataFrame(data1)
table3

Unnamed: 0,one,two
a,1.0,1
b,2.0,2
c,3.0,3
d,,4


- the resultant index is the union of all the series indexes passed

> Addition of columns: 

In [26]:
table3['three'] = pd.Series([10,20,30], index = ['a', 'b', 'c'])
table3

Unnamed: 0,one,two,three
a,1.0,1,10.0
b,2.0,2,20.0
c,3.0,3,30.0
d,,4,


> Deleting table columns 

In [27]:
del table3['two']
table3

Unnamed: 0,one,three
a,1.0,10.0
b,2.0,20.0
c,3.0,30.0
d,,


In [28]:
table3.pop('three')
table3

Unnamed: 0,one
a,1.0
b,2.0
c,3.0
d,


> Addition of rows

- selecting row : `loc()` function , passing *row label*

In [29]:
table3.loc['c']

one    3.0
Name: c, dtype: float64

- selecting row : `iloc()`, passing row index

In [35]:
table3.iloc[2]

one    3.0
Name: c, dtype: float64

In [32]:
data2 = {'one':pd.Series([1,2,3], index = ['a', 'b', 'c']),
        'two':pd.Series([1,2,3,4], index = ['a', 'b', 'c', 'd'])}
data2

{'one': a    1
 b    2
 c    3
 dtype: int64,
 'two': a    1
 b    2
 c    3
 d    4
 dtype: int64}

In [34]:
table4 = pd.DataFrame(data2)
table4

Unnamed: 0,one,two
a,1.0,1
b,2.0,2
c,3.0,3
d,,4


In [37]:
table4['three'] = pd.Series([10,20,30], index = ['a', 'b', 'c'])
table4

Unnamed: 0,one,two,three
a,1.0,1,10.0
b,2.0,2,20.0
c,3.0,3,30.0
d,,4,


> Using `concat` funtion to add multiple rows

In [41]:
row = pd.DataFrame([[11,13],[17,19]], columns = ['two', 'three'])
table5= pd.concat([table4,row])
table5

Unnamed: 0,one,two,three
a,1.0,1,10.0
b,2.0,2,20.0
c,3.0,3,30.0
d,,4,
0,,11,13.0
1,,17,19.0


> `drop` to remove rows , by passing label of rows to be removed

In [42]:
table6 = table5.drop('a')
table6

Unnamed: 0,one,two,three
b,2.0,2,20.0
c,3.0,3,30.0
d,,4,
0,,11,13.0
1,,17,19.0


---

## File reading

In [58]:
table_csv = pd.read_csv('./data/Cars2015.csv')
table_csv

Unnamed: 0,Make,Model,Type,LowPrice,HighPrice,Drive,CityMPG,HwyMPG,FuelCap,Length,Width,Wheelbase,Height,UTurn,Weight,Acc030,Acc060,QtrMile,PageNum,Size
0,Chevrolet,Spark,Hatchback,12.270,25.560,FWD,30,39,9.0,145,63,94,61,34,2345,4.4,12.8,19.4,123,Small
1,Hyundai,Accent,Hatchback,14.745,17.495,FWD,28,37,11.4,172,67,101,57,37,2550,3.7,10.3,17.8,148,Small
2,Kia,Rio,Sedan,13.990,18.290,FWD,28,36,11.3,172,68,101,57,37,2575,3.5,9.5,17.3,163,Small
3,Mitsubishi,Mirage,Hatchback,12.995,15.395,FWD,37,44,9.2,149,66,97,59,32,2085,4.4,12.1,19.0,188,Small
4,Nissan,Versa Note,Hatchback,14.180,17.960,FWD,31,40,10.9,164,67,102,61,37,2470,4.0,10.9,18.2,196,Small
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105,Toyoto,Sequioa,7Pass,44.395,64.320,RWD,12,18,26.4,205,80,122,75,42,6025,2.7,7.1,15.6,214,Large
106,Nissan,Pathfinder,7Pass,29.510,43.100,FWD,19,25,19.5,192,73,112,72,40,4505,3.2,7.7,16.0,193,Midsized
107,Acura,MDX,7Pass,42.865,57.080,FWD,18,27,19.5,194,77,111,68,40,4200,3.0,7.2,15.6,98,Midsized
108,Hyundai,Santa Fe,7Pass,30.150,36.000,FWD,18,24,19.0,193,74,110,67,39,4210,3.0,7.6,16.1,151,Midsized
