# The Begining for Learning Pandas

In [45]:
import pandas as pd
import numpy as np

## Series

A Series is a one-dimensional array like object,containing a sequence of **values and index**

In [2]:
obj = pd.Series([3,4,5,6])

In [3]:
obj

0    3
1    4
2    5
3    6
dtype: int64

In [4]:
obj.values

array([3, 4, 5, 6], dtype=int64)

In [5]:
obj.index

RangeIndex(start=0, stop=4, step=1)

Another way to think about Series is as a fixed length, ordered **dict**

## DataFrame

A DataFrame represents rectangular table of data and contains an ordered collection of columns.

In [6]:
data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada','Nevada'],
        'year':[2000,2001,2002,2001,2002,2003],
        'pop':[1.5,1.7,3.6,2.4,2.9,3.2]}
frame = pd.DataFrame(data)

In [7]:
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [8]:
frame.head()

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


Rearrange columns' order

In [9]:
pd.DataFrame(data,columns = ['year', 'state', 'pop'])

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9
5,2003,Nevada,3.2


A column can be retrieved as a Series

In [10]:
frame['state']

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

In [11]:
frame.state

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

A row can also be retrieved as a Series

In [12]:
frame.loc[1]

state    Ohio
year     2001
pop       1.7
Name: 1, dtype: object

Columns can be modified by assignent and assigning a column that don't exist will create a new column. 

In [13]:
frame['eastern'] = frame.state == 'Ohio'

In [14]:
frame

Unnamed: 0,state,year,pop,eastern
0,Ohio,2000,1.5,True
1,Ohio,2001,1.7,True
2,Ohio,2002,3.6,True
3,Nevada,2001,2.4,False
4,Nevada,2002,2.9,False
5,Nevada,2003,3.2,False


In [15]:
frame['eastern'] = 'True'

In [16]:
frame

Unnamed: 0,state,year,pop,eastern
0,Ohio,2000,1.5,True
1,Ohio,2001,1.7,True
2,Ohio,2002,3.6,True
3,Nevada,2001,2.4,True
4,Nevada,2002,2.9,True
5,Nevada,2003,3.2,True


In [17]:
del frame['eastern']

In [27]:
frame.columns

Index(['state', 'year', 'pop'], dtype='object')

In [28]:
frame.values

array([['Ohio', 2000, 1.5],
       ['Ohio', 2001, 1.7],
       ['Ohio', 2002, 3.6],
       ['Nevada', 2001, 2.4],
       ['Nevada', 2002, 2.9],
       ['Nevada', 2003, 3.2]], dtype=object)

## Index Objects

### pandas Index objects are responsible for holding the axis labels and other metadata
>Index objects are immutable and make it safer to share it among data structures

In [29]:
obj = pd.Series(range(3),index = ['a','b','c'])
index = obj.index
index

Index(['a', 'b', 'c'], dtype='object')

In [30]:
index[1] = 'd'

TypeError: Index does not support mutable operations

Unlike python set, a pandas Index can contain duplicte lables. same methods like append,delete, drop...

In [31]:
dup_labels = pd.Index(['foo','foo','bar','bar'])
dup_labels

Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

In [32]:
dup_labels.append(pd.Index(['ku']))

Index(['foo', 'foo', 'bar', 'bar', 'ku'], dtype='object')

## Essential functionality
### reindexing

In [33]:
obj.reindex(['c' ,'b', 'a'])

c    2
b    1
a    0
dtype: int64

### Droping entries from an axis

In [34]:
obj.drop('c')

a    0
b    1
dtype: int64

In [35]:
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


You can drop values form the columns by passing **axis = 1 or axis = 'colomns'**

In [36]:
frame.drop('year', axis = 1)

Unnamed: 0,state,pop
0,Ohio,1.5
1,Ohio,1.7
2,Ohio,3.6
3,Nevada,2.4
4,Nevada,2.9
5,Nevada,3.2


Many functions, like drop, which modify, can manipulate without returning a new object

In [38]:
obj

a    0
b    1
c    2
dtype: int64

In [39]:
obj['b':'c'] = 3

In [40]:
obj

a    0
b    3
c    3
dtype: int64

Slicing with labels is different from normal Python

In [46]:
data = pd.DataFrame(np.arange(16).reshape((4,4)), index = ['Ohio', 'Colorado', 'Utah', 'New York'], columns = ['one','two','three','four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [47]:
data[['three','one']]

Unnamed: 0,three,one
Ohio,2,0
Colorado,6,4
Utah,10,8
New York,14,12


In [48]:
data[:2]

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7


row selection: syntax data[:2] as a convenience  
column celection: a singer element or a list to the [] oprator

In [50]:
data < 5

Unnamed: 0,one,two,three,four
Ohio,True,True,True,True
Colorado,True,False,False,False
Utah,False,False,False,False
New York,False,False,False,False


### Selection with loc and iloc

>loc is for **axis** labels and iloc is for **integer** positioin

In [51]:
data.loc['Colorado',['two','three']]

two      5
three    6
Name: Colorado, dtype: int32

In [52]:
data.iloc[[1,2],[3,0,1]]

Unnamed: 0,four,one,two
Colorado,7,4,5
Utah,11,8,9
