## The Begining for Learning Pandas

In [6]:
import pandas as pd

### Series

A Series is a one-dimensional array like object,containing a sequence of values and index

In [7]:
obj = pd.Series([3,4,5,6])

In [8]:
obj

0    3
1    4
2    5
3    6
dtype: int64

In [11]:
obj.values

array([3, 4, 5, 6], dtype=int64)

In [12]:
obj.index

RangeIndex(start=0, stop=4, step=1)

Another way to think about Series is as a fixed length, ordered dict

### DataFrame

A DataFrame represents rectangular table of data and contains an ordered collection of columns.

In [13]:
data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada','Nevada'],
        'year':[2000,2001,2002,2001,2002,2003],
        'pop':[1.5,1.7,3.6,2.4,2.9,3.2]}
frame = pd.DataFrame(data)

In [14]:
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [15]:
frame.head()

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


Rearrange columns' order

In [16]:
pd.DataFrame(data,columns = ['year', 'state', 'pop'])

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9
5,2003,Nevada,3.2


A column can be retrieved as a Series

In [18]:
frame['state']

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

In [19]:
frame.state

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

A row can also be retrieved as a Series

In [22]:
frame.loc[1]

state    Ohio
year     2001
pop       1.7
Name: 1, dtype: object

Columns can be modified by assignent and assigning a column that don't exist will create a new column. 

In [23]:
frame['eastern'] = frame.state == 'Ohio'

In [24]:
frame

Unnamed: 0,state,year,pop,eastern
0,Ohio,2000,1.5,True
1,Ohio,2001,1.7,True
2,Ohio,2002,3.6,True
3,Nevada,2001,2.4,False
4,Nevada,2002,2.9,False
5,Nevada,2003,3.2,False


In [25]:
frame['eastern'] = 'True'

In [26]:
frame

Unnamed: 0,state,year,pop,eastern
0,Ohio,2000,1.5,True
1,Ohio,2001,1.7,True
2,Ohio,2002,3.6,True
3,Nevada,2001,2.4,True
4,Nevada,2002,2.9,True
5,Nevada,2003,3.2,True


In [27]:
del frame['eastern']

In [28]:
frame.columns

Index(['state', 'year', 'pop'], dtype='object')

In [29]:
frame.values

array([['Ohio', 2000, 1.5],
       ['Ohio', 2001, 1.7],
       ['Ohio', 2002, 3.6],
       ['Nevada', 2001, 2.4],
       ['Nevada', 2002, 2.9],
       ['Nevada', 2003, 3.2]], dtype=object)

### Index Objects

### pandas Index objects are responsible for holding the axis labels and other metadata
Index objects are immutable and make it safer to share it among data structures

In [31]:
obj = pd.Series(range(3),index = ['a','b','c'])
index = obj.index
index

Index(['a', 'b', 'c'], dtype='object')

In [32]:
index[1] = 'd'

TypeError: Index does not support mutable operations

Unlike python set, a pandas Index can contain duplicte lables. same methods like append,delete, drop...

In [34]:
dup_labels = pd.Index(['foo','foo','bar','bar'])
dup_labels

Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

In [40]:
dup_labels.append(pd.Index(['ku']))

Index(['foo', 'foo', 'bar', 'bar', 'ku'], dtype='object')

## Essential functionality
### reindexing

In [44]:
obj.reindex(['c' ,'b', 'a'])

c    2
b    1
a    0
dtype: int64

### Droping entries from an axis

In [45]:
obj.drop('c')

a    0
b    1
dtype: int64

In [46]:
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


You can drop values form the columns by passing axis = 1 or axis = 'colomns'

In [48]:
frame.drop('year', axis = 1)

Unnamed: 0,state,pop
0,Ohio,1.5
1,Ohio,1.7
2,Ohio,3.6
3,Nevada,2.4
4,Nevada,2.9
5,Nevada,3.2


Many functions, like drop, which modify, can manipulate without returning a new object

In [49]:
obj

a    0
b    1
c    2
dtype: int64