# Panda examples

First load the pandas library

In [2]:
import pandas as pd

Example of how to construct a data frame

In [61]:
df = pd.DataFrame({ 'A' : 1.,
                    'B' : pd.Timestamp('20160630'),
                    'C' : pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D' : pd.Series([1, 2, 1, 2], dtype='int32'),
                    'E' : pd.Categorical(["test", "train", "test", "train"]),
                    'F' : 'foo'})
df

Unnamed: 0,A,B,C,D,E,F
0,1.0,2016-06-30,1.0,1,test,foo
1,1.0,2016-06-30,1.0,2,train,foo
2,1.0,2016-06-30,1.0,1,test,foo
3,1.0,2016-06-30,1.0,2,train,foo


Get column names

In [5]:
var_names = list(df.columns.values)
var_names

['A', 'B', 'C', 'D', 'E', 'F']

### Selecting cuts of data
Select only the column called A, then only the columns A and C

In [60]:
df['A']

0    1.0
1    1.0
2    1.0
3    1.0
Name: A, dtype: float64

In [7]:
df[['A','C']]

Unnamed: 0,A,C
0,1.0,1.0
1,1.0,1.0
2,1.0,1.0
3,1.0,1.0


Select data by position. This is done in the format of [row,column].

In python position begin with 0 - column 0 is the first column

In [17]:
df.iloc[:,0]

0    1.0
1    1.0
2    1.0
3    1.0
Name: A, dtype: float64

In [41]:
df.iloc[0,:]

A                      1
B    2016-06-30 00:00:00
C                      1
D                      1
E                   test
F                    foo
Name: 0, dtype: object

Select data where 'D' = 2, then display only column D, E, F

In [35]:
df.loc[df.D==2]

Unnamed: 0,A,B,C,D,E,F
1,1.0,2016-06-30,1.0,2,train,foo
3,1.0,2016-06-30,1.0,2,train,foo


In [36]:
df.loc[df.D==2,['D','E','F']]

Unnamed: 0,D,E,F
1,2,train,foo
3,2,train,foo


### Joining
Create df for joins

In [54]:
df_to_join = pd.DataFrame({ 'E' : pd.Categorical(["test", "train"]),
                            'G' : pd.Categorical(["test", "train"]),
                            'H' : pd.Series([10, 20], dtype='int32')})
df_to_join

Unnamed: 0,E,G,H
0,test,test,10
1,train,train,20


If column names are the same in both data set

In [50]:
pd.merge(df, df_to_join, on ='E' )

Unnamed: 0,A,B,C,D,E,F,H
0,1.0,2016-06-30,1.0,1,test,foo,10
1,1.0,2016-06-30,1.0,1,test,foo,10
2,1.0,2016-06-30,1.0,2,train,foo,20
3,1.0,2016-06-30,1.0,2,train,foo,20


If colulmn names are different 

In [57]:
pd.merge(df, df_to_join, how='left', left_on ='E', right_on='G' )

Unnamed: 0,A,B,C,D,E_x,F,E_y,G,H
0,1.0,2016-06-30,1.0,1,test,foo,test,test,10
1,1.0,2016-06-30,1.0,2,train,foo,train,train,20
2,1.0,2016-06-30,1.0,1,test,foo,test,test,10
3,1.0,2016-06-30,1.0,2,train,foo,train,train,20


'how' can be left, right, inner, or outer join