#pandas
*(from the pandas webpage)* 

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
* A fast and efficient DataFrame object for data manipulation with integrated indexing;
* Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
* Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
* Flexible reshaping and pivoting of data sets;
* Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;


###Installing pandas
```
$ sudo pip install pandas
```

In [2]:
import pandas as pd
import numpy as np


In [41]:
df = pd.DataFrame()

In [42]:
df['score'] = np.abs(np.random.randn(10))

In [43]:
df['type'] = 1

In [44]:
df['name'] = ["John","Lisa"]*5

In [45]:
df["date"] = pd.date_range('20130101',periods=10)

In [46]:
df

Unnamed: 0,score,type,name,date
0,3.082851,1,John,2013-01-01
1,0.264913,1,Lisa,2013-01-02
2,0.461223,1,John,2013-01-03
3,1.325031,1,Lisa,2013-01-04
4,1.83076,1,John,2013-01-05
5,0.686858,1,Lisa,2013-01-06
6,0.385308,1,John,2013-01-07
7,0.476707,1,Lisa,2013-01-08
8,1.144944,1,John,2013-01-09
9,2.253562,1,Lisa,2013-01-10


In [108]:
df2 = pd.DataFrame(
    {'score':np.abs(np.random.randn(10)),
     'type':1,
     'name':["John","Lisa"]*5,
     'date':pd.date_range('20130101',periods=10)
    }
)

In [59]:
df2.dtypes

date     datetime64[ns]
name             object
score           float64
type              int64
dtype: object

##Viewing data
See the top & bottom rows of the frame

In [60]:
df2.head()

Unnamed: 0,date,name,score,type
0,2013-01-01,John,0.325451,1
1,2013-01-02,Lisa,0.522735,1
2,2013-01-03,John,0.414834,1
3,2013-01-04,Lisa,0.125078,1
4,2013-01-05,John,0.629536,1


In [61]:
df2.tail()

Unnamed: 0,date,name,score,type
5,2013-01-06,Lisa,0.642163,1
6,2013-01-07,John,0.856121,1
7,2013-01-08,Lisa,0.531495,1
8,2013-01-09,John,1.461605,1
9,2013-01-10,Lisa,0.026065,1


Display the index,columns, and the underlying numpy data

In [64]:
df2.index

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

In [65]:
df2.columns

Index([u'date', u'name', u'score', u'type'], dtype='object')

In [66]:
df2.values

array([[Timestamp('2013-01-01 00:00:00'), 'John', 0.3254510502498552, 1],
       [Timestamp('2013-01-02 00:00:00'), 'Lisa', 0.5227345297003183, 1],
       [Timestamp('2013-01-03 00:00:00'), 'John', 0.4148343096133987, 1],
       [Timestamp('2013-01-04 00:00:00'), 'Lisa', 0.12507802760152817, 1],
       [Timestamp('2013-01-05 00:00:00'), 'John', 0.6295359525128219, 1],
       [Timestamp('2013-01-06 00:00:00'), 'Lisa', 0.6421627378787833, 1],
       [Timestamp('2013-01-07 00:00:00'), 'John', 0.8561214745238536, 1],
       [Timestamp('2013-01-08 00:00:00'), 'Lisa', 0.5314945459587054, 1],
       [Timestamp('2013-01-09 00:00:00'), 'John', 1.4616045018386614, 1],
       [Timestamp('2013-01-10 00:00:00'), 'Lisa', 0.026064645359444586, 1]], dtype=object)

Transposing your data

In [67]:
df2.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
date,2013-01-01 00:00:00,2013-01-02 00:00:00,2013-01-03 00:00:00,2013-01-04 00:00:00,2013-01-05 00:00:00,2013-01-06 00:00:00,2013-01-07 00:00:00,2013-01-08 00:00:00,2013-01-09 00:00:00,2013-01-10 00:00:00
name,John,Lisa,John,Lisa,John,Lisa,John,Lisa,John,Lisa
score,0.3254511,0.5227345,0.4148343,0.125078,0.629536,0.6421627,0.8561215,0.5314945,1.461605,0.02606465
type,1,1,1,1,1,1,1,1,1,1


Sorting the columns

In [76]:
df2.sort_index(axis=1, ascending=False)

Unnamed: 0,type,score,name,date
0,1,0.325451,John,2013-01-01
1,1,0.522735,Lisa,2013-01-02
2,1,0.414834,John,2013-01-03
3,1,0.125078,Lisa,2013-01-04
4,1,0.629536,John,2013-01-05
5,1,0.642163,Lisa,2013-01-06
6,1,0.856121,John,2013-01-07
7,1,0.531495,Lisa,2013-01-08
8,1,1.461605,John,2013-01-09
9,1,0.026065,Lisa,2013-01-10


Sorting by values

In [77]:
df2.sort(columns='score')

Unnamed: 0,date,name,score,type
9,2013-01-10,Lisa,0.026065,1
3,2013-01-04,Lisa,0.125078,1
0,2013-01-01,John,0.325451,1
2,2013-01-03,John,0.414834,1
1,2013-01-02,Lisa,0.522735,1
7,2013-01-08,Lisa,0.531495,1
4,2013-01-05,John,0.629536,1
5,2013-01-06,Lisa,0.642163,1
6,2013-01-07,John,0.856121,1
8,2013-01-09,John,1.461605,1


In [78]:
df2.sort(['name','score'])

Unnamed: 0,date,name,score,type
0,2013-01-01,John,0.325451,1
2,2013-01-03,John,0.414834,1
4,2013-01-05,John,0.629536,1
6,2013-01-07,John,0.856121,1
8,2013-01-09,John,1.461605,1
9,2013-01-10,Lisa,0.026065,1
3,2013-01-04,Lisa,0.125078,1
1,2013-01-02,Lisa,0.522735,1
7,2013-01-08,Lisa,0.531495,1
5,2013-01-06,Lisa,0.642163,1


#Selection

In [109]:
df2['name'] 

0    John
1    Lisa
2    John
3    Lisa
4    John
5    Lisa
6    John
7    Lisa
8    John
9    Lisa
Name: name, dtype: object

In [154]:
df2[0:2]

Unnamed: 0,date,name,score,type
0,2013-01-01,John,0.895692,1
1,2013-01-02,Lisa,1.036331,1


In [159]:
df2.index = df2['date']

In [181]:
df2.iloc[0:3]

Unnamed: 0_level_0,date,name,score,type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013-01-01,2013-01-01,John,0.895692,1
2013-01-02,2013-01-02,Lisa,1.036331,1
2013-01-03,2013-01-03,John,0.14457,1


In [179]:
df2.loc['2013-01-01':'2013-01-03']

Unnamed: 0_level_0,date,name,score,type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013-01-01,2013-01-01,John,0.895692,1
2013-01-02,2013-01-02,Lisa,1.036331,1
2013-01-03,2013-01-03,John,0.14457,1


In [183]:
df2.iloc[0:3,1:3]

Unnamed: 0_level_0,name,score
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2013-01-01,John,0.895692
2013-01-02,Lisa,1.036331
2013-01-03,John,0.14457


##Boolean indexing

In [187]:
df2[df2['score'] > 1.0]

Unnamed: 0_level_0,date,name,score,type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013-01-02,2013-01-02,Lisa,1.036331,1
2013-01-05,2013-01-05,John,1.346207,1
2013-01-07,2013-01-07,John,1.027406,1
2013-01-08,2013-01-08,Lisa,2.584891,1


In [194]:
high_scores = df2['score'] > 1.0
df2[high_scores]

Unnamed: 0_level_0,date,name,score,type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013-01-02,2013-01-02,Lisa,1.036331,1
2013-01-05,2013-01-05,John,1.346207,1
2013-01-07,2013-01-07,John,1.027406,1
2013-01-08,2013-01-08,Lisa,2.584891,1


In [190]:
df2[(df2['score'] >1.0) & (df2['name']=='Lisa')]

Unnamed: 0_level_0,date,name,score,type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013-01-02,2013-01-02,Lisa,1.036331,1
2013-01-08,2013-01-08,Lisa,2.584891,1


In [191]:
df2[(df2['score'] >1.0) | (df2['name']=='Lisa')]

Unnamed: 0_level_0,date,name,score,type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013-01-02,2013-01-02,Lisa,1.036331,1
2013-01-04,2013-01-04,Lisa,0.715584,1
2013-01-05,2013-01-05,John,1.346207,1
2013-01-06,2013-01-06,Lisa,0.553321,1
2013-01-07,2013-01-07,John,1.027406,1
2013-01-08,2013-01-08,Lisa,2.584891,1
2013-01-10,2013-01-10,Lisa,0.353887,1


In [193]:
df2[df2.name.isin(['Lisa','John'])]

Unnamed: 0_level_0,date,name,score,type
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013-01-01,2013-01-01,John,0.895692,1
2013-01-02,2013-01-02,Lisa,1.036331,1
2013-01-03,2013-01-03,John,0.14457,1
2013-01-04,2013-01-04,Lisa,0.715584,1
2013-01-05,2013-01-05,John,1.346207,1
2013-01-06,2013-01-06,Lisa,0.553321,1
2013-01-07,2013-01-07,John,1.027406,1
2013-01-08,2013-01-08,Lisa,2.584891,1
2013-01-09,2013-01-09,John,0.958808,1
2013-01-10,2013-01-10,Lisa,0.353887,1


##Some statistics

In [208]:
print df2['score'].mean()
print df2['score'].std()
print df2['score'].sem()

0.961669666062
0.671758211126
0.212428598408


In [204]:
df2.groupby(['name',pd.TimeGrouper('3D')]).agg({'score':['mean','sum']})

Unnamed: 0_level_0,Unnamed: 1_level_0,score,score
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,sum
name,date,Unnamed: 2_level_2,Unnamed: 3_level_2
John,2013-01-01,0.520131,1.040262
John,2013-01-04,1.346207,1.346207
John,2013-01-07,0.993107,1.986214
Lisa,2013-01-01,1.036331,1.036331
Lisa,2013-01-04,0.634453,1.268906
Lisa,2013-01-07,2.584891,2.584891
Lisa,2013-01-10,0.353887,0.353887
