# Pandas Basics

## Index access

In [12]:
import pandas as pd

df = pd.read_json('data/nwinners.json')
df.head()

Unnamed: 0,born_in,category,country,date_of_birth,gender,link,name,place_of_birth,place_of_death,text,year
0,,Physics,Austria,25 April 1900,male,http://en.wikipedia.org/wiki/Wolfgang_Pauli,Wolfgang Pauli,Vienna,Zurich,"Wolfgang Pauli , Physics, 1945",1945
1,Austria,Chemistry,,3 December 1900,male,http://en.wikipedia.org/wiki/Richard_Kuhn,Richard Kuhn *,Vienna,Heidelberg,"Richard Kuhn *, Chemistry, 1938",1938
2,,Physiology or Medicine,Australia,27 January 1903,male,http://en.wikipedia.org/wiki/John_Eccles_(neur...,John Carew Eccles,Melbourne,Locarno,"John Carew Eccles , Physiology or Medicine, 1963",1963
3,,Physiology or Medicine,Australia,3 September 1899,male,http://en.wikipedia.org/wiki/Frank_Macfarlane_...,Sir Frank Macfarlane Burnet,Traralgon,Melbourne,"Sir Frank Macfarlane Burnet , Physiology or Me...",1960
4,,Physiology or Medicine,Australia,24 September 1898,male,http://en.wikipedia.org/wiki/Howard_Florey,Howard Florey,Adelaide,Oxford,"Howard Florey , Physiology or Medicine, 1945",1945


Initially, Pandas DataFrames are indexed by a columns property, which is a Panda index instance.   DataFrame rows also initially have a single numeric index (Pandas can have multiple indexes, including string or datetime indices, if necessary) which is called by the index property

In [13]:
df.columns

Index([u'born_in', u'category', u'country', u'date_of_birth', u'gender',
       u'link', u'name', u'place_of_birth', u'place_of_death', u'text',
       u'year'],
      dtype='object')

In [14]:
df.index

Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068],
           dtype='int64', length=1069)

Often, to aid selections, a column of the dataframe will be se to the index via the set_index method

In [15]:
df = df.set_index('name')
df.loc['Albert Einstein']

Unnamed: 0_level_0,born_in,category,country,date_of_birth,gender,link,place_of_birth,place_of_death,text,year
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Albert Einstein,,Physics,Switzerland,14 March 1879,male,http://en.wikipedia.org/wiki/Albert_Einstein,Ulm,Princeton,"Albert Einstein , born in Germany , Physics, ...",1921
Albert Einstein,,Physics,Germany,14 March 1879,male,http://en.wikipedia.org/wiki/Albert_Einstein,Ulm,Princeton,"Albert Einstein , Physics, 1921",1921


In [16]:
df.reset_index()
pass

In [17]:
df.iloc[2]

born_in                                                            
category                                     Physiology or Medicine
country                                                   Australia
date_of_birth                                       27 January 1903
gender                                                         male
link              http://en.wikipedia.org/wiki/John_Eccles_(neur...
place_of_birth                                            Melbourne
place_of_death                                              Locarno
text               John Carew Eccles , Physiology or Medicine, 1963
year                                                           1963
Name: John Carew Eccles, dtype: object

You can get a column with dot notation or conventional array access by keyword string

In [18]:
gender_col = df['gender']  # or df.gender
gender_col.head()

name
Wolfgang Pauli                 male
Richard Kuhn *                 male
John Carew Eccles              male
Sir Frank Macfarlane Burnet    male
Howard Florey                  male
Name: gender, dtype: object

## Grouping

To select groups (or subsets of rows) and return a new, filtered dataframe, use groupby

In [19]:
df = df.groupby('category')
df.groups.keys()

[u'',
 u'Physiology or Medicine',
 u'Literature',
 u'Economics',
 u'Peace',
 u'Chemistry',
 u'Physics']

In [20]:
phy_group = df.get_group('Physics')
phy_group.head()

Unnamed: 0_level_0,born_in,category,country,date_of_birth,gender,link,place_of_birth,place_of_death,text,year
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Wolfgang Pauli,,Physics,Austria,25 April 1900,male,http://en.wikipedia.org/wiki/Wolfgang_Pauli,Vienna,Zurich,"Wolfgang Pauli , Physics, 1945",1945
Arthur H. Compton,,Physics,United States,10 September 1892,male,http://en.wikipedia.org/wiki/Arthur_H._Compton,Wooster,Berkeley,"Arthur H. Compton , Physics, 1927",1927
Robert A. Millikan,,Physics,United States,22 March 1868,male,http://en.wikipedia.org/wiki/Robert_A._Millikan,Morrison,San Marino,"Robert A. Millikan , Physics, 1923",1923
Albert A. Michelson,,Physics,United States,19 December 1852,male,http://en.wikipedia.org/wiki/Albert_A._Michelson,Province of Posen,Pasadena,"Albert A. Michelson , born in then Germany, n...",1907
Ernest Lawrence,,Physics,United States,8 August 1901,male,http://en.wikipedia.org/wiki/Ernest_Lawrence,Canton,Palo Alto,"Ernest Lawrence , Physics, 1939",1939


Another way to do this is with boolean masks (like numpy and R)

In [21]:
df = pd.read_json('data/nwinners.json')
phy_group = df[df.category == 'Physics']
phy_group.head()

Unnamed: 0,born_in,category,country,date_of_birth,gender,link,name,place_of_birth,place_of_death,text,year
0,,Physics,Austria,25 April 1900,male,http://en.wikipedia.org/wiki/Wolfgang_Pauli,Wolfgang Pauli,Vienna,Zurich,"Wolfgang Pauli , Physics, 1945",1945
11,,Physics,United States,10 September 1892,male,http://en.wikipedia.org/wiki/Arthur_H._Compton,Arthur H. Compton,Wooster,Berkeley,"Arthur H. Compton , Physics, 1927",1927
13,,Physics,United States,22 March 1868,male,http://en.wikipedia.org/wiki/Robert_A._Millikan,Robert A. Millikan,Morrison,San Marino,"Robert A. Millikan , Physics, 1923",1923
25,,Physics,United States,19 December 1852,male,http://en.wikipedia.org/wiki/Albert_A._Michelson,Albert A. Michelson,Province of Posen,Pasadena,"Albert A. Michelson , born in then Germany, n...",1907
29,,Physics,United States,8 August 1901,male,http://en.wikipedia.org/wiki/Ernest_Lawrence,Ernest Lawrence,Canton,Palo Alto,"Ernest Lawrence , Physics, 1939",1939
