# Working with Indexies 

In [31]:
import pandas as pd

In [32]:
bond = pd.read_csv("~/Projects/DataAnalysisPandas/Data/jamesbond.csv")
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


## The .set_index() and .reset_index() Methods
The standard procedure in pandas is to add a numeric index starting at 0. We'll call the .set_index() and .reset_index() methods to alter the index of a DataFrame.

In [33]:
# set index column 
bond.set_index('Film',inplace=True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [34]:
# to reset and remove the current index use the drop parameter
# the main difference between .set_index() and .reset_index():
# .set_index() will not return a column that was set as an index back to the dataframe,
# where as the .reset_index() will return the column to the dataframe
bond.reset_index(drop = False, inplace=True)
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


## Retrieve Rows by Index Label with .loc[] 
One or more rows can be extracted from a DataFrame based on index position or index labels. We'll use the .loc[] method to retrieve rows based on index label.

In [35]:
# set film column to index 
bond.set_index('Film',inplace=True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [36]:
# sort index
bond.sort_index(inplace=True)

In [37]:
# .loc[] returns a series if only one row is found. Otherwise it will return a dataframe.
# if .loc[] cannot find a row, an error will return for only lookup on one value
bond.loc['Goldfinger']

Year                         1964
Actor                Sean Connery
Director             Guy Hamilton
Box Office                  820.4
Budget                       18.6
Bond Actor Salary             3.2
Name: Goldfinger, dtype: object

In [38]:
bond.loc['Casino Royale']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [39]:
# extract multiple sequential rows (slice the dataframe) 
# remember using a string makes the first and last value inclusive 
bond.loc['GoldenEye':'Moonraker']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,


In [40]:
# extract multiple non-sequential rows
bond.loc[['Octopussy','Moonraker']]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Octopussy,1983,Roger Moore,John Glen,373.8,53.9,7.8
Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,


In [41]:
# be very careful when extracting mulitple values, because if a row that does not exist
# the value will be added to the dataframe 
bond.loc[['Octopussy','Moonraker', 'Gold Test']]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Octopussy,1983.0,Roger Moore,John Glen,373.8,53.9,7.8
Moonraker,1979.0,Roger Moore,Lewis Gilbert,535.0,91.5,
Gold Test,,,,,,


## Retrieve Rows by Index Position with .iloc[]
One or more rows can be extracted from a DataFrame based on index position or index labels. We'll use the .iloc[] method to retrieve rows based on index position.

In [42]:
bond.reset_index(inplace=True)
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
1,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
2,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
3,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
4,Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [43]:
bond.iloc[15]

Film                 On Her Majesty's Secret Service
Year                                            1969
Actor                                 George Lazenby
Director                               Peter R. Hunt
Box Office                                     291.5
Budget                                          37.3
Bond Actor Salary                                0.6
Name: 15, dtype: object

In [45]:
# remember if you use intergers as indexs the upper bond will be exclusive
bond.iloc[1:4]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
1,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
2,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
3,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


In [46]:
bond.set_index('Film', inplace=True)
bond.sort_index(inplace=True)
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [47]:
# even though you don't see the index value, string indexies still have numeric indexes
bond.iloc[:4]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


## The Catch-All .ix[] Method
Use the .ix[] method to retrieve DataFrame rows based on either index label or index position. This is a catch-all method that combines the best features of the .loc[] and .iloc[] methods.

In [48]:
# works like the .loc[] method
bond.ix['GoldenEye']

Year                            1995
Actor                 Pierce Brosnan
Director             Martin Campbell
Box Office                     518.5
Budget                          76.9
Bond Actor Salary                5.1
Name: GoldenEye, dtype: object

In [49]:
# works like the .iloc[] method
bond.ix[10]

Year                           1989
Actor                Timothy Dalton
Director                  John Glen
Box Office                    250.9
Budget                         56.7
Bond Actor Salary               7.9
Name: Licence to Kill, dtype: object

## Second Arguments to .loc[], .iloc[], and .ix[] Methods 
The .loc[], .iloc[], and .ix[] methods can take second arguments to specify the column(s) that should be extracted. We'll practice extracting movies from our dataset with this syntax.

In [50]:
# using the .loc[] method
bond.loc['Moonraker', ['Actor','Budget','Year']]

Actor     Roger Moore
Budget           91.5
Year             1979
Name: Moonraker, dtype: object

In [53]:
# using the .iloc[] method
bond.iloc[14, 2:5]

Director      John Glen
Box Office        373.8
Budget             53.9
Name: Octopussy, dtype: object

In [54]:
# using the .ix[] method (mix and match the .loc and .iloc)
bond.ix[20, ['Budget', 'Year']]

Budget    27.7
Year      1974
Name: The Man with the Golden Gun, dtype: object

## Set New Values for a Specific Cell or Row 
How to assign a new value to one cell in a DataFrame. We first extract the cell value by using the .ix[] method with a row and column argument, then reset its value with the assignment operator (=)

In [56]:
# set a new actor value for the Dr. No movie
bond.ix['Dr. No','Actor'] = 'Adam'
bond.ix['Dr. No']

Year                          1962
Actor                         Adam
Director             Terence Young
Box Office                   448.8
Budget                           7
Bond Actor Salary              0.6
Name: Dr. No, dtype: object

In [58]:
# set new values for multiple columns
bond.ix['Dr. No',['Box Office','Budget']] = [25, 1000000]
bond.ix['Dr. No']

Year                          1962
Actor                         Adam
Director             Terence Young
Box Office                      25
Budget                       1e+06
Bond Actor Salary              0.6
Name: Dr. No, dtype: object

## Set Multiple Values in DataFrame
We can assign a new value to multiple cells in a DataFrame. We'll use the .ix[] method to extract a subset from a DataFrame, then reassign all column values in that subset.

In [60]:
# you can use a boolean assignment to create a filtered view of your dataframe
# it doesn't create a seperate dataframe or copy - it's just a view (reference)
# Best practice when assigning multiple values!! 
extract = bond['Actor'] == 'Sean Connery'
bond.ix[extract]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4


In [62]:
# assign mulitple rows 
bond.ix[extract, 'Actor'] = 'Sir Sean Connery'
bond.ix[extract]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
From Russia with Love,1963,Sir Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sir Sean Connery,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Sir Sean Connery,Irvin Kershner,380.0,86.0,
Thunderball,1965,Sir Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sir Sean Connery,Lewis Gilbert,514.2,59.9,4.4


## Rename Index Labels or Columns in a DataFrame
Call the .rename() method on a DataFrame to change the names of the index labels or column names. The method takes an argument of a Python dictionary where the key represents the current column name and the value represents the new column name. We'll also discuss an alternative syntax  (the .columns attribute) for changing the column names.