<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Index-objects-and-labeled-data" data-toc-modified-id="Index-objects-and-labeled-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Index objects and labeled data</a></span><ul class="toc-item"><li><span><a href="#pandas-Data-Structures" data-toc-modified-id="pandas-Data-Structures-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>pandas Data Structures</a></span></li><li><span><a href="#Creating-an-index" data-toc-modified-id="Creating-an-index-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Creating an index</a></span></li><li><span><a href="#Examing-an-index" data-toc-modified-id="Examing-an-index-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Examing an index</a></span></li><li><span><a href="#Modifying-index-name" data-toc-modified-id="Modifying-index-name-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Modifying index name</a></span></li><li><span><a href="#Unemployment-data" data-toc-modified-id="Unemployment-data-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Unemployment data</a></span></li><li><span><a href="#Assigning-the-index" data-toc-modified-id="Assigning-the-index-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Assigning the index</a></span></li><li><span><a href="#Removing-extra-column" data-toc-modified-id="Removing-extra-column-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Removing extra column</a></span></li><li><span><a href="#Examing-index-&amp;-columns" data-toc-modified-id="Examing-index-&amp;-columns-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Examing index &amp; columns</a></span></li><li><span><a href="#read_csv-with-index_col" data-toc-modified-id="read_csv-with-index_col-1.9"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>read_csv with index_col</a></span></li><li><span><a href="#Changing-index-of-a-DataFrame" data-toc-modified-id="Changing-index-of-a-DataFrame-1.10"><span class="toc-item-num">1.10&nbsp;&nbsp;</span>Changing index of a DataFrame</a></span></li></ul></li><li><span><a href="#Hierarchical-indexing" data-toc-modified-id="Hierarchical-indexing-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Hierarchical indexing</a></span><ul class="toc-item"><li><span><a href="#Extracting-data-with-a-MultiIndex" data-toc-modified-id="Extracting-data-with-a-MultiIndex-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Extracting data with a MultiIndex</a></span></li><li><span><a href="#Setting-&amp;-sorting-a-MultiIndex" data-toc-modified-id="Setting-&amp;-sorting-a-MultiIndex-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Setting &amp; sorting a MultiIndex</a></span></li><li><span><a href="#Indexing-multiple-levels-of-a-MultiIndex" data-toc-modified-id="Indexing-multiple-levels-of-a-MultiIndex-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Indexing multiple levels of a MultiIndex</a></span></li><li><span><a href="#Changing-index-name-labels" data-toc-modified-id="Changing-index-name-labels-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Changing index name labels</a></span></li><li><span><a href="#Building-an-index,-then-a-DataFrame" data-toc-modified-id="Building-an-index,-then-a-DataFrame-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Building an index, then a DataFrame</a></span></li><li><span><a href="#Extracting-data-with-a-MultiIndex" data-toc-modified-id="Extracting-data-with-a-MultiIndex-2.6"><span class="toc-item-num">2.6&nbsp;&nbsp;</span>Extracting data with a MultiIndex</a></span></li></ul></li></ul></div>

# Index objects and labeled data

## pandas Data Structures

 - Key building blocks  		
	 - Indexes: Sequence of labels  		
	 - Series:  1D array with Index
	 - DataFrames: 2D array with Series as columns 

 - Indexes  
	 - Immutable (Like dictionary keys)  
	 - Homogenous in data type (Like NumPy arrays)


In [1]:
import pandas as pd 
prices = [10.70, 10.86, 10.74, 10.71, 10.79]
shares = pd.Series(prices)
shares 

0    10.70
1    10.86
2    10.74
3    10.71
4    10.79
dtype: float64

## Creating an index

In [3]:
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']
shares = pd.Series(prices, index = days)
shares 

Mon     10.70
Tue     10.86
Wed     10.74
Thur    10.71
Fri     10.79
dtype: float64

## Examing an index

In [4]:
shares.index 

Index(['Mon', 'Tue', 'Wed', 'Thur', 'Fri'], dtype='object')

In [5]:
shares.index[2] 

'Wed'

In [6]:
shares.index[:2]

Index(['Mon', 'Tue'], dtype='object')

In [7]:
shares.index[-2:]

Index(['Thur', 'Fri'], dtype='object')

In [10]:
print(shares.index.name)

None


## Modifying index name

In [11]:
shares.index.name = 'weekday'
shares 

weekday
Mon     10.70
Tue     10.86
Wed     10.74
Thur    10.71
Fri     10.79
dtype: float64

In [13]:
# Index does not support mutable operations
shares.index[2] = 'Wednesday' 

TypeError: Index does not support mutable operations

In [14]:
# Index does not support mutable operations
shares.index[:4] = ['Monday', 'Tuesday', 'Wednesday', 'Thursday']

TypeError: Index does not support mutable operations

In [15]:
shares.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'] 
shares 

Monday       10.70
Tuesday      10.86
Wednesday    10.74
Thursday     10.71
Friday       10.79
dtype: float64

## Unemployment data 

In [16]:
region = pd.read_csv('gapminder_tidy.csv')
region.head()

Unnamed: 0,Country,Year,fertility,life,population,child_mortality,gdp,region
0,Afghanistan,1964,7.671,33.639,10474903.0,339.7,1182.0,South Asia
1,Afghanistan,1965,7.671,34.152,10697983.0,334.1,1182.0,South Asia
2,Afghanistan,1966,7.671,34.662,10927724.0,328.7,1168.0,South Asia
3,Afghanistan,1967,7.671,35.17,11163656.0,323.3,1173.0,South Asia
4,Afghanistan,1968,7.671,35.674,11411022.0,318.1,1187.0,South Asia


In [17]:
region.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10111 entries, 0 to 10110
Data columns (total 8 columns):
Country            10111 non-null object
Year               10111 non-null int64
fertility          10100 non-null float64
life               10111 non-null float64
population         10108 non-null float64
child_mortality    9210 non-null float64
gdp                9000 non-null float64
region             10111 non-null object
dtypes: float64(5), int64(1), object(2)
memory usage: 632.1+ KB


## Assigning the index

In [18]:
region.index = region['region'] 
region.head()

Unnamed: 0_level_0,Country,Year,fertility,life,population,child_mortality,gdp,region
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
South Asia,Afghanistan,1964,7.671,33.639,10474903.0,339.7,1182.0,South Asia
South Asia,Afghanistan,1965,7.671,34.152,10697983.0,334.1,1182.0,South Asia
South Asia,Afghanistan,1966,7.671,34.662,10927724.0,328.7,1168.0,South Asia
South Asia,Afghanistan,1967,7.671,35.17,11163656.0,323.3,1173.0,South Asia
South Asia,Afghanistan,1968,7.671,35.674,11411022.0,318.1,1187.0,South Asia


## Removing extra column

In [None]:
del region['region'] 

In [25]:
region.head() 

Unnamed: 0_level_0,Country,Year,fertility,life,population,child_mortality,gdp
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
South Asia,Afghanistan,1964,7.671,33.639,10474903.0,339.7,1182.0
South Asia,Afghanistan,1965,7.671,34.152,10697983.0,334.1,1182.0
South Asia,Afghanistan,1966,7.671,34.662,10927724.0,328.7,1168.0
South Asia,Afghanistan,1967,7.671,35.17,11163656.0,323.3,1173.0
South Asia,Afghanistan,1968,7.671,35.674,11411022.0,318.1,1187.0


## Examing index & columns

In [27]:
region.index.unique() 

Index(['South Asia', 'Europe & Central Asia', 'Middle East & North Africa',
       'Sub-Saharan Africa', 'America', 'East Asia & Pacific'],
      dtype='object', name='region')

In [28]:
region.index.name 

'region'

In [29]:
type(region.index) 

pandas.core.indexes.base.Index

In [30]:
region.columns 

Index(['Country', 'Year', 'fertility', 'life', 'population', 'child_mortality',
       'gdp'],
      dtype='object')

## read_csv with index_col

In [31]:
df_region = pd.read_csv('gapminder_tidy.csv', index_col = 'region')
df_region.head() 

Unnamed: 0_level_0,Country,Year,fertility,life,population,child_mortality,gdp
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
South Asia,Afghanistan,1964,7.671,33.639,10474903.0,339.7,1182.0
South Asia,Afghanistan,1965,7.671,34.152,10697983.0,334.1,1182.0
South Asia,Afghanistan,1966,7.671,34.662,10927724.0,328.7,1168.0
South Asia,Afghanistan,1967,7.671,35.17,11163656.0,323.3,1173.0
South Asia,Afghanistan,1968,7.671,35.674,11411022.0,318.1,1187.0


In [55]:
month_sales = pd.read_csv('month_sales.csv', index_col='Unnamed: 0')
month_sales

Unnamed: 0,eggs,salt,spam
APR,77,87.0,20
FEB,110,50.0,31
JAN,47,12.0,17
JUN,205,60.0,55
MAR,221,89.0,72
MAY,132,0.0,52


## Changing index of a DataFrame

In [56]:
# Create the list of new indexes: new_idx
new_idx = [x.upper() for x in month_sales.index]

# Assign new_idx to sales.index
month_sales.index = new_idx

# Print the sales DataFrame
print(month_sales)

     eggs  salt  spam
APR    77  87.0    20
FEB   110  50.0    31
JAN    47  12.0    17
JUN   205  60.0    55
MAR   221  89.0    72
MAY   132   0.0    52


# Hierarchical indexing

In [75]:
sales = pd.read_csv('eggs_sales.csv', index_col=['Unnamed: 0', 'Unnamed: 1']) 
sales.index.name = ['state', 'month']
sales 

Unnamed: 0,Unnamed: 1,eggs,salt,spam
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,0.0,52
TX,2,205,60.0,55


## Extracting data with a MultiIndex

In [77]:
# Print sales.loc[['CA', 'TX']]
print(sales.loc[['CA', 'TX']])

      eggs  salt  spam
CA 1    47  12.0    17
   2   110  50.0    31
TX 1   132   0.0    52
   2   205  60.0    55


In [78]:
# Print sales['CA':'TX']
print(sales['CA':'TX'])

      eggs  salt  spam
CA 1    47  12.0    17
   2   110  50.0    31
NY 1   221  89.0    72
   2    77  87.0    20
TX 1   132   0.0    52
   2   205  60.0    55


## Setting & sorting a MultiIndex

In [80]:
sales = sales.sort_index()
sales 

Unnamed: 0,Unnamed: 1,eggs,salt,spam
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,0.0,52
TX,2,205,60.0,55


## Indexing multiple levels of a MultiIndex

In [82]:
# Look up data for NY in month 1 in sales: NY_month1
NY_month1 = sales.loc['NY', 1, :]
NY_month1

Unnamed: 0,Unnamed: 1,eggs,salt,spam
NY,1,221,89.0,72


In [83]:
# Look up data for CA and TX in month 2: CA_TX_month2
CA_TX_month2 = sales.loc[('CA', 'TX'), 2, :]
CA_TX_month2 

Unnamed: 0,Unnamed: 1,eggs,salt,spam
CA,2,110,50.0,31
TX,2,205,60.0,55


In [84]:
# Access the inner month index and look up data for all states in month 2: all_month2
all_month2 = sales.loc[(slice(None), 2), :]
all_month2

Unnamed: 0,Unnamed: 1,eggs,salt,spam
CA,2,110,50.0,31
NY,2,77,87.0,20
TX,2,205,60.0,55


## Changing index name labels

In [60]:
# Assign the string 'MONTHS' to sales.index.name
month_sales.index.name = 'MONTHS'

# Print the sales DataFrame
print(month_sales)

PRODUCTS  eggs  salt  spam
MONTHS                    
APR         77  87.0    20
FEB        110  50.0    31
JAN         47  12.0    17
JUN        205  60.0    55
MAR        221  89.0    72
MAY        132   0.0    52


In [59]:
# Assign the string 'PRODUCTS' to sales.columns.name 
month_sales.columns.name = 'PRODUCTS'

# Print the sales dataframe again
print(month_sales)

PRODUCTS  eggs  salt  spam
MONTHS                    
APR         77  87.0    20
FEB        110  50.0    31
JAN         47  12.0    17
JUN        205  60.0    55
MAR        221  89.0    72
MAY        132   0.0    52


## Building an index, then a DataFrame

In [62]:
# Generate the list of months: months
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']

# Assign months to sales.index
month_sales.index = months

# Print the modified sales DataFrame
print(month_sales)

PRODUCTS  eggs  salt  spam
Jan         77  87.0    20
Feb        110  50.0    31
Mar         47  12.0    17
Apr        205  60.0    55
May        221  89.0    72
Jun        132   0.0    52


In [94]:
sales.loc[['CA', 'TX']]

Unnamed: 0,Unnamed: 1,eggs,salt,spam
CA,1,47,12.0,17
CA,2,110,50.0,31
TX,1,132,0.0,52
TX,2,205,60.0,55


In [93]:
sales['CA':'TX']

Unnamed: 0,Unnamed: 1,eggs,salt,spam
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,0.0,52
TX,2,205,60.0,55


In [92]:
sales['TX':'CA':-1]

Unnamed: 0,Unnamed: 1,eggs,salt,spam
TX,2,205,60.0,55
TX,1,132,0.0,52
NY,2,77,87.0,20
NY,1,221,89.0,72
CA,2,110,50.0,31
CA,1,47,12.0,17
