# Index objects and labeled data


## Index values and names


## Changing index of a DataFrame


As you saw in the previous exercise, indexes are immutable objects. This means that if you want to change or modify the index in a DataFrame, then you need to change the whole index. You will do this now, using a list comprehension to create the new index.

A list comprehension is a succinct way to generate a list in one line. For example, the following list comprehension generates a list that contains the cubes of all numbers from 0 to 9: cubes = [i**3 for i in range(10)]. This is equivalent to the following code:

cubes = []
for i in range(10):
    cubes.append(i**3)
Before getting started, print the sales DataFrame in the IPython Shell and verify that the index is given by month abbreviations containing lowercase characters.

In [6]:
import pandas as pd

sales_dict = {'eggs': {'Apr': 77,
  'Feb': 110,
  'Jan': 47,
  'Jun': 205,
  'Mar': 221,
  'May': 132},
 'salt': {'Apr': 87.0,
  'Feb': 50.0,
  'Jan': 12.0,
  'Jun': 60.0,
  'Mar': 89.0,
  'May': 'nan'},
 'spam': {'Apr': 20, 'Feb': 31, 'Jan': 17, 'Jun': 55, 'Mar': 72, 'May': 52}}

sales = pd.DataFrame(sales_dict)

In [7]:
# Create the list of new indexes: new_idx
new_idx = [month.upper() for month in sales.index]

# Assign new_idx to sales.index
sales.index = new_idx

# Print the sales DataFrame
print(sales)

     eggs salt  spam
APR    77   87    20
FEB   110   50    31
JAN    47   12    17
JUN   205   60    55
MAR   221   89    72
MAY   132  nan    52


## Changing index name labels


In [8]:
# Assign the string 'MONTHS' to sales.index.name
sales.index.name = 'MONTHS'

# Print the sales DataFrame
print(sales)

# Assign the string 'PRODUCTS' to sales.columns.name 
sales.columns.name = 'PRODUCTS'

# Print the sales dataframe again
print(sales)

        eggs salt  spam
MONTHS                 
APR       77   87    20
FEB      110   50    31
JAN       47   12    17
JUN      205   60    55
MAR      221   89    72
MAY      132  nan    52
PRODUCTS  eggs salt  spam
MONTHS                   
APR         77   87    20
FEB        110   50    31
JAN         47   12    17
JUN        205   60    55
MAR        221   89    72
MAY        132  nan    52


## Building an index, then a DataFrame


In [None]:
sales = sales.reset_index().drop('MONTHS', axis='columns')

In [22]:
# Generate the list of months: months
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']

# Assign months to sales.index
sales.index = months

# Print the modified sales DataFrame
print(sales)

PRODUCTS  eggs salt  spam
Jan         77   87    20
Feb        110   50    31
Mar         47   12    17
Apr        205   60    55
May        221   89    72
Jun        132  nan    52


# Hierarchical indexing


## Extracting data with a MultiIndex


In [29]:
filename = 'datasets/sales.csv'

sales = pd.read_csv(filename, index_col=[0,1])

In [30]:
# Print sales.loc[['CA', 'TX']]
print(sales.loc[['CA', 'TX']])

# Print sales['CA':'TX']
print(sales['CA':'TX'])

             eggs  salt  spam
state month                  
CA    1        47  12.0    17
      2       110  50.0    31
TX    1       132   NaN    52
      2       205  60.0    55
             eggs  salt  spam
state month                  
CA    1        47  12.0    17
      2       110  50.0    31
NY    1       221  89.0    72
      2        77  87.0    20
TX    1       132   NaN    52
      2       205  60.0    55


## Setting & sorting a MultiIndex


In [31]:
sales = sales.reset_index()

In [35]:
# Set the index to be the columns ['state', 'month']: sales
sales = sales.set_index(['state', 'month'])

# Sort the MultiIndex: sales
sales = sales.sort_index()

# Print the sales DataFrame
print(sales)

             eggs  salt  spam
state month                  
CA    1        47  12.0    17
      2       110  50.0    31
NY    1       221  89.0    72
      2        77  87.0    20
TX    1       132   NaN    52
      2       205  60.0    55


## Using .loc[] with nonunique indexes


In [36]:
sales = sales.reset_index()

In [39]:
# Set the index to the column 'state': sales
sales = sales.set_index(['state'])

# Print the sales DataFrame
print(sales)

# Access the data from 'NY'
print(sales.loc['NY'])

       month  eggs  salt  spam
state                         
CA         1    47  12.0    17
CA         2   110  50.0    31
NY         1   221  89.0    72
NY         2    77  87.0    20
TX         1   132   NaN    52
TX         2   205  60.0    55
       month  eggs  salt  spam
state                         
NY         1   221  89.0    72
NY         2    77  87.0    20


## Indexing multiple levels of a MultiIndex


In [41]:
sales = sales.reset_index().set_index(['state', 'month'])

In [85]:
# Look up data for NY in month 1: NY_month1
NY_month1 = sales.loc[('NY', 1), :]

# Look up data for CA and TX in month 2: CA_TX_month2
CA_TX_month2 = sales.loc[(['CA', 'TX'], 2), :]

# Look up data for all states in month 2: all_month2
all_month2 = sales.loc[(slice(None), 2), :]