# Advanced indexing


## 1. Changing index of a DataFrame
As you saw in the previous exercise, indexes are immutable objects. This means that if you want to change or modify the index in a DataFrame, then you need to change the whole index. You will do this now, using a list comprehension to create the new index.

A list comprehension is a succinct way to generate a list in one line. For example, the following list comprehension generates a list that contains the cubes of all numbers from 0 to 9: `cubes = [i**3 for i in range(10)]`. This is equivalent to the following code:

```python
cubes = []
for i in range(10):
    cubes.append(i**3)
```

In [1]:
# Import required packages
import pandas as pd

In [36]:
# Import data
sales = pd.read_csv("data/sales/sales.csv", index_col = "month")
sales

Unnamed: 0_level_0,eggs,salt,spam
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Jan,47,12.0,17
Feb,110,50.0,31
Mar,221,89.0,72
Apr,77,87.0,20
May,132,,52
Jun,205,60.0,55


In [37]:
# Create the list of new indexes: new_idx
new_idx = [x.upper() for x in sales.index]

# Assign new_idx to sales.index
sales.index = new_idx

# Print the sales DataFrame
sales

Unnamed: 0,eggs,salt,spam
JAN,47,12.0,17
FEB,110,50.0,31
MAR,221,89.0,72
APR,77,87.0,20
MAY,132,,52
JUN,205,60.0,55


Well done! Notice the DataFrame's new index!



## 2. Changing index name labels
Notice that in the previous exercise, the index was not labeled with a name. In this exercise, you will set its name to 'MONTHS'.

Similarly, if all the columns are related in some way, you can provide a label for the set of columns.

In [38]:
# Assign the string 'MONTHS' to sales.index.name
sales.index.name = "MONTHS"

# Print the sales DataFrame
sales

Unnamed: 0_level_0,eggs,salt,spam
MONTHS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
JAN,47,12.0,17
FEB,110,50.0,31
MAR,221,89.0,72
APR,77,87.0,20
MAY,132,,52
JUN,205,60.0,55


In [39]:
# Assign the string 'PRODUCTS' to sales.columns.name 
sales.columns.name = "PRODUCTS"

# Print the sales dataframe again
sales

PRODUCTS,eggs,salt,spam
MONTHS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
JAN,47,12.0,17
FEB,110,50.0,31
MAR,221,89.0,72
APR,77,87.0,20
MAY,132,,52
JUN,205,60.0,55


Wonderful work! Notice how in the first DataFrame, the index has a label, and in the second DataFrame, both the index as well as the columns have labels.



## 3. Building an index, then a DataFrame
You can also build the DataFrame and index independently, and then put them together. If you take this route, be careful, as any mistakes in generating the DataFrame or the index can cause the data and the index to be aligned incorrectly.

In this exercise, the sales DataFrame has been provided for you without the month index. Your job is to build this index separately and then assign it to the sales DataFrame.

In [40]:
# Preparing dataframe for this exercise
sales.reset_index(inplace=True)
sales.drop("MONTHS", axis = "columns", inplace=True)
sales.columns.name = None

In [41]:
sales

Unnamed: 0,eggs,salt,spam
0,47,12.0,17
1,110,50.0,31
2,221,89.0,72
3,77,87.0,20
4,132,,52
5,205,60.0,55


In [42]:
# Generate the list of months: months
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']

# Assign months to sales.index
sales.index = months

# Print the modified sales DataFrame
sales

Unnamed: 0,eggs,salt,spam
Jan,47,12.0,17
Feb,110,50.0,31
Mar,221,89.0,72
Apr,77,87.0,20
May,132,,52
Jun,205,60.0,55


Excellent work! You're getting the hang of working with indexes. You'll now move onto learning about hierarchical indexes!



## 4. Extracting data with a MultiIndex
The sales DataFrame you have been working with has been extended to now include State information as well. 

Extracting elements from the outermost level of a MultiIndex is just like in the case of a single-level Index. You can use the .loc[] accessor.

In [189]:
# Importing all columns except month for this exercise
sales = pd.read_csv("data/sales/sales.csv", usecols=list(range(1,4)))
sales

Unnamed: 0,eggs,salt,spam
0,47,12.0,17
1,110,50.0,31
2,221,89.0,72
3,77,87.0,20
4,132,,52
5,205,60.0,55


In [190]:
# Printing type of imported dataframe index
type(sales.index)

pandas.core.indexes.range.RangeIndex

In [191]:
# defining Indexing parameters
month =  [1,2,1,2,1,2]
state = ["CA","CA", "NY", "NY", "TX", "TX"]
index_names = ["state", "month"]

In [192]:
# Setting Multi-level index
sales.index = [state, month]
sales.index.names = index_names
sales

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55


In [193]:
# Print sales.loc[['CA', 'TX']]
sales.loc[["CA", "TX"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
TX,1,132,,52
TX,2,205,60.0,55


In [194]:
# Print sales['CA':'TX']
sales['CA':'TX']

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55


In [195]:
# Printing the multilevel index
sales.index

MultiIndex(levels=[['CA', 'NY', 'TX'], [1, 2]],
           codes=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
           names=['state', 'month'])

In [196]:
# Printing type of index.name
type(sales.index.name)

NoneType

In [197]:
# Printing type of index.names
type(sales.index.names)

pandas.core.indexes.frozen.FrozenList

Well done! Notice how New York is excluded by the first operation, and included in the second one.



## 5. Setting & sorting a MultiIndex
In the previous exercise, the MultiIndex was created and sorted for you. Now, you're going to do this yourself! With a MultiIndex, you should always ensure the index is sorted. You can skip this only if you know the data is already sorted on the index fields.

In [198]:
# Preparing the dataframe for this exercise
sales.reset_index(inplace = True)
sales

Unnamed: 0,state,month,eggs,salt,spam
0,CA,1,47,12.0,17
1,CA,2,110,50.0,31
2,NY,1,221,89.0,72
3,NY,2,77,87.0,20
4,TX,1,132,,52
5,TX,2,205,60.0,55


In [199]:
# Set the index to be the columns ['state', 'month']: sales
sales = sales.set_index(["state", "month"])

# Sort the MultiIndex: sales
sales = sales.sort_index()

# Print the sales DataFrame
sales

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55


Great work! Take a look at the sorted MultiIndex!


## 6. Using `.loc[]` with nonunique indexes
It is always preferable to have a meaningful index that uniquely identifies each row. Even though pandas does not require unique index values in DataFrames, it works better if the index values are indeed unique. To see an example of this, you will index your sales data by `'state'` in this exercise.

In [200]:
# Preparing the dataframe for this exercise
sales.reset_index(inplace = True)

In [201]:
# Set the index to the column 'state': sales
sales = sales.set_index("state")

# Print the sales DataFrame
sales

Unnamed: 0_level_0,month,eggs,salt,spam
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55


In [202]:
# Access the data from 'NY'
sales.loc["NY"]

Unnamed: 0_level_0,month,eggs,salt,spam
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
NY,1,221,89.0,72
NY,2,77,87.0,20


Fantastic work! Here, because you have nonunique indexes, two rows are returned.



## 7. Indexing multiple levels of a MultiIndex
Looking up indexed data is fast and efficient. And you have already seen that lookups based on the outermost level of a MultiIndex work just like lookups on DataFrames that have a single-level Index.

Looking up data based on inner levels of a MultiIndex can be a bit trickier. In this exercise, you will use your sales DataFrame to do some increasingly complex lookups.

The trickiest of all these lookups are when you want to access some inner levels of the index. In this case, you need to use `slice(None)` in the slicing parameter for the outermost dimension(s) instead of the usual `:`, or use `pd.IndexSlice`. You can refer to the pandas documentation for more details. For example, we can use the following code to extract rows from all Symbols for the dates Oct. 3rd through 4th inclusive:

`stocks.loc[(slice(None), slice('2016-10-03', '2016-10-04')), :]`
Pay particular attention to the tuple 
`(slice(None), slice('2016-10-03', '2016-10-04'))`.

In [203]:
# Preparing the dataframe for this exercise
sales.reset_index(inplace = True)
sales.set_index(["state", "month"], inplace = True)
sales

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,1,47,12.0,17
CA,2,110,50.0,31
NY,1,221,89.0,72
NY,2,77,87.0,20
TX,1,132,,52
TX,2,205,60.0,55


In [205]:
# Look up data for NY in month 1: NY_month1 - Method 1
NY_month1 = sales.loc["NY",1 ]
NY_month1

eggs    221.0
salt     89.0
spam     72.0
Name: (NY, 1), dtype: float64

In [206]:
# Look up data for NY in month 1: NY_month1 - Method 2
NY_month1 = sales.loc[("NY", 1), :]
NY_month1

eggs    221.0
salt     89.0
spam     72.0
Name: (NY, 1), dtype: float64

### Note: Cannot do this

```python
# Look up data for CA and TX in month 2: CA_TX_month2
sales.loc[["CA", "TX"], 2]

TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [2] of <class 'int'>
```

In [207]:
# Look up data for CA and TX in month 2: CA_TX_month2
CA_TX_month2 = sales.loc[(["CA", "TX"], 2),:]
CA_TX_month2

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,2,110,50.0,31
TX,2,205,60.0,55


### Note: Cannot also do this
```python
# Access the inner month index and look up data for all states in month 2: all_month2
sales.loc[(:, 2), :]

  File "<ipython-input-209-403fc6e779be>", line 2
    all_month2 = sales.loc[(:, 2), :]
                            ^
SyntaxError: invalid syntax
```

In [208]:
# Access the inner month index and look up data for all states in month 2: all_month2
all_month2 = sales.loc[(slice(None), 2), :]
all_month2

Unnamed: 0_level_0,Unnamed: 1_level_0,eggs,salt,spam
state,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,2,110,50.0,31
NY,2,77,87.0,20
TX,2,205,60.0,55


Well done! Now that you've mastered how to work with indexes.

