## Working with hierarchical (multi-level) indices in Pandas 

### Import the pandas library 

In [1]:
import pandas as pd

### Load the CSV file from this [link](https://raw.githubusercontent.com/Prajwalk09/Data-Analysis-with-Pandas-and-Python/refs/heads/main/MultiIndex/bigmac.csv) into a DataFrame and assign it to a variable named `data` while ensuring that the `'Date'` column is parsed as datetime objects during the import

In [3]:
url = "https://raw.githubusercontent.com/Prajwalk09/Data-Analysis-with-Pandas-and-Python/refs/heads/main/MultiIndex/bigmac.csv"
data = pd.read_csv(url, parse_dates = ['Date'])

### Display the first 5 rows of the DataFrame 

In [4]:
data.head()

Unnamed: 0,Date,Country,Price in US Dollars
0,2016-01-01,Argentina,2.39
1,2016-01-01,Australia,3.74
2,2016-01-01,Brazil,3.35
3,2016-01-01,Britain,4.22
4,2016-01-01,Canada,4.14


### Set the `Date` and `Country` columns as a MultiIndex for the `data` DataFrame
<span style="color:green; font-weight:bold; font-size:14px;">Make sure that the changes are made inplace</span> 

In [5]:
data.set_index(keys = ['Date', 'Country'], inplace = True)

### Display the first 5 rows of the DataFrame after the previous operation 

In [6]:
data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2016-01-01,Argentina,2.39
2016-01-01,Australia,3.74
2016-01-01,Brazil,3.35
2016-01-01,Britain,4.22
2016-01-01,Canada,4.14


### Sort the index of the `data` DataFrame in ascending order for the first level and descending order for the second level
<span style="color:red; font-weight:bold; font-size:14px;">Do not make the changes inplace</span>

In [7]:
data.sort_index(ascending = [True, False])

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2010-01-01,Uruguay,3.32
2010-01-01,United States,3.58
2010-01-01,Ukraine,1.83
2010-01-01,UAE,2.99
2010-01-01,Turkey,3.83
...,...,...
2016-01-01,Brazil,3.35
2016-01-01,Belgium,4.25
2016-01-01,Austria,3.76
2016-01-01,Australia,3.74


### Sort the `data` DataFrame based on its MultiIndex in Ascending Order
<span style="color:green; font-weight:bold; font-size:14px;">Make sure that the changes are made inplace</span> 

In [9]:
data.sort_index(ascending = [True, True], inplace = True)

### Understanding `get_level_values()` and Levels in a MultiIndex

#### What is a Level in a MultiIndex?
A **level** in a MultiIndex represents one layer of the hierarchical indexing in a pandas DataFrame. Each row in the DataFrame has an index that is defined by the combination of values across all levels of the MultiIndex. For example, in a MultiIndex with levels `['Date', 'Country']`, the rows are uniquely identified by a combination of both `Date` and `Country`. <br>
<span style="color:blue">Note that this method can be applied only on the index of the DataFrame and not the DataFrame itself.</span>

#### Purpose of `get_level_values()`
The `get_level_values()` method allows you to extract values from a specific level of a MultiIndex.

- **Input:** It can take:
  - The **name** of the level (e.g., `'Country'`).
  - The **position** of the level (e.g., `1`).

- **Output:** It returns the values from the specified level as a pandas Index object.

#### Why Use It?
- Simplifies working with complex MultiIndex DataFrames.
- Allows you to focus on specific levels for analysis or data manipulation.

### Retrieve all values from the 'Date' level of the MultiIndex DataFrame

In [10]:
data.index.get_level_values(level = 'Date')

DatetimeIndex(['2010-01-01', '2010-01-01', '2010-01-01', '2010-01-01',
               '2010-01-01', '2010-01-01', '2010-01-01', '2010-01-01',
               '2010-01-01', '2010-01-01',
               ...
               '2016-01-01', '2016-01-01', '2016-01-01', '2016-01-01',
               '2016-01-01', '2016-01-01', '2016-01-01', '2016-01-01',
               '2016-01-01', '2016-01-01'],
              dtype='datetime64[ns]', name='Date', length=652, freq=None)

### Retrieve all values from the level 1 of the MultiIndex DataFrame

In [11]:
data.index.get_level_values(level = 1)

Index(['Argentina', 'Australia', 'Brazil', 'Britain', 'Canada', 'Chile',
       'China', 'Colombia', 'Costa Rica', 'Czech Republic',
       ...
       'Switzerland', 'Taiwan', 'Thailand', 'Turkey', 'UAE', 'Ukraine',
       'United States', 'Uruguay', 'Venezuela', 'Vietnam'],
      dtype='object', name='Country', length=652)

### Rename the level 0 of the MultiIndex to "Dates"
<span style="color:red; font-weight:bold; font-size:14px;">Do not make the changes inplace</span> 

In [12]:
data.index.rename(names = 'Dates', level = 0)

MultiIndex([('2010-01-01',      'Argentina'),
            ('2010-01-01',      'Australia'),
            ('2010-01-01',         'Brazil'),
            ('2010-01-01',        'Britain'),
            ('2010-01-01',         'Canada'),
            ('2010-01-01',          'Chile'),
            ('2010-01-01',          'China'),
            ('2010-01-01',       'Colombia'),
            ('2010-01-01',     'Costa Rica'),
            ('2010-01-01', 'Czech Republic'),
            ...
            ('2016-01-01',    'Switzerland'),
            ('2016-01-01',         'Taiwan'),
            ('2016-01-01',       'Thailand'),
            ('2016-01-01',         'Turkey'),
            ('2016-01-01',            'UAE'),
            ('2016-01-01',        'Ukraine'),
            ('2016-01-01',  'United States'),
            ('2016-01-01',        'Uruguay'),
            ('2016-01-01',      'Venezuela'),
            ('2016-01-01',        'Vietnam')],
           names=['Dates', 'Country'], length=652)

### Rename the level "Date" of the MultiIndex to "Dates"
<span style="color:green; font-weight:bold; font-size:14px;">Make sure that the changes are made inplace</span>

In [14]:
data.index.rename(names = 'Dates', level = 'Date', inplace = True)

### Display the first 5 rows of the DataFrame 

In [15]:
data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Dates,Country,Unnamed: 2_level_1
2010-01-01,Argentina,1.84
2010-01-01,Australia,3.98
2010-01-01,Brazil,4.76
2010-01-01,Britain,3.67
2010-01-01,Canada,3.97


### Retrieve the value for 'Price in US Dollars' on '2010-01-01' for 'Australia' in the `data` DataFrame using MultiIndex

In [16]:
data.loc[('2010-01-01', 'Australia'), ('Price in US Dollars')]

3.98

### Retrieve the values for 'Price in US Dollars' between '2010-01-01' for 'Australia' and '2010-01-01' for 'Ukraine' in the `data` DataFrame using MultiIndex

In [17]:
data.loc[('2010-01-01', 'Australia'):('2010-01-01', 'Ukraine'), ('Price in US Dollars')]

Dates       Country       
2010-01-01  Australia         3.98
            Brazil            4.76
            Britain           3.67
            Canada            3.97
            Chile             3.18
            China             1.83
            Colombia          3.91
            Costa Rica        3.52
            Czech Republic    3.71
            Denmark           5.99
            Egypt             2.38
            Euro area         4.84
            Hong Kong         1.91
            Hungary           3.86
            Indonesia         2.24
            Israel            3.99
            Japan             3.50
            Latvia            3.09
            Lithuania         2.87
            Malaysia          2.08
            Mexico            2.50
            New Zealand       3.61
            Norway            7.02
            Pakistan          2.42
            Peru              2.81
            Philippines       2.21
            Poland            2.86
            Russia          