# MULTI-INDEX IN PANDAS 

# 📚 1. Module Dataset


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>

In [3]:
import pandas as pd

In [22]:
# parse_dates=["Date"] - convert Date column to datetime
bigmac = pd.read_csv("bigmac.csv", parse_dates=["Date"], date_format="%Y-%m-%d")
bigmac.head()

Unnamed: 0,Date,Country,Price in US Dollars
0,2000-04-01,Argentina,2.5
1,2000-04-01,Australia,1.541667
2,2000-04-01,Brazil,1.648045
3,2000-04-01,Canada,1.938776
4,2000-04-01,Switzerland,3.470588


In [12]:
# check the data types of the columns
bigmac.dtypes

Date                   datetime64[ns]
Country                        object
Price in US Dollars           float64
dtype: object

In [5]:
# check the data types of the columns
bigmac.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1386 entries, 0 to 1385
Data columns (total 3 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   Date                 1386 non-null   datetime64[ns]
 1   Country              1386 non-null   object        
 2   Price in US Dollars  1386 non-null   float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 32.6+ KB


# 📚 2. Create a Multi-index


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>

There are many ways to create a MultiIndex DataFrame:
1. Using the `set_index` method
2. Using the `index_col` parameter in read_csv
3. Using `MultiIndex.from_arrays()` method
4. Using `MultiIndex.from_tuples()` method
5. Using `MultiIndex.from_product()` method
`

#### 1. Using the `set_index` method
____


In [7]:
data = {
    'Region': ['North', 'North', 'South', 'South', 'West', 'West'],
    'State': ['NY', 'NY', 'TX', 'TX', 'CA', 'CA'],
    'City': ['Albany', 'Buffalo', 'Houston', 'Dallas', 'Los Angeles', 'San Francisco'],
    'Sales': [2500, 3000, 4000, 3500, 4500, 5000]
}

# Create DataFrame
df = pd.DataFrame(data)
df

Unnamed: 0,Region,State,City,Sales
0,North,NY,Albany,2500
1,North,NY,Buffalo,3000
2,South,TX,Houston,4000
3,South,TX,Dallas,3500
4,West,CA,Los Angeles,4500
5,West,CA,San Francisco,5000


Example 1: Create a MultiIndex DataFrame using the `set_index` method.

In [23]:
# Set MultiIndex
df_multi_index = df.set_index(['Region', 'State', 'City'])

In [19]:
# Display MultiIndex DataFrame
df_multi_index

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Sales
Region,State,City,Unnamed: 3_level_1
North,NY,Albany,2500
North,NY,Buffalo,3000
South,TX,Houston,4000
South,TX,Dallas,3500
West,CA,Los Angeles,4500
West,CA,San Francisco,5000


In [24]:
# check the index of the DataFrame
df_multi_index.index.to_list()

[('North', 'NY', 'Albany'),
 ('North', 'NY', 'Buffalo'),
 ('South', 'TX', 'Houston'),
 ('South', 'TX', 'Dallas'),
 ('West', 'CA', 'Los Angeles'),
 ('West', 'CA', 'San Francisco')]

Example 2: Create a multiindex for the bimacl dataset



In [25]:
bigmac_index = bigmac.set_index(keys=["Date", "Country"])

In [26]:
bigmac_index

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.500000
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Canada,1.938776
2000-04-01,Switzerland,3.470588
...,...,...
2020-07-01,Ukraine,2.174714
2020-07-01,Uruguay,4.327418
2020-07-01,United States,5.710000
2020-07-01,Vietnam,2.847282


#### 2. Using the `index_col` parameter in read_csv
____

This method is used to create a MultiIndex DataFrame when reading a csv file.
Demonsrating on how to use the `index_col` parameter in read_csv to create a MultiIndex DataFrame, and using the example of the bimacl dataset.

Example 1: Create a multiindex for the bimacl dataset

In [27]:
#create a multi-index DataFrame using the index_col parameter using the date and country columns
bigmac = pd.read_csv("bigmac.csv", 
                     parse_dates=["Date"], 
                     date_format="%Y-%m-%d", 
                     index_col=["Date", "Country"]).sort_index()


bigmac.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776
2000-04-01,Chile,2.451362
2000-04-01,China,1.195652
2000-04-01,Czech Republic,1.390537
2000-04-01,Denmark,3.078358
2000-04-01,Euro area,2.3808


#### 3. Using the `multiindex from arrays` method
____

In [28]:
arrays = [
    ['North', 'North', 'South', 'South', 'West', 'West'],
    ['NY', 'NY', 'TX', 'TX', 'CA', 'CA'],
    ['Albany', 'Buffalo', 'Houston', 'Dallas', 'Los Angeles', 'San Francisco']
]

index = pd.MultiIndex.from_arrays(arrays, names=('Region', 'State', 'City'))

df = pd.DataFrame({'Population': [100000, 200000, 150000, 180000, 400000, 300000]}, index=index)


In [29]:
print(arrays)

[['North', 'North', 'South', 'South', 'West', 'West'], ['NY', 'NY', 'TX', 'TX', 'CA', 'CA'], ['Albany', 'Buffalo', 'Houston', 'Dallas', 'Los Angeles', 'San Francisco']]


In [30]:
print(index)

MultiIndex([('North', 'NY',        'Albany'),
            ('North', 'NY',       'Buffalo'),
            ('South', 'TX',       'Houston'),
            ('South', 'TX',        'Dallas'),
            ( 'West', 'CA',   'Los Angeles'),
            ( 'West', 'CA', 'San Francisco')],
           names=['Region', 'State', 'City'])


In [32]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Population
Region,State,City,Unnamed: 3_level_1
North,NY,Albany,100000
North,NY,Buffalo,200000
South,TX,Houston,150000
South,TX,Dallas,180000
West,CA,Los Angeles,400000
West,CA,San Francisco,300000


#### 4. Using the `multiindex from tuples` method
___

In [36]:
tuples = [
    ('North', 'NY', 'Albany'),
    ('North', 'NY', 'Buffalo'),
    ('South', 'TX', 'Houston'),
    ('South', 'TX', 'Dallas'),
    ('West', 'CA', 'Los Angeles'),
    ('West', 'CA', 'San Francisco')
]

index = pd.MultiIndex.from_tuples(tuples, names=('Region', 'State', 'City'))

df = pd.DataFrame({'Population': [100000, 200000, 150000, 180000, 400000, 300000]}, index=index)
df


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Population
Region,State,City,Unnamed: 3_level_1
North,NY,Albany,100000
North,NY,Buffalo,200000
South,TX,Houston,150000
South,TX,Dallas,180000
West,CA,Los Angeles,400000
West,CA,San Francisco,300000


#### 5. Using the `multiindex from product` method
____

In [35]:
regions = ['North', 'South']
states = ['NY', 'TX']
cities = ['City1', 'City2']

index = pd.MultiIndex.from_product([regions, states, cities], names=('Region', 'State', 'City'))

df = pd.DataFrame({'Population': range(1, 9)}, index=index)
df


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Population
Region,State,City,Unnamed: 3_level_1
North,NY,City1,1
North,NY,City2,2
North,TX,City1,3
North,TX,City2,4
South,NY,City1,5
South,NY,City2,6
South,TX,City1,7
South,TX,City2,8


# 📚 3. Extract Index Level Values


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>

One can extract the index level values using the `index.get_level_values()` method. This method returns the values of the specified level of the MultiIndex.



- The `index.get_level_values` method extracts an **Index** with the values from one level in the **MultiIndex**.
- Invoke the `index.get_level_values` on the **MultiIndex**, not the **DataFrame** itself.
- The method expects either the level's index position or its name.

In [89]:
bigmac3 = pd.read_csv("bigmac.csv", parse_dates=["Date"], date_format="%Y-%m-%d", index_col=["Date", "Country"]).sort_index()
bigmac3.head(30)

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776
2000-04-01,Chile,2.451362
2000-04-01,China,1.195652
2000-04-01,Czech Republic,1.390537
2000-04-01,Denmark,3.078358
2000-04-01,Euro area,2.3808


In [90]:
# check the index of the DataFrame
bigmac3.index

MultiIndex([('2000-04-01',            'Argentina'),
            ('2000-04-01',            'Australia'),
            ('2000-04-01',               'Brazil'),
            ('2000-04-01',              'Britain'),
            ('2000-04-01',               'Canada'),
            ('2000-04-01',                'Chile'),
            ('2000-04-01',                'China'),
            ('2000-04-01',       'Czech Republic'),
            ('2000-04-01',              'Denmark'),
            ('2000-04-01',            'Euro area'),
            ...
            ('2020-07-01',               'Sweden'),
            ('2020-07-01',          'Switzerland'),
            ('2020-07-01',               'Taiwan'),
            ('2020-07-01',             'Thailand'),
            ('2020-07-01',               'Turkey'),
            ('2020-07-01',              'Ukraine'),
            ('2020-07-01', 'United Arab Emirates'),
            ('2020-07-01',        'United States'),
            ('2020-07-01',              'Uruguay

In [91]:
# extract the index using the get_level_values method
# Note: get_level_values() method is used to get values at the specified level on the Date and Country columns
# The name on the index must match the column name


bigmac3.index.get_level_values("Date")

DatetimeIndex(['2000-04-01', '2000-04-01', '2000-04-01', '2000-04-01',
               '2000-04-01', '2000-04-01', '2000-04-01', '2000-04-01',
               '2000-04-01', '2000-04-01',
               ...
               '2020-07-01', '2020-07-01', '2020-07-01', '2020-07-01',
               '2020-07-01', '2020-07-01', '2020-07-01', '2020-07-01',
               '2020-07-01', '2020-07-01'],
              dtype='datetime64[ns]', name='Date', length=1386, freq=None)

In [85]:
# extract the index using the get_level_values method
bigmac3.index.get_level_values("Country")

Index(['Argentina', 'Australia', 'Brazil', 'Britain', 'Canada', 'Chile',
       'China', 'Czech Republic', 'Denmark', 'Euro area',
       ...
       'Sweden', 'Switzerland', 'Taiwan', 'Thailand', 'Turkey', 'Ukraine',
       'United Arab Emirates', 'United States', 'Uruguay', 'Vietnam'],
      dtype='object', name='Country', length=1386)

In [71]:
# Alternatively, we can use the integer position of the index, because  'Date' is the first index [0]
# both of the following are equivalent
bigmac3.index.get_level_values(0) == bigmac3.index.get_level_values("Date")

array([ True,  True,  True, ...,  True,  True,  True])

In [86]:
bigmac3.index.get_level_values(1) == bigmac3.index.get_level_values("Country")

array([ True,  True,  True, ...,  True,  True,  True])

# 📚 4. Rename Index Levels


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>


____
To rename one or more levels of a MultiIndex, use the `set_names` method. This allows you to give meaningful names to different index levels, making your DataFrame easier to read and work with.

- You can rename a specific level by using the `names` parameter along with the `level` argument to target a particular index layer.
- If you want to rename all levels at once, pass a list of new names to the `names` parameter.
- Since `set_names` returns a modified copy of the index, you need to assign it back to the DataFrame to apply the changes permanently.

In [12]:
bigmac4 = pd.read_csv("bigmac.csv", parse_dates=["Date"], date_format="%Y-%m-%d", index_col=["Date", "Country"]).sort_index()
bigmac4.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [13]:
# renaming the index with level 0, which is the first index
bigmac4.index.set_names(names="Teslim_Date", level=0)

MultiIndex([('2000-04-01',            'Argentina'),
            ('2000-04-01',            'Australia'),
            ('2000-04-01',               'Brazil'),
            ('2000-04-01',              'Britain'),
            ('2000-04-01',               'Canada'),
            ('2000-04-01',                'Chile'),
            ('2000-04-01',                'China'),
            ('2000-04-01',       'Czech Republic'),
            ('2000-04-01',              'Denmark'),
            ('2000-04-01',            'Euro area'),
            ...
            ('2020-07-01',               'Sweden'),
            ('2020-07-01',          'Switzerland'),
            ('2020-07-01',               'Taiwan'),
            ('2020-07-01',             'Thailand'),
            ('2020-07-01',               'Turkey'),
            ('2020-07-01',              'Ukraine'),
            ('2020-07-01', 'United Arab Emirates'),
            ('2020-07-01',        'United States'),
            ('2020-07-01',              'Uruguay

if you check the dataset, the ~ names=['Teslim_Date', 'Country'], length=1386) ~ is the name of the index now instead of the default name.

In [14]:
# reassign the index to the DataFrame
bigmac4.index = bigmac4.index.set_names(names="Teslim_Date", level=0)
bigmac4.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Teslim_Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [16]:
# renaming the index with level 1, which is the second index
bigmac4.index.set_names(names="List_country", level=1)

MultiIndex([('2000-04-01',            'Argentina'),
            ('2000-04-01',            'Australia'),
            ('2000-04-01',               'Brazil'),
            ('2000-04-01',              'Britain'),
            ('2000-04-01',               'Canada'),
            ('2000-04-01',                'Chile'),
            ('2000-04-01',                'China'),
            ('2000-04-01',       'Czech Republic'),
            ('2000-04-01',              'Denmark'),
            ('2000-04-01',            'Euro area'),
            ...
            ('2020-07-01',               'Sweden'),
            ('2020-07-01',          'Switzerland'),
            ('2020-07-01',               'Taiwan'),
            ('2020-07-01',             'Thailand'),
            ('2020-07-01',               'Turkey'),
            ('2020-07-01',              'Ukraine'),
            ('2020-07-01', 'United Arab Emirates'),
            ('2020-07-01',        'United States'),
            ('2020-07-01',              'Uruguay

In [17]:
# reassign the index to the DataFrame
bigmac4.index = bigmac4.index.set_names(names="List_country", level=1)
bigmac4.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Teslim_Date,List_country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [20]:
bigmac.index.set_names(names=["Time", "Location"])

MultiIndex([('2000-04-01',            'Argentina'),
            ('2000-04-01',            'Australia'),
            ('2000-04-01',               'Brazil'),
            ('2000-04-01',              'Britain'),
            ('2000-04-01',               'Canada'),
            ('2000-04-01',                'Chile'),
            ('2000-04-01',                'China'),
            ('2000-04-01',       'Czech Republic'),
            ('2000-04-01',              'Denmark'),
            ('2000-04-01',            'Euro area'),
            ...
            ('2020-07-01',               'Sweden'),
            ('2020-07-01',          'Switzerland'),
            ('2020-07-01',               'Taiwan'),
            ('2020-07-01',             'Thailand'),
            ('2020-07-01',               'Turkey'),
            ('2020-07-01',              'Ukraine'),
            ('2020-07-01', 'United Arab Emirates'),
            ('2020-07-01',        'United States'),
            ('2020-07-01',              'Uruguay

# 📚 5.The sort_index Method on a MultiIndex DataFrame


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>


___
- Using the `sort_index` method, we can target all levels or specific levels of the **MultiIndex**.
- To apply a different sort order to different levels, pass a list of Booleans.

In [4]:
bigmac = pd.read_csv("bigmac.csv", parse_dates=["Date"], date_format="%Y-%m-%d", index_col=["Date", "Country"])
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Canada,1.938776
2000-04-01,Switzerland,3.470588


In [6]:
bigmac.sort_index()
bigmac.sort_index(ascending=True)
bigmac.sort_index(ascending=False)

bigmac.sort_index(ascending=[True, False])
bigmac.sort_index(ascending=[False, True])

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2020-07-01,Argentina,3.509232
2020-07-01,Australia,4.578450
2020-07-01,Azerbaijan,2.324897
2020-07-01,Bahrain,3.713035
2020-07-01,Brazil,3.913528
...,...,...
2000-04-01,Sweden,2.714932
2000-04-01,Switzerland,3.470588
2000-04-01,Taiwan,2.287582
2000-04-01,Thailand,1.447368


In [7]:
bigmac.sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.500000
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002000
2000-04-01,Canada,1.938776
...,...,...
2020-07-01,Ukraine,2.174714
2020-07-01,United Arab Emirates,4.015846
2020-07-01,United States,5.710000
2020-07-01,Uruguay,4.327418


In [8]:
bigmac.sort_index(ascending=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.500000
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002000
2000-04-01,Canada,1.938776
...,...,...
2020-07-01,Ukraine,2.174714
2020-07-01,United Arab Emirates,4.015846
2020-07-01,United States,5.710000
2020-07-01,Uruguay,4.327418


In [20]:
bigmac.sort_index(ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2020-07-01,Vietnam,2.847282
2020-07-01,Uruguay,4.327418
2020-07-01,United States,5.710000
2020-07-01,United Arab Emirates,4.015846
2020-07-01,Ukraine,2.174714
...,...,...
2000-04-01,Canada,1.938776
2000-04-01,Britain,3.002000
2000-04-01,Brazil,1.648045
2000-04-01,Australia,1.541667


In [None]:
# sort the index in ascending order for the first level and descending order for the second level
bigmac.sort_index(ascending=[True, False])

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,United States,2.510000
2000-04-01,Thailand,1.447368
2000-04-01,Taiwan,2.287582
2000-04-01,Switzerland,3.470588
2000-04-01,Sweden,2.714932
...,...,...
2020-07-01,Brazil,3.913528
2020-07-01,Bahrain,3.713035
2020-07-01,Azerbaijan,2.324897
2020-07-01,Australia,4.578450


In [None]:
# sort the index in descending order for the first level and ascending order for the second level
bigmac.sort_index(ascending=[False, True])

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2020-07-01,Argentina,3.509232
2020-07-01,Australia,4.578450
2020-07-01,Azerbaijan,2.324897
2020-07-01,Bahrain,3.713035
2020-07-01,Brazil,3.913528
...,...,...
2000-04-01,Sweden,2.714932
2000-04-01,Switzerland,3.470588
2000-04-01,Taiwan,2.287582
2000-04-01,Thailand,1.447368


# 📚 6 .Extract Rows from a MultiIndex DataFrame


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>


___
- A **tuple** is an immutable list. It cannot be modified after creation.
- Create a tuple with a comma between elements. The community convention is to wrap the elements in parentheses.
- The `iloc` and `loc` accessors are available to extract rows by index position or label.
- For the `loc` accessor, pass a tuple to hold the labels from the index levels.

In [2]:
bigmac = pd.read_csv("bigmac.csv", parse_dates=["Date"], date_format="%Y-%m-%d", index_col=["Date", "Country"]).sort_index()
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


To crete a tuple, you can use the following syntax:
- `('value1', 'value2')`

We can write a tuple with a single element as well:
- `('value1',)`
- `('value2',)`

In [26]:
1,
1, 2
(1, 2)
type((1, 2))

# type([1, 2])

tuple

Using the `iloc` accessor, we can extract rows from a MultiIndex DataFrame by passing a tuple with the values from each level of the index.

In [10]:
# access the first row of the DataFrame
bigmac.iloc[0]

Price in US Dollars    2.5
Name: (2000-04-01 00:00:00, Argentina), dtype: float64

In [11]:
# access the second row of the DataFrame
bigmac.iloc[2]

Price in US Dollars    1.648045
Name: (2000-04-01 00:00:00, Brazil), dtype: float64

From the above, it can been seeing that the index access the row just like other dataframe. 

Using the `loc` accessor, we can extract rows from a MultiIndex DataFrame by passing a tuple with the values from each level of the index.

In [4]:
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [5]:
# check the index of the DataFrame
bigmac.index

MultiIndex([('2000-04-01',            'Argentina'),
            ('2000-04-01',            'Australia'),
            ('2000-04-01',               'Brazil'),
            ('2000-04-01',              'Britain'),
            ('2000-04-01',               'Canada'),
            ('2000-04-01',                'Chile'),
            ('2000-04-01',                'China'),
            ('2000-04-01',       'Czech Republic'),
            ('2000-04-01',              'Denmark'),
            ('2000-04-01',            'Euro area'),
            ...
            ('2020-07-01',               'Sweden'),
            ('2020-07-01',          'Switzerland'),
            ('2020-07-01',               'Taiwan'),
            ('2020-07-01',             'Thailand'),
            ('2020-07-01',               'Turkey'),
            ('2020-07-01',              'Ukraine'),
            ('2020-07-01', 'United Arab Emirates'),
            ('2020-07-01',        'United States'),
            ('2020-07-01',              'Uruguay

In [6]:
# You cannot access the index of the date direct as a column
# bigmac.loc['Date']

In [7]:
# Accessing the detail under the index of date 2010-04--01
bigmac.loc["2000-04-01"]

Unnamed: 0_level_0,Price in US Dollars
Country,Unnamed: 1_level_1
Argentina,2.5
Australia,1.541667
Brazil,1.648045
Britain,3.002
Canada,1.938776
Chile,2.451362
China,1.195652
Czech Republic,1.390537
Denmark,3.078358
Euro area,2.3808


In [16]:
# access the first row of the DataFrame using the index
bigmac.loc['2020-07-01']

Unnamed: 0_level_0,Price in US Dollars
Country,Unnamed: 1_level_1
Argentina,3.509232
Australia,4.57845
Azerbaijan,2.324897
Bahrain,3.713035
Brazil,3.913528
Britain,4.277332
Canada,5.076741
Chile,3.478482
China,3.098451
Colombia,3.29001


In [8]:
# access the first row of the DataFrame using the index
bigmac.loc['2020-07-01', 'Price in US Dollars']

Country
Argentina               3.509232
Australia               4.578450
Azerbaijan              2.324897
Bahrain                 3.713035
Brazil                  3.913528
Britain                 4.277332
Canada                  5.076741
Chile                   3.478482
China                   3.098451
Colombia                3.290010
Costa Rica              4.039015
Croatia                 3.319652
Czech Republic          3.801242
Denmark                 4.580607
Egypt                   2.633229
Euro area               4.786138
Guatemala               3.249708
Honduras                3.523594
Hong Kong               2.644837
Hungary                 2.890378
India                   2.526680
Indonesia               2.355386
Israel                  4.946606
Japan                   3.635516
Jordan                  3.244006
Kuwait                  3.736192
Lebanon                 5.952381
Malaysia                2.342047
Mexico                  2.228561
Moldova                 2.752562
Ne

In [9]:
# accessing a single row using the index using tuple
bigmac.loc[("2020-07-01", "Price in US Dollars")]
bigmac.loc[("2000-04-01", "Canada")]

Price in US Dollars    1.938776
Name: (2000-04-01 00:00:00, Canada), dtype: float64

In [51]:
# slicing the DataFrame using the date and country index
start = ("2000-04-01", "Hungary")
end = ("2000-04-01", "Poland")
bigmac.loc[start:end]

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Hungary,1.215054
2000-04-01,Indonesia,1.825047
2000-04-01,Israel,3.580247
2000-04-01,Japan,2.773585
2000-04-01,Malaysia,1.189474
2000-04-01,Mexico,2.221041
2000-04-01,New Zealand,1.691542
2000-04-01,Poland,1.27907


In [52]:
bigmac.loc[("2019-07-09", "Hungary"):]

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2019-07-09,Hungary,3.097667
2019-07-09,India,2.669413
2019-07-09,Indonesia,2.264685
2019-07-09,Israel,4.766311
2019-07-09,Japan,3.585712
...,...,...
2020-07-01,Ukraine,2.174714
2020-07-01,United Arab Emirates,4.015846
2020-07-01,United States,5.710000
2020-07-01,Uruguay,4.327418


In [53]:
bigmac.loc[("2012-01-01", "Brazil"): ("2013-07-01", "Turkey")]

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2012-01-01,Brazil,5.678670
2012-01-01,Britain,3.823395
2012-01-01,Canada,4.632940
2012-01-01,Chile,4.050983
2012-01-01,China,2.438445
...,...,...
2013-07-01,Sweden,6.156874
2013-07-01,Switzerland,6.719041
2013-07-01,Taiwan,2.630834
2013-07-01,Thailand,2.845723


In [54]:
bigmac.loc[("2012-01-01", "Brazil"): ("2013-07-01", "Turkey"), "Price in US Dollars"]

Date        Country    
2012-01-01  Brazil         5.678670
            Britain        3.823395
            Canada         4.632940
            Chile          4.050983
            China          2.438445
                             ...   
2013-07-01  Sweden         6.156874
            Switzerland    6.719041
            Taiwan         2.630834
            Thailand       2.845723
            Turkey         4.342384
Name: Price in US Dollars, Length: 160, dtype: float64

# 📚 7. The transpose Method


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>


___
- The `transpose` method inverts/flips the horizontal and vertical axes of the **DataFrame**.

In [4]:
bigmac = pd.read_csv("bigmac.csv", parse_dates=["Date"], date_format="%Y-%m-%d", index_col=["Date", "Country"]).sort_index()
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [5]:
start = ("2018-01-01", "China")
end = ("2018-01-01", "Denmark")

bigmac.loc[start:end].transpose()

Date,2018-01-01,2018-01-01,2018-01-01,2018-01-01,2018-01-01
Country,China,Colombia,Costa Rica,Czech Republic,Denmark
Price in US Dollars,3.171642,3.832468,4.027932,3.807779,4.93202


# 📚 8. The stack Method


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>


____
- The `stack` method moves the column index to the row index.
- Pandas will return a **MultiIndex Series**.
- Think of it like "stacking" index levels for a **MultiIndex**.

The stack method in Pandas is used to transform a DataFrame by pivoting the column index into the row index. This effectively reshapes the DataFrame, reducing its number of columns while increasing the depth of its row index.

When stack is applied to a DataFrame:
- It takes the column labels and moves them into the row index, creating a MultiIndex structure.
- The result is a MultiIndex Series, where the first index level corresponds to the original row index, and the second index level represents the former column labels.
- This operation is particularly useful when you need to restructure wide-format data into a longer format for easier processing or analysis.

Think of stack as a way to "stack" column values under each row, effectively creating a hierarchical index structure. This method is often used in data reshaping workflows, especially when dealing with time-series or panel data.

In [7]:
world = pd.read_csv("worldstats.csv", index_col=["year", "country"]).sort_index()
world.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,GDP
year,country,Unnamed: 2_level_1,Unnamed: 3_level_1
1960,Afghanistan,8994793.0,537777800.0
1960,Algeria,11124892.0,2723638000.0
1960,Australia,10276477.0,18567590000.0
1960,Austria,7047539.0,6592694000.0
1960,"Bahamas, The",109526.0,169802300.0


In [75]:
# convert the data to a another format using the stack() method
world.stack()

year  country                
1960  Afghanistan  Population    8.994793e+06
                   GDP           5.377778e+08
      Algeria      Population    1.112489e+07
                   GDP           2.723638e+09
      Australia    Population    1.027648e+07
                                     ...     
2015  World        GDP           7.343364e+13
      Zambia       Population    1.621177e+07
                   GDP           2.120156e+10
      Zimbabwe     Population    1.560275e+07
                   GDP           1.389294e+10
Length: 22422, dtype: float64

In [8]:
# after converting the data to another format, check the data type, it turns the data to a Series.. 
# this can be check using the type() function
type(world.stack())

pandas.core.series.Series

In [9]:
#  convert the stack data to a DataFrame using the to_frame() method
stacked_world = world.stack().to_frame()

stacked_world

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,0
year,country,Unnamed: 2_level_1,Unnamed: 3_level_1
1960,Afghanistan,Population,8.994793e+06
1960,Afghanistan,GDP,5.377778e+08
1960,Algeria,Population,1.112489e+07
1960,Algeria,GDP,2.723638e+09
1960,Australia,Population,1.027648e+07
...,...,...,...
2015,World,GDP,7.343364e+13
2015,Zambia,Population,1.621177e+07
2015,Zambia,GDP,2.120156e+10
2015,Zimbabwe,Population,1.560275e+07


In [10]:
# check the index of the DataFrame
stacked_world.index

MultiIndex([(1960,        'Afghanistan', 'Population'),
            (1960,        'Afghanistan',        'GDP'),
            (1960,            'Algeria', 'Population'),
            (1960,            'Algeria',        'GDP'),
            (1960,          'Australia', 'Population'),
            (1960,          'Australia',        'GDP'),
            (1960,            'Austria', 'Population'),
            (1960,            'Austria',        'GDP'),
            (1960,       'Bahamas, The', 'Population'),
            (1960,       'Bahamas, The',        'GDP'),
            ...
            (2015,            'Vietnam', 'Population'),
            (2015,            'Vietnam',        'GDP'),
            (2015, 'West Bank and Gaza', 'Population'),
            (2015, 'West Bank and Gaza',        'GDP'),
            (2015,              'World', 'Population'),
            (2015,              'World',        'GDP'),
            (2015,             'Zambia', 'Population'),
            (2015,             '

# 📚 9. The unstack Method


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>


____
- The `unstack` method moves a row index to the column index (the inverse of the `stack` method).
- By default, the `unstack` method will move the innermost index.
- We can customize the moved index with the `level` parameter.
- The `level` parameter accepts the level's index position or its name. It can also accept a list of positions/names.

In [19]:
world = pd.read_csv("worldstats.csv", index_col=["year", "country"]).sort_index().stack()
world.head()

year  country                
1960  Afghanistan  Population    8.994793e+06
                   GDP           5.377778e+08
      Algeria      Population    1.112489e+07
                   GDP           2.723638e+09
      Australia    Population    1.027648e+07
dtype: float64

To unstack a DataFrame, use the `unstack` method. This operation is the inverse of the `stack` method, moving the innermost row index level to the column index.

In [17]:
# un-stack the data using the level parameter
# unstack the data using the year index.. the default level is innermost index, which is the population
world.unstack()

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,GDP
year,country,Unnamed: 2_level_1,Unnamed: 3_level_1
1960,Afghanistan,8.994793e+06,5.377778e+08
1960,Algeria,1.112489e+07,2.723638e+09
1960,Australia,1.027648e+07,1.856759e+10
1960,Austria,7.047539e+06,6.592694e+09
1960,"Bahamas, The",1.095260e+05,1.698023e+08
...,...,...,...
2015,Vietnam,9.170380e+07,1.935994e+11
2015,West Bank and Gaza,4.422143e+06,1.267740e+10
2015,World,7.346633e+09,7.343364e+13
2015,Zambia,1.621177e+07,2.120156e+10


After the conversion, the DataFrame will have a MultiIndex column structure, where the first level corresponds to the original column labels and the second level represents the former row index. In this instance, the population  is the new column index.

In [18]:
# check the columns of the DataFrame
world.unstack().columns

Index(['Population', 'GDP'], dtype='object')

____

Sometimes, you may want to unstack a different row index level. In such cases, you can use the `level` parameter to specify the index level to move to the column index. The `level` parameter accepts either the index position or the level name.

In [22]:
world.head()

year  country                
1960  Afghanistan  Population    8.994793e+06
                   GDP           5.377778e+08
      Algeria      Population    1.112489e+07
                   GDP           2.723638e+09
      Australia    Population    1.027648e+07
dtype: float64

From the above dataframe, we might want to unstack the `Year` index level instead of the innermost index level. To do this, we can pass the `level` parameter with the name of the index level we want to unstack.

In [23]:
world.unstack(level="year")

Unnamed: 0_level_0,year,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Afghanistan,Population,8.994793e+06,9.164945e+06,9.343772e+06,9.531555e+06,9.728645e+06,9.935358e+06,1.014884e+07,1.036860e+07,1.059979e+07,1.084951e+07,...,2.518362e+07,2.587754e+07,2.652874e+07,2.720729e+07,2.796221e+07,2.880917e+07,2.972680e+07,3.068250e+07,3.162751e+07,3.252656e+07
Afghanistan,GDP,5.377778e+08,5.488889e+08,5.466667e+08,7.511112e+08,8.000000e+08,1.006667e+09,1.400000e+09,1.673333e+09,1.373333e+09,1.408889e+09,...,7.057598e+09,9.843842e+09,1.019053e+10,1.248694e+10,1.593680e+10,1.793024e+10,2.053654e+10,2.004633e+10,2.005019e+10,1.919944e+10
Albania,Population,,,,,,,,,,,...,2.992547e+06,2.970017e+06,2.947314e+06,2.927519e+06,2.913021e+06,2.904780e+06,2.900247e+06,2.896652e+06,2.893654e+06,2.889167e+06
Albania,GDP,,,,,,,,,,,...,8.992642e+09,1.070101e+10,1.288135e+10,1.204421e+10,1.192695e+10,1.289087e+10,1.231978e+10,1.278103e+10,1.327796e+10,1.145560e+10
Algeria,Population,1.112489e+07,1.140486e+07,1.169015e+07,1.198513e+07,1.229597e+07,1.262695e+07,1.298027e+07,1.335420e+07,1.374438e+07,1.414444e+07,...,3.374933e+07,3.426197e+07,3.481106e+07,3.540179e+07,3.603616e+07,3.671713e+07,3.743943e+07,3.818614e+07,3.893433e+07,3.966652e+07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Yemen, Rep.",GDP,,,,,,,,,,,...,1.908173e+10,2.563367e+10,3.039720e+10,2.845950e+10,3.090675e+10,3.107886e+10,3.207477e+10,3.595450e+10,,
Zambia,Population,3.049586e+06,3.142848e+06,3.240664e+06,3.342894e+06,3.449266e+06,3.559687e+06,3.674088e+06,3.792864e+06,3.916928e+06,4.047479e+06,...,1.238151e+07,1.273868e+07,1.311458e+07,1.350785e+07,1.391744e+07,1.434353e+07,1.478658e+07,1.524609e+07,1.572134e+07,1.621177e+07
Zambia,GDP,6.987397e+08,6.823597e+08,6.792797e+08,7.043397e+08,8.226397e+08,1.061200e+09,1.239000e+09,1.340639e+09,1.573739e+09,1.926399e+09,...,1.275686e+10,1.405696e+10,1.791086e+10,1.532834e+10,2.026555e+10,2.345952e+10,2.550306e+10,2.804552e+10,2.713464e+10,2.120156e+10
Zimbabwe,Population,3.752390e+06,3.876638e+06,4.006262e+06,4.140804e+06,4.279561e+06,4.422132e+06,4.568320e+06,4.718612e+06,4.874113e+06,5.036321e+06,...,1.312794e+07,1.329780e+07,1.349546e+07,1.372100e+07,1.397390e+07,1.425559e+07,1.456548e+07,1.489809e+07,1.524586e+07,1.560275e+07


In [24]:
# check the columns of the DataFrame
world.unstack(level="year").columns

Index([1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971,
       1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995,
       1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
       2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015],
      dtype='int64', name='year')

Instead of passing the level name, you can also pass the index position to the `level` parameter. This is useful when you want to unstack a specific index level without knowing its name.

In [20]:
world.unstack(level=0)

Unnamed: 0_level_0,year,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Afghanistan,Population,8.994793e+06,9.164945e+06,9.343772e+06,9.531555e+06,9.728645e+06,9.935358e+06,1.014884e+07,1.036860e+07,1.059979e+07,1.084951e+07,...,2.518362e+07,2.587754e+07,2.652874e+07,2.720729e+07,2.796221e+07,2.880917e+07,2.972680e+07,3.068250e+07,3.162751e+07,3.252656e+07
Afghanistan,GDP,5.377778e+08,5.488889e+08,5.466667e+08,7.511112e+08,8.000000e+08,1.006667e+09,1.400000e+09,1.673333e+09,1.373333e+09,1.408889e+09,...,7.057598e+09,9.843842e+09,1.019053e+10,1.248694e+10,1.593680e+10,1.793024e+10,2.053654e+10,2.004633e+10,2.005019e+10,1.919944e+10
Albania,Population,,,,,,,,,,,...,2.992547e+06,2.970017e+06,2.947314e+06,2.927519e+06,2.913021e+06,2.904780e+06,2.900247e+06,2.896652e+06,2.893654e+06,2.889167e+06
Albania,GDP,,,,,,,,,,,...,8.992642e+09,1.070101e+10,1.288135e+10,1.204421e+10,1.192695e+10,1.289087e+10,1.231978e+10,1.278103e+10,1.327796e+10,1.145560e+10
Algeria,Population,1.112489e+07,1.140486e+07,1.169015e+07,1.198513e+07,1.229597e+07,1.262695e+07,1.298027e+07,1.335420e+07,1.374438e+07,1.414444e+07,...,3.374933e+07,3.426197e+07,3.481106e+07,3.540179e+07,3.603616e+07,3.671713e+07,3.743943e+07,3.818614e+07,3.893433e+07,3.966652e+07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Yemen, Rep.",GDP,,,,,,,,,,,...,1.908173e+10,2.563367e+10,3.039720e+10,2.845950e+10,3.090675e+10,3.107886e+10,3.207477e+10,3.595450e+10,,
Zambia,Population,3.049586e+06,3.142848e+06,3.240664e+06,3.342894e+06,3.449266e+06,3.559687e+06,3.674088e+06,3.792864e+06,3.916928e+06,4.047479e+06,...,1.238151e+07,1.273868e+07,1.311458e+07,1.350785e+07,1.391744e+07,1.434353e+07,1.478658e+07,1.524609e+07,1.572134e+07,1.621177e+07
Zambia,GDP,6.987397e+08,6.823597e+08,6.792797e+08,7.043397e+08,8.226397e+08,1.061200e+09,1.239000e+09,1.340639e+09,1.573739e+09,1.926399e+09,...,1.275686e+10,1.405696e+10,1.791086e+10,1.532834e+10,2.026555e+10,2.345952e+10,2.550306e+10,2.804552e+10,2.713464e+10,2.120156e+10
Zimbabwe,Population,3.752390e+06,3.876638e+06,4.006262e+06,4.140804e+06,4.279561e+06,4.422132e+06,4.568320e+06,4.718612e+06,4.874113e+06,5.036321e+06,...,1.312794e+07,1.329780e+07,1.349546e+07,1.372100e+07,1.397390e+07,1.425559e+07,1.456548e+07,1.489809e+07,1.524586e+07,1.560275e+07


Therefore, levle = 0 is the same as level = 'Year'   
level = 1 is the same as level = 'Country'   
level = 2 is the same as level = 'Population'  and 'GDP'

In [37]:
a = world.unstack(level=2)
a

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,GDP
year,country,Unnamed: 2_level_1,Unnamed: 3_level_1
1960,Afghanistan,8.994793e+06,5.377778e+08
1960,Algeria,1.112489e+07,2.723638e+09
1960,Australia,1.027648e+07,1.856759e+10
1960,Austria,7.047539e+06,6.592694e+09
1960,"Bahamas, The",1.095260e+05,1.698023e+08
...,...,...,...
2015,Vietnam,9.170380e+07,1.935994e+11
2015,West Bank and Gaza,4.422143e+06,1.267740e+10
2015,World,7.346633e+09,7.343364e+13
2015,Zambia,1.621177e+07,2.120156e+10


In [38]:
a.columns

Index(['Population', 'GDP'], dtype='object')

In some cases, you may want to unstack multiple index levels at once. To do this, pass a list of index positions or names to the `level` parameter. This will move all specified index levels to the column index.

In [29]:
world.unstack(level=['year', 'country'])

year,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,...,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015
country,Afghanistan,Algeria,Australia,Austria,"Bahamas, The",Bangladesh,Belgium,Belize,Benin,Bermuda,...,United Kingdom,United States,Upper middle income,Uruguay,Uzbekistan,Vietnam,West Bank and Gaza,World,Zambia,Zimbabwe
Population,8994793.0,11124890.0,10276480.0,7047539.0,109526.0,48200700.0,9153489.0,92068.0,2431620.0,44400.0,...,65138230.0,321418800.0,2550326000.0,3431555.0,31299500.0,91703800.0,4422143.0,7346633000.0,16211770.0,15602750.0
GDP,537777800.0,2723638000.0,18567590000.0,6592694000.0,169802300.0,4274894000.0,11658720000.0,28072480.0,226195600.0,84466650.0,...,2848755000000.0,17947000000000.0,19732880000000.0,53442700000.0,66732800000.0,193599400000.0,12677400000.0,73433640000000.0,21201560000.0,13892940000.0


Just the way we can unstack a single index level, we can use a number of index if we dont know the name of the index level. The below code will gave the same result as the above code.

In [28]:
world.unstack(level=[0, 1])

year,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,...,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015
country,Afghanistan,Algeria,Australia,Austria,"Bahamas, The",Bangladesh,Belgium,Belize,Benin,Bermuda,...,United Kingdom,United States,Upper middle income,Uruguay,Uzbekistan,Vietnam,West Bank and Gaza,World,Zambia,Zimbabwe
Population,8994793.0,11124890.0,10276480.0,7047539.0,109526.0,48200700.0,9153489.0,92068.0,2431620.0,44400.0,...,65138230.0,321418800.0,2550326000.0,3431555.0,31299500.0,91703800.0,4422143.0,7346633000.0,16211770.0,15602750.0
GDP,537777800.0,2723638000.0,18567590000.0,6592694000.0,169802300.0,4274894000.0,11658720000.0,28072480.0,226195600.0,84466650.0,...,2848755000000.0,17947000000000.0,19732880000000.0,53442700000.0,66732800000.0,193599400000.0,12677400000.0,73433640000000.0,21201560000.0,13892940000.0


We can re-arrange the index level by passing the index level in the list.

In [42]:
world.unstack(level=[1, 1])

Unnamed: 0_level_0,country,Afghanistan,Algeria,Australia,Austria,"Bahamas, The",Bangladesh,Belgium,Belize,Benin,Bermuda,...,Faroe Islands,San Marino,Fragile and conflict affected situations,Kosovo,Montenegro,Timor-Leste,Sao Tome and Principe,South Sudan,Myanmar,Somalia
Unnamed: 0_level_1,country,Afghanistan,Algeria,Australia,Austria,"Bahamas, The",Bangladesh,Belgium,Belize,Benin,Bermuda,...,Faroe Islands,San Marino,Fragile and conflict affected situations,Kosovo,Montenegro,Timor-Leste,Sao Tome and Principe,South Sudan,Myanmar,Somalia
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
1960,Population,8.994793e+06,1.112489e+07,1.027648e+07,7.047539e+06,1.095260e+05,4.820070e+07,9.153489e+06,9.206800e+04,2.431620e+06,4.440000e+04,...,,,,,,,,,,
1960,GDP,5.377778e+08,2.723638e+09,1.856759e+10,6.592694e+09,1.698023e+08,4.274894e+09,1.165872e+10,2.807248e+07,2.261956e+08,8.446665e+07,...,,,,,,,,,,
1961,Population,9.164945e+06,1.140486e+07,1.048300e+07,7.086299e+06,1.151080e+05,4.959361e+07,9.183948e+06,9.470100e+04,2.466002e+06,4.550000e+04,...,,,,,,,,,,
1961,GDP,5.488889e+08,2.434767e+09,1.963938e+10,7.311750e+09,1.900962e+08,4.817580e+09,1.240015e+10,2.996500e+07,2.356682e+08,8.924999e+07,...,,,,,,,,,,
1962,Population,9.343772e+06,1.169015e+07,1.074200e+07,7.129864e+06,1.210830e+05,5.103060e+07,9.220578e+06,9.738900e+04,2.503232e+06,4.660000e+04,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2013,GDP,2.004633e+10,2.097035e+11,1.563951e+12,4.286986e+11,8.431750e+09,1.499905e+11,5.213705e+11,1.625828e+09,9.110801e+09,5.573710e+09,...,2.613459e+09,,7.864439e+11,7.073022e+09,4.464498e+09,1.319000e+09,3.056329e+08,1.325764e+10,5.865265e+10,5.352000e+09
2014,Population,3.162751e+07,3.893433e+07,2.346409e+07,8.541575e+06,3.830540e+05,1.590775e+08,1.123121e+07,3.517060e+05,1.059848e+07,,...,,,4.743954e+08,1.812771e+06,6.218100e+05,1.212107e+06,1.863420e+05,1.191118e+07,5.343716e+07,1.051757e+07
2014,GDP,2.005019e+10,2.135185e+11,1.454675e+12,4.368875e+11,8.510500e+09,1.728855e+11,5.312348e+11,1.717862e+09,9.575357e+09,,...,,,7.796821e+11,7.384901e+09,4.587742e+09,1.371173e+09,3.374135e+08,1.328208e+10,6.433004e+10,5.707000e+09
2015,Population,3.252656e+07,3.966652e+07,2.378117e+07,8.611088e+06,3.880190e+05,1.609956e+08,1.128572e+07,3.592870e+05,1.087983e+07,,...,,,4.856092e+08,1.797151e+06,6.223880e+05,1.245015e+06,,1.233981e+07,5.389715e+07,1.078710e+07


In [43]:
world.unstack(level=[1, 0])

country,Afghanistan,Algeria,Australia,Austria,"Bahamas, The",Bangladesh,Belgium,Belize,Benin,Bermuda,...,United Kingdom,United States,Upper middle income,Uruguay,Uzbekistan,Vietnam,West Bank and Gaza,World,Zambia,Zimbabwe
year,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,...,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015
Population,8994793.0,11124890.0,10276480.0,7047539.0,109526.0,48200700.0,9153489.0,92068.0,2431620.0,44400.0,...,65138230.0,321418800.0,2550326000.0,3431555.0,31299500.0,91703800.0,4422143.0,7346633000.0,16211770.0,15602750.0
GDP,537777800.0,2723638000.0,18567590000.0,6592694000.0,169802300.0,4274894000.0,11658720000.0,28072480.0,226195600.0,84466650.0,...,2848755000000.0,17947000000000.0,19732880000000.0,53442700000.0,66732800000.0,193599400000.0,12677400000.0,73433640000000.0,21201560000.0,13892940000.0


___

In [33]:
world

year  country                
1960  Afghanistan  Population    8.994793e+06
                   GDP           5.377778e+08
      Algeria      Population    1.112489e+07
                   GDP           2.723638e+09
      Australia    Population    1.027648e+07
                                     ...     
2015  World        GDP           7.343364e+13
      Zambia       Population    1.621177e+07
                   GDP           2.120156e+10
      Zimbabwe     Population    1.560275e+07
                   GDP           1.389294e+10
Length: 22422, dtype: float64

Just like a list.. we can coount from negative to postive and vice versa.Thus 

-1 is the same as 2     
-2 is the same as 1   
-3 is the same as 0    

In [40]:
# same as year index
world.unstack(level=-3)

Unnamed: 0_level_0,year,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Afghanistan,Population,8.994793e+06,9.164945e+06,9.343772e+06,9.531555e+06,9.728645e+06,9.935358e+06,1.014884e+07,1.036860e+07,1.059979e+07,1.084951e+07,...,2.518362e+07,2.587754e+07,2.652874e+07,2.720729e+07,2.796221e+07,2.880917e+07,2.972680e+07,3.068250e+07,3.162751e+07,3.252656e+07
Afghanistan,GDP,5.377778e+08,5.488889e+08,5.466667e+08,7.511112e+08,8.000000e+08,1.006667e+09,1.400000e+09,1.673333e+09,1.373333e+09,1.408889e+09,...,7.057598e+09,9.843842e+09,1.019053e+10,1.248694e+10,1.593680e+10,1.793024e+10,2.053654e+10,2.004633e+10,2.005019e+10,1.919944e+10
Albania,Population,,,,,,,,,,,...,2.992547e+06,2.970017e+06,2.947314e+06,2.927519e+06,2.913021e+06,2.904780e+06,2.900247e+06,2.896652e+06,2.893654e+06,2.889167e+06
Albania,GDP,,,,,,,,,,,...,8.992642e+09,1.070101e+10,1.288135e+10,1.204421e+10,1.192695e+10,1.289087e+10,1.231978e+10,1.278103e+10,1.327796e+10,1.145560e+10
Algeria,Population,1.112489e+07,1.140486e+07,1.169015e+07,1.198513e+07,1.229597e+07,1.262695e+07,1.298027e+07,1.335420e+07,1.374438e+07,1.414444e+07,...,3.374933e+07,3.426197e+07,3.481106e+07,3.540179e+07,3.603616e+07,3.671713e+07,3.743943e+07,3.818614e+07,3.893433e+07,3.966652e+07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Yemen, Rep.",GDP,,,,,,,,,,,...,1.908173e+10,2.563367e+10,3.039720e+10,2.845950e+10,3.090675e+10,3.107886e+10,3.207477e+10,3.595450e+10,,
Zambia,Population,3.049586e+06,3.142848e+06,3.240664e+06,3.342894e+06,3.449266e+06,3.559687e+06,3.674088e+06,3.792864e+06,3.916928e+06,4.047479e+06,...,1.238151e+07,1.273868e+07,1.311458e+07,1.350785e+07,1.391744e+07,1.434353e+07,1.478658e+07,1.524609e+07,1.572134e+07,1.621177e+07
Zambia,GDP,6.987397e+08,6.823597e+08,6.792797e+08,7.043397e+08,8.226397e+08,1.061200e+09,1.239000e+09,1.340639e+09,1.573739e+09,1.926399e+09,...,1.275686e+10,1.405696e+10,1.791086e+10,1.532834e+10,2.026555e+10,2.345952e+10,2.550306e+10,2.804552e+10,2.713464e+10,2.120156e+10
Zimbabwe,Population,3.752390e+06,3.876638e+06,4.006262e+06,4.140804e+06,4.279561e+06,4.422132e+06,4.568320e+06,4.718612e+06,4.874113e+06,5.036321e+06,...,1.312794e+07,1.329780e+07,1.349546e+07,1.372100e+07,1.397390e+07,1.425559e+07,1.456548e+07,1.489809e+07,1.524586e+07,1.560275e+07


In [41]:
# same as country index
world.unstack(level=-1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,GDP
year,country,Unnamed: 2_level_1,Unnamed: 3_level_1
1960,Afghanistan,8.994793e+06,5.377778e+08
1960,Algeria,1.112489e+07,2.723638e+09
1960,Australia,1.027648e+07,1.856759e+10
1960,Austria,7.047539e+06,6.592694e+09
1960,"Bahamas, The",1.095260e+05,1.698023e+08
...,...,...,...
2015,Vietnam,9.170380e+07,1.935994e+11
2015,West Bank and Gaza,4.422143e+06,1.267740e+10
2015,World,7.346633e+09,7.343364e+13
2015,Zambia,1.621177e+07,2.120156e+10


In [None]:
world.unstack(level="country")
world.unstack(level=-2)
world.unstack(level=2)

world.unstack([1, 0])
world.unstack(["country", "year"])

world.unstack([0, 1])
world.unstack(["year", "country"])

world.unstack(["year", "country"]).sort_index(axis=1)

# 📚 10. The pivot Method


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>

- The `pivot` method transforms data from a long (tall) format to a wide format.  
- Consider how the data expands when adding more entries:  
  - A long format grows vertically (down).  
  - A wide format expands horizontally (out).  
- The `index` parameter defines the row labels in the pivoted **DataFrame**.  
- The `columns` parameter determines which column's values become the new column headers.  
- The `values` parameter specifies the data to populate the pivoted **DataFrame**, filling the corresponding index-column intersections.

The syntax for the `pivot` method is as follows:

```python
pivot(index, columns, values)
```

In [44]:
sales = pd.read_csv("salesmen.csv")
sales

Unnamed: 0,Date,Salesman,Revenue
0,1/1/2025,Sharon,7172
1,1/2/2025,Sharon,6362
2,1/3/2025,Sharon,5982
3,1/4/2025,Sharon,7917
4,1/5/2025,Sharon,7837
...,...,...,...
1820,12/27/2025,Oscar,835
1821,12/28/2025,Oscar,3073
1822,12/29/2025,Oscar,6424
1823,12/30/2025,Oscar,7088


- The index parameter specifies the row labels in the pivoted DataFrame. In this case, the 'Date' column will be the row index.
- The columns parameter determines which column's values become the new column headers. Here, the content inside the 'Salesman' column will be the new column headers. 
- The values parameter specifies the data to populate the pivoted DataFrame, filling the corresponding index-column intersections. In this case, the 'Revenue' column will be the data values in the pivoted DataFrame.

In [45]:
#           Sharon   Oscar  Salesman 1  Salesman 2   New Salesman
# Date
# 1/1/2025	 7172	 1864
# 1/2/2025	 7543	 7105
# 1/3/2025	 1053	 6851

sales.pivot(index="Date", columns="Salesman", values="Revenue")

Salesman,Alexander,Dave,Oscar,Ronald,Sharon
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1/1/2025,4430,1864,5250,2639,7172
1/10/2025,301,7105,7663,8267,7543
1/11/2025,9489,6851,8888,1340,1053
1/12/2025,8719,7147,3092,279,4362
1/13/2025,2349,6160,6139,7540,6812
...,...,...,...,...,...
9/5/2025,2439,211,7743,4252,992
9/6/2025,7585,7293,5072,1112,556
9/7/2025,6669,9774,5230,3608,6499
9/8/2025,3058,8194,7755,5762,9621


# 📚 11. The melt Method


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>


- The `melt` method is the inverse of the `pivot` method.
- It takes a 'wide' dataset and converts it to a 'tall' dataset.
- The `melt` method is ideal when you have multiple columns storing the *same* data point.
- Ask yourself whether the column's values are a *type* of the column header. If they're not, the data is likely stored in a wide format.
- The `id_vars` parameters accepts the column whose values will be repeated for every column.
- The `var_name` parameter sets the name of the new column for the varying values (the former column names).
- The `value_name` parameter set the new name of the values column (holding the values from the original **DataFrame**).

The `melt` method is the inverse of the `pivot` method. It transforms a wide-format DataFrame into a long-format DataFrame. This operation is particularly useful when you have multiple columns storing the same data point.

When to use the `melt` method:
- If you have multiple columns storing the same data point, the data is likely stored in a wide format.
- Ask yourself whether the column's values are a type of the column header. If they're not, the data is likely stored in a wide format.

The syntax for the `melt` method is as follows:

```python
melt(id_vars, var_name, value_name)
```

The id_vars parameter accepts the column whose values will be repeated for every column. In this case, the 'Name' column will be repeated for every 'Salesman' column.

The var_name parameter sets the name of the new column for the varying values (the former column names). Here, the new column will be named 'Salesman'.

The value_name parameter sets the new name of the values column, which will hold the values from the original DataFrame. In this case, the new column will be named 'Revenue'.

If we actually define the `id_vars` parameter, the `melt` method will use all columns not specified in the `id_vars` parameter as the varying values. This is useful when you want to melt all columns except for a few.

In [46]:
quarters = pd.read_csv("quarters.csv")
quarters

Unnamed: 0,Salesman,Q1,Q2,Q3,Q4
0,Boris,602908,233879,354479,32704
1,Piers,43790,514863,297151,544493
2,Tommy,392668,113579,430882,247231
3,Travis,834663,266785,749238,570524
4,Cindy,580935,411379,110390,651572
5,Rob,656644,70803,375948,321388
6,Mike,486141,600753,742716,404995
7,Stacy,479662,742806,770712,2501
8,Alexandra,992673,879183,37945,293710


In [47]:
quarters.melt(id_vars="Salesman")

Unnamed: 0,Salesman,variable,value
0,Boris,Q1,602908
1,Piers,Q1,43790
2,Tommy,Q1,392668
3,Travis,Q1,834663
4,Cindy,Q1,580935
5,Rob,Q1,656644
6,Mike,Q1,486141
7,Stacy,Q1,479662
8,Alexandra,Q1,992673
9,Boris,Q2,233879


The same result will be obtained if we specify the `id_vars` parameter as a list of columns. This is useful when you want to melt multiple columns. This is usefful if we want to change the name of the column to a different name.

In [48]:
quarters.melt(id_vars="Salesman", var_name="Quarter", value_name="Revenue")

Unnamed: 0,Salesman,Quarter,Revenue
0,Boris,Q1,602908
1,Piers,Q1,43790
2,Tommy,Q1,392668
3,Travis,Q1,834663
4,Cindy,Q1,580935
5,Rob,Q1,656644
6,Mike,Q1,486141
7,Stacy,Q1,479662
8,Alexandra,Q1,992673
9,Boris,Q2,233879


# 📚 12. The pivot_table Method


<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">


</div>


- The `pivot_table` method operates similarly to the Pivot Table feature in Excel.
- A pivot table is a table whose values are aggregations of groups of values from another table.
- The `values` parameter accepts the numeric column whose values will be aggregated.
- The `aggfunc` parameter declares the aggregation function (the default is mean/average).
- The `index` parameter sets the index labels of the pivot table. MultiIndexes are permitted.
- The `columns` parameter sets the column labels of the pivot table. MultiIndexes are permitted.

The syntax for the `pivot_table` method is as follows:

```python
pivot_table(values, index, columns, aggfunc)
```

where:
- The values parameter specifies the numeric column whose values will be aggregated. In this case, the 'Revenue' column will be aggregated.
- The index parameter sets the index labels of the pivot table. Here, the 'Date' column will be the row index.
- The columns parameter sets the column labels of the pivot table. In this case, the 'Salesman' column will be the column index.
- The aggfunc parameter declares the aggregation function. The default is the mean/average. In this case, we'll use the sum function to aggregate the 'Revenue' values.

In [49]:
foods = pd.read_csv("foods.csv")
foods.head()

Unnamed: 0,First Name,Gender,City,Frequency,Item,Spend
0,Wanda,Female,Stamford,Weekly,Burger,15.66
1,Eric,Male,Stamford,Daily,Chalupa,10.56
2,Charles,Male,New York,Never,Sushi,42.14
3,Anna,Female,Philadelphia,Once,Ice Cream,11.01
4,Deborah,Female,Philadelphia,Daily,Chalupa,23.49


The first step is to create a pivot table using the `pivot_table` method. This method is similar to the Pivot Table feature in Excel, allowing you to aggregate groups of values from another table. We can pass the values and index parameters to specify the numeric column and row index, respectively. This method just average up the values base on the index and columns. Average is the default aggregation function.



In [50]:
foods.pivot_table(values="Spend", index="Gender")

Unnamed: 0_level_0,Spend
Gender,Unnamed: 1_level_1
Female,50.709629
Male,49.397623


We can specify the `aggfunc` parameter to declare the aggregation function. The default is the mean/average. In this case, we'll use the sum function to aggregate the 'Revenue' values. We can calculate various agrregate function such as sum, mean, median, mode, std, var, min, max, first, last, count, nunique, unique, and many more.

In [52]:
foods.pivot_table(values="Spend", index="Gender", aggfunc="std")

Unnamed: 0_level_0,Spend
Gender,Unnamed: 1_level_1
Female,27.665336
Male,27.807075


In [54]:
foods.pivot_table(values="Spend", index="City", aggfunc="count")

Unnamed: 0_level_0,Spend
City,Unnamed: 1_level_1
New York,313
Philadelphia,359
Stamford,328


In [55]:
foods.pivot_table(values="Spend", index="Gender", aggfunc="sum")

Unnamed: 0_level_0,Spend
Gender,Unnamed: 1_level_1
Female,25963.33
Male,24106.04


In [60]:

foods.pivot_table(values="Spend", index="Item", aggfunc="sum").round(2)

Unnamed: 0_level_0,Spend
Item,Unnamed: 1_level_1
Burger,7765.73
Burrito,8270.44
Chalupa,7644.52
Donut,8758.76
Ice Cream,8886.99
Sushi,8742.93


___

We can set the index into a list to have a multiindex to have a multiindex in the row index. This is useful when we have more than one index in the row index.

In [61]:
foods.pivot_table(values="Spend", index=["Gender", "Item"], aggfunc="sum")

Unnamed: 0_level_0,Unnamed: 1_level_0,Spend
Gender,Item,Unnamed: 2_level_1
Female,Burger,4094.3
Female,Burrito,4257.82
Female,Chalupa,4152.26
Female,Donut,4743.0
Female,Ice Cream,4032.87
Female,Sushi,4683.08
Male,Burger,3671.43
Male,Burrito,4012.62
Male,Chalupa,3492.26
Male,Donut,4015.76


In [None]:







foods.pivot_table(values="Spend", index=["Gender", "Item"], columns="City", aggfunc="sum")

foods.pivot_table(values="Spend", index="Item", columns=["Gender", "City"], aggfunc="sum")

foods.pivot_table(values="Spend", index="Item", columns=["Gender", "City"], aggfunc="mean")

foods.pivot_table(values="Spend", index="Item", columns=["Gender", "City"], aggfunc="count")

foods.pivot_table(values="Spend", index="Item", columns=["Gender", "City"], aggfunc="max")

foods.pivot_table(values="Spend", index="Item", columns=["Gender", "City"], aggfunc="min")

Gender,Female,Female,Female,Male,Male,Male
City,New York,Philadelphia,Stamford,New York,Philadelphia,Stamford
Item,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Burger,2.25,1.97,6.24,5.43,1.71,2.83
Burrito,1.02,1.04,1.18,15.9,8.58,3.64
Chalupa,1.96,9.35,9.09,11.61,1.94,10.56
Donut,3.15,2.13,1.68,1.49,1.26,6.63
Ice Cream,13.39,7.61,8.8,14.06,4.89,3.43
Sushi,2.52,11.68,8.2,3.28,2.01,32.15
