# `Multi - index`

In [1]:
import numpy as np
import pandas as pd

### `A multi index object is an index object with multiple levels.`

- To create a multi index, we can pass multiple columns inside the `index_col` parameter while we are importing a csv file.

In [2]:
bigmac_url = 'https://raw.githubusercontent.com/sameerjha462000/datasets/main/bigmac.csv'
bigmac = pd.read_csv(bigmac_url, index_col=['Date', 'Country'])
bigmac

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.500000
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Canada,1.938776
2000-04-01,Switzerland,3.470588
...,...,...
2020-07-01,Ukraine,2.174714
2020-07-01,Uruguay,4.327418
2020-07-01,United States,5.710000
2020-07-01,Vietnam,2.847282


In [3]:
type(bigmac.index) # This is a multi index

pandas.core.indexes.multi.MultiIndex

- We can also create a multi index object by passing multiple column names inside the set_index() method.

In [4]:
bigmac_url = 'https://raw.githubusercontent.com/sameerjha462000/datasets/main/bigmac.csv'
bigmac = pd.read_csv(bigmac_url)
bigmac

Unnamed: 0,Date,Country,Price in US Dollars
0,2000-04-01,Argentina,2.500000
1,2000-04-01,Australia,1.541667
2,2000-04-01,Brazil,1.648045
3,2000-04-01,Canada,1.938776
4,2000-04-01,Switzerland,3.470588
...,...,...,...
1381,2020-07-01,Ukraine,2.174714
1382,2020-07-01,Uruguay,4.327418
1383,2020-07-01,United States,5.710000
1384,2020-07-01,Vietnam,2.847282


In [5]:
bigmac.set_index(keys = ['Date', 'Country'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.500000
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Canada,1.938776
2000-04-01,Switzerland,3.470588
...,...,...
2020-07-01,Ukraine,2.174714
2020-07-01,Uruguay,4.327418
2020-07-01,United States,5.710000
2020-07-01,Vietnam,2.847282


# `creating multi index object`

In [6]:
names = ['Abhishek', 'Daniyaal']
subjects = ['Phy', 'Chem', 'Math']

m1 = pd.MultiIndex.from_product([names, subjects])
m1

MultiIndex([('Abhishek',  'Phy'),
            ('Abhishek', 'Chem'),
            ('Abhishek', 'Math'),
            ('Daniyaal',  'Phy'),
            ('Daniyaal', 'Chem'),
            ('Daniyaal', 'Math')],
           )

In [7]:
index_values = []

for name in names:
    for subject in subjects:
        index_values.append((name, subject))
        
index_values

[('Abhishek', 'Phy'),
 ('Abhishek', 'Chem'),
 ('Abhishek', 'Math'),
 ('Daniyaal', 'Phy'),
 ('Daniyaal', 'Chem'),
 ('Daniyaal', 'Math')]

In [8]:
m2 = pd.MultiIndex.from_tuples(index_values)
m2

MultiIndex([('Abhishek',  'Phy'),
            ('Abhishek', 'Chem'),
            ('Abhishek', 'Math'),
            ('Daniyaal',  'Phy'),
            ('Daniyaal', 'Chem'),
            ('Daniyaal', 'Math')],
           )

In [9]:
m1.__class__.__name__, m2.__class__.__name__

('MultiIndex', 'MultiIndex')

## `levels inside multi index object`

In [10]:
names = ['Abhishek', 'Daniyaal']
subjects = ['Phy', 'Chem', 'Math']

m1 = pd.MultiIndex.from_product([names, subjects])
m1

MultiIndex([('Abhishek',  'Phy'),
            ('Abhishek', 'Chem'),
            ('Abhishek', 'Math'),
            ('Daniyaal',  'Phy'),
            ('Daniyaal', 'Chem'),
            ('Daniyaal', 'Math')],
           )

In [11]:
m1.levels

FrozenList([['Abhishek', 'Daniyaal'], ['Chem', 'Math', 'Phy']])

In [12]:
m1.levels[0]

Index(['Abhishek', 'Daniyaal'], dtype='object')

## `stack() and unstack()`

    The stack() method is used to move the column index to row index.
    
    The unstack() method does the reverse i.e moves row index to column index.

In [13]:
index_val = [('cse',2019),('cse',2020),('cse',2021),('cse',2022),('ece',2019),('ece',2020),('ece',2021),('ece',2022)]
multi_index = pd.MultiIndex.from_tuples(index_val)
multi_index

MultiIndex([('cse', 2019),
            ('cse', 2020),
            ('cse', 2021),
            ('cse', 2022),
            ('ece', 2019),
            ('ece', 2020),
            ('ece', 2021),
            ('ece', 2022)],
           )

In [14]:
s = pd.Series([1,2,3,4,5,6,7,8],index=multi_index)
s

cse  2019    1
     2020    2
     2021    3
     2022    4
ece  2019    5
     2020    6
     2021    7
     2022    8
dtype: int64

In [15]:
temp = s.unstack() # moved row index to column index
temp

Unnamed: 0,2019,2020,2021,2022
cse,1,2,3,4
ece,5,6,7,8


In [16]:
temp.stack() # moves column index to row index

cse  2019    1
     2020    2
     2021    3
     2022    4
ece  2019    5
     2020    6
     2021    7
     2022    8
dtype: int64

In [17]:
branch_df1 = pd.DataFrame(
    [
        [1,2],
        [3,4],
        [5,6],
        [7,8],
        [9,10],
        [11,12],
        [13,14],
        [15,16],
    ],
    index = multi_index,
    columns = ['avg_package','students']
)

branch_df1

Unnamed: 0,Unnamed: 1,avg_package,students
cse,2019,1,2
cse,2020,3,4
cse,2021,5,6
cse,2022,7,8
ece,2019,9,10
ece,2020,11,12
ece,2021,13,14
ece,2022,15,16


In [18]:
temp = branch_df1.unstack()
temp

Unnamed: 0_level_0,avg_package,avg_package,avg_package,avg_package,students,students,students,students
Unnamed: 0_level_1,2019,2020,2021,2022,2019,2020,2021,2022
cse,1,3,5,7,2,4,6,8
ece,9,11,13,15,10,12,14,16


In [19]:
temp.unstack() # moves row index further to column index

avg_package  2019  cse     1
                   ece     9
             2020  cse     3
                   ece    11
             2021  cse     5
                   ece    13
             2022  cse     7
                   ece    15
students     2019  cse     2
                   ece    10
             2020  cse     4
                   ece    12
             2021  cse     6
                   ece    14
             2022  cse     8
                   ece    16
dtype: int64

In [20]:
branch_df2 = pd.DataFrame(
    [
        [1,2,0,0],
        [3,4,0,0],
        [5,6,0,0],
        [7,8,0,0],
    ],
    index = [2019,2020,2021,2022],
    columns = pd.MultiIndex.from_product([['delhi','mumbai'],['avg_package','students']])
)

branch_df2

Unnamed: 0_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,avg_package,students,avg_package,students
2019,1,2,0,0
2020,3,4,0,0
2021,5,6,0,0
2022,7,8,0,0


In [21]:
temp = branch_df2.stack() # The stack() method is used to move column index to row index
temp

Unnamed: 0,Unnamed: 1,delhi,mumbai
2019,avg_package,1,0
2019,students,2,0
2020,avg_package,3,0
2020,students,4,0
2021,avg_package,5,0
2021,students,6,0
2022,avg_package,7,0
2022,students,8,0


In [22]:
temp.stack() # does not make sense however

2019  avg_package  delhi     1
                   mumbai    0
      students     delhi     2
                   mumbai    0
2020  avg_package  delhi     3
                   mumbai    0
      students     delhi     4
                   mumbai    0
2021  avg_package  delhi     5
                   mumbai    0
      students     delhi     6
                   mumbai    0
2022  avg_package  delhi     7
                   mumbai    0
      students     delhi     8
                   mumbai    0
dtype: int64

In [23]:
branch_df3 = pd.DataFrame(
    [
        [1,2,0,0],
        [3,4,0,0],
        [5,6,0,0],
        [7,8,0,0],
        [9,10,0,0],
        [11,12,0,0],
        [13,14,0,0],
        [15,16,0,0],
    ],
    index = multi_index,
    columns = pd.MultiIndex.from_product([['delhi','mumbai'],['avg_package','students']])
)

branch_df3

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0
ece,2019,9,10,0,0
ece,2020,11,12,0,0
ece,2021,13,14,0,0
ece,2022,15,16,0,0


In [24]:
temp = branch_df3.unstack()
temp

Unnamed: 0_level_0,delhi,delhi,delhi,delhi,delhi,delhi,delhi,delhi,mumbai,mumbai,mumbai,mumbai,mumbai,mumbai,mumbai,mumbai
Unnamed: 0_level_1,avg_package,avg_package,avg_package,avg_package,students,students,students,students,avg_package,avg_package,avg_package,avg_package,students,students,students,students
Unnamed: 0_level_2,2019,2020,2021,2022,2019,2020,2021,2022,2019,2020,2021,2022,2019,2020,2021,2022
cse,1,3,5,7,2,4,6,8,0,0,0,0,0,0,0,0
ece,9,11,13,15,10,12,14,16,0,0,0,0,0,0,0,0


In [25]:
temp.unstack() # does not make sense

delhi   avg_package  2019  cse     1
                           ece     9
                     2020  cse     3
                           ece    11
                     2021  cse     5
                           ece    13
                     2022  cse     7
                           ece    15
        students     2019  cse     2
                           ece    10
                     2020  cse     4
                           ece    12
                     2021  cse     6
                           ece    14
                     2022  cse     8
                           ece    16
mumbai  avg_package  2019  cse     0
                           ece     0
                     2020  cse     0
                           ece     0
                     2021  cse     0
                           ece     0
                     2022  cse     0
                           ece     0
        students     2019  cse     0
                           ece     0
                     2020  cse     0
 

In [26]:
temp.stack()

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0
ece,2019,9,10,0,0
ece,2020,11,12,0,0
ece,2021,13,14,0,0
ece,2022,15,16,0,0


In [27]:
temp.stack().stack()

Unnamed: 0,Unnamed: 1,Unnamed: 2,delhi,mumbai
cse,2019,avg_package,1,0
cse,2019,students,2,0
cse,2020,avg_package,3,0
cse,2020,students,4,0
cse,2021,avg_package,5,0
cse,2021,students,6,0
cse,2022,avg_package,7,0
cse,2022,students,8,0
ece,2019,avg_package,9,0
ece,2019,students,10,0


# `working with MultiIndex dataframes`

In [28]:
branch_df3

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0
ece,2019,9,10,0,0
ece,2020,11,12,0,0
ece,2021,13,14,0,0
ece,2022,15,16,0,0


In [29]:
branch_df3.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0
ece,2019,9,10,0,0


In [30]:
branch_df3.tail()

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2022,7,8,0,0
ece,2019,9,10,0,0
ece,2020,11,12,0,0
ece,2021,13,14,0,0
ece,2022,15,16,0,0


In [31]:
branch_df3.sample(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
ece,2019,9,10,0,0
ece,2021,13,14,0,0
cse,2022,7,8,0,0
ece,2020,11,12,0,0
cse,2020,3,4,0,0


In [32]:
branch_df3

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0
ece,2019,9,10,0,0
ece,2020,11,12,0,0
ece,2021,13,14,0,0
ece,2022,15,16,0,0


In [33]:
branch_df3.sort_values(by = ('delhi', 'avg_package'), ascending = False)

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
ece,2022,15,16,0,0
ece,2021,13,14,0,0
ece,2020,11,12,0,0
ece,2019,9,10,0,0
cse,2022,7,8,0,0
cse,2021,5,6,0,0
cse,2020,3,4,0,0
cse,2019,1,2,0,0


In [34]:
branch_df3.isnull().sum()

delhi   avg_package    0
        students       0
mumbai  avg_package    0
        students       0
dtype: int64

In [35]:
branch_df3.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 8 entries, ('cse', 2019) to ('ece', 2022)
Data columns (total 4 columns):
 #   Column                 Non-Null Count  Dtype
---  ------                 --------------  -----
 0   (delhi, avg_package)   8 non-null      int64
 1   (delhi, students)      8 non-null      int64
 2   (mumbai, avg_package)  8 non-null      int64
 3   (mumbai, students)     8 non-null      int64
dtypes: int64(4)
memory usage: 632.0+ bytes


In [36]:
branch_df3.describe()

Unnamed: 0_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,avg_package,students,avg_package,students
count,8.0,8.0,8.0,8.0
mean,8.0,9.0,0.0,0.0
std,4.898979,4.898979,0.0,0.0
min,1.0,2.0,0.0,0.0
25%,4.5,5.5,0.0,0.0
50%,8.0,9.0,0.0,0.0
75%,11.5,12.5,0.0,0.0
max,15.0,16.0,0.0,0.0


# `Accessing rows and columns`

In [37]:
branch_df3

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0
ece,2019,9,10,0,0
ece,2020,11,12,0,0
ece,2021,13,14,0,0
ece,2022,15,16,0,0


In [38]:
branch_df3['delhi']

Unnamed: 0,Unnamed: 1,avg_package,students
cse,2019,1,2
cse,2020,3,4
cse,2021,5,6
cse,2022,7,8
ece,2019,9,10
ece,2020,11,12
ece,2021,13,14
ece,2022,15,16


In [39]:
branch_df3[('delhi', 'avg_package')]

cse  2019     1
     2020     3
     2021     5
     2022     7
ece  2019     9
     2020    11
     2021    13
     2022    15
Name: (delhi, avg_package), dtype: int64

In [40]:
branch_df3.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0
ece,2019,9,10,0,0


In [41]:
branch_df3.iloc[1]

delhi   avg_package    3
        students       4
mumbai  avg_package    0
        students       0
Name: (cse, 2020), dtype: int64

In [42]:
branch_df3.iloc[1:4]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0


In [43]:
branch_df3.iloc[[1,4,3]]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2020,3,4,0,0
ece,2019,9,10,0,0
cse,2022,7,8,0,0


In [44]:
branch_df3.iloc[[1,4,3], 1::]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,students,avg_package,students
cse,2020,4,0,0
ece,2019,10,0,0
cse,2022,8,0,0


In [45]:
branch_df3.iloc[[1,4,3], ::2]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,avg_package
cse,2020,3,0
ece,2019,9,0
cse,2022,7,0


In [46]:
branch_df3.iloc[[1,4,3], [0, 3, 1]]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,mumbai,delhi
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,students
cse,2020,3,0,4
ece,2019,9,0,10
cse,2022,7,0,8


### `The loc accessor also works the same way as iloc accessor`

In [47]:
branch_df3

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2020,3,4,0,0
cse,2021,5,6,0,0
cse,2022,7,8,0,0
ece,2019,9,10,0,0
ece,2020,11,12,0,0
ece,2021,13,14,0,0
ece,2022,15,16,0,0


In [48]:
branch_df3.loc[('cse', 2019)]

delhi   avg_package    1
        students       2
mumbai  avg_package    0
        students       0
Name: (cse, 2019), dtype: int64

In [49]:
branch_df3.loc[('cse', 2019) : ('ece', 2021) : 2]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2019,1,2,0,0
cse,2021,5,6,0,0
ece,2019,9,10,0,0
ece,2021,13,14,0,0


In [50]:
branch_df3.loc[[('cse', 2020), ('ece', 2022), ('cse', 2019)]]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,delhi,mumbai,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students,avg_package,students
cse,2020,3,4,0,0
ece,2022,15,16,0,0
cse,2019,1,2,0,0


In [51]:
branch_df3.loc[('cse', 2019), [('delhi', 'avg_package'), ('mumbai', 'avg_package')]]

delhi   avg_package    1
mumbai  avg_package    0
Name: (cse, 2019), dtype: int64

In [52]:
branch_df3.loc[('cse', 2019) : ('ece', 2021) : 2, ('delhi', 'avg_package'): :3]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,students
cse,2019,1,0
cse,2021,5,0
ece,2019,9,0
ece,2021,13,0


In [53]:
branch_df3.loc[[('cse', 2020), ('ece', 2022), ('cse', 2019)], [('delhi', 'avg_package'), ('mumbai', 'avg_package')]]

Unnamed: 0_level_0,Unnamed: 1_level_0,delhi,mumbai
Unnamed: 0_level_1,Unnamed: 1_level_1,avg_package,avg_package
cse,2020,3,0
ece,2022,15,0
cse,2019,1,0


# `swaplevel`

In [54]:
df = pd.DataFrame(
    {"Grade": ["A", "B", "A", "C"]},
    index=[
        ["Final exam", "Final exam", "Coursework", "Coursework"],
        ["History", "Geography", "History", "Geography"],
        ["January", "February", "March", "April"],
    ],
)
df

Unnamed: 0,Unnamed: 1,Unnamed: 2,Grade
Final exam,History,January,A
Final exam,Geography,February,B
Coursework,History,March,A
Coursework,Geography,April,C


    In the following example, we will swap the levels of the indices. Here, we will swap the levels column-wise, but levels can be swapped row-wise in a similar manner. Note that column-wise is the default behaviour. By not supplying any arguments for i and j, we swap the last and second to last indices.

In [55]:
df.swaplevel()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Grade
Final exam,January,History,A
Final exam,February,Geography,B
Coursework,March,History,A
Coursework,April,Geography,C


    By supplying one argument, we can choose which index to swap the last index with. We can for example swap the first index with the last one as follows.

In [56]:
df.swaplevel(0)

Unnamed: 0,Unnamed: 1,Unnamed: 2,Grade
January,History,Final exam,A
February,Geography,Final exam,B
March,History,Coursework,A
April,Geography,Coursework,C


    We can also define explicitly which indices we want to swap by supplying values for both i and j. Here, we for example swap the first and second indices.

In [57]:
df.swaplevel(0, 1)

Unnamed: 0,Unnamed: 1,Unnamed: 2,Grade
History,Final exam,January,A
Geography,Final exam,February,B
History,Coursework,March,A
Geography,Coursework,April,C
