# MultiIndex

In [1]:
import pandas as pd

## This Module's Dataset

In [3]:
bigmac= pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d')
bigmac.head()

Unnamed: 0,Date,Country,Price in US Dollars
0,2000-04-01,Argentina,2.5
1,2000-04-01,Australia,1.541667
2,2000-04-01,Brazil,1.648045
3,2000-04-01,Canada,1.938776
4,2000-04-01,Switzerland,3.470588


In [4]:
bigmac.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1386 entries, 0 to 1385
Data columns (total 3 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   Date                 1386 non-null   datetime64[ns]
 1   Country              1386 non-null   object        
 2   Price in US Dollars  1386 non-null   float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 32.6+ KB


## Create a MultiIndex
- A **MultiIndex** is an index with multiple levels or layers.
- Pass the `set_index` method a list of colum names to create a multi-index **DataFrame**.
- The order of the list's values will determine the order of the levels.
- Alternatively, we can pass the `read_csv` function's `index_col` parameter a list of columns.

In [5]:
bigmac= pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d')
bigmac.head()

Unnamed: 0,Date,Country,Price in US Dollars
0,2000-04-01,Argentina,2.5
1,2000-04-01,Australia,1.541667
2,2000-04-01,Brazil,1.648045
3,2000-04-01,Canada,1.938776
4,2000-04-01,Switzerland,3.470588


In [9]:
# in this case, neither date or country columns are good indexes, because both columns contain duplicate values, but a combination of them themselves is indeed a good index setting

bigmac.set_index(keys= ['Date', 'Country']).head()
bigmac.set_index(keys= ['Country', 'Date']).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Country,Date,Unnamed: 2_level_1
Argentina,2000-04-01,2.5
Australia,2000-04-01,1.541667
Brazil,2000-04-01,1.648045
Canada,2000-04-01,1.938776
Switzerland,2000-04-01,3.470588


In [10]:
# the order "doesn't" matter, but in general it's recommended (and also it is more optimzed) to attribute as the first index the column with the lowest amount of unique values 

bigmac.nunique() # in this case, indeed, the date column works better as the first level index

Date                     33
Country                  57
Price in US Dollars    1350
dtype: int64

In [11]:
bigmac.set_index(keys= ['Date', 'Country'], inplace= True)

# another way to define it would be by applying it directly on the dataframe definition
bigmac= pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [15]:
bigmac.index[0]

(Timestamp('2000-04-01 00:00:00'), 'Argentina')

## Extract Index Level Values
- The `get_level_values` method extracts an **Index** with the values from one level in the **MultiIndex**.
- Invoke the `get_level_values` on the **MultiIndex**, not the **DataFrame** itself.
- The method expects either the level's index position or its name.

In [16]:
bigmac= pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [22]:
bigmac.index # each index is a tuple now

bigmac.index.get_level_values('Date')
bigmac.index.get_level_values(0) # accessing the first element of the tuple

bigmac.index.get_level_values('Country')
bigmac.index.get_level_values(1) # accessing the second element of the tuple

MultiIndex([('2000-04-01',            'Argentina'),
            ('2000-04-01',            'Australia'),
            ('2000-04-01',               'Brazil'),
            ('2000-04-01',              'Britain'),
            ('2000-04-01',               'Canada'),
            ('2000-04-01',                'Chile'),
            ('2000-04-01',                'China'),
            ('2000-04-01',       'Czech Republic'),
            ('2000-04-01',              'Denmark'),
            ('2000-04-01',            'Euro area'),
            ...
            ('2020-07-01',               'Sweden'),
            ('2020-07-01',          'Switzerland'),
            ('2020-07-01',               'Taiwan'),
            ('2020-07-01',             'Thailand'),
            ('2020-07-01',               'Turkey'),
            ('2020-07-01',              'Ukraine'),
            ('2020-07-01', 'United Arab Emirates'),
            ('2020-07-01',        'United States'),
            ('2020-07-01',              'Uruguay

## Rename Index Levels
- Invoke the `set_names` method on the **MultiIndex** to change one or more level names.
- Use the `names` and `level` parameter to target a nested index at a given level.
- Alternatively, pass `names` a list of strings to overwrite *all* level names.
- The `set_names` method returns a copy, so replace the original index to alter the **DataFrame**.

In [26]:
bigmac= pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [27]:
bigmac.index.set_names('Calendar', level=0) # it generates a new index object (we need to apply it to dataframe if we want to proceed)
bigmac.index.set_names('Location', level=1, inplace=True)

bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Location,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [29]:
bigmac.index.set_names(['Calendar', 'Country'], inplace= True)
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Calendar,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


## The sort_index Method on a MultiIndex DataFrame
- Using the `sort_index` method, we can target all levels or specific levels of the **MultiIndex**.
- To apply a different sort order to different levels, pass a list of Booleans.

In [30]:
bigmac= pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [34]:
# by default sort_index() method sorts all the indexes levels in ascending order, but we can change it 
bigmac.sort_index()
bigmac.sort_index(ascending= True) # exactly the same result
bigmac.sort_index(ascending= False) # the complete opposite result

bigmac.sort_index(ascending= [True, False]) # dates sorted in ascending order, while country in descending order
bigmac.sort_index(ascending= [False, True]) # dates sorted in descending order, while country in ascending order

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2020-07-01,Argentina,3.509232
2020-07-01,Australia,4.578450
2020-07-01,Azerbaijan,2.324897
2020-07-01,Bahrain,3.713035
2020-07-01,Brazil,3.913528
...,...,...
2000-04-01,Sweden,2.714932
2000-04-01,Switzerland,3.470588
2000-04-01,Taiwan,2.287582
2000-04-01,Thailand,1.447368


## Extract Rows from a MultiIndex DataFrame
- A **tuple** is an immutable list. It cannot be modified after creation.
- Create a tuple with a comma between elements. The community convention is to wrap the elements in parentheses.
- The `iloc` and `loc` accessors are available to extract rows by index position or label.
- For the `loc` accessor, pass a tuple to hold the labels from the index levels.

In [38]:
bigmac= pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [68]:
bigmac.dtypes

Price in US Dollars    float64
dtype: object

In [49]:
1,
1,2

type((1,2))

tuple

In [53]:
bigmac.iloc[ 2 ]

Price in US Dollars    1.648045
Name: (2000-04-01 00:00:00, Brazil), dtype: float64

In [66]:
bigmac.loc['2000-04-01'].head() # another dataframe

bigmac.loc['2000-04-01', 'Canada'] # a series refering to that row

bigmac.loc['2000-04-01', 'Price in US Dollars'] # a series refering to that index (still works)

bigmac.loc[ ('2000-04-01', 'Canada'), 'Price in US Dollars' ]

bigmac.loc[ ('2000-04-01', 'Hungary'):('2000-04-01', 'Poland') ]

# it's still the same loc accessor we've seen before, but now we have to deal with more levels of information

bigmac.loc[ ('2019-07-04', 'Hungary'): ] # filter rows until the end begining from hungary in that day
bigmac.loc[ :('2019-07-04', 'Hungary') ] # filter rows until hungary in that day begining from the initial point

bigmac.loc[ ('2012-06-20', 'Brazil'): ('2013-04-14', 'Turkey'), 'Price in US Dollars']

Date        Country      
2012-07-01  Argentina        4.160964
            Australia        4.680156
            Brazil           4.935974
            Britain          4.162371
            Canada           5.022316
                               ...   
2013-01-01  Turkey           4.777385
            UAE              3.267040
            Ukraine          2.333006
            United States    4.367396
            Uruguay          5.446058
Name: Price in US Dollars, Length: 82, dtype: float64

## Some Chat GPT practising exercises

In [77]:
#1) Write a query to extract all rows for the country 'USA' on the date '2010-01-01'

bigmac.loc[('2010-01-01', 'United States')]

Price in US Dollars    3.58
Name: (2010-01-01 00:00:00, United States), dtype: float64

In [87]:
bigmac.loc[[('2010-01-01', 'Argentina'), ('2010-07-01', 'Argentina')] ]

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2010-01-01,Argentina,1.842833
2010-07-01,Argentina,3.558945


In [95]:
#2) Compute the average price for each country between '2010-01-01' and '2010-07-01'

bigmac.loc['2010-01-01':'2010-07-01'].groupby('Country')['Price in US Dollars'].agg(MeanPrice='mean', TotalPrice= 'sum').head()

Unnamed: 0_level_0,MeanPrice,TotalPrice
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Argentina,2.700889,5.401778
Australia,3.907394,7.814788
Brazil,4.832747,9.665493
Britain,3.577438,7.154876
Canada,3.987171,7.974341


In [107]:
#3) Identify the country with the maximum price on '2010-01-01'.

bigmac.loc['2010-01-01'].sort_values(by= 'Price in US Dollars', ascending= False)
bigmac.loc['2010-01-01'].idxmax()

Price in US Dollars    Norway
dtype: object

In [109]:
bigmac.index.get_level_values(1).unique()

Index(['Argentina', 'Australia', 'Brazil', 'Britain', 'Canada', 'Chile',
       'China', 'Czech Republic', 'Denmark', 'Euro area', 'Hong Kong',
       'Hungary', 'Indonesia', 'Israel', 'Japan', 'Malaysia', 'Mexico',
       'New Zealand', 'Poland', 'Russia', 'Singapore', 'South Africa',
       'South Korea', 'Sweden', 'Switzerland', 'Taiwan', 'Thailand',
       'United States', 'Philippines', 'Norway', 'Peru', 'Turkey', 'Egypt',
       'Colombia', 'Costa Rica', 'Pakistan', 'Saudi Arabia', 'Sri Lanka',
       'Ukraine', 'Uruguay', 'UAE', 'India', 'Vietnam', 'Azerbaijan',
       'Bahrain', 'Croatia', 'Guatemala', 'Honduras', 'Jordan', 'Kuwait',
       'Lebanon', 'Moldova', 'Nicaragua', 'Oman', 'Qatar', 'Romania',
       'United Arab Emirates'],
      dtype='object', name='Country')

In [116]:
#4) Extract all rows for the countries ['USA', 'Canada', 'UAE'] on the dates ['2010-01-01', '2010-07-01']

bigmac.loc[
    [ 
        ('2010-01-01', 'United States'),
        ('2010-01-01', 'Canada'),
        #('2010-01-01', 'United Arab Emirates'),
        ('2010-07-01', 'United States'),
        ('2010-07-01', 'Canada')#,
        #('2010-07-01', 'United Arab Emirates'),
    ]
]

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2010-01-01,United States,3.58
2010-01-01,Canada,3.973765
2010-07-01,United States,3.733333
2010-07-01,Canada,4.000576


In [136]:
#5) Compute the 3-day rolling average of prices for the country 'USA'.
bigmac.loc[ bigmac.index.get_level_values('Country') == 'United States' ].rolling(window=3).mean().head()


Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,United States,
2001-04-01,United States,
2002-04-01,United States,2.513333
2003-04-01,United States,2.58
2004-05-01,United States,2.7


In [132]:
#6) Extract all rows where the price exceeds $1,000 for the country 'Brazil'
bigmac.loc[ (bigmac.index.get_level_values('Country').isin(['Brazil'])) &  (bigmac['Price in US Dollars'] > 1) ].head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Brazil,1.648045
2001-04-01,Brazil,1.643836
2002-04-01,Brazil,1.538462
2003-04-01,Brazil,1.482085
2004-05-01,Brazil,1.698113


## The transpose Method
- The `transpose` method inverts/flips the horizontal and vertical axes of the **DataFrame**.

In [1]:
import pandas as pd

In [2]:
bigmac= pd.read_csv('bigmac.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col=['Date', 'Country']).sort_index()
bigmac.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2000-04-01,Argentina,2.5
2000-04-01,Australia,1.541667
2000-04-01,Brazil,1.648045
2000-04-01,Britain,3.002
2000-04-01,Canada,1.938776


In [6]:
# pandas allows multi-index for both rows and columns, and we can achieve it by using transpose method

start= ('2010-01-01', 'China')
end= ('2010-01-01', 'Denmark')

bigmac.loc[start:end].transpose()

Date,2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-01
Country,China,Colombia,Costa Rica,Czech Republic,Denmark
Price in US Dollars,1.830912,3.91296,3.51889,3.714432,5.993859


## The stack Method
- The `stack` method moves the column index to the row index.
- Pandas will return a **MultiIndex Series**.
- Think of it like "stacking" index levels for a **MultiIndex**.

In [8]:
pd.read_csv('worldstats.csv').nunique()

country         252
year             56
Population    11067
GDP           11065
dtype: int64

In [15]:
# based on the previous result we conclude that year should be a good candidate for the outermost level of our multi-index, while country would be better fit inside it

world= pd.read_csv('worldstats.csv', index_col= ['year', 'country']).sort_index(ascending=[True, True])
world.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,GDP
year,country,Unnamed: 2_level_1,Unnamed: 3_level_1
1960,Afghanistan,8994793.0,537777800.0
1960,Algeria,11124892.0,2723638000.0
1960,Australia,10276477.0,18567590000.0
1960,Austria,7047539.0,6592694000.0
1960,"Bahamas, The",109526.0,169802300.0


In [23]:
world.stack()
# it works bringing the columns of the dataframe as new levels of index, and so returning a series instead of a dataframe

year  country                
1960  Afghanistan  Population    8.994793e+06
                   GDP           5.377778e+08
      Algeria      Population    1.112489e+07
                   GDP           2.723638e+09
      Australia    Population    1.027648e+07
                                     ...     
2015  World        GDP           7.343364e+13
      Zambia       Population    1.621177e+07
                   GDP           2.120156e+10
      Zimbabwe     Population    1.560275e+07
                   GDP           1.389294e+10
Length: 22422, dtype: float64

In [34]:
world.stack().loc[ (1960)].to_frame('Numero')
# to_frame method converts series to dataframe

Unnamed: 0_level_0,Unnamed: 1_level_0,Numero
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,Population,8.994793e+06
Afghanistan,GDP,5.377778e+08
Algeria,Population,1.112489e+07
Algeria,GDP,2.723638e+09
Australia,Population,1.027648e+07
...,...,...
World,GDP,1.364643e+12
Zambia,Population,3.049586e+06
Zambia,GDP,6.987397e+08
Zimbabwe,Population,3.752390e+06


## The unstack Method
- The `unstack` method moves a row index to the column index (the inverse of the `stack` method).
- By default, the `unstack` method will move the innermost index.
- We can customize the moved index with the `level` parameter.
- The `level` parameter accepts the level's index position or its name. It can also accept a list of positions/names.

In [35]:
world= pd.read_csv('worldstats.csv', index_col= ['year', 'country']).sort_index(ascending=[True, True]).stack()
world.head()

year  country                
1960  Afghanistan  Population    8.994793e+06
                   GDP           5.377778e+08
      Algeria      Population    1.112489e+07
                   GDP           2.723638e+09
      Australia    Population    1.027648e+07
dtype: float64

In [41]:
world.unstack()
world.unstack().unstack().head()
#world.unstack().unstack().columns # now our columns are multiindex

Unnamed: 0_level_0,Population,Population,Population,Population,Population,Population,Population,Population,Population,Population,...,GDP,GDP,GDP,GDP,GDP,GDP,GDP,GDP,GDP,GDP
country,Afghanistan,Albania,Algeria,Andorra,Angola,Antigua and Barbuda,Arab World,Argentina,Armenia,Aruba,...,Uzbekistan,Vanuatu,"Venezuela, RB",Vietnam,Virgin Islands (U.S.),West Bank and Gaza,World,"Yemen, Rep.",Zambia,Zimbabwe
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1960,8994793.0,,11124892.0,,,,,,,,...,,,8607600000.0,,24200000.0,,1364643000000.0,,698739700.0,1052990000.0
1961,9164945.0,,11404859.0,,,,,,,,...,,,8923367000.0,,25700000.0,,1420440000000.0,,682359700.0,1096647000.0
1962,9343772.0,,11690152.0,,,,,21287682.0,,,...,,,9873398000.0,,36900000.0,,1524573000000.0,,679279700.0,1117602000.0
1963,9531555.0,,11985130.0,,,,,21621845.0,,,...,,,10663380000.0,,41400000.0,,1638187000000.0,,704339700.0,1159512000.0
1964,9728645.0,,12295973.0,,,,,21953926.0,,,...,,,9113581000.0,,53800000.0,,1799675000000.0,,822639700.0,1217138000.0


In [47]:
world.unstack(level=1).head()
world.unstack(level='country').head()
world.unstack(level=0).head()
world.unstack(level='year').head()

Unnamed: 0_level_0,year,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Afghanistan,Population,8994793.0,9164945.0,9343772.0,9531555.0,9728645.0,9935358.0,10148840.0,10368600.0,10599790.0,10849510.0,...,25183620.0,25877540.0,26528740.0,27207290.0,27962210.0,28809170.0,29726800.0,30682500.0,31627510.0,32526560.0
Afghanistan,GDP,537777800.0,548888900.0,546666700.0,751111200.0,800000000.0,1006667000.0,1400000000.0,1673333000.0,1373333000.0,1408889000.0,...,7057598000.0,9843842000.0,10190530000.0,12486940000.0,15936800000.0,17930240000.0,20536540000.0,20046330000.0,20050190000.0,19199440000.0
Albania,Population,,,,,,,,,,,...,2992547.0,2970017.0,2947314.0,2927519.0,2913021.0,2904780.0,2900247.0,2896652.0,2893654.0,2889167.0
Albania,GDP,,,,,,,,,,,...,8992642000.0,10701010000.0,12881350000.0,12044210000.0,11926950000.0,12890870000.0,12319780000.0,12781030000.0,13277960000.0,11455600000.0
Algeria,Population,11124890.0,11404860.0,11690150.0,11985130.0,12295970.0,12626950.0,12980270.0,13354200.0,13744380.0,14144440.0,...,33749330.0,34261970.0,34811060.0,35401790.0,36036160.0,36717130.0,37439430.0,38186140.0,38934330.0,39666520.0


In [53]:
world.unstack([0,1]).head() # apply the unstack method to all levels provided
world.unstack([1,0]).head().sort_index()

country,Afghanistan,Algeria,Australia,Austria,"Bahamas, The",Bangladesh,Belgium,Belize,Benin,Bermuda,...,United Kingdom,United States,Upper middle income,Uruguay,Uzbekistan,Vietnam,West Bank and Gaza,World,Zambia,Zimbabwe
year,1960,1960,1960,1960,1960,1960,1960,1960,1960,1960,...,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015
GDP,537777800.0,2723638000.0,18567590000.0,6592694000.0,169802300.0,4274894000.0,11658720000.0,28072480.0,226195600.0,84466650.0,...,2848755000000.0,17947000000000.0,19732880000000.0,53442700000.0,66732800000.0,193599400000.0,12677400000.0,73433640000000.0,21201560000.0,13892940000.0
Population,8994793.0,11124890.0,10276480.0,7047539.0,109526.0,48200700.0,9153489.0,92068.0,2431620.0,44400.0,...,65138230.0,321418800.0,2550326000.0,3431555.0,31299500.0,91703800.0,4422143.0,7346633000.0,16211770.0,15602750.0


## The pivot Method
- The `pivot` method reshapes data from a tall format to a wide format.
- Ask yourself which direction the data will expand in if you add more entries.
- A tall/long format expands down. A wide format expands out.
- The `index` parameter sets the horizontal index of the pivoted **DataFrame**.
- The `columns` parameter sets the column whose values will be the columns in the pivoted **DataFrame**.
- The `values` parameter set the values of the pivoted **DataFrame**. Pandas will populate the correct values based on the index and column intersections.

In [9]:
sales= pd.read_csv('salesmen.csv', parse_dates=['Date'], date_format='%m/%d/%Y')
sales.head()

Unnamed: 0,Date,Salesman,Revenue
0,2025-01-01,Sharon,7172
1,2025-01-02,Sharon,6362
2,2025-01-03,Sharon,5982
3,2025-01-04,Sharon,7917
4,2025-01-05,Sharon,7837


In [14]:
sales.pivot(index= 'Date', columns= 'Salesman', values= 'Revenue')
# index: the column we want to be the new index of the pivoted df
# columns: columns (or, columun) we want to pivote (it means its row values will be presented as new columns)
# values: the column which contains the values we want to see on the new pivoted columns

Salesman,Alexander,Dave,Oscar,Ronald,Sharon
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-01-01,4430,1864,5250,2639,7172
2025-01-02,8026,8278,8661,4951,6362
2025-01-03,5188,4226,7075,2703,5982
2025-01-04,3144,3868,2524,4258,7917
2025-01-05,938,2287,2793,7771,7837
...,...,...,...,...,...
2025-12-27,6666,2843,835,2981,2045
2025-12-28,1243,8888,3073,6129,100
2025-12-29,3498,9490,6424,7662,4115
2025-12-30,8858,3594,7088,2570,2577


## The melt Method
- The `melt` method is the inverse of the `pivot` method.
- It takes a 'wide' dataset and converts it to a 'tall' dataset.
- The `melt` method is ideal when you have multiple columns storing the *same* data point.
- Ask yourself whether the column's values are a *type* of the column header. If they're not, the data is likely stored in a wide format.
- The `id_vars` parameters accepts the column whose values will be repeated for every column.
- The `var_name` parameter sets the name of the new column for the varying values (the former column names).
- The `value_name` parameter set the new name of the values column (holding the values from the original **DataFrame**).

In [None]:
world= pd.read_csv('worldstats.csv', index_col= ['year', 'country']).sort_index(ascending=[True, True]).stack()
world.head()

## The pivot_table Method
- The `pivot_table` method operates similarly to the Pivot Table feature in Excel.
- A pivot table is a table whose values are aggregations of groups of values from another table.
- The `values` parameter accepts the numeric column whose values will be aggregated.
- The `aggfunc` parameter declares the aggregation function (the default is mean/average).
- The `index` parameter sets the index labels of the pivot table. MultiIndexes are permitted.
- The `columns` parameter sets the column labels of the pivot table. MultiIndexes are permitted.

In [None]:
world= pd.read_csv('worldstats.csv', index_col= ['year', 'country']).sort_index(ascending=[True, True]).stack()
world.head()