In [1]:
import pandas as pd

In [9]:
big = pd.read_csv("bigmac.csv", parse_dates= ["Date"])

In [10]:
big.shape

(652, 3)

In [11]:
big.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 652 entries, 0 to 651
Data columns (total 3 columns):
Date                   652 non-null datetime64[ns]
Country                652 non-null object
Price in US Dollars    652 non-null float64
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 15.4+ KB


In [20]:
big.set_index(["Date","Country"], inplace= True)

the idea behind setting multiindex values is that the field(s) with less unque values should be placed first before others<br>
this enables better view and categorisation

In [12]:
big.nunique()

Date                    12
Country                 58
Price in US Dollars    330
dtype: int64

### Extracting Rows from a MultiIndex DataFrame

In [21]:
big.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Price in US Dollars
Date,Country,Unnamed: 2_level_1
2016-01-01,Argentina,2.39
2016-01-01,Australia,3.74
2016-01-01,Brazil,3.35
2016-01-01,Britain,4.22
2016-01-01,Canada,4.14


We will use the .loc[] method. However this will take in a Tuple of the outer and inner index values you want to extract

In [30]:
big.sort_index(inplace= True)

In [37]:
big.loc[("2016-01-01", "Argentina")]

Price in US Dollars    2.39
Name: (2016-01-01 00:00:00, Argentina), dtype: float64

To get the exact field value:

In [38]:
big.loc[("2016-01-01", "Argentina"), "Price in US Dollars"]

Date        Country  
2016-01-01  Argentina    2.39
Name: Price in US Dollars, dtype: float64

The __.transpose()__ method changes the axis from y-axis to x-axis

In [41]:
big.transpose()

Date,2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-01,...,2016-01-01,2016-01-01,2016-01-01,2016-01-01,2016-01-01,2016-01-01,2016-01-01,2016-01-01,2016-01-01,2016-01-01
Country,Argentina,Australia,Brazil,Britain,Canada,Chile,China,Colombia,Costa Rica,Czech Republic,...,Switzerland,Taiwan,Thailand,Turkey,UAE,Ukraine,United States,Uruguay,Venezuela,Vietnam
Price in US Dollars,1.84,3.98,4.76,3.67,3.97,3.18,1.83,3.91,3.52,3.71,...,6.44,2.08,3.09,3.41,3.54,1.54,4.93,3.74,0.66,2.67


The .swaplevel() method swaps the MultiIndex columns

In [43]:
big.columns

Index(['Price in US Dollars'], dtype='object')

The __.to_frame()__ method changes a Series object to a DataFrame object type


The __.stack()__ method moves the indexesfrom the y-axis to the x-axis

### The pivot Method

In [50]:
sales = pd.read_csv("salesmen.csv", parse_dates=["Date"])

In [56]:
sales.sort_values("Salesman").head(3)

Unnamed: 0,Date,Salesman,Revenue
0,2016-01-01,Bob,7172
249,2016-09-06,Bob,556
248,2016-09-05,Bob,992


In [58]:
sales.shape

(1830, 3)

.pivot(index = " ", columns = " ", values = " ")
- The _index_ param assigns what field you want to use as the index 
- The _columns_ param assigns what __field values__ you want to use as the new columns(fields)
- The _values_ param assigns what field you want to get the records for on the "columns" above <br>
For example:

In [83]:
sales.pivot(index="Date", columns="Salesman", values="Revenue").head()

Salesman,Bob,Dave,Jeb,Oscar,Ronald
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-01-01,7172,1864,4430,5250,2639
2016-01-02,6362,8278,8026,8661,4951
2016-01-03,5982,4226,5188,7075,2703
2016-01-04,7917,3868,3144,2524,4258
2016-01-05,7837,2287,938,2793,7771


### The pivot_table() Method

In [62]:
food = pd.read_csv("foods.csv")

In [63]:
food.head(3)

Unnamed: 0,First Name,Gender,City,Frequency,Item,Spend
0,Wanda,Female,Stamford,Weekly,Burger,15.66
1,Eric,Male,Stamford,Daily,Chalupa,10.56
2,Charles,Male,New York,Never,Sushi,42.14


In [67]:
#food.pivot(index='Spend', columns="First Name", values="Item")
food.nunique()

First Name    198
Gender          2
City            3
Frequency       8
Item            6
Spend         950
dtype: int64

food.pivot_table(values=" ", index=" ",columns=" ", aggfunc=" ")
- the _values_ param states what column you want to get the values(or records) for
- the _index_ param states what columns you want to set as the index field
- The _columns_ param assigns what field values you want to use as the new columns(fields)
- the _aggfunc_ param states what aggregate function you want to use i.e. mean, median, sum, count e.t.c <br>

For example:

In [69]:
food.pivot_table(values="Spend", index="First Name", aggfunc="sum").head()

Unnamed: 0_level_0,Spend
First Name,Unnamed: 1_level_1
Aaron,284.35
Adam,51.45
Alan,239.99
Albert,271.94
Alice,489.77


The above shows the sum of what each person has spent

To get a statistic on what males and females spend on each item:

In [76]:
food.pivot_table(values="Spend", index=["First Name","Item"], aggfunc="sum").head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Spend
First Name,Item,Unnamed: 2_level_1
Aaron,Burrito,95.07
Aaron,Chalupa,51.43
Aaron,Donut,48.0
Aaron,Ice Cream,89.85
Adam,Burrito,51.45


In addition, we can also group the data by City fields by adding an extra param to the code. i.e:

In [82]:
food.pivot_table(values="Spend", index=["Gender","Item"],columns=["City"], aggfunc="max")

Unnamed: 0_level_0,City,New York,Philadelphia,Stamford
Gender,Item,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,Burger,98.96,97.79,85.06
Female,Burrito,92.25,96.79,99.21
Female,Chalupa,98.43,99.29,98.78
Female,Donut,95.63,96.52,91.75
Female,Ice Cream,97.83,88.14,97.44
Female,Sushi,99.51,99.02,95.43
Male,Burger,90.32,99.68,97.2
Male,Burrito,98.04,93.27,95.07
Male,Chalupa,96.44,98.4,99.87
Male,Donut,86.7,93.12,99.26
