<img src="Images\Slide1.PNG" />

In [None]:
import pandas as pd

<img src="Images\Slide2.PNG" />

In [None]:
bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"]) #notice we parsed the dates
bigmac.head(3)

In [None]:
bigmac.info()
#bigmac.dtypes

<img src="Images\Slide3.PNG" />

<img src="Images\Slide4.PNG" />

In [None]:
#bigmac.set_index(keys = ["Date", "Country"])
#bigmac.set_index(keys = ["Country", "Date"])
#bigmac["Country"].value_counts()

In [None]:
bigmac.set_index(keys = ["Date", "Country"], inplace = True)
bigmac.head(3)

<div class = "alert alert-block alert-info">
Now let's sort the index, notice how it orders both indices


In [None]:
bigmac.sort_index(inplace = True) # add inplace = True

<div class = "alert alert-block alert-info">
By the way, we can do all sorts of interesting things with the sort_index() method

In [None]:
bigmac.sort_index(ascending = [False, True]) 
bigmac.sort_index(level = 1, ascending = True) # levels start at 0

In [None]:
bigmac.head(3)

<div class = "alert alert-block alert-info">
Now let's check out the attributes - look what happens when we check our index and type

In [None]:
#bigmac.index
#bigmac.index.names
type(bigmac.index)

<div class = "alert alert-block alert-info">
And what if we try to access the first index?

In [None]:
bigmac.index[0]
#Notice it's a tuple consisting of a timestamp and a string

<img src="Images\Slide5.PNG" />

<div class = "alert alert-block alert-info">
The next method we'll learn is the get_level_values, which give us the index at the level we choose

In [None]:
#bigmac.index.get_level_values(0)
#bigmac.index.get_level_values("Date")
#bigmac.index.get_level_values(1)
#bigmac.index.get_level_values("Country")
#bigmac.index.get_level_values(2)

<div class = "alert alert-block alert-info">
We can also use the .set_names method to change the name of our indices

In [None]:
bigmac.index.set_names(["Zman", "Location"], inplace = True)

In [None]:
bigmac.head(3)

In [None]:
#now let's return things to normal
bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"], index_col = ["Date", "Country"])
bigmac.sort_index(inplace = True)

<img src="Images\Slide6.PNG" />

<div class = "alert alert-block alert-info">
Next, let's see how we can extract rows from multiIndices. <br>
We'll do this using the .loc and .iloc methods, with a tuple to represent our multiIndex

In [None]:
bigmac.loc[("2010-01-01")] #Here we filtered only on the first level index
bigmac.loc[("2010-01-01", "Brazil")] #Here we filtered on both levels, but not on columns
bigmac.loc[("2010-01-01", "Brazil"), "Price in US Dollars"] #Here we filtered on everything - this would be useful if we had more columns...

In [None]:
#we can also use the iloc method to the same extent, but it's harder to filter on the multiIndex, because it treats all levels as one
bigmac.iloc[0, 0]
bigmac.iloc[4,0]


In [None]:
bigmac.ix[("2016-01-01", "China"), 0] #we can also use ix, but ix is deprecated

<img src="Images\Slide7.PNG" />

## The `.transpose()`  and `.swaplevel()` Methods

<div class = "alert alert-block alert-info">
Next, we'll learn about the transpose method - which basicly turns around the axis of the Dataframe - let's take a look:

In [None]:
bigmac.head(3)

In [None]:
bigmac.transpose()

In [None]:
bigmac = bigmac.transpose()

<div class = "alert alert-block alert-info">
This can sometimes be VERY useful - for example when you have a series and want to turn it into a list of features. <br>
Now let's observe how we extract from our transposed dataframe:

In [None]:
bigmac.loc["Price in US Dollars", ("2016-01-01", "Denmark")]

In [None]:
#now let's put things back to normal
bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"], index_col = ["Date", "Country"])
bigmac.sort_index(inplace = True)

In [None]:
bigmac.swaplevel()

## Pivot, Stack, Unstack, Melt

In [None]:
pd.read_csv("worldstats.csv", index_col = ["country", "year"])
world = pd.read_csv("worldstats.csv", index_col = ["country", "year"])
world.head(3)

In [None]:
world.stack()

In [None]:
#To prove the columns became an index - we'll turn the series into a Dataframe
world.stack().to_frame()

<div class = "alert alert-block alert-info">
The reason our column is called 0 is because that's pandas' default name. <br>
<br>
Next, let's look at unstack, which does, like you guessed, exactly the opposite.

In [None]:
s = world.stack()
s.head(3)

In [None]:
s.unstack()
#s.unstack().unstack()
#s.unstack().unstack().unstack()

<div class = "alert alert-block alert-info">
We can also give unstack an index number, if we don't want it to choose the deepest layer by default <br>

In [None]:
s.unstack(0)
#s.unstack(1)
#s.unstack(2)
#s.unstack(-1)
#s.unstack(-2)

<div class = "alert alert-block alert-info">
We can also unstack by level name.

In [None]:
s.unstack(level = ["year", "country"])

<div class = "alert alert-block alert-info">
And last, pandas sometimes creates cells with no values in order to keep the consistency of the Dataframe. <br>
The fill_value parameter allows us to replace the dafault NaN with a value

In [None]:
s.unstack(1)

In [None]:
s = s.unstack("year", fill_value = 0)

In [None]:
s.head()

<div class = "alert alert-block alert-info">
Next we'll learn about the pivot method. <br>
But first, a quick reminder what is pivot:

![slide1](pivot_excel.png)

<div class = "alert alert-block alert-info">
Pivot is extremely simple in pandas - <br>
Let's start by loading a new dataframe. This dataset contains the yearly revenue of 5 salesmen of a fictional company.

In [None]:
pd.read_csv("salesmen.csv")

In [None]:
sales = pd.read_csv("salesmen.csv", parse_dates = ["Date"]) #Notice we parsed the dates
sales["Salesman"] = sales["Salesman"].astype("category") #Notice we changed this column to "category" - why?
sales.head(3)

In [None]:
sales.pivot(index = "Date", columns="Salesman", values="Revenue")

<div class = "alert alert-block alert-info">
Wow - that makes much more sense! <br>
Notice that this DF is 1/5 as long as our previous DF <br>
<br>
Next, let's look at pivot_table. Pivot Table, much like in excel, allows us to pivot and combine that with an aggregative function. <br>
Let's import a new dataset of customers in a fictional restaurant, and what they like to order

In [None]:
foods =pd.read_csv("foods.csv")
foods.head(3)

In [None]:
#foods.pivot_table(values="Spend", index="Gender", aggfunc="sum")
#foods.pivot_table(values="Spend", index="Item", aggfunc="sum")
#foods.pivot_table(values="Spend", index=["Gender", "Item"], aggfunc="sum")
#foods.pivot_table(values="Spend", index=["Gender", "Item"], aggfunc="mean")
#foods.pivot_table(values="Spend", index=["Gender", "Item"], columns= "City", aggfunc="mean")

<div class = "alert alert-block alert-info">
Pop quiz - How would the dataframes from the following 2 lines will look like?

In [None]:
#foods.pivot_table(values="Spend", index=["Gender", "Item"], columns= "City", aggfunc="min")
#foods.pivot_table(values="Spend", index=["City", "Item"], columns= "Frequency", aggfunc="mean")

<div class = "alert alert-block alert-info">
Lastly, we'll learn the pd.melt method, which allows us to "melt" our columns into one and turn them into values. <br>
Melt UNPIVOTS a dataframe, and we can choose which columns to keep <br>
It'll make more sense when you see it - let's start with importing a very small dataset of 9 salesmen and how much they sold each quarter

In [None]:
sales = pd.read_csv("quarters.csv")
sales

In [None]:
pd.melt(sales, id_vars="Salesman")
#pd.melt(sales)

In [None]:
#We can also change the name of our new variable and value columns
pd.melt(sales, id_vars = "Salesman", var_name = "Quarter", value_name = "Revenue")

<img src="tweet.jpg" width = '500'/>