# ADVANCED PANDAS: DATA RESHAPING & PIVOTING

## Course Outline:
- Introduction to Data Wrangling
    - Case-study: Data Preprocessing for The Absolute Beginners
- Data Cleaning & Preparation
    - Data Cleaning (Missing & Duplicated Data)
    - String Manipulation (Regular Expression)
    - Data Transformation
- Merging, Joining, and Concatenating Data
    - concat()
    - merge()
    - join()
- Aggregation and Grouping
    - groupby()
- ***Reshaping and Pivoting***
    - ***pivot_table()***
    - ***melt()***

##### Importing Libraries & Datasets

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()

In [None]:
titanic = sns.load_dataset('titanic')
titanic

==========

# Reshaping & Pivoting

## Reshaping Data
- Why?
    - Tidy datasets
    - Readability (Human vs. Statistics)
    - Simplicity
    - Summary Statistics for Multi-Level Indexing 
- What?
    - Wide vs. Long Format
- Transformation
    - Long to Wide Using *pivot() & pivot_table()* Functions
    - Wide to Long Using *melt()* Function

##### Long-to-Wide Transformation Example

In [None]:
from IPython.display import Image
Image("data/pivot-example.png")

##### Wide-to-Long Transformation Example

In [None]:
from IPython.display import Image
Image("data/melt-example.png")

==========

### Wide-to-Long Transformation

In [None]:
from IPython.display import Image
Image("data/pivot.png")

##### Rashaping with pivot() function 

In [None]:
titanic.head()

In [None]:
pd.pivot(titanic, index='sex', columns='pclass',values='fare')
# pivot() function has some limitation

##### Reshaping with pivot_table()
pivot() method has limitations when dealing with aggregation functions and duplicated data, so we will use pivot_table() instead

In [None]:
# Grouping data using index in a Pivot Table
pd.pivot_table(data=titanic,index=['sex'])

#titanic.pivot_table('sex')
# titanic.groupby('sex').mean()

In [None]:
# Pivot Table with multi-index
pd.pivot_table(titanic,index=['sex','pclass'])

In [None]:
# Aggregation function in Pivot Tables
pd.pivot_table(titanic,index=['sex','pclass'],aggfunc={'age':np.mean,'survived':np.sum})

In [None]:
pd.pivot_table(titanic,index=['sex','pclass'],values=['survived'], aggfunc=np.sum, margins=True)

In [None]:
pd.pivot_table(titanic,index=['sex'],columns=['pclass'],values=['survived'],aggfunc=np.sum)

In [None]:
pd.pivot_table(titanic,
               index=['sex','survived','pclass'],
               columns=['embark_town'],
               values=['age'],
               aggfunc=np.mean,
               fill_value=np.mean(titanic['age']),
               margins=True)

==========

### Long-to-Wide Transformation

In [None]:
from IPython.display import Image
Image("data/melt.png")

##### Undoing Pivoting Using melt() function

In [None]:
df = pd.DataFrame({'first': {0: 'John', 1: 'Mary'},
                   'last': {0: 'Doe', 1: 'Bo'},
                   'height': {0: 5.5, 1: 6.0},
                  'weight':{0: 120, 1: 135}})
df

In [None]:
df.melt(id_vars=['first','last'], value_vars=['height','weight'],var_name='W/H',value_name='Results')

==========

### Stacking & UnStacking

##### Reshaping Using stack() and unstack() functions Using Multi-Indexing

In [None]:
from IPython.display import Image
Image("data/stack.png")

In [None]:
from IPython.display import Image
Image("data/unstack.png")

In [None]:
titanic.head()

In [None]:
# Multi-Indexing
titanic.set_index(['sex','pclass'])['fare']

In [None]:
titanic.set_index(['sex','pclass'])['fare'].index

In [None]:
titanic_pivoted = titanic.pivot_table(index='sex', columns='pclass', values='fare')
titanic_pivoted

# titanic.groupby(['sex','pclass'])['fare'].mean().unstack()

In [None]:
titanic_pivoted.stack()

In [None]:
titanic_pivoted.unstack()

In [None]:
titanic_gender = titanic.groupby('sex').sum()
titanic_gender

In [None]:
titanic_gender.unstack()

In [None]:
titanic_gender.stack()

==========

# THANK YOU!