# Manipulate Fortune Data

Fortune output is often not in the ideal format. If we want to join the data to another source, pivot it along a different dimension, or upload it to another database, we first need to unpivot it. Excel lacks a good way to deal with this.

# Tutorial Overview
In this module, we will:
1. import an excel file produced by Fortune
2. remove the excess columns and rows
3. transform the data from wide to long format (unpivot)
4. export to csv

In [2]:
import pandas as pd
import os

df = pd.read_excel(os.path.join(os.path.dirname(os.getcwd()), 'Support_Files', 'fortune_data.xlsx'), skiprows=29, header=None)

id_cols = ['parent_category','row_num','child_category']  # names for the columns with missing headers
df.iloc[0, 1:4] = 'category'
df.iloc[1, 1:4] = id_cols

df.columns = pd.MultiIndex.from_arrays(df[:2].values)  # create multiindex column names based on the first two rows of data
df = df[2:]  # remove first two rows since they are now the headers

df = df.loc[:,df.isnull().all()==False]  # removes any columns with only null values
df = df.loc[df.isnull().all(axis=1)==False]  # removes an rows with only null values
df = df.fillna(method='ffill')  # fill null values

df = pd.melt(df, id_vars=[('category', 'parent_category'),('category','row_num'),('category','child_category')])  # unpivot data
df.columns = id_cols + ['period','metric','value']

df.to_csv('fortune_output.csv', index=False)  # output to csv
df.to_clipboard()
df.head()

Unnamed: 0,parent_category,row_num,child_category,period,metric,value
0,GMS,30.0,Total GMS,FY17,Actuals,2037570000.0
1,GMS,31.0,Amazon Retail,FY17,Actuals,107990900.0
2,GMS,32.0,FBA,FY17,Actuals,1793968000.0
3,GMS,33.0,MFN,FY17,Actuals,47429410.0
4,Units,35.0,Total Units Served,FY17,Actuals,156563400.0


# Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.
* Describe three examples when Pandas would be better than using Excel directly.

# Further Reading
This section provides more resources on the topic if you are looking to go deeper.

## Books
* Python for Data Analysis, by William McKinney. http://shop.oreilly.com/product/0636920023784.do

## APIs
* Pandas. https://pandas.pydata.org/

## Articles
* Getting started with Pandas in 5 minutes, on Towards Data Science. https://medium.com/bhavaniravi/python-pandas-tutorial-92018da85a33
* My Pandas Cheat Sheet, on Towards Data Science. https://towardsdatascience.com/my-python-pandas-cheat-sheet-746b11e44368    

# Summary

In this tutorial, you worked with data from Fortune using Pandas. Specifically, you learned:
* How to import an excel file produced by Fortune
* How to remove the excess columns and rows
* How to transform the data from wide to long format (unpivot)
* How to export to csv

# Next

In the next section, you will use Pandas to work with additional datasets. 