# 2021: Week 3

January 20, 2021

This week is the third in the Starter Challenges series we are running to kick off 2021.  

This challenge will introduce you to the two main types of reshaping data, pivoting and aggregation. We used aggregation last week so we are building on this technique this week. We will also let you work with Unions where we stack data sets that have similar structures on top of each other. 

As with all of our January 2021 challenges, we will be sharing useful links to helpful videos and articles that will give you more technical support if you need it. 

This week's challenge sees us looking at the Accessory Sales at our Bike Store.

# Input

One Excel sheet with 5 tabs of data:
* London
* Leeds
* York
* Manchester
* Birmingham

<img src='https://1.bp.blogspot.com/-SkE1ID3HbKk/YABuumsbvQI/AAAAAAAACHM/GeuEChI9z-Qhc7n6NwEej6fTXSlC-Bs9wCLcBGAsYHQ/w640-h180/Screenshot%2B2021-01-14%2Bat%2B16.17.49.png'>

The structure of the data is the same in each worksheet. 

# Requirements
* Input the data source by pulling together all the tables (help)
* Pivot 'New' columns and 'Existing' columns (help)
* Split the former column headers to form: (help)
* Customer Type
* Product
* Rename the measure created by the Pivot as 'Products Sold'
* Create a Store column from the data
* Remove any unnecessary data fields
* Turn Date into Quarter (help)
* Aggregate to form two separate outputs of the number of products sold by: (help)
1. Product, Quarter
2. Store, Customer Type, Product
* Output each data set as a csv file (help)

# Output
Product Quarter Output

<img src='https://1.bp.blogspot.com/-CBupnRR1gq0/YABt-8HOz5I/AAAAAAAACHA/5wL8p38ZlNsTkBWFfVqTv1sn2Ju4-MoiwCLcBGAsYHQ/w400-h290/Screenshot%2B2021-01-14%2Bat%2B16.14.27.png'>

3 Data Fields:
* Product
* Quarter
* Products Sold
16 rows (17 rows including headers)

Customer Product Quarter Output

<img src='https://1.bp.blogspot.com/-kfx7Rw9syhE/YABuI15OKmI/AAAAAAAACHE/RpH8HUPChOYT8qE26JL9MvAYaZhW32IxwCLcBGAsYHQ/w640-h364/Screenshot%2B2021-01-14%2Bat%2B16.15.20.png'>

4 Data Fields:
* Store
* Customer Types
* Product
* Products Sold
40 rows (41 rows including headers)

In [1]:
import pandas as pd

In [2]:
input = 'PD 2021 Wk 3 Input.xlsx'
excel = pd.ExcelFile(input)
print(excel.sheet_names)

['Manchester', 'London', 'Leeds', 'York', 'Birmingham']


In [3]:
# Input the data source by pulling together all the tables & Create a Store column from the data
mini_df = []
for i in excel.sheet_names:
    temp_df = pd.read_excel(input, sheet_name=i)
    temp_df['Store'] = i
    mini_df.append(temp_df)
df = pd.concat(mini_df, axis=0)
print(df.head(5))
print(df.info())

        Date  New - Saddles  New - Mudguards  New - Wheels  New - Bags  \
0 2021-01-21           13.0             42.0          19.0        38.0   
1 2021-02-21            1.0              9.0          14.0         6.0   
2 2021-03-21            8.0             22.0           6.0        35.0   
3 2021-04-21            3.0              9.0           8.0        16.0   
4 2021-05-21            2.0              8.0           5.0        34.0   

   Existing - Saddles  Existing - Mudguards  Existing - Wheels  \
0                17.0                  48.0               19.0   
1                 2.0                   4.0               19.0   
2                 0.0                  48.0               17.0   
3                18.0                  50.0               18.0   
4                17.0                   3.0               12.0   

   Existing - Bags       Store  
0             13.0  Manchester  
1             24.0  Manchester  
2             16.0  Manchester  
3             25.0  Manche

In [4]:
# Pivot 'New' columns and 'Existing' columns
columns_to_pivot = df.columns[1:]
print(columns_to_pivot)
df_pivot = pd.melt(df, id_vars=['Date', 'Store'], value_vars=columns_to_pivot, var_name='Pivot_name', value_name='Pivot_value')
df_pivot['Pivot_value'] = df_pivot['Pivot_value'].astype(int)
print(df_pivot.head(5))
print(df_pivot.info())


Index(['New - Saddles', 'New - Mudguards', 'New - Wheels', 'New - Bags',
       'Existing - Saddles', 'Existing - Mudguards', 'Existing - Wheels',
       'Existing - Bags', 'Store'],
      dtype='object')
        Date       Store     Pivot_name  Pivot_value
0 2021-01-21  Manchester  New - Saddles           13
1 2021-02-21  Manchester  New - Saddles            1
2 2021-03-21  Manchester  New - Saddles            8
3 2021-04-21  Manchester  New - Saddles            3
4 2021-05-21  Manchester  New - Saddles            2
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 480 entries, 0 to 479
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Date         480 non-null    datetime64[ns]
 1   Store        480 non-null    object        
 2   Pivot_name   480 non-null    object        
 3   Pivot_value  480 non-null    int32         
dtypes: datetime64[ns](1), int32(1), object(2)
memory usage: 13.2+ KB
None


In [5]:
# Split the former column headers to form
df_pivot[['Customer Type', 'Product']] = df_pivot['Pivot_name'].str.split(' - ', expand=True)
df_pivot.drop(columns='Pivot_name', inplace=True)
print(df_pivot.head(5))
print(df_pivot.info())

        Date       Store  Pivot_value Customer Type  Product
0 2021-01-21  Manchester           13           New  Saddles
1 2021-02-21  Manchester            1           New  Saddles
2 2021-03-21  Manchester            8           New  Saddles
3 2021-04-21  Manchester            3           New  Saddles
4 2021-05-21  Manchester            2           New  Saddles
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 480 entries, 0 to 479
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date           480 non-null    datetime64[ns]
 1   Store          480 non-null    object        
 2   Pivot_value    480 non-null    int32         
 3   Customer Type  480 non-null    object        
 4   Product        480 non-null    object        
dtypes: datetime64[ns](1), int32(1), object(3)
memory usage: 17.0+ KB
None


In [6]:
# Rename the measure created by the Pivot as 'Products Sold'
df_pivot.rename(columns={'Pivot_value':'Products Sold'}, inplace=True)
print(df_pivot.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 480 entries, 0 to 479
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date           480 non-null    datetime64[ns]
 1   Store          480 non-null    object        
 2   Products Sold  480 non-null    int32         
 3   Customer Type  480 non-null    object        
 4   Product        480 non-null    object        
dtypes: datetime64[ns](1), int32(1), object(3)
memory usage: 17.0+ KB
None


In [7]:
# Remove any unnecessary data fields
# done

In [8]:
# Turn Date into Quarter
df_pivot['Quarter'] = df_pivot['Date'].dt.quarter
df_pivot.drop(columns='Date', inplace=True)
print(df_pivot.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 480 entries, 0 to 479
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Store          480 non-null    object
 1   Products Sold  480 non-null    int32 
 2   Customer Type  480 non-null    object
 3   Product        480 non-null    object
 4   Quarter        480 non-null    int64 
dtypes: int32(1), int64(1), object(3)
memory usage: 17.0+ KB
None


In [9]:
# Aggregate to form two separate outputs of the number of products sold
# Output 1
output1 = df_pivot.groupby(['Product', 'Quarter']).agg(Products_sold=('Products Sold','sum')).reset_index()
print(output1.head(5))

     Product  Quarter  Products_sold
0       Bags        1            683
1       Bags        2            593
2       Bags        3            564
3       Bags        4            541
4  Mudguards        1           1006


In [10]:
# Output 2
output2 = df_pivot.groupby(['Store', 'Customer Type', 'Product']).agg(Products_sold=('Products Sold','sum')).reset_index()
print(output2.head(5))

        Store Customer Type    Product  Products_sold
0  Birmingham      Existing       Bags            218
1  Birmingham      Existing  Mudguards            266
2  Birmingham      Existing    Saddles            185
3  Birmingham      Existing     Wheels             78
4  Birmingham           New       Bags            312
