### Prepping Data Challenge: Pareto Parameters (Week 13)

We're going to create a classic use case of the Pareto chart. How many customers make up 80% of our Sales?

### Requirements
 - Input the data
 - Aggregate the data to the total sales for each customer
 - Calculate the percent of total sales each customer represents
 - Calculate the running total of sales across customers
    - Order by the percent of total in a descending order
    - Round to 2 decimal places
 - Create a parameter that will allow the user to decide the percentage of sales they wish to filter to
 - Output the data, including the parameter in the output name
 - Create a second output that describes the result in plain English
   - e.g. 50% of Customers account for 80% of Sales 

In [1]:
import pandas as pd

In [2]:
# Input the data.
df = pd.read_csv('WK13-Pareto Input.csv')

In [3]:
df.head()

Unnamed: 0,Customer ID,First Name,Surname,Order ID,Sales
0,247,Kevina,Teresse,1087,699.192
1,562,Harry,O'Malley,1088,91.056
2,345,Kristofor,Sprigin,1089,3.928
3,394,Russell,Poore,1090,21.376
4,439,Kelsey,Fardo,1092,697.074


In [4]:
#Aggregate the data to the total sales for each customer
df = df.groupby(['Customer ID','First Name','Surname'], as_index=False)['Sales'].sum()

In [5]:
#Calculate the percent of total sales each customer represents
df["% of Total"] = df['Sales']/df['Sales'].sum()*100

In [6]:
#Calculate the running total of sales across customers
#Order by the percent of total in a descending order
#Round to 2 decimal places
df['Running % of Total Sales'] = (df.sort_values('Sales', ascending=False)['% of Total'].cumsum()).round(2)

In [9]:
#Create a parameter that will allow the user to decide the percentage of sales they wish to filter to
df['Total Customers'] = len(df)
select_percent = int(input('Enter a percent Value'))

# Filter data based on selected % 
df_2 = df[df['Running % of Total Sales'] <= select_percent]
print(f'{round(len(df_2)*100/len(df))}% of Customers account \
      for {select_percent}% of Sales ')
df_2.head

Enter a percent Value50
22% of Customers account       for 50% of Sales 


Unnamed: 0,Customer ID,First Name,Surname,Sales,% of Total,Running % of Total Sales,Total Customers
0,1,Thurston,Thresher,5563.560,0.279539,32.85,690
4,6,Shanan,Normanville,7755.620,0.389678,18.88,690
5,7,Vernon,Tomaskov,14473.571,0.727219,2.94,690
12,16,Salem,Gerriets,6106.880,0.306838,29.04,690
21,27,Kristian,Blofeld,4510.797,0.226643,44.44,690
...,...,...,...,...,...,...,...
672,774,Shalom,Harraway,5278.826,0.265233,36.39,690
675,777,Samaria,Ettles,6442.254,0.323689,25.60,690
685,787,Holmes,Yakebovitch,6134.038,0.308203,28.12,690
686,788,Malissa,Keech,6160.102,0.309512,27.81,690


In [10]:
df_2 = df_2.rename(columns={'Surname':'Last Name'})

In [11]:
df_2 = df_2[['Customer ID','First Name','Last Name','Sales',"% of Total","Running % of Total Sales"]]

In [12]:
df_2.head()

Unnamed: 0,Customer ID,First Name,Last Name,Sales,% of Total,Running % of Total Sales
0,1,Thurston,Thresher,5563.56,0.279539,32.85
4,6,Shanan,Normanville,7755.62,0.389678,18.88
5,7,Vernon,Tomaskov,14473.571,0.727219,2.94
12,16,Salem,Gerriets,6106.88,0.306838,29.04
21,27,Kristian,Blofeld,4510.797,0.226643,44.44


In [13]:
#output the data
df_2.to_csv('wk11-output.csv', index=False)