# **Caltech - Machine Learning Course**

## LAB:  Australian Apparel Ltd

### NOTEBOOK:  Data Wrangling
### Project Statement
AAL, established in 2000, is a well-known brand in Australia, particularly recognized for its clothing business. It has opened branches in various states, metropolises, and tier-1 and tier-2 cities across the country.
The brand caters to all age groups, from kids to the elderly.
Currently experiencing a surge in business, AAL is actively pursuing expansion opportunities. To facilitate informed investment decisions, the CEO has assigned the responsibility to the head of AAL’s sales and marketing (S&M) department. The specific tasks include:
1)	Identify the states that are generating the highest revenues.
2)	Develop sales programs for states with lower revenues. The head of sales and marketing has requested your assistance with this task.
Analyze the sales data of the company for the fourth quarter in Australia, examining it on a state-by-state basis. Provide insights to assist the company in making data-driven decisions for the upcoming year.

### Notebook Objective
Data analysis
a.	Perform descriptive statistical analysis on the data in the Sales and Unit columns. Utilize techniques such as mean, median, mode, and standard deviation for this analysis.
b.	Identify the group with the highest sales and the group with the lowest sales based on the data provided.
c.	Identify the group with the highest and lowest sales based on the data provided.
d.	Generate weekly, monthly, and quarterly reports to document and present the results of the analysis conducted.


<hr/>

### A.  Descriptive Statistical Analysis on Sales and Unit


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
def load_data(file_path):
    return pd.read_csv(file_path)

# Load the data
df = load_data('../data/AusApparalSales4thQrt2020cln.csv')

# Convert 'Date' to DateTime and set as index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Generate Weekly Sales Report
weekly_sales = df.resample('W')['Sales'].agg(['sum', 'mean', 'count'])
weekly_sales.columns = ['Total Sales', 'Average Daily Sales', 'Number of Transactions']
weekly_sales['Average Transaction Value'] = weekly_sales['Total Sales'] / weekly_sales['Number of Transactions']

print("Weekly Sales Report:")
print(weekly_sales.describe())

# Generate Monthly Sales Report
monthly_sales = df.resample('ME')['Sales'].agg(['sum', 'mean', 'count'])
monthly_sales.columns = ['Total Sales', 'Average Daily Sales', 'Number of Transactions']
monthly_sales['Average Transaction Value'] = monthly_sales['Total Sales'] / monthly_sales['Number of Transactions']

print("Monthly Sales Report:")
print(monthly_sales.describe())

# Generate Quarterly Sales Report
quarterly_sales = df.resample('QE')['Sales'].agg(['sum', 'mean', 'count'])
quarterly_sales.columns = ['Total Sales', 'Average Daily Sales', 'Number of Transactions']
quarterly_sales['Average Transaction Value'] = quarterly_sales['Total Sales'] / quarterly_sales['Number of Transactions']

print("Quarterly Sales Report:")
print(quarterly_sales.describe())

Weekly Sales Report:
        Total Sales  Average Daily Sales  Number of Transactions  \
count  1.400000e+01            14.000000               14.000000   
mean   2.430732e+07         45385.183835              540.000000   
std    5.893363e+06          7262.360668              107.846044   
min    1.379250e+07         35484.693878              252.000000   
25%    2.112750e+07         38211.982710              588.000000   
50%    2.422375e+07         45454.931973              588.000000   
75%    2.896750e+07         52805.059524              588.000000   
max    3.177000e+07         54732.142857              588.000000   

       Average Transaction Value  
count                  14.000000  
mean                45385.183835  
std                  7262.360668  
min                 35484.693878  
25%                 38211.982710  
50%                 45454.931973  
75%                 52805.059524  
max                 54732.142857  
Monthly Sales Report:
        Total Sales  Average 

##### Sales Data Analysis Findings by Date:
**Weekly Summary**
- Sales Data range is 14 Weeks / 3 months / 1 quarter of data.
- Total Weekly Sales averages $24.3M
- Weekly Sales varies from $13.8M to $31.8M
- The Average transaction value is $45K, ranging from $35.4K to $53.7K

**Monthly Summary**
- Monthly Transaction volume is pegged at 2520 every month
- Average Monthly Sales volume of $113.4M, with a range from $90.68M to $135.3M perhaps due to seasonality

**Quarterly Summary**
- Total Sales for the Quarter was $340.3M
- A total of 7560 transactions distributed evenly across 2520 each month

<hr/>

### B.  Sales Data Analysis by Groups


In [2]:
def group_performance_over_time(data, period):
    return data.groupby([pd.Grouper(freq=period), 'Group'])['Sales'].sum().unstack()

#Weekly Group Performance
weekly_group_performance = group_performance_over_time(df, 'W')
print("Weekly Group Performance:")
print(weekly_group_performance)

#Monthly Group Performance
monthly_group_performance = group_performance_over_time(df, 'ME')
print("\nMonthly Group Performance:")
print(monthly_group_performance)

#Quarterly Group Performance
quarterly_group_performance = group_performance_over_time(df, 'QE')
print("\nQuarterly Group Performance:")
print(quarterly_group_performance)

Weekly Group Performance:
Group          Kids      Men  Seniors    Women
Date                                          
2020-10-04  3690000  3730000  3782500  3842500
2020-10-11  7020000  6807500  6737500  6437500
2020-10-18  6707500  6710000  6470000  6752500
2020-10-25  6525000  6872500  6757500  6660000
2020-11-01  5490000  5507500  5540000  5270000
2020-11-08  5125000  5335000  4962500  5442500
2020-11-15  5315000  5242500  5397500  5217500
2020-11-22  5302500  5025000  5200000  5585000
2020-11-29  5577500  5482500  5302500  5115000
2020-12-06  7362500  7505000  7217500  7537500
2020-12-13  8030000  7802500  7805000  7887500
2020-12-20  7765000  7990000  7772500  8127500
2020-12-27  7630000  8337500  7685000  8117500
2021-01-03  3532500  3402500  3407500  3450000

Monthly Group Performance:
Group           Kids       Men   Seniors     Women
Date                                              
2020-10-31  28635000  28885000  28565000  28205000
2020-11-30  22882500  22615000  22322500 

##### Sales Data Analysis Findings by Group:
- Dramatic uptic in Sales across all groups in December, likely due to Holiday shopping trends
- Sales are fairly evenly distributed across all Groups
