# COMP41680/COMP4760 Class Test

This test involves analysing store product sales data from different retail stores. **Complete all three tasks. All tasks carry equal marks.**

The data is stored in CSV format, with the following fields:

- *date*: the date on which product was sold
- *category*: high-level category type of the product
- *location*: store location where the product was sold
- *amount*: the sale value in euros of the product

In [None]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

## Task 1: Data Loading and Initial Characterisation

**(a)** URL for data.

http://mlg.ucd.ie/modules/python/test/retail.csv   
or simple load the file `retail.csv`

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Load the retail dataset from CSV
df = pd.read_csv('retail.csv')
print(df.head())

In [None]:
plt.figure(figsize=(8, 6))
plt.hist(df['amount'], bins=30, color='teal', edgecolor='black')
plt.title('Distribution of Sale Amount Values')
plt.xlabel('Sale Amount')
plt.ylabel('Frequency')
plt.show()

**(b)** 

In [None]:
# Sales by category
category_sales = df['category'].value_counts()
colors = sns.color_palette('Set2', len(category_sales))
category_sales.plot(kind='bar', color=colors)
plt.title('Number of Sales by Category')
plt.ylabel('Count')
plt.show()

# Sales by location
location_sales = df['location'].value_counts()
colors = sns.color_palette('coolwarm', len(location_sales))
location_sales.plot(kind='bar', color=colors)
plt.title('Number of Sales by Location')
plt.ylabel('Count')
plt.show()

In [None]:
total_sales_category = df.groupby('category')['amount'].sum()
colors = sns.color_palette('husl', len(total_sales_category))
total_sales_category.plot(kind='bar', color=colors)
plt.title('Total Sales by Product Category')
plt.ylabel('Total Sales (€)')
plt.show()

**(c)** 

In [None]:
total_sales_location = df.groupby('location')['amount'].sum()
colors = sns.color_palette('viridis', len(total_sales_location))
total_sales_location.plot(kind='bar', color=colors)
plt.title('Total Sales by Location')
plt.ylabel('Total Sales (€)')
plt.show()

In [None]:
pivot_table = df.pivot_table(values='amount', index='location', columns='category', aggfunc='sum')
sns.heatmap(pivot_table, annot=True, cmap='YlGnBu')
plt.title('Sales Relationship Between Location and Category')
plt.show()

## Task 2: Analysis of Feature Associations

In [None]:
df['date'] = pd.to_datetime(df['date'])
monthly_sales = df.groupby([df['date'].dt.to_period('M'), 'category'])['amount'].sum().unstack()
monthly_sales.plot(figsize=(10, 6), colormap='tab10')
plt.title('Monthly Sales by Category')
plt.ylabel('Sales (€)')
plt.show()

**(a)** 

In [None]:
rolling_avg = monthly_sales.rolling(3).mean()
rolling_avg.plot(figsize=(10, 6), linestyle='--', colormap='plasma')
plt.title('3-Month Rolling Average Sales')
plt.ylabel('Sales (€)')
plt.show()

**(b)**  


**(c)**  

## Task 3: Time Series Analysis

**(a)** 

**(b)**  