# Title: Data Aggregation
<b>Problem Statement:</b> Analyzing Sales Performance by Region in a Retail Company<br>
<b>Dataset:</b> "Retail_Sales_Data.csv"<br>
<b>Description:</b> The dataset contains information about sales transactions in a retail company. It
includes attributes such as transaction date, product category, quantity sold, and sales
amount. The goal is to perform data aggregation to analyze the sales performance by region
and identify the top-performing regions.<br>

<b>Tasks to Perform:</b>
1. Import the "Retail_Sales_Data.csv" dataset.
2. Explore the dataset to understand its structure and content.
3. Identify the relevant variables for aggregating sales data, such as region, sales
amount, and product category.
4. Group the sales data by region and calculate the total sales amount for each region.
5. Create bar plots or pie charts to visualize the sales distribution by region.
6. Identify the top-performing regions based on the highest sales amount.
7. Group the sales data by region and product category to calculate the total sales
amount for each combination.
8. Create stacked bar plots or grouped bar plots to compare the sales amounts across
different regions and product categories.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Task 1. Import the "Retail_Sales_Data.csv" dataset

In [None]:
file_path = 'datasets/customer_shopping_data.csv' # Use the correct file path
sales_data = pd.read_csv(file_path)

## Task 2. Explore the dataset

In [None]:
sales_data.head() # View first few rows to understand structure

In [None]:
sales_data.info() # Check data types and null values

## Task 3. Identify the relevant variables

In [None]:
relevant_columns = ["shopping_mall", "price", "category"]

## Task 4. Group the sales data by region and calculate the total sales amount for each region

In [None]:
sales_by_region = sales_data.groupby('shopping_mall')['price'].sum().reset_index()

## Task 5. Create bar plots or pie charts to visualize the sales distribution by region

In [None]:
# Bar plot
plt.figure(figsize=(10, 6))
sns.barplot(data=sales_by_region, x='shopping_mall', y='price')
plt.title('Total Sales by Shopping Mall')
plt.ylabel('Price')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Pie chart
plt.figure(figsize=(8, 8))
plt.pie(sales_by_region['price'], labels=sales_by_region['shopping_mall'], autopct='%1.1f%%', startangle=140)
plt.title('Sales Distribution by Shopping Mall')
plt.show()

## Task 6. Identify the top-performing shopping mall based on the highest sales amount

In [None]:
top_regions = sales_by_region.sort_values(by='price', ascending=False)
print("Top Performing Shopping Mall:")
print(top_regions)

## Task 7. Group the sales data by region and product category to calculate the total sales amount for each combination

In [None]:
sales_by_region_product = sales_data.groupby(['shopping_mall', 'category'])['price'].sum().reset_index()

## Task 8. Create stacked bar plots or grouped bar plots to compare the sales amounts across regions and product categories


In [None]:
# Grouped bar plot
plt.figure(figsize=(12, 6))
sns.barplot(data=sales_by_region_product, x='shopping_mall', y='price', hue='category')
plt.title('Sales by Shopping Mall and Category')
plt.ylabel('Price')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Stacked bar plot (optional - requires pivoting data)
pivot_data = sales_by_region_product.pivot(index='shopping_mall', columns='category', values='price')
pivot_data.plot(kind='bar', stacked=True, figsize=(12, 6))
plt.title('Stacked Sales by Shopping Mall and Category')
plt.ylabel('Price')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()