## Data Aggregation
Problem Statement: Analyzing Sales Performance by Region in a Retail Company
Dataset: "Retail_Sales_Data.csv"
Description: The dataset contains information about sales transactions in a retail company. It 
includes attributes such as transaction date, product category, quantity sold, and sales 
amount. The goal is to perform data aggregation to analyze the sales performance by region 
and identify the top-performing regions.
Tasks to Perform:
1. Import the "Retail_Sales_Data.csv" dataset.
2. Explore the dataset to understand its structure and content.
3. Identify the relevant variables for aggregating sales data, such as region, sales 
amount, and product category.
4. Group the sales data by region and calculate the total sales amount for each region.
5. Create bar plots or pie charts to visualize the sales distribution by region.
6. Identify the top-performing regions based on the highest sales amount.
7. Group the sales data by region and product category to calculate the total sales 
amount for each combination.
8. Create stacked bar plots or grouped bar plots to compare the sales amounts across 
different regions and product categories.

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv("customer_shopping_data - customer_shopping_data.csv")
df.head()

In [None]:
# Check the columns and data types in the dataset
df.info()


In [None]:
df. columns

In [None]:
# Group the data by 'region' and calculate the total sales amount for each region
# Calculate total sales amount per invoice
df['total_sales_amount'] = df['quantity'] * df['price']

region_sales = df.groupby('shopping_mall')['total_sales_amount'].sum().reset_index()
region_sales


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
# Bar plot for sales distribution by region
plt.figure(figsize=(10, 6))
plt.bar(region_sales['shopping_mall'], region_sales['total_sales_amount'], color=['skyblue','lightgreen','coral'])
plt.xlabel('shopping_mall')
plt.ylabel('total_sale_ampunt')
plt.title('Total sales amount by region')
plt.xticks(rotation=45)
plt.show()


In [None]:
# Pie chart for sales distribution by region
# Plotting the pie chart
plt.figure(figsize=(8, 8))
plt.pie(region_sales['total_sales_amount'], labels=region_sales['shopping_mall'], autopct='%1.1f%%', startangle=140, colors=['skyblue', 'lightgreen', 'coral'])
plt.title('Sales Distribution by Region')
plt.show()

In [None]:
# Display the top-performing regions
# top_regions = region_sales.head()
# top_regions


In [None]:
# Group the data by 'region' and 'product_category' to calculate total sales amount
region_category_sales = df.groupby(['shopping_mall', 'category'])['total_sales_amount'].sum().unstack().fillna(0)
region_category_sales


In [None]:
df = pd.DataFrame(df)
df.head()

In [None]:
# Stacked bar plot for sales comparison by region and product category
sns.barplot(x='shopping_mall', y='total_sales_amount', hue='category', data=df, palette='pastel')

# Set plot labels and title
plt.xlabel('Region')
plt.ylabel('Total Sales Amount')
plt.title('Total Sales Amount by Region and Product Category')
plt.xticks(rotation=45)

plt.show()