Tasks:

1. Sales Trends:

- Create a line chart to visualize monthly sales trends over the two years.
- Compare sales trends for 2010 and 2011 using a dual-axis plot or overlayed line charts.

2. Top Products:

- Use a bar chart to show the top 10 products by quantity sold.
- Create a pie chart to visualize the contribution of the top 5 products to total sales.

3. Customer Segmentation:

- Use a scatter plot to analyze the relationship between the number of purchases and total sales for each customer.
- Create a heatmap to show sales distribution across different months and days of the week.

4. Revenue by Category:

- If there are product categories, create a bar chart to show revenue by category.
- Use a stacked bar chart to compare category-wise sales across the two years.

5. Geographical Analysis:

- If the dataset contains geographical data (e.g., country or region), create a map or bar chart to visualize sales distribution by location.

6. Correlation Analysis:

- Use a heatmap to visualize correlations between numerical columns like Quantity, Price, and sales.

7. Quarterly Analysis:

- Create a grouped bar chart to compare quarterly sales for 2010 and 2011.
- Highlight the quarter with the highest sales for each year.

8. Anomaly Visualization:

- Use a line chart to highlight anomalies in sales trends (e.g., missing data in December 2011).

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df1= pd.read_excel("C:/Users/prash/Desktop/data/online_retail_II.xlsx", sheet_name='Year 2009-2010')
df2= pd.read_excel('C:/Users/prash/Desktop/data/online_retail_II.xlsx', sheet_name='Year 2010-2011')
df = pd.concat([df1, df2], ignore_index=True)

# Data Cleaning- negative values to positive values
df['Quantity'] = df['Quantity'].abs()
df['Price'] = df['Price'].abs()

# Adding revenue column
df['TotalPrice'] = df['Quantity'] * df['Price']

# Removing bad data(not related to online retail sales)- got this data after doing product analysis
scanlist=['Manual','AMAZON FEE','DOTCOM POSTAGE','Adjust bad debt', 'POSTAGE']
df = df[~df['Description'].isin(scanlist)]

# Ensure the date column is in datetime format
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])

# Extract year, month, quarter, week number, and day of the week
df['Year'] = df['InvoiceDate'].dt.year
df['Month'] = df['InvoiceDate'].dt.month
df['Quarter'] = df['InvoiceDate'].dt.quarter
df['WeekNumber'] = df['InvoiceDate'].dt.isocalendar().week  # Week number
df['DayOfWeek'] = df['InvoiceDate'].dt.day_name()  # Day of the week as a string

# Drop duplicates
df.drop_duplicates(inplace=True) 

# Removing bad data with total price null and where prices can't be assigned
df=df[df['TotalPrice']!=0]