# Electronics Retailer Sales Analysis

#### Table of Contents

- [Data Pre-Processing](#data-pre-processing)
- [Analysis](#analysis)
  - [Time Series Analysis](#time-series-analysis)
    - [Yearly Revenue & Year over Year Growth](#yearly-revenue--year-over-year-growth)
    - [Monthly Revenue over Time](#monthly-revenue-over-time)
    - [Revenue Month over Month Growth Rate](#revenue-month-over-month-growth-rate)
  - [Product Analysis](#product-analysis)
    - [Top Performing Products](#top-performing-products)
    - [Percentage of Total Revenue by Category](#percentage-of-total-revenue-by-category)
  - [Customer Analysis](#customer-analysis)
    - [Average Order Value by Customer Segment (Purchase Frequency)](#average-order-value-by-customer-segment-purchase-frequency)
    - [Percentage of Total Revenue by Region](#percentage-of-total-revenue-by-region)
    - [Top Performing Cities by Revenue](#top-performing-cities-by-revenue)
    - [New vs. Returning Customers Over Time](#new-vs-returning-customers-over-time)
    - [Customer Retention Rate by Region](#customer-retention-rate-by-region)
- [Data Export](#data-export)

### Importing Python Libraries

In [1]:
import pandas as pd   # Data manipulation
import numpy as np   # Mathematical operations
import plotly.express as px   # Interactive visualizations
import plotly.io as pio   # HTML Exporting

### Importing Datasets

In [2]:
sales = pd.read_csv('Cleaned Data/sales_cleaned.csv', parse_dates=['order_date', 'delivery_date'])
products = pd.read_csv('Cleaned Data/products_cleaned.csv')
customers = pd.read_csv('Cleaned Data/customers_cleaned.csv', parse_dates=['birthdate'])
stores = pd.read_csv('Cleaned Data/stores_cleaned.csv', parse_dates=['open_date'])
exchrates = pd.read_csv('Cleaned Data/ExchangeRates_cleaned.csv', parse_dates=['date'])

# Data Pre-Processing

### Merging Datasets

In [3]:
sales = (
    sales
    .merge(products, on='product_key', how='left')
    .merge(customers, on='customer_key', how='left')
    .merge(stores, on='store_key', how='left')
    .merge(
        exchrates,
        left_on=['currency_code', 'order_date'],
        right_on=['currency_code', 'date'],
        how='left'
    )
    .drop(columns=['currency_code', 'date'])   # Unnecessary columns)
    .rename(columns={
        'state_x': 'customer_state',
        'country_x': 'customer_country',
        'state_y': 'store_state',
        'country_y': 'store_country'
    })
)

### Data Validation

In [4]:
# Checking if any products do not have sale data
unsold_products = products[~products['product_key'].isin(sales['product_key'])]

print(
    'There are {} unsold products and will be omitted from sales analysis.'
    .format(len(unsold_products))
)

There are 25 unsold products and will be omitted from sales analysis.


In [5]:
print('Number of records:', len(sales))
print('Date range: {} to {}'.format(sales['order_date'].min(), sales['order_date'].max()))
print('Unique products:', sales['product_key'].nunique())
print('Unique customers:', sales['customer_key'].nunique())
print('Unique stores:', sales['store_key'].nunique())

Number of records: 62884
Date range: 2016-01-01 00:00:00 to 2021-02-20 00:00:00
Unique products: 2492
Unique customers: 11887
Unique stores: 58


### Adding Core Metrics

In [6]:
sales = sales.assign(
    # Revenue and profit metrics
    revenue=sales['unit_price'] * sales['quantity'],
    profit=(sales['unit_price'] - sales['unit_cost']) * sales['quantity'],
    profit_margin=lambda df: df['profit'] / df['revenue'],
    
    # Channel classification
    channel=np.where(sales['store_key'] == 0, 'Online', 'In-Store'),
    
    # Temporary columns for aggregations and plotting
    year=sales['order_date'].dt.year,
    month=sales['order_date'].dt.to_period('M').dt.to_timestamp()
)

# Analysis

## Time Series Analysis

### Yearly Revenue & Year over Year Growth

In [7]:
# Aggregating revenue by purchase channel and year
yearly_channel = (
    sales
    .groupby(['year', 'channel'])['revenue']
    .sum()
    .reset_index()
)

# Aggregating revenue by year
yearly_total = (
    sales
    .groupby('year')['revenue']
    .sum()
    .reset_index(name='total_revenue')
    # YoY change
    .assign(pct_change=lambda df: df['total_revenue'].pct_change() * 100)
)

# Merging and preparing for plotting
yearly_channel = (
    yearly_channel
    .merge(yearly_total, on='year')
    .query('year != 2021')  # Excluding incomplete year
    .assign(
        # YoY change annotation (show on topmost bar, exclude starting year)
        yoy_label=lambda df: np.where(
            (df['channel'] == 'Online') & (df['pct_change'].isna() == False),
            df['pct_change'].round(1).astype(str) + '%',
            ''
        )
    )
)

# Creating bar chart
fig = px.bar(
    yearly_channel,
    x='year',
    y='revenue',
    color='channel',
    text='yoy_label',
    title='Annual Revenue<br><sub>with YoY Growth Rate</sub>',
    labels={
        'year': '',
        'revenue': 'Revenue',
        'channel': 'Channel'
    },
    custom_data=['total_revenue'],
    template='plotly_white'
)

# Formatting layout
fig.update_layout(
    barmode='stack',
    yaxis_tickprefix='$',
    height=600,
    width=1200,
    title_x=0.5,
    legend_title_text=''
)

# Formatting tooltip
fig.update_traces(
    textposition='outside',
    hovertemplate=(
        '<b>%{x}</b><br>'
        '%{fullData.name} Revenue: <b>$%{y:,.0f}</b><br>'
        'Total Revenue: <b>$%{customdata[0]:,.0f}</b><br>'
        '<extra></extra>'
    )
)

fig.show()

#fig.write_image("Static Charts/annual_revenue.png", scale=2)

### Monthly Revenue over Time

In [8]:
monthly_rev = (
    sales
    .groupby(['month', 'channel'])['revenue']
    .sum()
    .reset_index()
)

monthly_total = (
    sales
    .groupby('month')['revenue']
    .sum()
    .reset_index(name='total_revenue')
)

monthly_rev = monthly_rev.merge(monthly_total, on='month', how='left')

fig = px.area(
    monthly_rev,
    x='month',
    y='revenue',
    color='channel',
    title='Monthly Revenue Over Time',
    custom_data=['total_revenue'],
    template='plotly_white'
)

fig.update_layout(
    yaxis=dict(tickformat='$,.2s', title='Revenue'),
    xaxis=dict(title='', showgrid=False),
    hovermode='x unified',
    height=500,
    width=1500,
    title_x=0.5,
    legend=dict(orientation='h', y=1.1, x=0.5, xanchor='center', title_text='')
)

fig.update_traces(
    hovertemplate='%{fullData.name} Revenue: $%{y:,.0f}<extra></extra>',
)

fig.show()

#fig.write_html("monthly_rev_time.html", include_plotlyjs="cdn")
#fig.write_image("Static Charts/monthly_rev_time.png", scale=2)

### Revenue Month over Month Growth Rate

In [9]:
# Calculating monthly revenue growth rates
monthly_rates = (
    sales
    .groupby('month')['revenue']
    .sum()
    .reset_index()
    .assign(
        pct_change=lambda df: (df['revenue'].pct_change() * 100).round(1),
        Year=lambda df: df['month'].dt.year,
        Month=lambda df: df['month'].dt.strftime('%b')
    )
)


months = [
    'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
    'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'
]

# Pivoting data for heatmap
monthly_change = (
    monthly_rates
    .pivot(index='Month', columns='Year', values='pct_change')
    .reindex(months)
)

# Creating heatmap
fig = px.imshow(
    monthly_change,
    range_color=[-100, 100],
    color_continuous_scale='RdYlGn',
    aspect='auto'
)

fig.update_layout(
    title=dict(
        text='Month-over-Month Revenue Growth Rate<br><sub>% Change from Previous Month</sub>',
        x=0.5,
        xanchor='center'
    ),
    coloraxis_colorbar=dict(
        title='Growth Rate',
        ticksuffix='%',
        len=0.7,
        thickness=15
    ),
    xaxis=dict(side='top', showgrid=False, title=None),
    yaxis=dict(showgrid=False, title=None),
    height=600,
    width=1200,
    margin=dict(t=100),
    template='plotly_white'
)

# Adding annotations
labels = np.where(
    monthly_change.notna(),
    monthly_change.astype(str) + '%',
    ''
)

fig.update_traces(
    text=labels,
    texttemplate='%{text}',
    hoverinfo='skip',
    hovertemplate=None
)

fig.show()

#fig.write_image("Static Charts/mom_growth_rate.png", scale=2)

## Product Analysis

### Top Performing Products

In [10]:
product_rev = (
    sales
    .groupby(['category', 'subcategory', 'product_name'])
    .agg({
        'revenue': 'sum', 
        'profit': 'sum', 
        'profit_margin': 'mean'
    })
    .reset_index()
    .query('revenue > 100000')
    .sort_values('revenue', ascending=False)
)

fig = px.treemap(
    product_rev,
    path=['category', 'subcategory', 'product_name'],
    values='revenue',
    color='profit_margin',
    color_continuous_scale='RdYlGn',
    color_continuous_midpoint=product_rev['profit_margin'].median(),
    title='Top Performing Products<br><sub>Size = Revenue<br>Color = Profit Margin</sub>',
    custom_data=['profit'],
    
)

fig.update_traces(
    hovertemplate=(
        '<b>%{label}</b><br>'
        'Revenue: $%{value:,.0f}<br>'
        'Profit: $%{customdata[0]:,.0f}<br>'
        'Profit Margin: %{color:.1%}'
        '<extra></extra>'
    ),
    textinfo='label',
    marker=dict(line=dict(width=2, color='white'))
)

fig.update_layout(
    margin=dict(t=100, l=2, r=50, b=2),
    title=dict(
        font_size=18,
        x=0.5,
        xanchor='center',
        y=0.95,
        yanchor='top'
    ),
    coloraxis_colorbar=dict(
        title=dict(
            text="Profit Margin",
            font=dict(size=12)
        ),
        tickformat=".0%",
        len=0.5,
        thickness=12,
        tickfont=dict(size=10),
    ),
    width=1500, height=600,
)

fig.show()

#fig.write_html("top_products_treemap.html", include_plotlyjs="cdn")
#fig.write_image("Static Charts/top_products_treemap.png", scale=2)

### Percentage of Total Revenue by Category

In [11]:
cat_revenue = (
    sales
    .groupby('category')
    .agg(total_revenue=('revenue', 'sum'))
    .assign(revenue_pct=lambda df: 100 * df['total_revenue'] / df['total_revenue'].sum())
    .sort_values('revenue_pct')
    .reset_index()
)

fig = px.bar(
    cat_revenue,
    x='revenue_pct',
    y='category',
    title='Percentage of Total Revenue by Category',
    labels={'revenue_pct': '% of Revenue', 'category': ''},
    text=cat_revenue['revenue_pct'].round(1).astype(str) + '%',
    template='plotly_white'
)

fig.update_traces(
    textposition='outside',
    hoverinfo='skip',
    hovertemplate=None
)

fig.update_layout(
    margin=dict(t=50, l=25, r=25, b=100),
    xaxis_ticksuffix='%',
    title=dict(x=0.5, xanchor='center'),
    width=1200
)


#fig.write_image("Static Charts/total_rev_category.png", scale=2)

## Customer Analysis

### Average Order Value by Customer Segment (Purchase Frequency)

In [12]:
customer_segments = (
    sales
    .groupby('customer_key')
    .agg(
        order_count=('order_number', 'nunique'),
        avg_order_value=('revenue', 'mean')
    )
    .reset_index()
    .assign(
        segment=lambda df: pd.cut(
            df['order_count'],
            bins=[0, 1, 3, 100],
            labels=['One-Time Buyer', 'Occasional (2-3 orders)', 'Loyal (4+ orders)']
        )
    )
)

fig = px.box(
    customer_segments,
    x='segment',
    y='avg_order_value',
    title='Average Order Value by Customer Segment<br><sub>(Log Scale)</sub>',
    color='segment',
    points='outliers',
    template='plotly_white'
)

fig.update_layout(
    xaxis_title='',
    yaxis=dict(
        title='Average Order Value',
        type='log',
        tickprefix='$',
        dtick=1,
        tickformat=',.0f'
    ),
    height=700,
    width=1200,
    title_x=0.5,
    showlegend=False
)

fig.update_traces(
    marker=dict(size=3, opacity=0.5),
    hovertemplate=(
        '<b>%{x}</b><br>' +
        'Avg Order Value: <b>$%{y:,.2f}</b>' +
        '<extra></extra>'
    )
)

fig.show()

#fig.write_html("segmented_aov.html", include_plotlyjs="cdn")
#fig.write_image("Static Charts/segmented_aov.png", scale=2)

### Percentage of Total Revenue by Region

In [13]:
country_rev= (
    sales
    .groupby('store_country')['revenue']
    .sum()
    .reset_index()
    .assign(pct=lambda df: (df['revenue'] / df['revenue'].sum() * 100).round(2))
)

fig = px.bar(
    country_rev.sort_values('pct'),
    x='pct',
    y='store_country',
    orientation='h',
    title='Percentage of Total Revenue by Region',
    labels={'pct': '% of Revenue', 'store_country': ''},
    text_auto=True,
    template='plotly_white'
)

fig.update_traces(
    textposition='outside',
    hoverinfo='skip',
    hovertemplate=None
)

fig.update_layout(
    margin=dict(t=50, l=25, r=25, b=100),
    xaxis_ticksuffix='%',
    title=dict(x=0.5, xanchor='center'),
    width=1200
)

fig.show()

#fig.write_image("Static Charts/total_rev_region.png", scale=2)

### Top Performing Cities by Revenue

In [14]:
cities_rev = (
    sales
    .groupby('city')['revenue']
    .sum()
    .reset_index()
    .nlargest(20, 'revenue')
)

fig = px.bar(
    cities_rev.sort_values('revenue'),
    x='revenue',
    y='city',
    title='Top 20 Performing Cities by Revenue',
    text='revenue',
)

fig.update_traces(
    texttemplate='$%{x:,.0f}',
    hoverinfo='skip',
    hovertemplate=None
)

fig.update_layout(
    yaxis_title=None,
    xaxis=dict(title='Revenue', tickprefix='$'),
    title=dict(x=0.5, xanchor='center'),
    template='plotly_white',
    height=700,
    bargap=0.15,
    margin=dict(t=50, l=25, r=25, b=100),
    width=1200
)

fig.show()

#fig.write_image("Static Charts/top_cities.png", scale=2)

### New vs. Returning Customers Over Time

In [15]:
# Finding first purchase month per customer
first_purchase = (
    sales
    .groupby('customer_key')['order_date'].min()
    .dt.to_period('M')
)

sales['month'] = sales['order_date'].dt.to_period('M')

# Classifying customer type
sales['customer_type'] = np.where(
    sales['month'] == sales['customer_key'].map(first_purchase),
    'New', 
    'Returning'
)

# Counting unique customers per month
monthly_counts = (
    sales
    .groupby(['month', 'customer_type'])['customer_key']
    .nunique()
    .reset_index()
    .assign(month=lambda df: df['month'].dt.to_timestamp())
)

fig = px.area(
    monthly_counts,
    x='month',
    y='customer_key',
    color='customer_type',
    title='New vs Returning Customers Over Time',
    labels={'month': ''},
    category_orders={'customer_type': ['New', 'Returning']},
    template='plotly_white'
)

fig.update_layout(
    height=500,
    width=1500,
    title_x=0.5,
    xaxis_showgrid=False,
    yaxis=dict(
        title='Customers',
        tickformat=',d'
    ),
    hovermode='x unified',
    legend=dict(
        orientation='h',
        x=0.5,
        xanchor='center',
        y=1.0,
        title_text=''
    )
)

fig.update_traces(
    hovertemplate='%{y:,} %{fullData.name} customers<extra></extra>',
    line=dict(width=1)
)

fig.show()

#fig.write_html("new_returning_time.html", include_plotlyjs="cdn")
#fig.write_image("Static Charts/new_returning_time.png", scale=2)

### Customer Retention Rate by Region

In [16]:
unique_orders = sales.groupby(['store_country', 'customer_key'])['order_number'].nunique().reset_index()

# Calculating retention rate per country
retention = (
    unique_orders
    .groupby('store_country')
    .agg(
        unique_customers=('customer_key', 'nunique'),
        returned_customers=('order_number', lambda x: (x > 1).sum())
    )
    .reset_index()
    .assign(retention_rate=lambda df: (df['returned_customers'] / df['unique_customers']) * 100)
)

fig = px.bar(
    retention.sort_values('retention_rate'),
    x='retention_rate',
    y='store_country',
    title='Customer Retention Rate by Region<br><sub>Percentage of Customers with 2+ Purchases</sub>',
    text='retention_rate',
    custom_data=['unique_customers', 'returned_customers'],
    template='plotly_white'
)

fig.update_layout(
    yaxis=dict(title=None, showgrid=False),
    xaxis=dict(title='Retention Rate (%)', ticksuffix='%'),
    height=500,
    width=1200,
    title_x=0.5,
)

fig.update_traces(
    texttemplate='%{text:.1f}%',
    textposition='outside',
    hovertemplate=(
        '<b>%{y}</b><br>' +
        'Retention Rate: <b>%{x:.1f}%</b><br>' +
        'Returning Customers: <b>%{customdata[1]:,}</b><br>' +
        'Total Customers: <b>%{customdata[0]:,}</b>' +
        '<extra></extra>'
    )
)

fig.show()

#fig.write_image("Static Charts/retention_rate_region.png", scale=2)

# Data Export

In [17]:
sales.drop(columns=['month', 'year'], inplace=True)

sales.to_csv('Cleaned Data/merged_sales_data.csv', index=False)