<a href="https://colab.research.google.com/github/AnamHJ24/datascience-python-challenges/blob/main/notebooks/Day2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Day 2 - Amazon

You are a Product Analyst on the **Amazon** Sponsored Advertising team investigating
sponsored product ad engagement across electronics categories. Your team wants to
understand CTR variations to optimize targeted advertising strategies.

In [1]:
# import required libraries
import pandas as pd
import numpy as np

# import data files
url1 = 'https://raw.githubusercontent.com/AnamHJ24/datascience-python-challenges/refs/heads/main/Data/Day2_dim_prod.txt'
url2 = 'https://raw.githubusercontent.com/AnamHJ24/datascience-python-challenges/refs/heads/main/Data/Day2_fct_ad.txt'

dim_product = pd.read_csv(url1)
fct_ad_performance = pd.read_csv(url2)

dim_product.head()

Unnamed: 0,product_id,product_name,product_category
0,1,Smart TV,Home Electronics
1,2,Wireless Earbuds,Electronics & Gadgets
2,3,Refrigerator,Electronics Appliances
3,4,Bestselling Novel,Books
4,5,Designer Jeans,Fashion


In [2]:
fct_ad_performance.head()

Unnamed: 0,ad_id,clicks,product_id,impressions,recorded_date
0,101,10,1,200,2024-10-02
1,102,15,1,300,2024-10-12
2,103,20,2,250,2024-10-05
3,104,18,2,230,2024-10-20
4,105,5,3,150,2024-10-15


## Question 1
What is the average click-through rate (CTR) for sponsored product ads for each product category that
contains the substring 'Electronics' in its name during October 2024? This analysis will help determine
which electronics-related categories are performing optimally.

## Solution

In [None]:
# Convert required column to datetime
fct_ad_performance['recorded_date'] = pd.to_datetime(fct_ad_performance['recorded_date'])

# Filter datframe for October 2024
oct_df = fct_ad_performance[
    (fct_ad_performance['recorded_date'].dt.year == 2024) &
     (fct_ad_performance['recorded_date'].dt.month == 10)]

# Filter to select rows containing 'Electronics'
electronics_df = dim_product[dim_product['product_category'].str.contains('Electronics')]

# Merge filtered datframes
merged_df = pd.merge(oct_df, electronics_df, on = 'product_id')
merged_df['CTR'] = merged_df['clicks'] / merged_df['impressions']

# Calculate CTR
avg_ctr = merged_df.groupby('product_category')['CTR'].mean()
print("The average click-through rate is:\n",avg_ctr.to_string())


The average click-through rate is:
 
 product_category
Electronics & Gadgets      0.079130
Electronics Accessories    0.100000
Electronics Appliances     0.050000
Electronics Gadgets        0.072500
Home Electronics           0.066667


## Question 2
Which product categories have a CTR greater than the aggregated overall average CTR for sponsored
product ads during October 2024? This analysis will identify high-performing categories for further
optimization. For this question, we want to calculate CTR for each ad, then get the average across ads
by product category & overall.


## Solution

In [None]:
# Calculate CTR for October data
oct_df = oct_df.copy()
oct_df['CTR'] = oct_df['clicks'] / oct_df['impressions']
merged_df = pd.merge(oct_df, dim_product, on = 'product_id')

# Group by Product Categories
category_ctr = merged_df.groupby('product_category')['CTR'].mean().reset_index()
category_ctr.columns = ['Product category', 'Average CTR']
overall_cat_ctr = merged_df['CTR'].mean()

# Calculate high performing product categries
high_performing = category_ctr[category_ctr['Average CTR'] > overall_cat_ctr]
high_perfroming = high_performing.sort_values('Average CTR', ascending = False)
print("\nHIGH PERFORMING PRODUCT CATEGORIES (Above Average CTR)")
print(f"\nOverall Average CTR: {overall_cat_ctr:.2%}\n")
print(high_performing.to_string(index=False))


HIGH PERFORMING PRODUCT CATEGORIES (Above Average CTR)

Overall Average CTR: 6.80%

       Product category  Average CTR
                  Books      0.10000
  Electronics & Gadgets      0.07913
Electronics Accessories      0.10000
    Electronics Gadgets      0.07250
                Kitchen      0.07000


## Question 3
For the product categories identified in the previous question, what is the percentage difference
between their CTR and the overall average CTR for October 2024? This analysis will quantify the
performance gap to recommend specific categories for targeted advertising optimization.

## Solution

In [None]:
# Calculate Percentage difference
high_performing = high_performing.copy()
high_performing['Percentage difference'] = (high_performing['Average CTR'] - overall_cat_ctr) /overall_cat_ctr * 100

# Sort Values
high_performing_categories = high_performing.sort_values('Percentage difference', ascending = False)
print("\nTOP PERFORMING PRODUCT CATEGORIES (vs. Average CTR)")
print(f"\nBenchmark (Overall Average CTR): {overall_cat_ctr:.2%}\n")

# Display formatted table
formatted_df = high_performing_categories.copy()
formatted_df['Average CTR'] = formatted_df['Average CTR'].map('{:.2%}'.format)
formatted_df['Percentage difference'] = formatted_df['Percentage difference'].map('{:.2f}%'.format)

print(formatted_df[['Product category', 'Average CTR', 'Percentage difference']]
      .to_string(index=False))



TOP PERFORMING PRODUCT CATEGORIES (vs. Average CTR)

Benchmark (Overall Average CTR): 6.80%

       Product category Average CTR Percentage difference
                  Books      10.00%                46.96%
Electronics Accessories      10.00%                46.96%
  Electronics & Gadgets       7.91%                16.29%
    Electronics Gadgets       7.25%                 6.55%
                Kitchen       7.00%                 2.87%
