<a href="https://colab.research.google.com/github/AnamHJ24/datascience-python-challenges/blob/main/notebooks/Day5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Day 5 - Nintendo
You are a Product Analyst working with the **Nintendo Switch** **2** pre-sales team to analyze
regional pre-order patterns and customer segmentation. Your team needs to understand how
different demographics influence pre-sale volumes across regions. You will leverage historical
pre-sale transaction data to extract meaningful insights that can guide marketing strategies.


In [1]:
# Import required libraries
import pandas as pd
import numpy as np

# Import data file
url = "https://raw.githubusercontent.com/AnamHJ24/datascience-python-challenges/refs/heads/main/Data/Day5.txt"
pre_sale_data = pd.read_csv(url)
pre_sale_data.head()

Unnamed: 0,region,customer_id,pre_order_date,demographic_group,pre_order_quantity
0,North America,C001,2024-07-02,Gamer,1
1,Europe,C002,2024-07-03,Casual,2
2,Asia,C003,2024-07-04,Tech Enthusiast,1
3,Latin America,C004,2024-07-05,Family,3
4,Oceania,C005,2024-07-06,Student,2


## Question 1
What percentage of records have missing values in at least one column? Handle the missing values, so
that we have a cleaned dataset to work with.

## Solution

In [5]:
# Finding number of rows with missing values
missing_rows = pre_sale_data.isna().any(axis = 1).sum( )

# Calculate percentage
missing_rows_percent = (missing_rows / len(pre_sale_data)) * 100
print("The percentage of records with missing values is:",round(missing_rows_percent,2),"%\n")

# Drop rows with missing values
pre_sale_data_cleaned = pre_sale_data.dropna( )

The percentage of records with missing values is: 6.67 %



## Question 2
Using the cleaned data, calculate the total pre-sale orders per month for each region and demographic
group.

## Solution

In [12]:
import calendar
# Convert required columns to datetime
pre_sale_data_cleaned = pre_sale_data_cleaned.copy()
pre_sale_data_cleaned['pre_order_date'] = pd.to_datetime(pre_sale_data_cleaned['pre_order_date'])

# Find total pre-sale orders
grouped_data = pre_sale_data_cleaned.groupby([pre_sale_data_cleaned['pre_order_date'].dt.month,'region','demographic_group'])['pre_order_quantity'].sum( )
grouped_data.index = grouped_data.index.set_levels(
    [calendar.month_name[month] for month in grouped_data.index.levels[0]],level=0)
print("Total pre-sale orders per month for each region and demographic group:\n")
print(grouped_data.unstack().fillna(0))

Total pre-sale orders per month for each region and demographic group:

demographic_group             Casual  Family  Gamer  Student  Tech Enthusiast
pre_order_date region                                                        
July           Asia              4.0     4.0    2.0      3.0              1.0
               Europe            2.0     4.0    2.0      7.0              0.0
               Latin America     3.0     3.0    1.0      4.0              2.0
               North America     1.0     3.0    8.0      0.0              1.0
               Oceania           0.0     3.0    5.0      2.0              4.0
August         Asia              8.0     0.0    4.0      3.0              0.0
               Europe            2.0     4.0    4.0      0.0              0.0
               Latin America     4.0     5.0    0.0      0.0              5.0
               North America     0.0     4.0    2.0     11.0              2.0
               Oceania           0.0     1.0    0.0      6.0          

## Question 3
Predict the total pre-sales quantity for each region for September 2024. Assume that growth rate from
August to September, is the same as the growth rate from July to August in each region.

## Solution

In [16]:
pre_sale_data_cleaned ['month'] = pre_sale_data_cleaned ['pre_order_date'].dt.month
pre_sale_data_cleaned ['year'] = pre_sale_data_cleaned ['pre_order_date'].dt.year

# Filter for July and August 2024 data
summer_data = pre_sale_data_cleaned[
                ((pre_sale_data_cleaned ['month'].isin([7, 8])) &
                  (pre_sale_data_cleaned ['year'] == 2024))]

# Calculate monthly totals
monthly_totals = summer_data.groupby(['region','month'])['pre_order_quantity'].sum().unstack()
monthly_totals.columns = ['July', 'August']

# Calculate growth rate from July to August for each region
monthly_totals['growth_rate'] = (monthly_totals['August']- monthly_totals['July']) / monthly_totals['July']

# Calculate September predictions
monthly_totals['September_prediction'] = monthly_totals['August'] * (1 + monthly_totals['growth_rate'])

# Round to whole numbers since we can't have partial orders
monthly_totals['September_prediction'] = monthly_totals['September_prediction'].round().astype(int)

# Final output with predictions
result = monthly_totals[['July', 'August','growth_rate', 'September_prediction']]
result = result.rename(columns={
'growth_rate': 'July-Aug Growth Rate',
'September_prediction': 'September 2024 Forecast'})
print(result.reset_index())

          region  July  August  July-Aug Growth Rate  September 2024 Forecast
0           Asia    14      15              0.071429                       16
1         Europe    15      10             -0.333333                        7
2  Latin America    13      14              0.076923                       15
3  North America    13      19              0.461538                       28
4        Oceania    14      10             -0.285714                        7
