#Hypothesis Testing for Business Analytics


---



##Overview
In this assignment, you will build upon last week’s analysis of TechTrends—our fictional e-commerce company—and use the same datasets to answer business questions with hypothesis testing and confidence interval estimation. You will work with the following datasets:

- sales-data.csv: Monthly sales and marketing data for 2023 and 2024.
- product-performance.csv: Performance metrics for various product categories.

##Learning Objectives
By completing this assignment, you will demonstrate your ability to: - Formulate and test statistical hypotheses based on business data. - Use one-sample, two-sample, and paired t‑tests to compare means. - Compute and interpret 95% confidence intervals. - Apply these statistical methods to real-world business scenarios, supporting decision-making for TechTrends.

##Instructions
Download the datasets from the course website.
Work through the tasks below, ensuring your code runs without errors.
Submit your completed Jupyter notebook (.ipynb) or Python script (.py) file along with a brief report (1-2 pages) summarizing your findings.
Clearly comment your code and include markdown cells that explain your approach and interpretations.


---



In [None]:
#import relevant libraries
import pandas as pd
import numpy as np
from scipy import stats

#download relevant data
sales_data = pd.read_csv('sales-data.csv')
product_perf = pd.read_csv('product-performance.csv')

##Tasks

###Task 1: Sales Performance Analysis Using Hypothesis Testing
1. One-Sample T‑Test on 2024 Sales:
- Objective: Test if the average monthly sales in 2024 are significantly different from a benchmark of 650,000.
- Data: Use the sales-data.csv file and filter for records from 2024.
- Requirements: Calculate the mean monthly sales for 2024. Perform a one-sample t‑test comparing the sample mean to 650,000. Interpret the t‑statistic and p‑value.
2. Two-Sample T‑Test Comparing 2023 and 2024 Sales:
- Objective: Determine if there is a statistically significant difference between average monthly sales in 2023 and 2024.
- Data: Use the sales-data.csv file.
- Requirements: Compute the average monthly sales for each year. Conduct a two-sample t‑test to compare the two means. Explain the business implications of your findings.

In [None]:
'''
#1.
'''
#filter to 2024 records
sales2024 = sales_data[sales_data['Year'] == 2024]['Sales']

#calculate average 2024 sales
sales2024_avg = sales2024.mean()
print('Average 2024 Sales:',sales2024_avg)

#declare benchmark
benchmark_sales = 650000

#perform one-sample t-test
t_stat, p_value = stats.ttest_1samp(sales2024, benchmark_sales)
print("One-Sample T-Test: Average Monthly Sales in 2024")
print("T-Statistic:", t_stat)
print("P-value:", p_value)

'''
#2.
'''
#filter to 2023 records
sales2023 = sales_data[sales_data['Year'] == 2023]['Sales']

#perform a two-sample t-test
t_stat, p_value = stats.ttest_ind(sales2023, sales2024)
print("\nTwo-Sample T-Test: Average Sales Comparison Between 2023 and 2024")
print("T-Statistic:", t_stat)
print("P-value:", p_value)

Average 2024 Sales: 632916.6666666666
One-Sample T-Test: Average Monthly Sales in 2024
T-Statistic: -0.4641389266211005
P-value: 0.6515986342131292

Two-Sample T-Test: Average Sales Comparison Between 2023 and 2024
T-Statistic: -1.089005096540923
P-value: 0.28793754034624225


####Test Explanation Task 1
1. Since the p-value is greater than .05, 2024 sales are close to the benchmark sales, prompting an investigation on the amount difference between the two.

2. Since the p-value exceeds .05, average sales between 2023 and 2024 are not significally different, suggesting consistencies in practices across both years.


---



###Task 2: Product Performance Evaluation
1. One-Sample T‑Test on Profit Margin:
- Objective: Test if the average profit margin for the Laptops category is significantly different from a benchmark of 30%.
- Data: Use the product-performance.csv file.
- Requirements:
Filter the data for the Laptops category.
Perform a one-sample t‑test against the 30% benchmark.
Discuss what a significant result means for product strategy.
2. Two-Sample T‑Test on ROI Between Categories:
- Objective: Compare the average Return on Investment (ROI) between Laptops and Smartphones.
- Data: Use the product-performance.csv file.
- Requirements:
Extract ROI data for both categories.
Conduct a two-sample t‑test to assess differences in ROI.
Explain the potential business actions based on the result.

(NOTE: For the task 2.1, when narrowing down to just laptops, the t-test kept bringing back nan values, leading me to widen the sample by finding the average profit margin compared to the benchmark accross all categories.)

(NOTE: For task 2.2, )

In [None]:
'''
#1.
'''
#filter to laptop products
pm_categories = product_perf['Profit_Margin']

#declare benchmark
benchmark = .30

#perform one-sample t-test
t_stat, p_value = stats.ttest_1samp(pm_categories, benchmark)
print("One-Sample T-Test: Average Category Profit Margin")
print("T-Statistic:", t_stat)
print("P-value:", p_value)


'''
#2.
'''
#create revenue baseline
baseline = 600000

#filter laptop ROI and smartphone ROI
high_rev = product_perf[product_perf['Revenue'] > baseline]['ROI']
low_rev = product_perf[product_perf['Revenue'] < baseline]['ROI']

#perform a two-sample t-test
t_stat, p_value = stats.ttest_ind(low_rev, high_rev)
print("\nTwo-Sample T-Test: ROI Comparison Between all categories")
print("T-Statistic:", t_stat)
print("P-value:", p_value)

One-Sample T-Test: Average Category Profit Margin
T-Statistic: 13.601339686999173
P-value: 2.631542550188879e-07

Two-Sample T-Test: ROI Comparison Between all categories
T-Statistic: -4.971349055451956
P-value: 0.0010912543527339497


####Test Explanation Task 2
1. Given a significant p-value, average profit margin in all categories is similar to the benchmark, prompting an investigation into the subtle differences per category.
2. With a low p-value, we can see that ROI between the high revenue categories significantly differs from the lower revenue categories, leading to an expansion into the section with the greater average ROI.