# PSET 8: T-Tests and ANOVA
This notebook analyzes customer feedback and sales data using various statistical methods.

In [2]:
import pandas as pd
import numpy as np
from scipy import stats


df_feedback = pd.read_csv("customer_feedback.csv")
df_sales = pd.read_csv("sales_data.csv")
df_feedback['date'] = pd.to_datetime(df_feedback['date'])
df_sales['date'] = pd.to_datetime(df_sales['date'])

## 1. Feedback Analysis
Compare customer satisfaction between iOS and Android users.

In [5]:

ios_scores = df_feedback[df_feedback['product'] == 'iOS']['feedback_score'].to_numpy()
android_scores = df_feedback[df_feedback['product'] == 'Android']['feedback_score'].to_numpy()

std_ios = np.std(ios_scores, ddof=1)
std_android = np.std(android_scores, ddof=1)
equal_var = np.isclose(std_ios, std_android, rtol=0.1)

t_stat, p_val = stats.ttest_ind(ios_scores, android_scores, equal_var=equal_var)
t_stat, p_val

(1.9033888211703986, 0.05756609365982318)

**Interpretation:** The p-value is 0.0576, which is greater than 0.05. Thus, there is no statistically significant difference in customer satisfaction between iOS and Android users.

## 2. Sales Campaign Analysis
Compare sales before and after the March 2023 campaign.

In [9]:

before = df_sales[df_sales['date'] < '2023-03-01']['sales'].to_numpy()
after = df_sales[df_sales['date'] >= '2023-03-01']['sales'].to_numpy()

t_stat, p_val = stats.ttest_ind(before, after, equal_var=np.isclose(np.std(before, ddof=1), np.std(after, ddof=1), rtol=0.1))
t_stat, p_val


(0.27045080178405995, 0.7870335279675489)

**Interpretation:** The p-value is 0.787, which indicates no significant impact of the campaign.

## 3. Seasonal Sales Analysis
Compare summer (Jun-Aug) vs. winter (Dec-Feb) sales.

In [13]:

summer = df_sales[df_sales['date'].dt.month.isin([6, 7, 8])]['sales'].to_numpy()
winter = df_sales[df_sales['date'].dt.month.isin([12, 1, 2])]['sales'].to_numpy()

t_stat, p_val = stats.ttest_ind(summer, winter, equal_var=np.isclose(np.std(summer, ddof=1), np.std(winter, ddof=1), rtol=0.1))
t_stat, p_val


(0.09956961638905915, 0.9207644588060664)

**Interpretation:** The p-value is 0.921, showing no seasonal difference in sales.

## 4. Feedback Consistency (ANOVA)
Test if feedback scores differ across Jan, May, Sep, Dec.

In [17]:

months = [1, 5, 9, 12]
feedback_groups = [df_feedback[df_feedback['date'].dt.month == m]['feedback_score'].to_numpy() for m in months]

f_stat, p_val = stats.f_oneway(*feedback_groups)
f_stat, p_val


(0.3146823675455494, 0.8147473590881886)

**Interpretation:** The p-value is 0.815, indicating no significant variation across months.

## 5. Feedback and Sales Correlation
Test if high feedback months have higher sales.

In [21]:

df_feedback['month'] = df_feedback['date'].dt.to_period('M')
df_sales['month'] = df_sales['date'].dt.to_period('M')

avg_feedback = df_feedback.groupby('month')['feedback_score'].mean().reset_index()
monthly_sales = df_sales.groupby('month')['sales'].sum().reset_index()

merged = pd.merge(avg_feedback, monthly_sales, on='month')
high = merged[merged['feedback_score'] >= merged['feedback_score'].median()]['sales'].to_numpy()
low = merged[merged['feedback_score'] < merged['feedback_score'].median()]['sales'].to_numpy()

t_stat, p_val = stats.ttest_ind(high, low, equal_var=np.isclose(np.std(high, ddof=1), np.std(low, ddof=1), rtol=0.1))
t_stat, p_val


(-1.275671553542551, 0.22835205852512985)

**Interpretation:** The p-value is 0.228. No strong evidence of correlation between feedback and sales.