# Hypothesis Testing

## Step 1: Load Data

In [1]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Load the cleaned data
df = pd.read_feather('../data/spotify_2000_2020.feather')

# Add derived columns 
df['is_hit'] = (df['popularity'] > 80).astype(int)
dance_threshold = df['danceability'].quantile(0.75)
df['high_danceability'] = (df['danceability'] > dance_threshold).astype(int)

## Step 1: Define Hypothesis
### Hypothesis: Does High Danceability Increase Hit Probability?

We want to test whether songs with high danceability are significantly more likely to be hits.

- **Null Hypothesis (H₀):**  
  P(hit | high danceability) = P(hit | not high danceability)

- **Alternative Hypothesis (H₁):**  
  P(hit | high danceability) > P(hit | not high danceability)

We'll conduct a **one-tailed two-proportion z-test** to compare the hit rates between the two groups.


## Step 2: Compute Group Proportions + Counts

In [3]:
# Group A: High Danceability
group_a = df[df['high_danceability'] == 1]
hits_a = group_a['is_hit'].sum()
n_a = group_a.shape[0]

# Group B: Not High Danceability
group_b = df[df['high_danceability'] == 0]
hits_b = group_b['is_hit'].sum()
n_b = group_b.shape[0]

# Print proportions
p_a = hits_a / n_a
p_b = hits_b / n_b

print(f"Group A (High Danceability): {hits_a}/{n_a} → p = {p_a:.4f}")
print(f"Group B (Not High Danceability): {hits_b}/{n_b} → p = {p_b:.4f}")


Group A (High Danceability): 224/10333 → p = 0.0217
Group B (Not High Danceability): 211/31323 → p = 0.0067


### Hit Rate Comparison Between Groups

- Group A (High Danceability): 224 hits out of 10,333 songs → P = 0.0217

- Group B (Not High Danceability): 211 hits out of 31,323 songs → P = 0.0067

- This suggests that high-danceability songs have a notably higher hit rate, but we now need to test whether this difference is statistically significant.

## Step 3: Run the Two-Proportion Z-Test
### When to Use a Two-Proportion Z-Test
Use it when:

- You have two independent groups

- You're comparing the proportion of success (in our case: hit = 1) between them

- The sample sizes are large enough to approximate the normal distribution

In [5]:
from statsmodels.stats.proportion import proportions_ztest

# Success counts and sample sizes
successes = np.array([hits_a, hits_b])
samples = np.array([n_a, n_b])

# Run one-tailed z-test (alternative: group A > group B)
stat, pval = proportions_ztest(count=successes, nobs=samples, alternative='larger')

print(f"Z-statistic: {stat:.4f}")
print(f"p-value: {pval:.4f}")


Z-statistic: 12.9564
p-value: 0.0000


### Hypothesis Test Result: Two-Proportion Z-Test
Result:

- Z-statistic: 12.9564

- p-value: < 0.0000 

Interpretation:
Since the p-value is far below 0.05, we reject the null hypothesis.
There is strong statistical evidence that songs with high danceability have a significantly higher chance of becoming hits than songs with lower danceability.

This supports the idea that danceability is positively associated with song success, and justifies further exploration via causal inference or Bayesian analysis in future steps.