# üíø Spotify Tracks
## üéµ Hypothesis 1 ‚Äî Audio Features vs Popularity

| Field         | Description |
|---------------|-------------|
| Author:       | Robert Steven Elliott |
| Course:       | Code Institute ‚Äì Data Analytics with AI Bootcamp |
| Project Type: |	Hackathon 2 |
| Date:         |	December 2025 |

$$
\begin{aligned}
H_{0} &= \text{There is no statistically significant relationship between a track‚Äôs danceability or energy and its popularity score.} \\
H_{1} &= \text{Tracks with higher danceability and energy have significantly higher popularity scores than tracks with lower values.}
\end{aligned}
$$

### Import Libraries

In [1]:
import sys
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr

### Project Paths

In [2]:
PROJECT_ROOT = Path.cwd().parent
sys.path.append(str(PROJECT_ROOT))
DATA_DIR = PROJECT_ROOT / "data" / "clean"

INPUT_CSV = DATA_DIR / "spotify_clean.csv"


### Load Custom Libraries

In [3]:
from utils.data_processing import load_data
from utils.visualisation import plot_scatter

### Load Dataset

In [4]:
df = load_data(INPUT_CSV)
df.head()

Unnamed: 0,artists,name,popularity,duration_ms,explicit,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,genre,artist_primary
0,sam smith;kim petras,unholy,100,156943,False,0.714,0.472,2,-7.375,1,0.0864,0.013,5e-06,0.266,0.238,131.121,4,pop,sam smith
1,bizarrap;quevedo,"quevedo: bzrp music sessions, vol. 52",99,198937,False,0.621,0.782,2,-5.548,1,0.044,0.0125,0.033,0.23,0.55,128.033,4,hip-hop,bizarrap
2,manuel turizo,la bachata,98,162637,False,0.835,0.679,7,-5.329,0,0.0364,0.583,2e-06,0.218,0.85,124.98,4,reggaeton,manuel turizo
3,david guetta;bebe rexha,i'm good,98,175238,True,0.561,0.965,7,-3.673,0,0.0343,0.00383,7e-06,0.371,0.304,128.04,4,edm,david guetta
4,bad bunny;chencho corleone,me porto bonito,97,178567,True,0.911,0.712,1,-5.105,0,0.0817,0.0901,2.7e-05,0.0933,0.425,92.005,4,reggae,bad bunny


### Take Random sample

In [5]:
df_sample = df.sample(n=1000, random_state=42)

### Statistical Analysis
#### Pearson Correlation

In [6]:
# Pearson correlations
r_dance, p_dance = pearsonr(df["danceability"], df["popularity"])
r_energy, p_energy = pearsonr(df["energy"], df["popularity"])

print(f"Danceability vs Popularity: r={r_dance:.3f}, p={p_dance:.4f}")
print(f"Energy vs Popularity: r={r_energy:.3f}, p={p_energy:.4f}")

Danceability vs Popularity: r=0.104, p=0.0000
Energy vs Popularity: r=0.009, p=0.0141


##### What this means:

**Danceability**

- The correlation is positive but weak.
- The very small p-value indicates the relationship is statistically significant.
- However, the effect size is small, meaning:
    - Danceability does influence popularity, but only slightly.
    - It is not a strong predictor on its own.

‚úÖ Statistically significant

‚ö†Ô∏è Practically weak

**Energy**

- The correlation coefficient is close to zero.
- Despite the p-value being below 0.05, the effect size is negligible.
- This suggests:
    - With a large dataset, even tiny relationships can appear significant.
    - Energy alone does not meaningfully explain popularity.

‚ö†Ô∏è Statistically significant but not practically meaningful

##### Popularity vs Danceability

In [7]:
fig = plot_scatter(
    df_sample,
    x_col="danceability",
    y_col="popularity",
    title="Danceability vs Popularity with Regression Line",
    xlabel="Danceability",
    ylabel="Popularity",
    trend=True
)
fig.show()

##### Popularity vs Energy

In [8]:
fig = plot_scatter(
    df_sample,
    x_col="energy",
    y_col="popularity",
    title="Energy vs Popularity with Regression Line",
    xlabel="Energy",
    ylabel="Popularity",
    trend=True
)
fig.show()

### üß™ Hypothesis Decision


Null Hypothesis ($H_{0}$)

There is no statistically significant relationship between danceability or energy and popularity.

#### Decision

- Danceability: ‚ùå Reject H‚ÇÄ (weak but significant relationship)
- Energy: ‚ö†Ô∏è Fail to reject H‚ÇÄ in practical terms (effect ‚âà 0)


### üß† Conclusion:
Hypothesis 1 is partially supported. Danceability shows a weak but statistically significant positive relationship with popularity, suggesting it contributes marginally to a track‚Äôs success. Energy, however, demonstrates a negligible correlation with popularity despite statistical significance, indicating it is not a meaningful standalone predictor. Overall, audio features alone are insufficient to explain popularity, highlighting the influence of external factors such as marketing, artist reputation, and listener trends. Given the weak effect sizes, popularity is likely driven by a combination of musical, social, and industry factors rather than individual audio characteristics.