## Spotify Popularity Predictor (Gradio Demo)

#### Introduction

Quick Gradio UI that lets you choose a trained model (Random Forest, tuned XGBoost, or Linear Regression), set key audio features (energy, danceability, valence, loudness, tempo, explicit flag, macro-genre), and return a predicted Spotify popularity score. Models are downloaded from the [Hugging Face repo](https://huggingface.co/YShutko/spotify-popularity-models/tree/main), and genre options come from the [cleaned dataset](https://github.com/YShutko/CI_spotify_track_analysis/blob/3c87c87abea4c3f4959ce5005da1cfe5ec366491/data/spotify_cleaned_data.csv)

---

First, all necessary libraries are imported

In [33]:
import pandas as pd
import joblib
import gradio as gr
from huggingface_hub import hf_hub_download

Below, models are loaded from Hugging Face repo. User can change model from dropdown menu in UI

In [None]:
REPO = "YShutko/spotify-popularity-models"   # Hugging Face repo ID

# Load models from Hugging Face Hub
model_files = {
    "Random Forest": "random_forest_model.pkl",
    "XGBoost (Tuned)": "xgb_model_best.pkl",
    "Linear Regression": "linear_regression_model.pkl"
}

# Dictionary to hold loaded models
loaded_models = {}

# Download and load each model
for model_name, file_name in model_files.items():
    model_path = hf_hub_download(
        repo_id=REPO,
        filename=file_name,
        token=None   # works if repo is public and files NOT in LFS
    )
    loaded_models[model_name] = joblib.load(model_path)

print(" All models loaded successfully!")

 All models loaded successfully!


Load dataset to get feature ranges

In [None]:
df = pd.read_csv("../data/spotify_cleaned_data.csv") # Load cleaned dataset to get genre options


Detailed Spotify genres (125+ categories) are groupped into a smaller set of broad macro-genres such as Pop, Rock, Hip-Hop/Rap, Electronic, Jazz/Blues, etc.

Why we add this:
* The dataset contains many very specific, niche genres, which makes one-hot encoding explode into hundreds of sparse columns.
* Grouping genres into macro-genres reduces dimensionality and makes the model more stable.
* Macro-genres capture the general musical category, which strongly influences listener behavior and popularity.
* This makes it easier for the model to learn general patterns like:
    * Pop & K-Pop → tend to be more popular
    * Jazz, Metal → often niche
    * Electronic/Dance → high energy & loudness patterns

Result:
Cleaner inputs, fewer sparse features, and improved model performance and interpretability.

In [36]:
# Macro-genre from track_genre
def map_macro_genre(g):
    g = str(g).lower()
    if "pop" in g:
        return "Pop"
    elif "rock" in g:
        return "Rock"
    elif "hip hop" in g or "rap" in g or "trap" in g:
        return "Hip-Hop/Rap"
    elif "r&b" in g or "soul" in g:
        return "R&B/Soul"
    elif "electro" in g or "techno" in g or "house" in g or "edm" in g or "dance" in g:
        return "Electronic/Dance"
    elif "metal" in g or "hardcore" in g:
        return "Metal/Hardcore"
    elif "jazz" in g or "blues" in g:
        return "Jazz/Blues"
    elif "classical" in g or "orchestra" in g or "piano" in g:
        return "Classical"
    elif "latin" in g or "reggaeton" in g or "sertanejo" in g or "samba" in g:
        return "Latin"
    elif "country" in g:
        return "Country"
    elif "folk" in g or "singer-songwriter" in g:
        return "Folk"
    elif "indie" in g or "alternative" in g:
        return "Indie/Alternative"
    else:
        return "Other"

df["macro_genre"] = df["track_genre"].apply(map_macro_genre)

Then a prediction function is written

In [37]:
def predict_popularity_api(model_name,energy, danceability, valence, loudness, tempo, explicit, genre):

    sample = pd.DataFrame([{
        "energy": energy,
        "danceability": danceability,
        "valence": valence,
        "loudness": loudness,
        "tempo": tempo,
        "explicit": explicit,
        "macro_genre": genre,
        "artist_popularity": 50,
        "loudness_danceability": loudness * danceability,
        "energy_valence": energy * valence,
        "instrumentalness": 0,
        "acousticness": 0,
        "liveness": 0,
        "speechiness": 0,
        "duration_min": 3
    }])
    model = loaded_models[model_name]

    prediction = model.predict(sample)[0]
    return round(float(prediction), 2)


Gradio interface wiring: a dropdown to pick which saved model to run, sliders/dropdowns for the key features (energy, danceability, valence, loudness, tempo, explicit flag, macro-genre), and a single numeric output showing the predicted popularity. It launches with a title/description and shareable link via share=True.

In [38]:
interface = gr.Interface(
    fn=predict_popularity_api,
    inputs=[
        gr.Dropdown(list(model_files.keys()), label="Choose Model"),
        gr.Slider(0, 1, step=0.01, label="Energy"),
        gr.Slider(0, 1, step=0.01, label="Danceability"),
        gr.Slider(0, 1, step=0.01, label="Valence"),
        gr.Slider(-60, 0, step=1, label="Loudness"),
        gr.Slider(50, 200, step=1, label="Tempo"),
        gr.Dropdown([True, False], label="Explicit"),
        gr.Dropdown(sorted(df["macro_genre"].unique()), label="Macro Genre"),
    ],
    outputs="number",
    title="Spotify Popularity Predictor (Multiple Models)",
    description="Select a model and adjust the sliders to predict track popularity."
)

interface.launch(share=True)

* Running on local URL:  http://127.0.0.1:7865

Could not create share link. Missing file: C:\Users\julia\.cache\huggingface\gradio\frpc\frpc_windows_arm64_v0.3. 

Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps: 

1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.3/frpc_windows_arm64.exe
2. Rename the downloaded file to: frpc_windows_arm64_v0.3
3. Move the file to this location: C:\Users\julia\.cache\huggingface\gradio\frpc




---

### Conclusion

- Gradio wraps the trained models (Random Forest, tuned XGBoost, Linear Regression) so stakeholders can test popularity predictions without a notebook.
- Tree-based models are strongest; keep the model dropdown wired to the selected model before sharing.
- Use this demo for quick feedback loops, then add input validation/logging if you plan 