## Widget for in notebook prediction

#### Introduction

In [previous notebook](https://github.com/YShutko/CI_spotify_track_analysis/blob/84eff468d0f63f960026b5c8bb2dd31972dd798f/notebooks/ml_models.ipynb) three models, namely Linear Regression, Random Forrest and XGBoost, were developed and and uploaded on [Hugging Face repo](https://huggingface.co/YShutko/spotify-popularity-models/tree/main). In this Project Notebook a widget for song popularity prediction is developed.

---

First, all necessary libraries are imported

In [7]:
import joblib
import pandas as pd
from huggingface_hub import hf_hub_download
import ipywidgets as widgets
from IPython.display import display


Below, one of the models is loaded from Hugging Face repo. User can change model manually

In [None]:
# Constants
REPO = "YShutko/spotify-popularity-models" # Hugging Face repo name

MODEL_FILE = "xgb_model_best.pkl"     # or "xgb_model_best.pkl" or "linear_regression_model.pkl"

model_path = hf_hub_download(     # Download model from Hugging Face repo
    repo_id=REPO,
    filename=MODEL_FILE,
    token=None  # required only if repo or file is private
)

model = joblib.load(model_path)   # Load the model

print("Model loaded successfully:", MODEL_FILE)


Model loaded successfully: xgb_model_best.pkl


Load dataset to get feature ranges

In [None]:
df = pd.read_csv("../data/spotify_cleaned_data.csv") # Load dataset to get feature ranges

Detailed Spotify genres (125+ categories) are groupped into a smaller set of broad macro-genres such as Pop, Rock, Hip-Hop/Rap, Electronic, Jazz/Blues, etc.

Why we add this:
* The dataset contains many very specific, niche genres, which makes one-hot encoding explode into hundreds of sparse columns.
* Grouping genres into macro-genres reduces dimensionality and makes the model more stable.
* Macro-genres capture the general musical category, which strongly influences listener behavior and popularity.
* This makes it easier for the model to learn general patterns like:
    * Pop & K-Pop → tend to be more popular
    * Jazz, Metal → often niche
    * Electronic/Dance → high energy & loudness patterns

Result:
Cleaner inputs, fewer sparse features, and improved model performance and interpretability.

#Macro-genre from track_genre

def map_macro_genre(g):
    g = str(g).lower()
    if "pop" in g:
        return "Pop"
    elif "rock" in g:
        return "Rock"
    elif "hip hop" in g or "rap" in g or "trap" in g:
        return "Hip-Hop/Rap"
    elif "r&b" in g or "soul" in g:
        return "R&B/Soul"
    elif "electro" in g or "techno" in g or "house" in g or "edm" in g or "dance" in g:
        return "Electronic/Dance"
    elif "metal" in g or "hardcore" in g:
        return "Metal/Hardcore"
    elif "jazz" in g or "blues" in g:
        return "Jazz/Blues"
    elif "classical" in g or "orchestra" in g or "piano" in g:
        return "Classical"
    elif "latin" in g or "reggaeton" in g or "sertanejo" in g or "samba" in g:
        return "Latin"
    elif "country" in g:
        return "Country"
    elif "folk" in g or "singer-songwriter" in g:
        return "Folk"
    elif "indie" in g or "alternative" in g:
        return "Indie/Alternative"
    else:
        return "Other"

df["macro_genre"] = df["track_genre"].apply(map_macro_genre)

Then a prediction function is written

In [None]:
# Prediction function
def predict_popularity(energy, danceability, valence, loudness, tempo, explicit, genre):
    
    sample = pd.DataFrame([{            # Create a sample DataFrame for prediction
        "energy": energy,
        "danceability": danceability,
        "valence": valence,
        "loudness": loudness,
        "tempo": tempo,
        "explicit": explicit,
        "macro_genre": genre,
        "artist_popularity": 50,  # default
        "loudness_danceability": loudness * danceability,
        "energy_valence": energy * valence,
        "instrumentalness": 0,
        "acousticness": 0,
        "liveness": 0,
        "speechiness": 0,
        "duration_min": 3
    }])

    prediction = model.predict(sample)[0]   # Make prediction
    print(f" Predicted Popularity: {prediction:.1f}") # Display prediction


And user interface for popularity prediction by different features

In [12]:
#UI with ipywidgets

widgets.interact(
    predict_popularity,
    energy=widgets.FloatSlider(min=0, max=1, step=0.01, value=0.5),
    danceability=widgets.FloatSlider(min=0, max=1, step=0.01, value=0.5),
    valence=widgets.FloatSlider(min=0, max=1, step=0.01, value=0.5),
    loudness=widgets.FloatSlider(min=-60, max=0, step=1, value=-10),
    tempo=widgets.FloatSlider(min=50, max=200, step=1, value=120),
    explicit=widgets.Dropdown(options=[True, False]),
    genre=widgets.Dropdown(options=sorted(df["macro_genre"].unique()))
)


interactive(children=(FloatSlider(value=0.5, description='energy', max=1.0, step=0.01), FloatSlider(value=0.5,…

<function __main__.predict_popularity(energy, danceability, valence, loudness, tempo, explicit, genre)>

---

### Conclusion

In this notebook a widget for interective predictions was developed