Here is an example of how to use scikit-learn to train a random forest regressor to predict the popularity of a song based on its features.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Load the dataset
df = pd.read_csv('Spotify_Youtube.csv')
df = df.dropna()

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df.drop(['Track', 'Artist', 'Url_spotify', 'Album', 'Album_type', 'Uri', 'Stream', 'Url_youtube', 'Title', 'Channel', 'Views', 'Likes', 'Comments', 'Description', 'Licensed', 'official_video'], axis=1), df['Stream'], test_size=0.2, random_state=42)

# Train a random forest regressor
rf = RandomForestRegressor(n_estimators=100, random_state=1)
rf.fit(X_train, y_train)

# Evaluate the model on the testing set
y_pred = rf.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('MSE:', mse)

MSE: 4.231945800074944e+16


In this example, we use the RandomForestRegressor class from scikit-learn to train a random forest model with 100 trees. We then evaluate the model on the testing set using mean squared error as the performance metric.

Note that we drop the non-numeric columns from the dataset, as they cannot be used as input features to the model. We also use the 'Stream' column as the target variable, which we are trying to predict.

In [2]:
import joblib

# Save the model to disk
filename = 'predict_music.sav'
joblib.dump(rf, filename)


['predict_music.sav']