TikTok Video Popularity Prediction

This notebook is a concise prototype for predicting whether a TikTok video will be popular: it loads a tabular dataset of TikTok videos, defines a binary target variable based on likes, engineers simple features, trains a Random Forest classifier, and evaluates its performance. Comments throughout explain each step.

In [None]:
# Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report


In [None]:
# Load dataset
DATA_FILE = 'tiktok_video_performance.csv'  # Update if needed
df = pd.read_csv(DATA_FILE)
print(df.head())


In [None]:
# Define target variable
median_likes = df['Likes'].median()
df['is_popular'] = (df['Likes'] > median_likes).astype(int)
print('Median Likes threshold:', median_likes)


In [None]:
# Feature engineering and model training
# Calculate number of hashtags by splitting the Hashtags column (string) on whitespace
df['num_hashtags'] = df['Hashtags'].fillna('').apply(lambda x: len(str(x).split()))
# Select features available before determining popularity
X = df[['Comments', 'Shares', 'Views', 'num_hashtags']].fillna(0)
y = df['is_popular']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the model
model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Print classification report to evaluate the model
print(classification_report(y_test, y_pred))
