# TRAIN AND SAVE THE MODEL

### Model Architecture

**Pre-trained Vision Model + Regressor**

**Pre-trained Vision Model:**

- CLIP
- ResNet50
- …

**Regressor:**

- XGBoost
- LightGBM
- MLP
- …

Vision model will produce image embeddings. Regressor will take it and give a score.

We will only train regressor head of the model.

## ✅ Why CLIP ResNet-50 + LightGBM?

### 🔍 CLIP ResNet-50 (RN50)
- **Pretrained** on massive datasets → captures general visual features well
- **ResNet backbone** is lighter and faster than ViT → ideal for CPU/MacBook M3
- No fine-tuning needed → use as a frozen image encoder
- Extracts 1024D embeddings → perfect for downstream ML models

### ⚡ LightGBM Regressor
- Extremely **fast and efficient on CPU**
- Handles **small to medium datasets** well
- Works great with **numerical features** like image embeddings
- Easy to **train, save, and interpret**

### 🧩 Architecture Summary
YouTube Thumbnail → CLIP-RN50 → 1024D Embedding → LightGBM → Performance Score


✅ Lightweight, fast, and effective pipeline for thumbnail scoring on local machines.


# Create Dataset

#### We have thumbnails and their metadata stored. We should create their embeddings from the images and label them with a score function to feed the regressor.

### Load CLIP ResNet50 Model

In [1]:
import open_clip
import torch

device = "mps" if torch.backends.mps.is_available() else "cpu"
model, _, preprocess = open_clip.create_model_and_transforms('RN50', pretrained='openai')

model.eval()
model.to(device)



CLIP(
  (visual): ModifiedResNet(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act1): ReLU(inplace=True)
    (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act2): ReLU(inplace=True)
    (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act3): ReLU(inplace=True)
    (avgpool): AvgPool2d(kernel_size=2, stride=2, padding=0)
    (layer1): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (act1): ReLU(inplace=True)
        (

### Load Metadata

#### We need a scoring function to label thumbnails using their metadata

In [2]:
import numpy as np
import pandas as pd
from pandas import DataFrame

def score_thumbnails(metadata: DataFrame) -> DataFrame:

    metadata['score'] = np.log(metadata['view_count'] / metadata['average_view_count'])
    
    return metadata.sort_values(by='score', ascending=False).reset_index(drop=True)


In [3]:
# Load metadata
df = pd.read_csv("data/raw/total.csv")

# Filter and score thumbnails
df = score_thumbnails(df)

# Save the filtered and scored thumbnails
df.to_csv("data/raw/scored_metadata.csv", index=False)


### 📊 Rescale Scores Based on Percentiles
To normalize the score column into a new range between 0.5 and 10 based on percentiles, we will log-based scores into a more intuitive and comparable scale.


In [4]:
# Load CSV
df = pd.read_csv("data/raw/scored_metadata.csv")

# Define number of bins (e.g., 20 bins between 0.5 and 10)
n_bins = 20
bin_edges = np.percentile(df['score'], np.linspace(0, 100, n_bins + 1))
bin_labels = np.round(np.linspace(0.5, 10, n_bins), 2)

# Digitize score into bins
df['scaled_score'] = pd.cut(df['score'], bins=bin_edges, labels=bin_labels, include_lowest=True)

# Optional: convert to float
df['scaled_score'] = df['scaled_score'].astype(float)

# Save updated CSV
df.to_csv("data/raw/scored_metadata_scaled.csv", index=False)

print("Updated scores saved to data/raw/scored_metadata_scaled.csv")

Updated scores saved to data/raw/scored_metadata_scaled.csv


### Load CLIP ResNet50 Model

### Extract Image Embeddings

In [5]:
import os
from PIL import Image
from tqdm import tqdm

# Define the folder containing thumbnail images
image_folder = "src/data_scraping/thumbnails"

# Parameters
batch_size = 32
image_tensors = []
scores = []
X_train = []

# Prepare data for batching
valid_rows = []
for _, row in df.iterrows():
    image_path = os.path.join(image_folder, row['video_id'])+".jpg"
    try:
        image = Image.open(image_path).convert("RGB")
        image_tensor = preprocess(image)
        image_tensors.append(image_tensor)
        scores.append(row['scaled_score'])
        valid_rows.append(row)
    except Exception as e:
        print(f"Skipping {row['video_id']}: {e}")

# Batch processing
for i in tqdm(range(0, len(image_tensors), batch_size)):
    batch = image_tensors[i:i+batch_size]
    batch_tensor = torch.stack(batch).to(device)

    with torch.no_grad():
        batch_embeddings = model.encode_image(batch_tensor).cpu().numpy()

    X_train.extend(batch_embeddings)

# Convert to final arrays
X_train = np.array(X_train)
y_train = np.array(scores[:len(X_train)])

print("✅ Batch processing complete")
print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)

Skipping Qx18v00TVqI: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/Qx18v00TVqI.jpg'
Skipping pk9924dUdz4: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/pk9924dUdz4.jpg'
Skipping RP7dF9vXqZY: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/RP7dF9vXqZY.jpg'
Skipping 7tTzSeWGY90: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/7tTzSeWGY90.jpg'
Skipping behMZUjf_Rw: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/behMZUjf_Rw.jpg'
Skipping K_ovEr89zyg: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/K_ovEr89zyg.jpg'
Skipping VnupfBL9DJ8: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/VnupfBL9DJ8.jpg'
Skipping sn7_X2zjfLA: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/sn7_X2zjfLA.jpg'
Skipping wPTOq3opOys: [Errno 2] No such file or directory: 'src/data_scraping/thumbnails/wPTOq3opOys.jpg'
Skipping _tPb-ZrMxw0: [Errno 2] No such file o

100%|██████████| 94/94 [00:25<00:00,  3.63it/s]

✅ Batch processing complete
X_train shape: (2996, 1024)
y_train shape: (2996,)





### Train LightGBM

In [6]:
import lightgbm as lgb
import joblib

lgb_model = lgb.LGBMRegressor()

In [7]:

lgb_model.fit(X_train, y_train)
# Save the trained model
joblib.dump(lgb_model, "model.lgb")

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009208 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 261120
[LightGBM] [Info] Number of data points in the train set: 2996, number of used features: 1024
[LightGBM] [Info] Start training from score 5.252003


['model.lgb']

### Predict Thumbnail Score

In [8]:
import numpy as np
from PIL import Image
import joblib

# Load the trained model
regressor: lgb.LGBMRegressor
regressor = joblib.load("model.lgb")

# --- Feature extraction ---
def extract_image_embedding(image_path):
    image = preprocess(Image.open(image_path).convert("RGB")).unsqueeze(0).to(device)
    with torch.no_grad():
        image_features = model.encode_image(image)
    return image_features.cpu().numpy()

# --- Inference function ---
def predict_thumbnail_score(image_path):
    embedding = extract_image_embedding(image_path)
    score = regressor.predict(embedding)[0]
    est_ratio = np.exp(score)
    return {
        "log_score": round(score, 4),
        "views_per_subscriber_est": round(est_ratio, 2)
    }

In [10]:

image_path = "noktali_virgul_podcast_thumbnails/kq38Urs5QEU.jpg"  # Replace with your own image
result = predict_thumbnail_score(image_path)
print("Predicted:", result)

Predicted: {'log_score': np.float64(-0.9701), 'views_per_subscriber_est': np.float64(0.38)}




In [9]:

image_path = "noktali_virgul_podcast_thumbnails/kq38Urs5QEU.jpg"  # Replace with your own image
result = predict_thumbnail_score(image_path)
print("Predicted:", result)

Predicted: {'log_score': np.float64(4.9671), 'views_per_subscriber_est': np.float64(143.61)}


