<a href="https://colab.research.google.com/github/RecSys-lab/Popcorn/blob/main/examples/colab/popcorn_tool.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **üçø Popcorn Framework**
### **From Data Feed to Recommender List Generation**

üé¨ Popcorn Framework: [link](https://github.com/RecSys-lab/Popcorn)

## **[Step 1] Clone Popcorn Movie Recommender Tool**

Clone the framework into your `GDrive` and prepare it for experiments.

‚ö†Ô∏è You might see a *"Restart Session"* warning during the first run in Google Colab due to library version mismatches. This is expected! Accept the restart, re-run this cell, and continue!

In [1]:
# Clone the repo
!git clone https://github.com/RecSys-lab/Popcorn.git

# Install the required library
%cd Popcorn
!pip install -e .

# Add the repository to the Python path
import sys
sys.path.append('/content/Popcorn')

# Go back to the root
%cd ..

fatal: destination path 'Popcorn' already exists and is not an empty directory.
/content/Popcorn
Obtaining file:///content/Popcorn
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: Popcorn
  Attempting uninstall: Popcorn
    Found existing installation: Popcorn 1.6.0
    Uninstalling Popcorn-1.6.0:
      Successfully uninstalled Popcorn-1.6.0
  Running setup.py develop for Popcorn
Successfully installed Popcorn-1.6.0
/content


## üöÄ **[Step 2] Use the Framework**

### *1. Load Configurations and Imports*

In [2]:
import os
import json
import pandas as pd
from popcorn.utils import readConfigs

# Start the Framework
print("Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...\n")

# Read the configuration file
configs = readConfigs("Popcorn/popcorn/config/config.yml")
# If properly read, print the configurations
if not configs:
    print("Error reading the configuration file!")

Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...

- Reading the framework's configuration file ...
- Configuration file loaded successfully!


In [31]:
# @title ‚öôÔ∏è Configurations
# @markdown ‚öôÔ∏è Choose your desired configurations:

textual_modality = "OpenAI (raw)"  # @param ['None', 'OpenAI (raw)', 'OpenAI (augmented)', 'Llama (raw)', 'Llama (augmented)', 'ST (raw)', 'ST (augmented)']
audio_modality = "None"  # @param ['None', 'MMTF (i-vec)', 'MMTF (BLF)']
visual_modality = "Popcorn (Trailer - VGG19)"  # @param ['None', 'MMTF (CNN)', 'MMTF (AVF)', 'Popcorn (Trailer - VGG19)', 'Popcorn (Trailer - Incp3)', 'Popcorn (Full-Movie - VGG19)', 'Popcorn (Full-Movie - Incp3)', 'Popcorn (Shot - VGG19)', 'Popcorn (Shot - Incp3)']
movielens = "1m"  # @param ['100k', '1m', '25m']
split_method = 'random' # @param ['random', 'temporal', 'per_user']
recom_model = 'vbpr' # @param ['cf', 'vbpr', 'amr', 'vmf']
fusion_methods = 'CCA + PCA' # @param ['CCA', 'PCA', 'CCA + PCA']
cca_components = 40 # @param {"type":"slider","min":4,"max":64,"step":4}
pca_components = 0.8 # @param {"type":"slider","min":0.4,"max":0.99,"step":0.01}
top_n = 2 # @param ["2","5","10","15","20","25","30","50"] {"type":"raw"}
test_ratio = 0.2 # @param {"type":"slider","min":0.1,"max":0.9,"step":0.1}
n_epochs = 1 # @param {"type":"slider","min":1,"max":50,"step":1}

In [32]:
# Override Configurations

# MovieLens Dataset
configs["datasets"]["unimodal"]["movielens"]["version"] = movielens
configs["datasets"]["unimodal"]["movielens"]["download_path"] = "/content/MovieLens"

# Poison-RAG-Plus Dataset (LLM-Augmented Text)
configs["datasets"]["unimodal"]["poison_rag_plus"]["augmented"] = True if textual_modality.find("(augmented)") != -1 else False
configs["datasets"]["unimodal"]["poison_rag_plus"]["llm"] = "openai" if textual_modality.find("OpenAI") != -1 else "st" if textual_modality.find("ST") != -1 else "llama" # 'openai' | 'st' | 'llama'

# Popcorn-Visual Dataset (Full-Movies)
configs["datasets"]["multimodal"]["popcorn"]["cnns"] = ["incp3"] if visual_modality.find("Incp3") != -1 else ["vgg19"] # 'incp3' | 'vgg19'
configs["datasets"]["multimodal"]["popcorn"]["agg_embedding_sources"] = ["movie_trailers_agg"] if visual_modality.find("Trailer") != -1 else ["full_movies_agg"] if visual_modality.find("Full-Movie") != -1 else ["movie_shots_agg"] # 'full_movies_agg' | 'movie_shots_agg' | 'movie_trailers_agg'

# MMTF-14K Dataset (Full-Movies)
configs["datasets"]["multimodal"]["mmtf"]["visual_variant"] = "cnn" if visual_modality.find("(CNN)") != -1 else "avf" # 'avf' | 'cnn'
configs["datasets"]["multimodal"]["mmtf"]["audio_variant"] = "ivec" if audio_modality.find("i-vec") != -1 else "blf" # 'blf' | 'ivec'
configs["datasets"]["multimodal"]["mmtf"]["audio_blf_pca"] = 0.95

# Modalities
# [Supported]: 'audio_mmtf' | 'visual_mmtf' | 'text_rag_plus' | 'visual_popcorn'
configs["modalities"]['selected'] = []
if textual_modality != "None":
  configs["modalities"]['selected'].append("text_rag_plus")
if audio_modality != "None":
  configs["modalities"]['selected'].append("audio_mmtf")
if visual_modality != "None":
  if visual_modality.find("MMTF") != -1:
    configs["modalities"]['selected'].append("visual_mmtf")
  else:
    configs["modalities"]['selected'].append("visual_popcorn")

configs["modalities"]["fusion_methods"]['selected'] = ["concat"] # 'concat' | 'cca' | 'pca'
if fusion_methods.find("CCA") != -1:
  configs["modalities"]["fusion_methods"]['selected'].append("cca")
if fusion_methods.find("PCA") != -1:
  configs["modalities"]["fusion_methods"]['selected'].append("pca")

configs["modalities"]["fusion_methods"]['cca_components'] = cca_components
configs["modalities"]["fusion_methods"]['pca_variance'] = pca_components
configs["modalities"]["fusion_methods"]['cca_reg'] = 0.01 # CCA Lambda, can be modified
configs["modalities"]["fusion_methods"]['pca_reg'] = 0.01 # PCA Lambda, can be modified

# Setup
configs["setup"]['k_core'] = 10 # K-Core Filter, can be modified
configs["setup"]["n_epochs"] = n_epochs
configs["setup"]["use_gpu"] = False # Recommended to keep it False in Google Colab, due to torch conflicts
configs["setup"]["use_parallel"] = True
configs["setup"]["is_fast_prototype"] = True if n_epochs == 1 else False
configs["setup"]['split']['mode'] = split_method # 'random' | 'temporal' | 'per_user'
configs["setup"]['split']['test_ratio'] = test_ratio

# Recommendation
configs["setup"]["model_choice"] = recom_model # 'cf' | 'vbpr' | 'amr' | 'vmf'
configs["recommender"]["top_n"] = top_n # 2 | 5 | 10 | 15 | 20 | 25 | 30 | 50

### *2. The Rest of the Procedures*

In [33]:
from popcorn.recommenders.reclist import generateLists
from popcorn.optimizers.grid_search import gridSearch
from popcorn.recommenders.assembler import assembleModality

# Clear output path
cca = str(configs["modalities"]["fusion_methods"]['cca_components'])
lmbp = str(configs["modalities"]["fusion_methods"]['pca_reg']).replace(".", "")
lmbc = str(configs["modalities"]["fusion_methods"]['cca_reg']).replace(".", "")
pca = str(configs["modalities"]["fusion_methods"]['pca_variance']).replace(".", "")
configs["general"]["output_path"] = f"output_pca{pca}_plmb{lmbp}_cca{cca}_clmb{lmbc}"

# -------------------------------------------------
# Assemble Modalities and Get Train/Test/Modalities
# ------------------------------------------------
trainDF, testDF, trainSet, modalitiesDict, genreDict = assembleModality(configs)
print("- Modalities assembled successfully!")
print(f"- Training set size: {trainDF.shape}, Testing set size: {testDF.shape}")
print(f"- Available modalities: {list(modalitiesDict.keys())}")

# -----------------------------------
# Apply GridSearch and Get Parameters
# -----------------------------------
finalModels = gridSearch(configs, trainDF, trainSet, modalitiesDict)
print("\n‚úî Grid search completed successfully!")
print(f"- Final models after HPO: {list(finalModels.keys())}")

# -----------------------------
# Generate Recommendation Lists
# -----------------------------
generateLists(configs, trainDF, trainSet, testDF, genreDict, finalModels)


- Downloading the MovieLens-1m dataset ...
- Creating the download path '/content/MovieLens/ml-1m' ...
- Fetching data from 'https://files.grouplens.org/datasets/movielens/ml-1m.zip' ...
- Download completed and the dataset is saved as a 'zip' file!
- Extracting the dataset files inside '/content/MovieLens/ml-1m' ...
- Dataset extracted to '/content/MovieLens/ml-1m' successfully!
- Removing the zip file '/content/MovieLens/ml-1m/ml-1m.zip' ...
- Zip file removed successfully!

- Loading 'MovieLens-1m' data from '/content/MovieLens/ml-1m/ml-1m' ...
- Items (movies) have been loaded. Number of rows: 3,883
- Users have been loaded. Number of rows: 6,040
- Ratings have been loaded. Number of rows: 1,000,209

- Preparing to save the genres DataFrame in 'output_pca08_plmb001_cca40_clmb001' ...
- Genres DataFrame saved to 'output_pca08_plmb001_cca40_clmb001/item_genre_ml-1m.csv'!
- Genre dictionary loaded: 
{'1': ['Animation', "Children's", 'Comedy'], '2': ['Adventure', "Children's", 'Fantas

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Optimization finished!
-- Fitting '{'k': 32, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.6420 ...
Optimization finished!
-- Fitting '{'k': 32, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5554 ...
Optimization finished!
-- Fitting '{'k': 64, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5554 ...
Optimization finished!
-- Fitting '{'k': 64, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5520 ...
Optimization finished!
-- Fitting '{'k': 128, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.6420 ...
-- Found the best case: 'VBPR' in 'text', item '{'k': 32, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' (0.6420), done in 44.6s.
- Starting GridSearch procedure (seed: 42, useGPU: False, parallelHPO: True)...
-- HPO 'V

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Optimization finished!
-- Fitting '{'k': 32, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5023 ...
Optimization finished!
-- Fitting '{'k': 32, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.2407 ...
Optimization finished!
-- Fitting '{'k': 64, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5180 ...
Optimization finished!
-- Fitting '{'k': 64, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.4947 ...
Optimization finished!
-- Fitting '{'k': 128, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.6516 ...
-- Found the best case: 'VBPR' in 'visual', item '{'k': 128, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' (0.6516), done in 40.4s.
- Starting GridSearch procedure (seed: 42, useGPU: False, parallelHPO: True)...
-- HPO

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Optimization finished!
-- Fitting '{'k': 32, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.2840 ...
Optimization finished!
-- Fitting '{'k': 32, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5252 ...
Optimization finished!
-- Fitting '{'k': 64, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5023 ...
Optimization finished!
-- Fitting '{'k': 64, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5554 ...
Optimization finished!
-- Fitting '{'k': 128, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.2502 ...
-- Found the best case: 'VBPR' in 'cca_40', item '{'k': 64, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' (0.5554), done in 30.0s.
- Starting GridSearch procedure (seed: 42, useGPU: False, parallelHPO: True)...
-- HPO

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Epoch 1/1:   0%|          | 0/499 [00:00<?, ?it/s]

Optimization finished!
-- Fitting '{'k': 32, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.2407 ...
Optimization finished!
-- Fitting '{'k': 32, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.5213 ...
Optimization finished!
-- Fitting '{'k': 64, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.2574 ...
Optimization finished!
-- Fitting '{'k': 64, 'k2': 16, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.3147 ...
Optimization finished!
-- Fitting '{'k': 128, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' to get 0.2345 ...
-- Found the best case: 'VBPR' in 'pca_80', item '{'k': 32, 'k2': 8, 'learning_rate': 0.001, 'lambda_w': 0.01, 'lambda_b': 0.01, 'n_epochs': 1}' (0.5213), done in 27.2s.
- HPO done! Kept 4 configs.


Epoch 1/1:   0%|          | 0/623 [00:00<?, ?it/s]

Optimization finished!


Epoch 1/1:   0%|          | 0/623 [00:00<?, ?it/s]

Optimization finished!


Epoch 1/1:   0%|          | 0/623 [00:00<?, ?it/s]

Optimization finished!


Epoch 1/1:   0%|          | 0/623 [00:00<?, ?it/s]

Optimization finished!
- Re-fit finished for 'VBPR' with variant 'pca_80'.
- Grid search done! Kept 4 final models.

‚úî Grid search completed successfully!
- Final models after HPO: [('VBPR', 'text'), ('VBPR', 'visual'), ('VBPR', 'cca_40'), ('VBPR', 'pca_80')]
- Generating recommendation lists ...
- Recommendation lists saved have been saved in 'output_pca08_plmb001_cca40_clmb001/reclist_ml1m_vbpr_openai_raw_top2.csv'! Samples:
   user_id                                              train  \
0        1                  [608, 1193, 1961, 2028, 527, 150]   
1        2  [480, 1954, 356, 165, 1193, 2571, 2028, 589, 1...   
2        3                                       [1961, 2167]   

                   gt rec_VBPR_text  CR_VBPR_text  ND_VBPR_text  CB_VBPR_text  \
0              [1721]    [589, 318]           0.0           0.0      5.115571   
1  [1690, 3147, 3578]    [858, 608]           0.0           0.0      5.576981   
2         [480, 1266]   [2028, 589]           0.0           0.0