<a href="https://colab.research.google.com/github/RecSys-lab/Popcorn/blob/main/examples/colab/modality_data_fusion_all_popcorn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **üçø Popcorn Framework in Google Colab**
### **Modality Fusion RAG-Plus (Text), MMTF-14K (Audio), and Popcorn (Visual)**

üé¨ Popcorn Framework: [link](https://github.com/RecSys-lab/Popcorn)

## **[Step 1] Clone Popcorn Movie Recommender Tool**

Clone the framework into your `GDrive` and prepare it for experiments.

‚ö†Ô∏è You might see a *"Restart Session"* warning during the first run in Google Colab due to library version mismatches. This is expected! Accept the restart, re-run this cell, and continue!

In [1]:
# Clone the repo
!git clone https://github.com/RecSys-lab/Popcorn.git

# Install the required library
%cd Popcorn
!pip install -e .

# Add the repository to the Python path
import sys
sys.path.append('/content/Popcorn')

# Go back to the root
%cd ..

fatal: destination path 'Popcorn' already exists and is not an empty directory.
/content/Popcorn
Obtaining file:///content/Popcorn
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: Popcorn
  Attempting uninstall: Popcorn
    Found existing installation: Popcorn 1.6.0
    Uninstalling Popcorn-1.6.0:
      Successfully uninstalled Popcorn-1.6.0
  Running setup.py develop for Popcorn
Successfully installed Popcorn-1.6.0
/content


## üöÄ **[Step 2] Use the Framework**

### *1. Load Configurations and Imports*

In [2]:
import os
import json
import pandas as pd
from popcorn.utils import readConfigs

# Start the Framework
print("Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...\n")

# Read the configuration file
configs = readConfigs("Popcorn/popcorn/config/config.yml")
# If properly read, print the configurations
if not configs:
    print("Error reading the configuration file!")

Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...

- Reading the framework's configuration file ...
- Configuration file loaded successfully!


### *2. Load Poison-RAG-Plus (Text)*

In [3]:
from popcorn.datasets.poison_rag_plus.loader import loadPoisonRagPlus

# Load Dataset
textDF = loadPoisonRagPlus(configs)

if textDF is not None:
  print(f"- textDF shape: {textDF.shape}")


- Preparing the 'Poison-RAG-Plus' dataset with 'llama'-driven enriched embeddings ...
-- Loading data from 'llama_enriched_description_part1.csv.gz' ...
-- Loading data from 'llama_enriched_description_part2.csv.gz' ...
-- Loading data from 'llama_enriched_description_part3.csv.gz' ...
-- Loading data from 'llama_enriched_description_part4.csv.gz' ...
-- Loading data from 'llama_enriched_description_part5.csv.gz' ...
- Finished loading 4 parts of textual enriched data with 1,606 items!
- textDF shape: (1606, 2)


### *3. Load MMTF-14K (Audio)*

In [5]:
from popcorn.datasets.mmtf14k.helper_audio import loadAudioFusedDF

# Configurations override (optional)
configs["datasets"]["multimodal"]["mmtf"]["audio_variant"] = "ivec" # 'blf' | 'ivec'

# Load Audio
audioDF = loadAudioFusedDF(configs)
if audioDF is not None:
  print(f"- audioDF shape: {audioDF.shape}")

audioDF.head(5)

- Fetching MMTF-14K audio data for variant 'ivec' ...
- Fetched 1,807 audio items using 'i-vector' features.
- audioDF shape: (1807, 2)


Unnamed: 0,item_id,audio
0,1500,"[-0.013718624, -0.0067672646, 0.010239853, 0.0..."
1,367,"[0.02349284, 0.013962358, 0.008394235, -0.0101..."
2,84152,"[-0.016936503, -0.020135608, -8.170488e-05, -0..."
3,3717,"[0.0849043, -0.0054050665, -0.008795045, 0.060..."
4,3238,"[0.011163599, 0.008414111, 0.0021271762, -0.03..."


### *4. Load Popcorn (Visual)*

In [6]:
from popcorn.utils import convertStrToListCol
from popcorn.datasets.popcorn.helper_embedding_agg import (
    loadAggEmbeddings,
    generateMovieAggEmbeddingUrl,
)

# Variables (modifiable)
urlList = []
cnn = "incp3" # 'incp3' | 'vgg19'
aggType = "full_movies_agg" # 'full_movies_agg' | 'movie_shots_agg' | 'movie_trailers_agg'
sampleCommonItemIds = ["1203", "1206", "4993", "2329"]

# Load aggregated features into a List, if they are in the sample common item IDs
for itemId in sampleCommonItemIds:
  url = generateMovieAggEmbeddingUrl(aggType, cnn, itemId)
  urlList.append(url)

# Get Max and Mean embeddings
dfAggEmbedsMax, dfAggEmbedsMean = loadAggEmbeddings(urlList)

# Pick only Max as visual embedding
visualDF = dfAggEmbedsMax

# Set visualDF
visualDF["visual"] = convertStrToListCol(visualDF, "visual")
print(
    f"- Loaded {len(dfAggEmbedsMax)} sample records of aggregated features ({visualDF.shape}!"
)

visualDF.head(5)

- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000001203.json' ...
- JSON data loaded successfully!
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000001206.json' ...
- JSON data loaded successfully!
- Loading aggregated features (50%) ...
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000004993.json' ...
- JSON data loaded successfully!
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000002329.json' ...
- JSON data loaded successfully!
- Loading aggregated features (100%) ...
- Loaded 4 sample records of aggregated features ((4, 2)!


Unnamed: 0,item_id,visual
0,1203,"[1.780925, 1.143189, 1.430709, 1.287412, 0.942..."
1,1206,"[3.349655, 2.344749, 4.013632, 1.785706, 2.177..."
2,4993,"[2.033981, 2.562521, 2.866126, 2.117148, 2.722..."
3,2329,"[2.441656, 1.727284, 1.52837, 1.703405, 1.9716..."


### *5. Fuse Poison-RAG-Plus (Text), MMTF-14K (Audio), and Popcorn (Visual) Datasets*

In [7]:
from popcorn.modalities.fuse_all import createMultimodalDF

# Modalities
modalitiesDict = {
  "text": textDF,
  "audio": audioDF,
  "visual": visualDF,
}

# Fuse
fusedDF, keep = createMultimodalDF(modalitiesDict)
if fusedDF is None:
  print("\n- [Error] Failed to create fused DataFrame!")

fusedDF.head(10)

- Creating multimodal DataFrame from unimodal DataFrames ...
- Found 4 common items across modalities ...
- Created a Fused DataFrame ((4, 4)) with modalities: ['text', 'audio', 'visual'] ...
  item_id                                               text  \
0    1203  [0.8071289, -0.9667969, 0.9116211, -0.30932617...   
1    4993  [-0.120666504, -0.6845703, -0.6201172, -0.4853...   
2    2329  [0.7705078, -1.0449219, 0.11956787, -0.3933105...   

                                               audio  \
0  [0.032335542, 0.0278966, -0.016505657, -0.0345...   
1  [0.013244278, -0.03695688, -0.0012119992, 0.03...   
2  [0.047993265, 0.023684833, -0.0042071952, -0.0...   

                                              visual  
0  [1.780925, 1.143189, 1.430709, 1.287412, 0.942...  
1  [2.033981, 2.562521, 2.866126, 2.117148, 2.722...  
2  [2.441656, 1.727284, 1.52837, 1.703405, 1.9716...  
- Final fused DataFrame has 4 items after combining all modalities ...


Unnamed: 0,item_id,text,audio,visual,all
0,1203,"[0.8071289, -0.9667969, 0.9116211, -0.30932617...","[0.032335542, 0.0278966, -0.016505657, -0.0345...","[1.780925, 1.143189, 1.430709, 1.287412, 0.942...","[0.80712890625, -0.966796875, 0.91162109375, -..."
1,4993,"[-0.120666504, -0.6845703, -0.6201172, -0.4853...","[0.013244278, -0.03695688, -0.0012119992, 0.03...","[2.033981, 2.562521, 2.866126, 2.117148, 2.722...","[-0.12066650390625, -0.6845703125, -0.62011718..."
2,2329,"[0.7705078, -1.0449219, 0.11956787, -0.3933105...","[0.047993265, 0.023684833, -0.0042071952, -0.0...","[2.441656, 1.727284, 1.52837, 1.703405, 1.9716...","[0.7705078125, -1.044921875, 0.11956787109375,..."
3,1206,"[0.50390625, -1.0478516, 0.6040039, 0.13806152...","[0.07518825, 0.006744052, -0.031470243, 0.0426...","[3.349655, 2.344749, 4.013632, 1.785706, 2.177...","[0.50390625, -1.0478515625, 0.60400390625, 0.1..."
