<a href="https://colab.research.google.com/github/alitourani/Iranis-dataset/blob/master/examples/colab/load_popcorn_dataset_embedding_agg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **üçø Popcorn Framework in Google Colab**
### **Load Popcorn Dataset - Frame/Shot Aggregated Embedding Functions**

ü§ó Dataset in HF: [link](https://huggingface.co/datasets/alitourani/Popcorn_Dataset)

üåê Dataset Web-Page: [link](https://recsys-lab.github.io/popcorn_dataset/)

üé¨ Popcorn Framework: [link](https://github.com/RecSys-lab/Popcorn)

## **[Step 1] Clone Popcorn Movie Recommender Tool**

Clone the framework into your `GDrive` and prepare it for experiments.

‚ö†Ô∏è You might see a *"Restart Session"* warning during the first run in Google Colab due to library version mismatches. This is expected! Accept the restart, re-run this cell, and continue!

In [1]:
# Clone the repo
!git clone https://github.com/RecSys-lab/Popcorn.git

# Install the required library
%cd Popcorn
!pip install -e .

# Add the repository to the Python path
import sys
sys.path.append('/content/Popcorn')

# Go back to the root
%cd ..

fatal: destination path 'Popcorn' already exists and is not an empty directory.
/content/Popcorn
Obtaining file:///content/Popcorn
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: Popcorn
  Attempting uninstall: Popcorn
    Found existing installation: Popcorn 1.6.0
    Uninstalling Popcorn-1.6.0:
      Successfully uninstalled Popcorn-1.6.0
  Running setup.py develop for Popcorn
Successfully installed Popcorn-1.6.0
/content


## üöÄ **[Step 2] Use the Framework**

### *1. Load Configurations and Imports*

In [2]:
from popcorn.utils import readConfigs
from popcorn.datasets.popcorn.utils import RAW_DATA_URL
from popcorn.datasets.popcorn.helper_embedding_agg import (
    loadAggEmbeddings,
    fetchMovieAggEmbeddings,
    generateAllAggEmbeddingUrls,
    generateMovieAggEmbeddingUrl,
    generateMoviesAggEmbeddingUrls,
)

# Start the Framework
print("Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...\n")

# Read the configuration file
configs = readConfigs("Popcorn/popcorn/config/config.yml")
# If properly read, print the configurations
if not configs:
    print("Error reading the configuration file!")

Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...

- Reading the framework's configuration file ...
- Configuration file loaded successfully!


### *2. Override the Configurations (Optional)*

In [3]:
# Which CNNs to use? ('incp3' | 'vgg19')
cnns = configs["datasets"]["multimodal"]["popcorn"]["cnns"]

# What is the Dataset name?
datasetName = configs["datasets"]["multimodal"]["popcorn"]["name"]

# Which Agg Embedding Sources to Use? ('full_movies_agg' | 'movie_shots_agg' | 'movie_trailers_agg')
aggEmbeddings = configs["datasets"]["multimodal"]["popcorn"]["agg_embedding_sources"]

print(
    f"- Preparing to fetch the aggregated files of '{datasetName}' dataset from '{RAW_DATA_URL}' ..."
)

- Preparing to fetch the aggregated files of 'Popcorn-visual' dataset from 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/' ...


### *3. Test Generating Sample URLs to Aggregated Embeddings*

In [4]:
print(f"[Func-1] Generating a sample address to aggregated features ...")
givenMovieId = 6
givenCnn, givenEmbedding = cnns[0], aggEmbeddings[0]
aggEmbeddingUrl = generateMovieAggEmbeddingUrl(givenEmbedding, givenCnn, givenMovieId)
print(
    f"- URL for aggregated features of movie '#{givenMovieId}' extracted by CNN '{givenCnn}' from source '{givenEmbedding}': {aggEmbeddingUrl}"
)

[Func-1] Generating a sample address to aggregated features ...
- URL for aggregated features of movie '#6' extracted by CNN 'incp3' from source 'full_movies_agg': https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000006.json


### *4. Test Fetching Aggregated Features of a Movie*

In [7]:
print(f"[Func-2] Fetching aggregated features of a movie ...")
aggEmbeddingList = fetchMovieAggEmbeddings(givenEmbedding, givenCnn, givenMovieId)
print(f"- Fetched {len(aggEmbeddingList)} aggregated features! Sample embeddings:")
print(f"-- 'Max': {aggEmbeddingList[0]['Max'][:5]}")
print(f"-- 'Mean': {aggEmbeddingList[0]['Mean'][:5]}")

#
givenMovieIds = [6, 50]
aggEmbeddings = ["full_movies_agg", "movie_trailers_agg"]
print(f"[Func-3] Generating all addresses to aggregated features for movies {givenMovieIds} extracted by CNNs {cnns} from sources {aggEmbeddings} ...")
allAggEmbeddingUrls = generateMoviesAggEmbeddingUrls(aggEmbeddings, cnns, givenMovieIds)
print(f"- Generated {len(allAggEmbeddingUrls)} variants of aggregated feature addresses!")
print(f"-- Keys: {list(allAggEmbeddingUrls.keys())}")
print(f"-- Sample addresses for '{aggEmbeddings[0]}' and '{cnns[0]}': {allAggEmbeddingUrls[aggEmbeddings[0]][cnns[0]]}")

[Func-2] Fetching aggregated features of a movie ...
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000006.json' ...
- JSON data loaded successfully!
- Fetched 1 aggregated features! Sample embeddings:
-- 'Max': [2.20092, 2.158851, 1.767559, 1.588628, 1.921421]
-- 'Mean': [0.336484, 0.330648, 0.251412, 0.326983, 0.262982]
[Func-3] Generating all addresses to aggregated features for movies [6, 50] extracted by CNNs ['incp3'] from sources ['full_movies_agg', 'movie_trailers_agg'] ...
- Generated 2 variants of aggregated feature addresses!
-- Keys: ['full_movies_agg', 'movie_trailers_agg']
-- Sample addresses for 'full_movies_agg' and 'incp3': ['https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000006.json', 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000050.json']


### *5. Test Generating All URLs of Aggregated Embeddings*

In [8]:
print(f"[Func-4] Generating all addresses to aggregated features based on the configuration file ...")
aggEmbeddingUrlDict = generateAllAggEmbeddingUrls(configs)
if not aggEmbeddingUrlDict:
    print("- Error in generating the addresses! Continuing ...")
else:
    print(f"- Generated {len(aggEmbeddingUrlDict)} variants of aggregated feature addresses!")
    print(f"- Sample addresses for '{aggEmbeddings[0]}' and '{cnns[0]}': {aggEmbeddingUrlDict[aggEmbeddings[0]][cnns[0]][:2]}")

[Func-4] Generating all addresses to aggregated features based on the configuration file ...
- Fetching URL from 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/resolve/main/stats.json' ...
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/resolve/main/stats.json' ...
- JSON data loaded successfully!
- Fetching all movie IDs ...
- Generating a list of addresses to fetch the aggregated features of all movies (CNNs: ['incp3'], Embeddings: ['full_movies_agg']) ...
- Generated 274 aggregated feature addresses, e.g., https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000006.json ...
- Generated 1 variants of aggregated feature addresses!
- Sample addresses for 'full_movies_agg' and 'incp3': ['https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000006.json', 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/

### *6. Test Loading Aggregated Features into a DataFrame*

In [9]:
print(f"[Func-5] Loading aggregated features into a DataFrame ...")
if not aggEmbeddingUrlDict:
    print("- Error in loading the aggregated features! Skipping ...")
else:
    # Take a few samples of the generated addresses
    aggEmbeddingUrlList = aggEmbeddingUrlDict[aggEmbeddings[0]][cnns[0]][:5]
    # Load aggregated features into a DataFrame
    dfAggEmbedsMax, dfAggEmbedsMean = loadAggEmbeddings(aggEmbeddingUrlList)
    print(f"- Loaded {len(dfAggEmbedsMax)} sample records of aggregated features! Check the first 3 records:")
    print(f"- The loaded DataFrame (Max):\n{dfAggEmbedsMax.head(3)}")
    print(f"- The loaded DataFrame (Mean):\n{dfAggEmbedsMean.head(3)}")

[Func-5] Loading aggregated features into a DataFrame ...
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000006.json' ...
- JSON data loaded successfully!
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000050.json' ...
- JSON data loaded successfully!
- Loading aggregated features (40%) ...
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000111.json' ...
- JSON data loaded successfully!
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/incp3/0000000150.json' ...
- JSON data loaded successfully!
- Loading aggregated features (80%) ...
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/raw/main/full_movies_agg/in