<a href="https://colab.research.google.com/github/RecSys-lab/popcorn_dataset/blob/main/examples/load_popcorn_metadata.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **üçø Popcorn Framework in Google Colab**
### **Load Popcorn Dataset - Metadata-driven Functions**

ü§ó Dataset in HF: [link](https://huggingface.co/datasets/alitourani/Popcorn_Dataset)

üåê Dataset Web-Page: [link](https://recsys-lab.github.io/popcorn_dataset/)

üé¨ Popcorn Framework: [link](https://github.com/RecSys-lab/Popcorn)

## **[Step 1] Clone Popcorn Movie Recommender Tool**

Clone the framework into your `GDrive` and prepare it for experiments.

‚ö†Ô∏è You might see a *"Restart Session"* warning during the first run in Google Colab due to library version mismatches. This is expected! Accept the restart, re-run this cell, and continue!

In [1]:
# Clone the repo
!git clone https://github.com/RecSys-lab/Popcorn.git

# Install the required library
%cd Popcorn
!pip install -e .

# Add the repository to the Python path
import sys
sys.path.append('/content/Popcorn')

# Go back to the root
%cd ..

fatal: destination path 'Popcorn' already exists and is not an empty directory.
/content/Popcorn
Obtaining file:///content/Popcorn
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: Popcorn
  Attempting uninstall: Popcorn
    Found existing installation: Popcorn 1.6.0
    Uninstalling Popcorn-1.6.0:
      Successfully uninstalled Popcorn-1.6.0
  Running setup.py develop for Popcorn
Successfully installed Popcorn-1.6.0
/content


## üöÄ **[Step 2] Use the Framework**

### *1. Load Configurations and Imports*

In [3]:
import os
import json
import pandas as pd
from popcorn.utils import readConfigs, loadJsonFromUrl
from popcorn.datasets.popcorn.utils import METADATA_URL
from popcorn.datasets.popcorn.helper_metadata import (
    countMovies,
    fetchMovieById,
    fetchAllMovieIds,
    fetchRandomMovie,
    fetchRandomMovies,
    fetchMoviesByGenre,
    getAvgGenrePerMovie,
    fetchYearsOccurrences,
    fetchGenresOccurrences,
)

# Start the Framework
print("Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...\n")

# Read the configuration file
configs = readConfigs("Popcorn/popcorn/config/config.yml")
# If properly read, print the configurations
if not configs:
    print("Error reading the configuration file!")

Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...

- Reading the framework's configuration file ...
- Configuration file loaded successfully!


### *2. Load the Dataset Metadata File*

In [4]:
# Load Popcorn Dataset metadata
datasetName = configs["datasets"]["multimodal"]["popcorn"]["name"]
print(f"- Loading the '{datasetName}' dataset metadata from '{METADATA_URL}' ...")
jsonData = loadJsonFromUrl(METADATA_URL)
if jsonData is None:
    print("- Error in loading the Popcorn dataset metadata! Exiting ...")

- Loading the 'Popcorn-visual' dataset metadata from 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/resolve/main/stats.json' ...
- Loading JSON data from the given URL 'https://huggingface.co/datasets/alitourani/Popcorn_Dataset/resolve/main/stats.json' ...
- JSON data loaded successfully!


### *3. Count the Number of Movies in the Dataset*

In [5]:
print("[Func-1] Counting the number of movies in the dataset ...")
moviesCount = countMovies(jsonData)
if moviesCount == -1:
    print("- Error in counting the number of movies!")
else:
    print(f"- Number of movies in the dataset (from metadata): {moviesCount}")

[Func-1] Counting the number of movies in the dataset ...
- Number of movies in the dataset (from metadata): 274


### *4. Fetch All Movie-IDs in the Dataset*

In [7]:
print("[Func-2] Fetching all movie IDs in the dataset ...")
movieIds = fetchAllMovieIds(jsonData)
if not movieIds:
    print("- Error in fetching movie IDs!")
else:
    print(
        f"- {len(movieIds)} Movie-IDs have been fetched successfully. Sample IDs: {movieIds[:5]}"
    )

[Func-2] Fetching all movie IDs in the dataset ...
- 274 Movie-IDs have been fetched successfully. Sample IDs: ['0000000006', '0000000050', '0000000111', '0000000150', '0000000165']


### *5. Fetch A / Some Random Movie(s) from the Dataset*

In [8]:
# One Movie
print("[Func-3a] Fetching a random movie from the dataset ...")
randomMovie = fetchRandomMovie(jsonData)
if not randomMovie:
    print("- Error in fetching a random movie!")
else:
    print(f"- Random movie fetched successfully: {randomMovie}")

# Mltiple Movies
randomMovieCount = 6
print(f"\n[Func-3b] Fetching {randomMovieCount} random movies from the dataset ...")
randomMovies = fetchRandomMovies(jsonData, randomMovieCount)
if not randomMovies:
    print("- Error in fetching random movies!")
else:
    print(f"- Random movies fetched successfully: {randomMovies}")

[Func-3a] Fetching a random movie from the dataset ...
- Random movie fetched successfully: {'id': '0000006863', 'title': 'School of Rock', 'year': 2003, 'genres': ['Comedy', 'Musical']}

[Func-3b] Fetching 6 random movies from the dataset ...
- Random movies fetched successfully: [{'id': '0000000912', 'title': 'Casablanca', 'year': 1942, 'genres': ['Drama', 'Romance']}, {'id': '0000097752', 'title': 'Cloud Atlas', 'year': 2012, 'genres': ['Drama', 'Sci-Fi', 'IMAX']}, {'id': '0000152061', 'title': 'Triple 9', 'year': 2016, 'genres': ['Crime', 'Drama']}, {'id': '0000072731', 'title': 'Lovely Bones, The', 'year': 2009, 'genres': ['Crime', 'Drama', 'Fantasy', 'Horror', 'Thriller']}, {'id': '0000158966', 'title': 'Captain Fantastic', 'year': 2016, 'genres': ['Drama']}, {'id': '0000197203', 'title': 'Triple Frontier', 'year': 2019, 'genres': ['Crime', 'Thriller']}]


### *6. Fetch a Movie by a Given Movie-ID*

In [9]:
# Successful Fetch (Existing Movie-ID)
print("[Func-4] Fetching a movie by a given ID ...")
givenMovieId = 6
movieById = fetchMovieById(jsonData, givenMovieId)
if not movieById:
    print(f"- Error in fetching movie by ID '{givenMovieId}'!")
else:
    print(f"- Movie fetched successfully by ID '{givenMovieId}': {movieById}")

# Unsuccessful Fetch (Non-Existing Movie-ID)
unsuccessfulMovieId = 999999
movieById = fetchMovieById(jsonData, unsuccessfulMovieId)
if not movieById:
    print(f"- Error in fetching movie by ID '{unsuccessfulMovieId}'!")
else:
    print(
        f"- Movie fetched successfully by ID '{unsuccessfulMovieId}': {movieById}"
    )

[Func-4] Fetching a movie by a given ID ...
- Movie fetched successfully by ID '6': {'id': '0000000006', 'title': 'Heat', 'year': 1995, 'genres': ['Action', 'Crime', 'Thriller']}
- [Warn] No movie found with the given ID '0000999999'. Returning an empty dictionary as movie ...
- Error in fetching movie by ID '999999'!


### *7. Fetch Movies by a Given Genre*

In [11]:
print("[Func-5] Fetching movies by a given genre ...")
givenGenre = "Romance"
moviesByGenre = fetchMoviesByGenre(jsonData, givenGenre)
if not moviesByGenre:
    print(f"- Error in fetching movies by genre '{givenGenre}'!")
else:
    print(
        f"- Sample movies fetched successfully by genre '{givenGenre}': {list(moviesByGenre.values())[:5]}"
    )

[Func-5] Fetching movies by a given genre ...
- Sample movies fetched successfully by genre 'Romance': [{'id': '0000000266', 'title': 'Legends of the Fall', 'year': 1994, 'genres': ['Drama', 'Romance', 'War', 'Western']}, {'id': '0000000356', 'title': 'Forrest Gump', 'year': 1994, 'genres': ['Comedy', 'Drama', 'Romance', 'War']}, {'id': '0000000912', 'title': 'Casablanca', 'year': 1942, 'genres': ['Drama', 'Romance']}, {'id': '0000001339', 'title': "Dracula (Bram Stoker's Dracula)", 'year': 1992, 'genres': ['Fantasy', 'Horror', 'Romance', 'Thriller']}, {'id': '0000001704', 'title': 'Good Will Hunting', 'year': 1997, 'genres': ['Drama', 'Romance']}]


### *8. Classify Release Years by Count*

In [12]:
print("[Func-6] Classifying years by count ...")
yearsFreq = fetchYearsOccurrences(jsonData)
if not yearsFreq:
    print("- Error in classifying years!")
else:
    print(f"- Movies per year (based on metadata): {yearsFreq}")

[Func-6] Classifying years by count ...
- Movies per year (based on metadata): {1995: 4, 1976: 2, 1994: 7, 1993: 3, 1990: 4, 1991: 1, 1996: 2, 1972: 1, 1954: 1, 1942: 1, 1988: 2, 1992: 4, 1975: 2, 1986: 2, 1966: 1, 1957: 1, 1971: 1, 1979: 2, 1974: 2, 1984: 2, 1997: 7, 1973: 3, 1989: 1, 1998: 3, 1982: 1, 1985: 1, 1999: 4, 2000: 5, 1987: 1, 2001: 4, 1983: 2, 2002: 9, 2003: 10, 1978: 1, 2004: 7, 2005: 3, 2006: 4, 2007: 5, 2008: 9, 2009: 5, 2010: 10, 2011: 9, 2012: 14, 2013: 14, 2014: 22, 2015: 20, 2016: 17, 2018: 13, 2019: 10, 2017: 15}


### *9. Classify Movies by Genre*

In [13]:
print("[Func-7] Classifying movies by genre ...")
genresFreq = fetchGenresOccurrences(jsonData)
if not genresFreq:
    print("- Error in classifying movies by genre!")
else:
    print(f"- Movies per genre (based on metadata): {genresFreq}")

[Func-7] Classifying movies by genre ...
- Movies per genre (based on metadata): {'Action': 142, 'Crime': 80, 'Thriller': 119, 'Mystery': 26, 'Drama': 129, 'Adventure': 56, 'IMAX': 24, 'Comedy': 29, 'Romance': 18, 'War': 18, 'Western': 10, 'Sci-Fi': 57, 'Children': 1, 'Horror': 39, 'Film-Noir': 1, 'Fantasy': 21, 'Musical': 1}


### *10. Calculate the Average Genre per Movie*

In [14]:
print("[Func-8] Calculating the average genre per movie ...")
avgGenres = getAvgGenrePerMovie(jsonData)
if avgGenres == 0.0:
    print("- Error in calculating the average genre per movie!")
else:
    print(f"- Average genre per movie (based on metadata): {avgGenres}")

[Func-8] Calculating the average genre per movie ...
- Average genre per movie (based on metadata): 2.814
