<a href="https://colab.research.google.com/github/RecSys-lab/Popcorn/blob/main/examples/colab/load_movielens_genres.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **üçø Popcorn Framework in Google Colab**
### **Load MovieLens Genres (100K, 1M, 25m)**

üé¨ Popcorn Framework: [link](https://github.com/RecSys-lab/Popcorn)

## **[Step 1] Clone Popcorn Movie Recommender Tool**

Clone the framework into your `GDrive` and prepare it for experiments.

‚ö†Ô∏è You might see a *"Restart Session"* warning during the first run in Google Colab due to library version mismatches. This is expected! Accept the restart, re-run this cell, and continue!

In [1]:
# Clone the repo
!git clone https://github.com/RecSys-lab/Popcorn.git

# Install the required library
%cd Popcorn
!pip install -e .

# Add the repository to the Python path
import sys
sys.path.append('/content/Popcorn')

# Go back to the root
%cd ..

fatal: destination path 'Popcorn' already exists and is not an empty directory.
/content/Popcorn
Obtaining file:///content/Popcorn
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: Popcorn
  Attempting uninstall: Popcorn
    Found existing installation: Popcorn 1.6.0
    Uninstalling Popcorn-1.6.0:
      Successfully uninstalled Popcorn-1.6.0
  Running setup.py develop for Popcorn
Successfully installed Popcorn-1.6.0
/content


## üöÄ **[Step 2] Use the Framework**

### *1. Load Configurations and Imports*

In [2]:
import os
import json
import pandas as pd
from popcorn.utils import readConfigs

# Start the Framework
print("Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...\n")

# Read the configuration file
configs = readConfigs("Popcorn/popcorn/config/config.yml")
# If properly read, print the configurations
if not configs:
    print("Error reading the configuration file!")

# Override (optional)
configs["datasets"]["unimodal"]["movielens"]["version"] = "100k" # '100k' | '1m' | '25m'
configs["datasets"]["unimodal"]["movielens"]["download_path"] = "/content/MovieLens"

Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...

- Reading the framework's configuration file ...
- Configuration file loaded successfully!


### *2. Download MovieLens Dataset Variants*

In [3]:
from popcorn.datasets.movielens.downloader import downloadMovieLens

# Variables
mlVersion = configs["datasets"]["unimodal"]["movielens"]["version"]
downloadPath = configs["datasets"]["unimodal"]["movielens"]["download_path"]

# Download MovieLens dataset
downloadMovieLens(mlVersion, downloadPath)


- Downloading the MovieLens-100k dataset ...
- Creating the download path '/content/MovieLens/ml-100k' ...
- Fetching data from 'https://files.grouplens.org/datasets/movielens/ml-100k.zip' ...
- Download completed and the dataset is saved as a 'zip' file!
- Extracting the dataset files inside '/content/MovieLens/ml-100k' ...
- Dataset extracted to '/content/MovieLens/ml-100k' successfully!
- Removing the zip file '/content/MovieLens/ml-100k/ml-100k.zip' ...
- Zip file removed successfully!


True

### *3. Load MovieLens Dataset Movies*

In [4]:
from popcorn.datasets.movielens.loader import loadMovieLens
from popcorn.datasets.utils import printTextualDatasetStats

# Load MovieLens
itemsDF, usersDF, ratingsDF = loadMovieLens(configs)
if itemsDF is None:
    print("Error in loading the MovieLens dataset! Exiting ...")
else:
  print(f"\n- ItemsDF (shape: {itemsDF.shape}): \n{itemsDF.head()}")
  printTextualDatasetStats(ratingsDF)


- Downloading the MovieLens-100k dataset ...
- The download path '/content/MovieLens/ml-100k' already exists! Skipping the download ...

- Loading 'MovieLens-100k' data from '/content/MovieLens/ml-100k/ml-100k' ...
- Items (movies) have been loaded. Number of rows: 1,682
- Users have been loaded. Number of rows: 943
- Ratings have been loaded. Number of rows: 100,000

- ItemsDF (shape: (1682, 3)): 
  item_id              title                           genres
0       1   Toy Story (1995)  [Animation, Children's, Comedy]
1       2   GoldenEye (1995)    [Action, Adventure, Thriller]
2       3  Four Rooms (1995)                       [Thriller]
3       4  Get Shorty (1995)          [Action, Comedy, Drama]
4       5     Copycat (1995)         [Crime, Drama, Thriller]
--------------------------
- The Dataset Overview:
-- Total Interactions: 100000
-- |U|: 943
-- |I|: 1682
-- |R|/|U|: 106.0445
-- |R|/|I|: 59.4530
-- |R|/(|U|*|I|): 0.0630
--------------------------


### *4. Load Main and Unique Genres*

In [5]:
from popcorn.datasets.movielens.helper_genres import getMainGenres, getAllGenres

print("Getting main and all unique genres ...")
print(f"- Main Genres: {getMainGenres()}")
print(f"- All Genres: {getAllGenres()}")

Getting main and all unique genres ...
- Main Genres: ['Action', 'Comedy', 'Drama', 'Horror']
- All Genres: ['unknown', 'Action', 'Adventure', 'Animation', "Children's", 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']


### *5. Load Genres Dictionary and Save Its DataFrame*

In [7]:
from popcorn.datasets.movielens.helper_genres import getGenreDict

# Override (optional)
configs["general"]["output_path"] = "" # Empty (default)

print("Getting genre dictionary and saving genres DataFrame ...")

genreDict = getGenreDict(itemsDF, configs, saveOutput=True)
print(f"- Genre Dictionary (sample): \n{dict(list(genreDict.items())[:3])}")

Getting genre dictionary and saving genres DataFrame ...

- Preparing to save the genres DataFrame in 'outputs' ...
- Genres DataFrame saved to 'outputs/item_genre_ml-100k.csv'!
- Genre Dictionary (sample): 
{'1': ['Animation', "Children's", 'Comedy'], '2': ['Action', 'Adventure', 'Thriller'], '3': ['Thriller']}


### *6. Binarize Genres as a new DataFrame*

In [9]:
from popcorn.datasets.movielens.helper_genres import binarizeGenres

print("Binarizing genres as a new ItemsDF ...")
itemsDF_binGenre = binarizeGenres(itemsDF)
print(f"- ItemsDF with binarized genres: \n{itemsDF_binGenre.head(10)}")

Binarizing genres as a new ItemsDF ...
- ItemsDF with binarized genres: 
  item_id  isAction  isComedy  isDrama  isHorror
0       1         0         1        0         0
1       2         1         0        0         0
2       3         0         0        0         0
3       4         1         1        1         0
4       5         0         0        1         0
5       6         0         0        1         0
6       7         0         0        1         0
7       8         0         1        1         0
8       9         0         0        1         0
9      10         0         0        1         0


### *7. Augment Original ItemsDF with Binarized Genres*

In [11]:
from popcorn.datasets.movielens.helper_movies import augmentMoviesWithBinarizedGenres

print("Augmenting original ItemsDF with binarized genres ...")

itemsDF_augmented = augmentMoviesWithBinarizedGenres(itemsDF, itemsDF_binGenre)
print(f"- Augmented ItemsDF: \n{itemsDF_augmented.head(10)}")

Augmenting original ItemsDF with binarized genres ...
- Augmented ItemsDF: 
  item_id                                              title  \
0       1                                   Toy Story (1995)   
1       2                                   GoldenEye (1995)   
2       3                                  Four Rooms (1995)   
3       4                                  Get Shorty (1995)   
4       5                                     Copycat (1995)   
5       6  Shanghai Triad (Yao a yao yao dao waipo qiao) ...   
6       7                              Twelve Monkeys (1995)   
7       8                                        Babe (1995)   
8       9                            Dead Man Walking (1995)   
9      10                                 Richard III (1995)   

                            genres  isAction  isComedy  isDrama  isHorror  
0  [Animation, Children's, Comedy]         0         1        0         0  
1    [Action, Adventure, Thriller]         1         0        0    