<a href="https://colab.research.google.com/github/RecSys-lab/Popcorn/blob/main/examples/colab/experiment_kcore_ml-25m.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **üçø Popcorn Framework in Google Colab**
### **Apply K-Core Filtering - MovieLens 25M**

üé¨ Popcorn Framework: [link](https://github.com/RecSys-lab/Popcorn)

## **[Step 1] Clone Popcorn Movie Recommender Tool**

Clone the framework into your `GDrive` and prepare it for experiments.

‚ö†Ô∏è You might see a *"Restart Session"* warning during the first run in Google Colab due to library version mismatches. This is expected! Accept the restart, re-run this cell, and continue!

In [1]:
# Clone the repo
!git clone https://github.com/RecSys-lab/Popcorn.git

# Install the required library
%cd Popcorn
!pip install -e .

# Add the repository to the Python path
import sys
sys.path.append('/content/Popcorn')

# Go back to the root
%cd ..

fatal: destination path 'Popcorn' already exists and is not an empty directory.
/content/Popcorn
Obtaining file:///content/Popcorn
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: Popcorn
  Attempting uninstall: Popcorn
    Found existing installation: Popcorn 1.6.0
    Uninstalling Popcorn-1.6.0:
      Successfully uninstalled Popcorn-1.6.0
  Running setup.py develop for Popcorn
Successfully installed Popcorn-1.6.0
/content


## üöÄ **[Step 2] Use the Framework**

### *1. Load Configurations and Imports*

In [2]:
import os
import json
import pandas as pd
from popcorn.utils import readConfigs

# Start the Framework
print("Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...\n")

# Read the configuration file
configs = readConfigs("Popcorn/popcorn/config/config.yml")
# If properly read, print the configurations
if not configs:
    print("Error reading the configuration file!")

Welcome to 'Popcorn' üçø! Starting the framework for your movie recommendation ...

- Reading the framework's configuration file ...
- Configuration file loaded successfully!


### *2. Download the MovieLens Dataset*

In [3]:
from popcorn.datasets.movielens.downloader import downloadMovieLens

# Override (optional)
configs["datasets"]["unimodal"]["movielens"]["version"] = "25m" # '100k' | '1m' | '25m'
configs["datasets"]["unimodal"]["movielens"]["download_path"] = "/content/MovieLens"

# Variables
mlVersion = configs["datasets"]["unimodal"]["movielens"]["version"]
downloadPath = configs["datasets"]["unimodal"]["movielens"]["download_path"]

# Download MovieLens dataset
downloadMovieLens(mlVersion, downloadPath)


- Downloading the MovieLens-25m dataset ...
- Creating the download path '/content/MovieLens/ml-25m' ...
- Fetching data from 'https://files.grouplens.org/datasets/movielens/ml-25m.zip' ...
- Download completed and the dataset is saved as a 'zip' file!
- Extracting the dataset files inside '/content/MovieLens/ml-25m' ...
- Dataset extracted to '/content/MovieLens/ml-25m' successfully!
- Removing the zip file '/content/MovieLens/ml-25m/ml-25m.zip' ...
- Zip file removed successfully!


True

### *3. Load the MovieLens Dataset*

In [4]:
from popcorn.datasets.movielens.loader import loadMovieLens
from popcorn.datasets.utils import printTextualDatasetStats

# Load MovieLens
itemsDF, usersDF, ratingsDF = loadMovieLens(configs)
if itemsDF is None:
    print("Error in loading the MovieLens dataset! Exiting ...")
else:
  print(f"\n- ItemsDF (shape: {itemsDF.shape}): \n{itemsDF.head()}")
  print(f"\n- RatingsDF original row count: {len(ratingsDF):,}")
  printTextualDatasetStats(ratingsDF)


- Downloading the MovieLens-25m dataset ...
- The download path '/content/MovieLens/ml-25m' already exists! Skipping the download ...

- Loading 'MovieLens-25m' data from '/content/MovieLens/ml-25m/ml-25m' ...
- Items (movies) have been loaded. Number of rows: 62,423
- [Note] MovieLens-25M does not provide user metadata! Skipping user data loading ...
- Ratings have been loaded. Number of rows: 25,000,095

- ItemsDF (shape: (62423, 3)): 
  item_id                               title  \
0       1                    Toy Story (1995)   
1       2                      Jumanji (1995)   
2       3             Grumpier Old Men (1995)   
3       4            Waiting to Exhale (1995)   
4       5  Father of the Bride Part II (1995)   

                                              genres  
0  [Adventure, Animation, Children, Comedy, Fantasy]  
1                     [Adventure, Children, Fantasy]  
2                                  [Comedy, Romance]  
3                           [Comedy, Drama

### *4. Apply K-Core 10*

In [5]:
from popcorn.datasets.utils import applyKcore

# Variables
K_CORE = configs["setup"]["k_core"] # Originally accessible via this config

# Before filtering
print(f"- Before {K_CORE}-core filtering row count: {len(ratingsDF):,}")

# Override to K_CORE=10
K_CORE = 10

# Apply K-Core
ratingsDF_filtered = applyKcore(ratingsDF, K_CORE)
print(f"- After {K_CORE}-core filtering row count: {len(ratingsDF_filtered):,}")

- Before 10-core filtering row count: 25,000,095
- Applying 10-core filtering ...
- After 10-core filtering row count: 24,890,566


### *5. Apply K-Core 20*

In [6]:
from popcorn.datasets.utils import applyKcore

# Before filtering
print(f"- Before {K_CORE}-core filtering row count: {len(ratingsDF):,}")

# Override to K_CORE=20
K_CORE = 20

# Apply K-Core
ratingsDF_filtered = applyKcore(ratingsDF, K_CORE)
print(f"- After {K_CORE}-core filtering row count: {len(ratingsDF_filtered):,}")

- Before 10-core filtering row count: 25,000,095
- Applying 20-core filtering ...
- After 20-core filtering row count: 24,808,109


### *6. Apply K-Core 40*

In [7]:
from popcorn.datasets.utils import applyKcore

# Before filtering
print(f"- Before {K_CORE}-core filtering row count: {len(ratingsDF):,}")

# Override to K_CORE=40
K_CORE = 40

# Apply K-Core
ratingsDF_filtered = applyKcore(ratingsDF, K_CORE)
print(f"- After {K_CORE}-core filtering row count: {len(ratingsDF_filtered):,}")

- Before 20-core filtering row count: 25,000,095
- Applying 40-core filtering ...
- After 40-core filtering row count: 23,389,788


### *7. Apply K-Core 60*

In [8]:
from popcorn.datasets.utils import applyKcore

# Before filtering
print(f"- Before {K_CORE}-core filtering row count: {len(ratingsDF):,}")

# Override to K_CORE=60
K_CORE = 60

# Apply K-Core
ratingsDF_filtered = applyKcore(ratingsDF, K_CORE)
print(f"- After {K_CORE}-core filtering row count: {len(ratingsDF_filtered):,}")

- Before 40-core filtering row count: 25,000,095
- Applying 60-core filtering ...
- After 60-core filtering row count: 22,078,626
