<a href="https://colab.research.google.com/github/RecSys-lab/MoViFex/blob/main/examples/download_trailers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **MoViFex Framework - Trailer Downloader for `MoViFex` Dataset**

🎬 Dataset MoViFex Dataset: [link](https://huggingface.co/datasets/alitourani/MoViFex_Dataset/tree/main)

🎬 Framework: [link](https://github.com/RecSys-lab/MoViFex)


# [Step 1] - Load the Framework

Clone the framework into your `GDrive` and prepare it for experiments.

In [1]:
# Clone the repo
!git clone https://github.com/RecSys-lab/MoViFex.git

# Install the required library
%cd MoViFex
!pip install -e .

# Add the repository to the Python path
import sys
sys.path.append('/content/MoViFex')

Cloning into 'MoViFex'...
remote: Enumerating objects: 710, done.[K
remote: Counting objects: 100% (286/286), done.[K
remote: Compressing objects: 100% (195/195), done.[K
remote: Total 710 (delta 144), reused 213 (delta 85), pack-reused 424 (from 1)[K
Receiving objects: 100% (710/710), 227.83 KiB | 4.65 MiB/s, done.
Resolving deltas: 100% (361/361), done.
/content/MoViFex
Obtaining file:///content/MoViFex
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pytube>=15.0 (from MoViFex==1.0.0)
  Downloading pytube-15.0.0-py3-none-any.whl.metadata (5.0 kB)
Collecting scipy>=1.14.1 (from MoViFex==1.0.0)
  Downloading scipy-1.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Downloading pytube-15.0.0-py3-none-any.whl (57 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m1.9 MB/s[0m eta [3

# [Step 2] - Use the Framework 🚀

Import the framework and define some variables to work with it.

In [2]:
import os
import json
import movifex

# Similar to the `config.yml` file in the framework - section `datasets/visual_dataset/movifex`
configs = {
    "pipelines": {
        "movie_trailers": {
            "name": "Trailer-fetcher",
            "download_path": "/"
        }
    },
    "datasets": {
        "visual_dataset":{
            "movifex": {
                "name": "MoViFex-visual",
                "path_metadata": "https://huggingface.co/datasets/alitourani/MoViFex_Dataset/resolve/main/stats.json"
            }
        }
    }
}

# Variables
cfgPipeline = configs['pipelines']['movie_trailers']
cfgDatasets = configs['datasets']['visual_dataset']['movifex']

# More Variables
datasetName = cfgDatasets['name']
jsonFilePath = cfgDatasets['path_metadata']

**Test I. Loading the Corresponding Dataset Metadata**

- ⚙️ Function: `loadJsonFromUrl`

In [3]:
from movifex.utils import loadJsonFromUrl

print(f"- Fetching the list of movies from '{datasetName}' dataset ...")
jsonData = loadJsonFromUrl(jsonFilePath)

print(jsonData)

- Fetching the list of movies from 'MoViFex-visual' dataset ...
[{'id': '0000000006', 'title': 'Heat', 'year': 1995, 'genres': ['Action', 'Crime', 'Thriller']}, {'id': '0000000050', 'title': 'Usual Suspects, The', 'year': 1995, 'genres': ['Crime', 'Mystery', 'Thriller']}, {'id': '0000000111', 'title': 'Taxi Driver', 'year': 1976, 'genres': ['Crime', 'Drama', 'Thriller']}, {'id': '0000000150', 'title': 'Apollo 13', 'year': 1995, 'genres': ['Adventure', 'Drama', 'IMAX']}, {'id': '0000000165', 'title': 'Die Hard: With a Vengeance', 'year': 1995, 'genres': ['Action', 'Crime', 'Thriller']}, {'id': '0000000231', 'title': 'Dumb & Dumber (Dumb and Dumber)', 'year': 1994, 'genres': ['Adventure', 'Comedy']}, {'id': '0000000266', 'title': 'Legends of the Fall', 'year': 1994, 'genres': ['Drama', 'Romance', 'War', 'Western']}, {'id': '0000000293', 'title': 'LÃ©on: The Professional (a.k.a. The Professional) (LÃ©on)', 'year': 1994, 'genres': ['Action', 'Crime', 'Drama', 'Thriller']}, {'id': '00000002

**Test II. Prepare a List of Proper Information for Trailer Finding**

- ⚙️ Function: `filterMovieList`

In [6]:
from movifex.pipelines.downloaders.utils import filterMovieList

print(f"- Preparing data to make proper queries for movie finding ...")
filteredMovies = filterMovieList(jsonData)

print(filteredMovies)

- Preparing data to make proper queries for movie finding ...
- Prepared 274 movies data from the JSON data to query YouTube ...

[{'id': '0000000006', 'title': 'Heat', 'year': 1995}, {'id': '0000000050', 'title': 'Usual Suspects, The', 'year': 1995}, {'id': '0000000111', 'title': 'Taxi Driver', 'year': 1976}, {'id': '0000000150', 'title': 'Apollo 13', 'year': 1995}, {'id': '0000000165', 'title': 'Die Hard: With a Vengeance', 'year': 1995}, {'id': '0000000231', 'title': 'Dumb & Dumber (Dumb and Dumber)', 'year': 1994}, {'id': '0000000266', 'title': 'Legends of the Fall', 'year': 1994}, {'id': '0000000293', 'title': 'LÃ©on: The Professional (a.k.a. The Professional) (LÃ©on)', 'year': 1994}, {'id': '0000000296', 'title': 'Pulp Fiction', 'year': 1994}, {'id': '0000000318', 'title': 'Shawshank Redemption, The', 'year': 1994}, {'id': '0000000356', 'title': 'Forrest Gump', 'year': 1994}, {'id': '0000000420', 'title': 'Beverly Hills Cop III', 'year': 1994}, {'id': '0000000480', 'title': 'Jura

**Test III. Search and Download the Trailers**

- ⚙️ Function: `downloadMovieTrailers`

In [9]:
from movifex.pipelines.downloaders.movieTrailerDownloader import downloadMovieTrailers

# Sampling (optional)
filteredMovies = filteredMovies[:3]

downloadMovieTrailers(cfgPipeline, filteredMovies)

- Running the Trailer-fetcher ...
-- Generating the YouTube links for the given movies ...




-- Generated 3 YouTube links for the movies!

- Downloading the results in / ...
- Downloading the trailer of 'Heat' from https://youtube.com/watch?v=14oNcFxiVaQ ...


HTTPError: HTTP Error 403: Forbidden