<a href="https://colab.research.google.com/github/RecSys-lab/movifex_dataset/blob/main/examples/load_movifex_visuals_aggregated.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **MoViFex Framework - Process `MoViFex` Dataset Aggregated Visual Features**

🎬 Dataset MoViFex Dataset: [link](https://huggingface.co/datasets/alitourani/MoViFex_Dataset/tree/main)

🎬 Framework: [link](https://github.com/RecSys-lab/MoViFex)


# [Step 1] - Load the Framework

Clone the framework into your `GDrive` and prepare it for experiments.

In [None]:
# Clone the repo
!git clone https://github.com/RecSys-lab/MoViFex.git

# Install the required library
%cd MoViFex
!pip install -e .

# Add the repository to the Python path
import sys
sys.path.append('/content/MoViFex')

Cloning into 'MoViFex'...
remote: Enumerating objects: 693, done.[K
remote: Counting objects: 100% (269/269), done.[K
remote: Compressing objects: 100% (185/185), done.[K
remote: Total 693 (delta 135), reused 202 (delta 78), pack-reused 424 (from 1)[K
Receiving objects: 100% (693/693), 204.58 KiB | 2.53 MiB/s, done.
Resolving deltas: 100% (352/352), done.
/content/MoViFex
Obtaining file:///content/MoViFex
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pytube>=15.0 (from MoViFex==1.0.0)
  Downloading pytube-15.0.0-py3-none-any.whl.metadata (5.0 kB)
Collecting scipy>=1.14.1 (from MoViFex==1.0.0)
  Downloading scipy-1.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Downloading pytube-15.0.0-py3-none-any.whl (57 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m2.5 MB/s[0m eta [3

# [Step 2] - Use the Framework 🚀

Import the framework and define some variables to work with it.

In [None]:
import os
import json
import movifex

# Similar to the `config.yml` file in the framework - section `datasets/visual_dataset/movifex`
configs = {
    "name": "MoViFex-visual",
    "path_metadata": "https://huggingface.co/datasets/alitourani/MoViFex_Dataset/resolve/main/stats.json",
    "path_raw": "https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/",
    "feature_sources": ["full_movies", "movie_shots", "movie_trailers"],
    "agg_feature_sources": ["full_movies_agg", "movie_shots_agg", "movie_trailers_agg"],
    "feature_models": ["incp3", "vgg19"],
    "aggregation_models": ["Max", "Mean"]
}

# Variables
datasetName = configs['name']
datasetRawFilesUrl = configs['path_raw']
featureModels = configs['feature_models']
featureSources = configs['feature_sources']
aggFeatureSources = configs['agg_feature_sources']

# Other variables
givenMovieId = 6
givenModel = featureModels[0]
givenAggFeatureSource = aggFeatureSources[0]

**Test I. Generating a Packet Address**

- ⚙️ Function: `packetAddressGenerator`

In [None]:
from movifex.datasets.movifex.helper_visualfeats_agg import fetchAggregatedFeatures

print(f"- Fetching aggregated features of the movie #{givenMovieId} (type: '{givenAggFeatureSource}', CNN: '{givenModel}') ...")
aggFeatures = fetchAggregatedFeatures(datasetRawFilesUrl, givenAggFeatureSource, givenModel, givenMovieId)
print(f"- The fetched aggregated features (list):\n {aggFeatures}")

- Fetching aggregated features of the movie #6 (type: 'full_movies_agg', CNN: 'incp3') ...
- The fetched aggregated features (list):
 [{'Max': [2.20092, 2.158851, 1.767559, 1.588628, 1.921421, 2.108246, 2.462245, 2.750559, 1.875503, 1.388787, 2.324629, 1.593678, 3.246272, 2.145216, 3.805069, 2.274425, 2.567641, 1.710589, 2.210374, 3.113759, 2.995106, 1.749763, 3.59445, 2.464502, 1.757488, 1.673122, 1.737465, 1.462482, 1.586424, 4.188165, 2.141298, 2.574938, 2.710921, 2.467631, 2.123704, 2.56505, 2.031943, 2.507104, 2.108845, 3.665492, 1.672623, 1.583344, 1.839345, 1.684337, 2.196849, 3.613509, 1.880924, 4.747502, 2.523645, 2.702798, 2.281293, 1.67635, 2.639477, 1.737069, 1.944016, 2.11997, 1.94974, 1.880355, 2.501177, 2.017897, 2.238228, 4.91618, 3.273538, 1.549966, 2.679488, 2.998206, 1.946417, 2.327894, 2.74513, 1.779103, 2.05818, 1.688125, 2.443502, 2.067371, 2.068713, 1.334956, 2.668175, 2.480188, 2.075522, 2.391256, 2.039112, 1.861594, 2.085006, 2.102111, 1.781374, 2.073073, 3.121

**Test II. Generating a Packet Address**

- ⚙️ Function: `packetAddressGenerator`

In [None]:
from movifex.datasets.movifex.helper_visualfeats_agg import generatedAggFeatureAddresses

print(f"- Generating a list of addresses of aggregated features ...")
aggFeatureAddresses = generatedAggFeatureAddresses(configs)
print(f"- Samples of the generated addresses: {aggFeatureAddresses['full_movies_agg']['incp3'][:2]}")

- Generating addresses for aggregated features ...
- Fetching URL from 'https://huggingface.co/datasets/alitourani/MoViFex_Dataset/resolve/main/stats.json' ...
- Fetching all movie IDs ...
- Found 274 movie IDs ...
- Generating a list of addresses to fetch the aggregated features ...
- Generated 1644 aggregated feature addresses, e.g., https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/full_movies_agg/incp3/0000000006.json ...
- Samples of the generated addresses: ['https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/full_movies_agg/incp3/0000000006.json', 'https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/full_movies_agg/incp3/0000000050.json']


In [None]:
from movifex.datasets.movifex.helper_visualfeats_agg import loadAggregatedFeaturesIntoDataFrame

print(f"- Loading a sample aggregated feature into a DataFrame (type: 'full_movies_agg', type: 'incp3') ...")
tmpVisualDFMax, tmpVisualDFMean = loadAggregatedFeaturesIntoDataFrame(aggFeatureAddresses['full_movies_agg']['incp3'])
print(f"- Loaded {len(tmpVisualDFMax)} records for 'Max' aggregated features! Check the first 3 records:")

print(f"\n- The loaded DataFrame (Max):")
print(tmpVisualDFMax.head(3))

print(f"\n- The loaded DataFrame (Mean):")
print(tmpVisualDFMean.head(3))

- Loading a sample aggregated feature into a DataFrame (type: 'full_movies_agg', type: 'incp3') ...
- Loading aggregated features (36%) ...
- Loading aggregated features (72%) ...
- Loaded 274 records for 'Max' aggregated features! Check the first 3 records:

- The loaded DataFrame (Max):
   itemId                                          embedding
0       6  2.20092,2.158851,1.767559,1.588628,1.921421,2....
1      50  2.608933,2.313115,1.61709,2.527633,1.283107,2....
2     111  2.064346,1.855269,1.985471,2.009896,1.377788,1...

- The loaded DataFrame (Mean):
   itemId                                          embedding
0       6  0.336484,0.330648,0.251412,0.326983,0.262982,0...
1      50  0.318389,0.303116,0.239968,0.279309,0.263945,0...
2     111  0.593401,0.312364,0.406187,0.336265,0.338023,0...
