<a href="https://colab.research.google.com/github/RecSys-lab/movifex_dataset/blob/main/examples/embeddings_processes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Reading the Dataset Instances**

🎬 Dataset: [link](https://huggingface.co/datasets/alitourani/MoViFex_Dataset)


In [None]:
# Importing libraries
import json
import random
import requests
from collections import Counter

# Variables
datasetPathRaw = "https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/"
datasetUrl = "https://huggingface.co/datasets/alitourani/MoViFex_Dataset/resolve/main"
datasetFolders = ["full_movies", "movie_shots", "movie_trailers"]
models = ["incp3", "vgg19"]

# Changable Variables
movieId = 6
packetId = 1

## I. Working with Packets

The `packets` contain a set of visual features stored in `[datasetFolders]/[models]/[movieId]` path. Each path can contain various packets like `packet0001.json` to `packet0009.json` that need to be downloaded and combined together in a dictionary.

In [None]:
# Generate packet addresses
def packetAddressGenerator(datasetUrl: str, gFeature: str, gModel: str, gMovieId, gPacketId):
  """
  Generates the address of a packet file based on the given parameters.

  Parameters:
      datasetUrl (str): The base URL of the dataset.
      gFeature (str): The feature type (e.g., "audio", "visual").
      gModel (str): The model used for feature extraction.
      gMovieId (int): The ID of the movie.
      gPacketId (int): The ID of the packet.

  Returns:
      packetAddress (str): The URL address of the
  """
  # Standardize variables
  gMovieId = f"{int(gMovieId):010d}"
  gPacketId = str(gPacketId).zfill(4)
  # Create address
  packetAddress = datasetUrl + f"{gFeature}/{gModel}/{gMovieId}/packet" + str(gPacketId).zfill(4) + ".json"
  return packetAddress

# Run
packetAddress = packetAddressGenerator(datasetPathRaw, datasetFolders[2],
                                       models[0], movieId, packetId)

print(f'A generated URL to access Packet#{packetId} of Movie with ID "{movieId}":')
print(packetAddress)

A generated URL to access Packet#1 of Movie with ID "6":
https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/movie_trailers/incp3/0000000006/packet0001.json


In [None]:
# Fetching JSON data from a given address
def loadJsonFromUrl(jsonUrl: str):
    """
    Load `json` data from a given URL and return it.

    Parameters:
        jsonUrl (str): The root address to load JSON data from.

    Returns:
        dict: The JSON data loaded from the URL.
    """
    try:
        # Load JSON data from the URL
        response = requests.get(jsonUrl)
        response.raise_for_status()  # Raise an error for bad status codes
        data = response.json()  # Parse JSON data'
        print("JSON data loaded successfully!\n")
        return data
    except requests.exceptions.RequestException as e:
        print(f"Error fetching data from {jsonUrl}: {e}\n")
        return None
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON data: {e}")
        return None

# Run
jsonData = loadJsonFromUrl(packetAddress)

# Print
print(f'Accessed features from the generated URL:')
print(jsonData)

JSON data loaded successfully!

Accessed features from the generated URL:
[{'frameId': 'frame0000000', 'features': [0.231173, 0.036643, 0.472224, 0.011103, 0.417098, 0.0, 2.010293, 0.13191, 0.0, 0.207225, 0.008761, 0.0213, 0.457245, 2.488863, 1.301304, 2.8569, 0.2144, 0.055559, 0.0, 1.032197, 0.0, 0.027317, 0.071072, 0.043609, 0.117077, 0.002889, 0.098192, 0.003712, 0.221458, 0.317585, 0.939607, 0.213456, 0.002987, 0.087288, 0.114077, 0.011182, 0.052336, 0.110334, 0.043944, 1.218956, 0.380524, 0.06979, 0.314063, 0.007497, 0.0, 0.683982, 0.019638, 0.611484, 0.017711, 0.720402, 0.0, 0.052615, 0.908476, 0.053268, 0.038771, 0.019052, 0.05914, 0.036392, 0.022864, 1.251429, 0.093743, 4.622915, 0.650956, 0.250475, 0.018524, 0.093694, 0.004467, 0.27943, 1.40802, 0.1203, 0.173363, 0.000163, 1.020205, 0.055078, 0.432004, 0.122776, 2.708302, 0.065505, 1.441574, 0.005215, 0.025978, 0.082456, 0.019946, 0.366754, 0.0, 0.043339, 0.153319, 0.126986, 0.007196, 0.002379, 0.0, 0.171208, 0.010258, 0.0071,

In [None]:
# Fetch all packet[x].json files existing for a movie and combine them together
def fetchAllPackets(datasetUrl: str, gFeature: str, gModel: str, gMovieId):
  """
  Fetches all packets of a movie from the dataset.

  Parameters:
      datasetUrl (str): The base URL of the dataset.
      gFeature (str): The feature type (e.g., "audio", "visual").
      gModel (str): The model used for feature extraction.
      gMovieId (int): The ID of the movie.

  Returns:
      moviePackets (list): A list of all packets of the movie.
  """
  # Variables
  counter = 0
  moviePackets = []
  # Loop over all possible files
  while True:
    counter += 1
    # Generate packet address
    packetAddress = packetAddressGenerator(datasetUrl, gFeature, gModel, gMovieId, counter)
    print(f'Generated packet address: {packetAddress}')
    # Fetch JSON data
    jsonData = loadJsonFromUrl(packetAddress)
    if jsonData:
      print(f'Fetched JSON from the address ...')
      moviePackets += jsonData
    else:
      print(f'No JSON data found at the address ...')
      break
  # Return
  return moviePackets

# Run
moviePackets = fetchAllPackets(datasetPathRaw, datasetFolders[2], models[0], movieId)

print(f'\nAll packets of Movie "{movieId}" (variant: {datasetFolders[2]}, model: {models[0]})')
print(moviePackets)

Generated packet address: https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/movie_trailers/incp3/0000000006/packet0001.json
JSON data loaded successfully!

Fetched JSON from the address ...
Generated packet address: https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/movie_trailers/incp3/0000000006/packet0002.json
JSON data loaded successfully!

Fetched JSON from the address ...
Generated packet address: https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/movie_trailers/incp3/0000000006/packet0003.json
JSON data loaded successfully!

Fetched JSON from the address ...
Generated packet address: https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/movie_trailers/incp3/0000000006/packet0004.json
JSON data loaded successfully!

Fetched JSON from the address ...
Generated packet address: https://huggingface.co/datasets/alitourani/MoViFex_Dataset/raw/main/movie_trailers/incp3/0000000006/packet0005.json
JSON data loaded successfu