![video decomposition clustering](https://raw.githubusercontent.com/ARVEST-APP/ml-notebooks/refs/heads/main/docs/images/notebooks/video-decomposiiton-clustering.png)

In this notebook, we shall start learning the [Arvest](https://arvest.app/en) API and its [python package](https://github.com/ARVEST-APP/arvest-api) by taking some videos that we have stored online, decomposing them into images, and using k-means clustering to create some interactive interfaces that we can upload and use in Arvest. 

# 0. Setup

Let's begin by installing and importing all of the different components we will need.

In [None]:
print("Installing and importing packages...")

# Uninstall and reinstall packages for a clean environment
!pip uninstall -q -y arvestapi
!pip uninstall -q -y arvesttools
!pip uninstall -q -y jhutils
!pip uninstall -q -y iiif_prezi3
!pip uninstall -q -y dvt
!pip install -q --disable-pip-version-check git+https://github.com/ARVEST-APP/arvest-api.git
!pip install -q --disable-pip-version-check git+https://github.com/ARVEST-APP/arvest-api-tools.git
!pip install -q --disable-pip-version-check git+https://github.com/jdchart/jh-py-utils.git
!pip install -q --disable-pip-version-check git+https://github.com/iiif-prezi/iiif-prezi3.git
!pip install -q --disable-pip-version-check git+https://github.com/distant-viewing/dvt.git
!pip install -q --disable-pip-version-check opencv-python
!pip install -q --disable-pip-version-check scikit-learn
!pip install -q --disable-pip-version-check matplotlib

# Import packages
import arvestapi
import arvesttools.manifest_creation
from jhutils.local_files import read_json, write_json, get_image_info, collect_files
import jhutils.online_files
from jhutils.misc import print_progress_bar, slugify
import os
import dvt
import iiif_prezi3
import shutil
import requests
import json
import cv2
import numpy as np
from sklearn.manifold import TSNE
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from PIL import Image
import random
import mimetypes

TEMP_FOLDER = os.path.join(os.getcwd(), "_TEMP")
if os.path.isdir(TEMP_FOLDER) == False:
    os.makedirs(TEMP_FOLDER)

print("👍 Ready!")

Installing and importing packages...
👍 Ready!


### If you're following a workshop session...

If you're currently in a workshop - hello! 👋

You can run this cell to download all of the data into your colab session so that you don't have to run all of the time-consuming processes.

In [10]:
IM_IN_A_WORKSHOP = False

if IM_IN_A_WORKSHOP:
    # Download and unpack zip:
    jhutils.online_files.download_zip("", os.path.join(os.getcwd(), "data"))

## 1. Connect to Arvest

First, we need to "connect" to Arvest using the Arvest API package. For this, we need our user email and our password which we will give to an instance of the `arvestapi.Arvest()` class. For convenience, we've saved ours in a file which is why we get `LOGIN_DATA` by reading a json file.

In [11]:
# First, let's connect to our Arvest account:
LOGIN_DATA = os.path.join(os.getcwd(), "login_private.json")
credentials = read_json(LOGIN_DATA)

ar = arvestapi.Arvest(credentials["email"], credentials["password"])
print(f"👍 Succesfully connected to Arvest with \"{ar.profile.name}\"")

👍 Succesfully connected to Arvest with "Jacob"


# 2. Get videos
First we will need some sources to process! In this example, we shall be comparing different productions of Pina Bausch's [_Café Müller_](https://en.wikipedia.org/wiki/Caf%C3%A9_M%C3%BCller). We shall be comparing three videos which are found at the following URLs:

In [12]:
VIDEO_URLS = {
    "video_1" : "https://youtu.be/ONtu1t0h1gQ?si=yfKV38rF5RAmxW7L"
}

### Add the videos to Arvest:

We can first use the Arvest API to add these videos to our Arvest account using the `add_media()` function. Let's do that now!

The `add_media()` function returns a representation of the object in Arvest, which will allow us to modify things like it's `title`, `description` and `metadata`.

In [None]:
for video_name in VIDEO_URLS:
    
    # Create the media item in Arvest:
    added_media = ar.add_media(path = VIDEO_URLS[video_name])

    # Update the media item:
    added_media.update_title(video_name)
    added_media.update_description("A performance of Pina Bausch's Café Müller")

    # We can also update the item's metadata:
    item_metadata = added_media.get_metadata()
    item_metadata["identifier"] = f"&&WORKSHOP-API-CONTENT-{video_name}"
    added_media.update_metadata(item_metadata)

### Download the videos for processing

In order to process the videos we will need to be able to access them locally, which means that we will need to download them into our session. 

ℹ️ This step will be skipped if you downloaded the workshop elements, however you cans till run the cell.

In [None]:
LOCAL_VIDEO_PATHS = {}

if not IM_IN_A_WORKSHOP:
    for video_name in VIDEO_URLS:
        path = jhutils.online_files.download(VIDEO_URLS[video_name], dir = os.path.join(os.getcwd(), "data", "videos"))
        LOCAL_VIDEO_PATHS[video_name] = path
else:
    LOCAL_VIDEO_PATHS = {} # ADD THIS

print(f"👍 Downloaded videos")

## 3. Extract images

We're going to process one image for every 5 seconds of video in order to get a good idea about the visual composition of each video. To do this we'll first need to extract images from the videos.

ℹ️ This step will be skipped if you downloaded the workshop elements, however you cans till run the cell.

In [8]:
LOCAL_IMAGE_PATHS = {}
INTERVAL = 1

if not IM_IN_A_WORKSHOP:
    for i, video_name in enumerate(LOCAL_VIDEO_PATHS):
        print_progress_bar(i + 1, len(LOCAL_VIDEO_PATHS), f"(treating {video_name})...")

        # Output folder:
        out_folder = os.path.join(os.getcwd(), "data", "images", video_name)
        os.makedirs(out_folder, exist_ok=True)
        LOCAL_IMAGE_PATHS[video_name] = out_folder

        # Open video file with opencv 2 and get properties
        cap = cv2.VideoCapture(LOCAL_VIDEO_PATHS[video_name])
        fps = cap.get(cv2.CAP_PROP_FPS)
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        duration = total_frames / fps

        # For iteration:
        frame_interval = int(fps * INTERVAL)
        frame_num = 0
        saved_frame_count = 0

        while frame_num < total_frames:
            # Get frame
            cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
            ret, frame = cap.read()
            if not ret:
                break
            
            # Save image
            output_path = os.path.join(out_folder, f"frame_{saved_frame_count:04d}.jpg")
            cv2.imwrite(output_path, frame)

            saved_frame_count += 1
            frame_num += frame_interval

        cap.release()
else:
    LOCAL_IMAGE_PATHS = {} # ADD THIS

print("🏞️ Finished extracting images!")

|██████████████████████████████████████████████████| 100.0% Complete (treating video_1)...
🏞️ Finished extracting images!


# 4. Process embeddings
Next we can use the distant viewing toolkit to map the images within an embedding space. The first time you use dvt it will download the model onto your computer. We'll save the embedding data as a **numpy file** (`.npy`) so that we don't have to run this step again.

In [13]:
for i, video_name in enumerate(LOCAL_IMAGE_PATHS):
    corpus = collect_files(LOCAL_IMAGE_PATHS[video_name], ["jpg"])
    embedding_file = os.path.join(LOCAL_IMAGE_PATHS[video_name], "_embeddings.npy")

    # Instance of dvt AnnoEmbed class:
    embedder = dvt.AnnoEmbed()

    for i, image_file in enumerate(corpus):
        print_progress_bar(i + 1, len(corpus), f"Treating {os.path.basename(image_file)}")

        image_as_np = cv2.imread(image_file)
        image_as_np = cv2.cvtColor(image_as_np, cv2.COLOR_BGR2RGB)

        embedding = embedder.run(image_as_np)["embedding"]
        if i == 0:
            embedding_list = embedding
        else:
            embedding_list = np.vstack((embedding_list, embedding))

    print(embedding_list)
    np.save(embedding_file, embedding_list)

|██████████████████████████████████████████████████| 100.0% Complete Treating frame_0031.jpg
[[-0.03379547 -0.21241432 -0.11296684 ...  0.24943791 -0.13283801
   0.02285932]
 [ 0.11780826 -0.12091831 -0.14023873 ...  0.12563564 -0.1249859
  -0.01937349]
 [ 0.38491026 -0.08492123 -0.05013636 ...  1.1693214  -0.04501805
   0.2564414 ]
 ...
 [ 0.09809799 -0.11525958 -0.11219667 ...  0.08334374 -0.10009596
   0.02966112]
 [-0.12422692 -0.14223471 -0.13878013 ...  0.45014828 -0.11285977
   0.17033623]
 [-0.070604   -0.1521217  -0.15332255 ...  0.45926207 -0.07086968
   0.06665158]]


# 3. Dimensionality reduction and clustering
Now that we have our embedding data, we can use dimensionality reduction to crunch all of these dimensions down into 2 so that they can be projected into a 2-dimensional space. To do this, we'll use an one of the following dimensionality reduction algorithms: [T-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding), [PCA]() or [UMAP](). Note that we also do some pre- and post-processing, the full process is: standardisation -> dimensionality reduction -> normalization.

In [14]:
normalized = []

for i, video_name in enumerate(LOCAL_IMAGE_PATHS):
    corpus = collect_files(LOCAL_IMAGE_PATHS[video_name], ["jpg"])
    embedding_file = os.path.join(LOCAL_IMAGE_PATHS[video_name], "_embeddings.npy")

    embedding_list = np.load(embedding_file)
    standardized = StandardScaler().fit_transform(embedding_list)

    tsne = TSNE(n_components = 2, perplexity = 50, learning_rate=200, n_iter=5000)
    reduced = tsne.fit_transform(standardized)

    normalized.append(MinMaxScaler((0, 1)).fit_transform(reduced))

: 

If we like, we can visualize the data in a scatter plot:

In [None]:
TO_DISPLAY = 0

transposed = np.transpose(normalized[TO_DISPLAY])
plt.scatter(transposed[0], transposed[1])
plt.show()

## Clustering (optional)

Next we could perform some clustering on this data using [K-Means](https://en.wikipedia.org/wiki/K-means_clustering). We won't be using this data for our visualisation, but it is something that could potentially be useful.

In [None]:
NUM_CLUSTERS = 6

kmeans = KMeans(n_clusters = NUM_CLUSTERS, random_state = 0, n_init = "auto")
clusters = kmeans.fit(normalized).labels_

# Create a random colour map for visualisation:
colour_map = {}
used = []
for item in clusters:
    if item not in used:
        colour_map[str(item)] = (random.random(), random.random(), random.random())
        used.append(item)

We can visualize the clusters in a scatter plot like this:

In [None]:
transposed = np.transpose(normalized)
col = []
for item in clusters:
    col.append(colour_map[str(item)])

plt.scatter(transposed[0], transposed[1], c = col)
plt.show()

# 4. Export to Arvest
Finally, we shall export the results of our analysis to an image file, and create an annotated (and therefore interactive) IIIF Manifest that can be consulted in [Arvest](https://arvest.app/en). First, we shall create a high-res PNG file that projects the corresponding images into the 2D space of the dimensionality reduction. We shall also keep a track of the coordinates so that we can create our annotations later.

In [None]:
IMAGE_PATH = os.path.join(os.getcwd(), "visualization-image.png")
COORDINATES_PATH = os.path.join(os.getcwd(), "visualization-coordinates.json")

WIDTH = 5000
HEIGHT = 5000
PADDING = 100
IMAGE_ZOOM = 0.1

def scale(val, old_min, old_max, new_min, new_max):
    return new_min + (((val - old_min) * (new_max - new_min)) / (old_max - old_min))

# Function for adding each image to the main image:
def add_image(full_image, coordinates_list, image_url, coordinates):
  img_path = os.path.join(TEMP_FOLDER, os.path.basename(image_url))
  img_data = get_image_info(img_path)
  this_img = Image.open(img_path)

  w = int(img_data["width"] * IMAGE_ZOOM)
  h = int(img_data["height"] * IMAGE_ZOOM)
  x = int(scale(int(int(float(coordinates[0]) * WIDTH) - (w * 0.5)), 0, WIDTH, PADDING, WIDTH - (PADDING * 2)))
  y = int(scale(int(int(float(coordinates[1]) * HEIGHT) - (h * 0.5)), 0, HEIGHT, PADDING, HEIGHT - (PADDING * 2)))

  this_img = this_img.resize((w, h))
  full_image.paste(this_img, (x, y))

  coordinates_list.append({"url" : image_url, "x" : x, "y" : y, "w" : w, "h" : h})

# Initialize image and coordinates
full_image = Image.new('RGBA', (WIDTH, HEIGHT))
coordinates = {"images" : []}

# Add all of the images:
for i, item in enumerate(normalized):
  image_data = corpus[i]
  print_progress_bar(i, len(corpus) - 1, f"Treating {os.path.basename(image_data['url'])}")
  add_image(full_image, coordinates["images"], image_data['url'], item)

full_image.save(IMAGE_PATH)
write_json(COORDINATES_PATH, coordinates)

print("🎨 Image created!")

## Create Manifests
Now we need to create our Manifests. In order to make the main visualization Manifest truly interactive, we shall also make a _Manifest for each of the images in our corpus_. This must be done first, as we will need the URLs of these Manifests when creating our annotations.

First, we need to "connect" to Arvest using the Arvest API package. For this, we need our user email and our password which we will give to an instance of the `arvestapi.Arvest()` class. For convenience, we've saved ours in a file which is why we get `LOGIN_DATA` by reading a json file.

In [None]:
# First, let's connect to our Arvest account:
LOGIN_DATA = os.path.join(os.getcwd(), "login_private.json")
credentials = read_json(LOGIN_DATA)

ar = arvestapi.Arvest(credentials["email"], credentials["password"])
print(f"👍 Succesfully connected to Arvest with \"{ar.profile.name}\"")

Next, we'll create our Manifests using the [arvesttools](https://github.com/ARVEST-APP/arvest-api-tools) package's helper function `media_to_manifest()`. We'll create a Manifest for each file in our corpus, and keep a track of the URLs which are created in the `MANIFEST_DICT` variable:

In [None]:
MANIFEST_DICT = {}

for i, image_data in enumerate(corpus):
  print_progress_bar(i + 1, len(corpus), f"Creating a Manifest for {os.path.basename(image_data['url'])}")

  img_path = os.path.join(TEMP_FOLDER, os.path.basename(image_data['url']))
  img_filename = os.path.splitext(os.path.basename(image_data['url']))[0]
  img_data = get_image_info(img_path)

  # Create the iiif_prezi3.Manifest:
  manifest = arvesttools.manifest_creation.media_to_manifest(img_path)

  # Update the ID to be the online location of the image:
  manifest.items[0].items[0].items[0].body.id = image_data['url']

  # Save the Manifest to disk
  local_path = os.path.join(TEMP_FOLDER, f"{slugify(img_filename)}.json")
  write_json(local_path, manifest.dict())

  # And upload to Arvest:
  added_manifest = ar.add_manifest(path = local_path, update_id = True)
  added_manifest.update_title(f"{img_filename}")
  added_manifest.update_description("Local view of an image embedding projection.")
  
  manifest_metadata = added_manifest.get_metadata()
  manifest_metadata["creator"] = "Image embedding projection tutorial"
  manifest_metadata["identifier"] = "&&API-TUTORIAL-IMAGE-EMBEDDING"
  added_manifest.update_metadata(manifest_metadata)

  # Keep track of the urls that are created:
  MANIFEST_DICT[image_data['url']] = added_manifest.get_full_url()

print("👍 Finished!")

Finally, let's create the main visualization Manifest. First, we need to upload the image we created of the projection to Arvest. For this, we'll use the `add_media()` function.

In [None]:
added_media = ar.add_media(path = IMAGE_PATH)
added_media.update_title("Image collection projection")
added_media.update_description("A projection in 2D space of a collection of images.")

media_metadata = added_media.get_metadata()
media_metadata["creator"] = "Image embedding projection tutorial"
media_metadata["identifier"] = "&&API-TUTORIAL-IMAGE-EMBEDDING"
added_media.update_metadata(media_metadata)

print(f"👍 Media uploaded to Arvest at the following url: {added_media.get_full_url()}")

Now let's create the Manifest. Again, we'll use the `media_to_manifest()` function, which in this case can also accept an Arvest media item. 

Once we've created the Manifest, we can add annotations to render it interactive. We'll add an annotation for each of the Manifests  created earlier using the `add_textual_annotation()` function with the corresponding Manifest url and spatial position and dimensions:

In [None]:
# Create the Manifest:
manifest = arvesttools.manifest_creation.media_to_manifest(added_media)

# Add an annotation for each Manifest:
for item in coordinates["images"]:
    image_url = item["url"]
    manifest_url = MANIFEST_DICT[image_url]
    xywh = {"x" : item["x"], "y" : item["y"], "w" : item["w"], "h" : item["h"]}
    
    arvesttools.manifest_creation.add_textual_annotation(
        manifest,
        text_content = f"<p>{os.path.basename(image_url)}</p>",
        xywh = xywh,
        linked_manifest = manifest_url
    )

# Save to disk:
local_path = os.path.join(TEMP_FOLDER, "projection-manifest.json")
write_json(local_path, manifest.dict())

# And upload to Arvest:
added_manifest = ar.add_manifest(path = local_path, update_id = True)
added_manifest.update_title("Image embedding projection")
added_manifest.update_description("Projection of a collection of images in 2-D space.")

manifest_metadata = added_manifest.get_metadata()
manifest_metadata["creator"] = "Image embedding projection tutorial"
manifest_metadata["identifier"] = "&&API-TUTORIAL-IMAGE-EMBEDDING"
added_manifest.update_metadata(manifest_metadata)

print(f"👍 Manifest uploaded to Arvest at the following url: {added_manifest.get_preview_url()}")

# 5. Cleanup
To finish, lets clean up our mess! First, we can delete the temporary folder .

In [None]:
shutil.rmtree(TEMP_FOLDER)
os.remove(IMAGE_PATH)
os.remove(COORDINATES_PATH)
print(f"🗑️ {TEMP_FOLDER} removed !")

And finally, we can remove from Arvest all of our content. We can get all of our content by using the `get_manifests()` and `get_medias()` functions, then check the metadata. If it's one of the files we want to remove, we can then use the `remove()` function.

**⚠️ Warning: there's no going back after using the remove function, so be careful! To avoid accidential removal, we've added a `REMOVE` variable that need to be set to `True` for the code to run.**

In [None]:
REMOVE = True

if REMOVE:
    all_manifests = ar.get_manifests()
    count = 0
    print("Removing manifests...")

    for i, media_file in enumerate(all_manifests):
        print_progress_bar(i + 1, len(all_manifests), f"(Processing file {i + 1}/{len(all_manifests)})")
        media_metadata = media_file.get_metadata()
        if media_metadata["creator"] == "Image embedding projection tutorial" and media_metadata["identifier"] == "&&API-TUTORIAL-IMAGE-EMBEDDING":
            media_file.remove()
            count = count + 1

    all_media = ar.get_medias()
    print("Removing medias...")

    for i, media_file in enumerate(all_media):
        print_progress_bar(i + 1, len(all_media), f"(Processing file {i + 1}/{len(all_media)})")
        media_metadata = media_file.get_metadata()
        if media_metadata["creator"] == "Image embedding projection tutorial" and media_metadata["identifier"] == "&&API-TUTORIAL-IMAGE-EMBEDDING":
            media_file.remove()
            count = count + 1

    print(f"🗑️ Removed {count} items!")