![Image embedding projection](https://raw.githubusercontent.com/arvest-data-in-context/ml-notebooks/refs/heads/main/docs/images/notebooks/video-shot-decomposition.png)

In this video, we shall find a video that we have stored on Arvest, and then use the [Distant Viewing Toolkit](https://github.com/distant-viewing/dvt) to detect different shot changes in the video. Once this is done, we shall take the results and build an interactive IIIF Manifest which can be diirectly viewed in [Arvest](https://arvest.app).

# 0. Setup

Let's begin by installing and importing all of the different components we will need.

In [None]:
print("Installing and importing packages...")

# Uninstall and reinstall packages for a clean environment
!pip uninstall -q -y arvestapi
!pip uninstall -q -y arvesttools
!pip uninstall -q -y jhutils
!pip uninstall -q -y iiif_prezi3
!pip uninstall -q -y dvt
!pip install -q --disable-pip-version-check git+https://github.com/arvest-data-in-context/arvest-api.git
!pip install -q --disable-pip-version-check git+https://github.com/arvest-data-in-context/arvest-api-tools.git
!pip install -q --disable-pip-version-check git+https://github.com/jdchart/jh-py-utils.git
!pip install -q --disable-pip-version-check git+https://github.com/iiif-prezi/iiif-prezi3.git
!pip install -q --disable-pip-version-check git+https://github.com/distant-viewing/dvt.git
!mkdir -p /root/.cache/torch/hub/checkpoints/

# Import packages
import arvestapi
import arvesttools.manifest_creation
from jhutils.local_files import read_json, write_json
import jhutils.online_files
from jhutils.misc import print_progress_bar_colab, slugify
import os
import dvt
import iiif_prezi3
import shutil
import numpy as np
import random
import copy
import mimetypes
mimetypes.add_type('image/webp', '.webp')

TEMP_FOLDER = os.path.join(os.getcwd(), "_TEMP")
if os.path.isdir(TEMP_FOLDER) == False:
    os.makedirs(TEMP_FOLDER)

print("👍 Ready!")

# 1. Find our video
The first step is to get the video that we wish to process. We have ours stored on our Arvest account, and we have given its metadata `identifier` field the value `"API-TUTORIAL-CONTENT-AUTO-SHOT-DECOMP"`. This allows us to find our media using the [Arvest API](https://github.com/arvest-data-in-context/arvest-api).

First, we need to "connect" to Arvest using the Arvest API package. For this, we need our user email and our password which we will give to an instance of the `arvestapi.Arvest()` class.

In [None]:
EMAIL = "my_email@something.com"
PASSWORD = "myarvestpassword"

ar = arvestapi.Arvest(EMAIL, PASSWORD)
print(f"👍 Succesfully connected to Arvest with \"{ar.profile.name}\"")

Next, we'll get all of our media using the `get_medias()` function, and search until we find the right video.

In [None]:
found_media = []
media_items = ar.get_medias()

for media_item in media_items:
    media_item_metadata = media_item.get_metadata()
    if media_item_metadata["identifier"] == "API-TUTORIAL-CONTENT-AUTO-SHOT-DECOMP":
        found_media.append(media_item)

video_item = found_media[0]

print(f"🔍 Found {len(found_media)} media files corresponding to search criteria.")
print(f"Treating first item: \"{video_item.title}\"")

Finally, we shall have to download the correspoinding video in order to analyze it. To do this, we shall use our helper function.

In [None]:
local_video_path = jhutils.online_files.download(video_item.get_full_url(), dir = TEMP_FOLDER)

print(f"👍 Video downloaded to {local_video_path}")

# 2. Perform Analysis
Now that we have our video, we are ready to analyze! We shall use the [Distant Viewing Toolkit](https://github.com/distant-viewing/dvt)'s `AnnoShotBreaks()` class (ntoe that, if you're running this for the first time, the model will first need to download).

Once we have the results in frames, we calculate the start and end of each shot in seconds using the video's frame rate.

In [None]:
# DVT's shot detection model:
anno_breaks = dvt.AnnoShotBreaks()

print("Processing...")
# Run the analysis here:
breaks = anno_breaks.run(local_video_path)

# The results are returned in frames, here we parse the results so that we have them in seconds:
result_parse = {"shots" : []}
frame_rate = dvt.video_info(local_video_path)["fps"]
for i in range(len(breaks["scenes"]["start"])):
    result_parse["shots"].append([breaks["scenes"]["start"][i] / frame_rate, breaks["scenes"]["end"][i] / frame_rate])

print(f"👍 Analysis complete! Found {len(result_parse['shots'])} shots.")

# 4. Export to Arvest
Finally, we shall export the results of our analysis to an interactive IIIF Manifest that can be opened in Arvest. Let's begin by creating the basic Manifest with the [arvesttools](https://github.com/arvest-data-in-context/arvest-api-tools) package's `media_to_manifest()` function.

In [None]:
manifest = arvesttools.manifest_creation.media_to_manifest(video_item)
print("👍 Manifest created!")

Next, we'll add a new Canvas for each of the detected shots that crops the original video according to the shot start and end times.

**ℹ️ This is currently not working, as the IIIF Presentation API does not allow us to supply a start time to a video resource. Hopefully this feature shall be implemented soon.**

In [None]:
# for i, shot in enumerate(result_parse["shots"]):

#     duration = shot[1] - shot[0]
    
#     shot_canvas = copy.copy(manifest.items[0])
#     shot_ap = copy.copy(shot_canvas.items[0])
#     shot_an = copy.copy(shot_ap.items[0])
#     shot_body = copy.copy(shot_an.body)

#     shot_canvas.label = {"en" : [f"{video_item.title} (shot {i + 1})"]}
#     shot_canvas.duration = "{:.4f}".format(duration)
    
#     original_target = shot_an.target.split("&t=")[0]
#     shot_an.target = f"{original_target}&t={shot[0]:.2f},{shot[1]:.2f}"

#     shot_body.duration = duration

#     shot_an.body = shot_body
#     shot_ap.items = [shot_an]
#     shot_canvas.items = [shot_ap]

#     arvesttools.manifest_creation.append_canvas_to_manifest(manifest, shot_canvas)

Next, let's add a timed annotation to the main Canvas, one for each shot.

In [None]:
print("Adding annotations...")
for i, shot in enumerate(result_parse["shots"]):
    print_progress_bar_colab(i + 1, len(result_parse["shots"]), f"(shot {i + 1}/{len(result_parse['shots'])})")

    arvesttools.manifest_creation.add_textual_annotation(
        manifest,
        text_content = f"<p><strong>Shot {i + 1}</strong><br>(<em>{shot[0]:.2f}-{shot[1]:.2f}</em>)</p>",
        t = {"start" : "{:.4f}".format(shot[0]), "end" : "{:.4f}".format(shot[1])}
    )

print("👍 Finished")

Finally, we can upload the Manifest to Arvest. You can either go and find it in your [workspace](https://workspace.arvest.app/) or view it at the url given below.

In [None]:
# Save to disk
local_path = os.path.join(TEMP_FOLDER, f"{slugify(video_item.title)}-shot-decomposition.json")
write_json(local_path, manifest.dict())

# Upload Manifest:
added_manifest = ar.add_manifest(path = local_path, update_id = True)
added_manifest.update_title(f"{video_item.title} (automatic shot decomposition)")
added_manifest.update_description("A Manifest annotated using a video shot detection model.")
if video_item.thumbnail_url != None:
    added_manifest.update_thumbnail_url(video_item.thumbnail_url)

# Update metadata:
manifest_metadata = added_manifest.get_metadata()
manifest_metadata["creator"] = "Video shot deocmposition tutorial"
manifest_metadata["identifier"] = "&&API-TUTORIAL-VIDEO-SHOT-DECOMP"
added_manifest.update_metadata(manifest_metadata)

print(f"👍 Manifest created, view it here: {added_manifest.get_preview_url()}")

# 5. Cleanup
To finish, lets clean up our mess! First, we can delete the temporary folder .

In [None]:
shutil.rmtree(TEMP_FOLDER)
print(f"🗑️ {TEMP_FOLDER} removed !")

And finally, we can remove from Arvest all of our created Manifest. We can get all of our Manifests by using the `get_manifests()` function, then check the metadata. If it's one of the files we want to remove, we can then use the `remove()` function.

**⚠️ Warning: there's no going back after using the remove function, so be careful! To avoid accidential removal, we've added a `REMOVE` variable that need to be set to `True` for the code to run.**

In [None]:
REMOVE = False

if REMOVE:
    all_manifests = ar.get_manifests()
    count = 0
    print("Removing manifests...")

    for i, media_file in enumerate(all_manifests):
        print_progress_bar_colab(i + 1, len(all_manifests), f"(Processing file {i + 1}/{len(all_manifests)})")
        media_metadata = media_file.get_metadata()
        if media_metadata["creator"] == "Video shot deocmposition tutorial" and media_metadata["identifier"] == "&&API-TUTORIAL-VIDEO-SHOT-DECOMP":
            media_file.remove()
            count = count + 1

    print(f"🗑️ Removed {count} items!")