![Arvest batch media upload](https://raw.githubusercontent.com/arvest-data-in-context/ml-notebooks/refs/heads/main/docs/images/notebooks/arvest-batch-media-upload.png)

In this notebook, we'll take either local or online media, and see how to upload it to [Arvest](https://arvest.app) using the [Arvest API](https://github.com/arvest-data-in-context/arvest-api).

# 0. Setup

Let's begin by installing and importing all of the different components we will need.

In [None]:
print("Installing and importing packages...")

# Uninstall and reinstall packages for a clean environment
!pip uninstall -q -y arvestapi
!pip uninstall -q -y jhutils
!pip install -q --disable-pip-version-check git+https://github.com/arvest-data-in-context/arvest-api.git
!pip install -q --disable-pip-version-check git+https://github.com/jdchart/jh-py-utils.git

# Import packages
import arvestapi
from jhutils.local_files import collect_files, read_json
from jhutils.misc import print_progress_bar
import os

print("👍 Ready!")


# 1. Prepare your media
In Arvest, it is possible to upload media from your machine, or link to media that already exists online. In both cases, we will need to get the **path** to your media files, and bring them together in a list. First, here is a list of online files (hosted at the [Library of Congress](https://www.loc.gov/collections/fsa-owi-color-photographs/about-this-collection/)):

In [None]:
ONLINE_MEDIA_FILES = [
    "https://tile.loc.gov/storage-services/service/pnp/fsac/1a34000/1a34600/1a34630v.jpg",
    "https://tile.loc.gov/storage-services/service/pnp/fsac/1a34000/1a34200/1a34209v.jpg",
    "https://tile.loc.gov/storage-services/service/pnp/fsac/1a33000/1a33800/1a33859v.jpg"
]

We can also get a list of files on our computer:

In [None]:
FOLDER_TO_UPLOAD = os.path.join(os.getcwd(), "..", "..", "..", 'test-media')
LOCAL_MEDIA_FILES = collect_files(FOLDER_TO_UPLOAD)
print(f"🔍 Found {len(LOCAL_MEDIA_FILES)} files in {FOLDER_TO_UPLOAD}.")

# 2. Connect to Arvest
Next, we need to "connect" to Arvest using the Arvest API package. For this, we need our username and our password. We've saved ours to a json file in order to keep things private.

In [None]:
LOGIN_DATA = os.path.join(os.getcwd(), "login_private.json")
credentials = read_json(LOGIN_DATA)

ar = arvestapi.Arvest(credentials["email"], credentials["password"])
print(f"👍 Succesfully connected to Arvest with \"{ar.profile.name}\"")

Now we can add the media to Arvest using the `add_media()` function. This will take one kwarg, `path`, which is the path to the file we'd like to upload. This can be a local path or an online path, the API package will take care of things for us. Each time, we'll grab the added media, and also modify the **title** and **description**.

In [None]:
uploaded_media = []
count = 0
print("Uploading files...")

for i, media_file_path in enumerate(ONLINE_MEDIA_FILES):
    print_progress_bar(i, len(ONLINE_MEDIA_FILES) - 1, f"(Online file {i + 1}/{len(ONLINE_MEDIA_FILES)})")

    added_media = ar.add_media(path = media_file_path)
    added_media.update_title(f"Batch upload file {i + 1} (online)")
    added_media.update_description(f"Uploaded to demonstrate batch media uploading from a python notebook.")
    uploaded_media.append(added_media)
    count = count + 1

for i, media_file_path in enumerate(LOCAL_MEDIA_FILES):
    print_progress_bar(i, len(LOCAL_MEDIA_FILES) - 1, f"(Local file {i + 1}/{len(LOCAL_MEDIA_FILES)})")

    added_media = ar.add_media(path = media_file_path)
    added_media.update_title(f"Batch upload file {i + 1} (local)")
    added_media.update_description(f"Uploaded to demonstrate batch media uploading from a python notebook.")
    uploaded_media.append(added_media)
    count = count + 1

print(f"👏 Added {count} media files to Arvest!")

# 3. Update metadata
Finally we can update our media's metadata. Among other things, this will notably be useful for parsing our documents and making sure that we find the files we need when scripting. We can deal with our metadata as a `dict` in python which we get using the `get_metadata()` function. We can then update this dict and use the `update_metadata()` function to update.

In [None]:
print("Updating metadata...")

for i, added_media in enumerate(uploaded_media):
    print_progress_bar(i, len(uploaded_media) - 1, f"(File {i + 1}/{len(uploaded_media)})")
    media_metadata = added_media.get_metadata()
    media_metadata["creator"] = "Batch media upload example script"
    media_metadata["identifier"] = "&&BATCH_UPLOAD"
    added_media.update_metadata(media_metadata)

print(f"👍 Metadata updated!")

# 4. Batch remove media
If we need to remove media files we can do so by parsing through all of our media and checjign certain conditions. For example, we can get all of our media files using the `get_medias()` function, then check it's metadata. If it's one of the files we want to remove, we can then use the `remove()` function.

**Warning: there's no going back after using the remove function, so be careful!**

In [None]:
all_media = ar.get_medias()
count = 0
print("Removing files...")

for i, media_file in enumerate(all_media):
    print_progress_bar(i, len(all_media) - 1, f"(Processing file {i + 1}/{len(all_media)})")
    media_metadata = media_file.get_metadata()
    if media_metadata["creator"] == "Batch media upload example script" and media_metadata["identifier"] == "&&BATCH_UPLOAD":
        media_file.remove()
        count = count + 1

print(f"🗑️ Removed {count} media files!")