![Arvest batch media upload](https://raw.githubusercontent.com/arvest-data-in-context/ml-notebooks/refs/heads/main/docs/images/notebooks/arvest-batch-media-upload.png)

In this notebook, we shall learn how to batch upload media files from you machine to [Arvest](https://arvest.app) using the [Arvest API](https://github.com/arvest-data-in-context/arvest-api).

# 0. Setup

Let's begin by installing and importing all of the different components we will need.

In [None]:
print("Installing and importing packages...")

# Uninstall and reinstall packages for a clean environment
!pip uninstall -q -y arvestapi
!pip uninstall -q -y jhutils
!pip install -q --disable-pip-version-check git+https://github.com/arvest-data-in-context/arvest-api.git
!pip install -q --disable-pip-version-check git+https://github.com/jdchart/jh-py-utils.git

# Import packages
import arvestapi
from jhutils.local_files import collect_files
from jhutils.misc import print_progress_bar_colab
import os

print("👍 Ready!")

# 1. Prepare your media

In the local version of this notebook, you can directly look for files on your computer. As we're operating in Google Colab here, you'll first have to upload your files to the temporary storage section. To do this, open the `Files` tab on the left, and drop the files into the workspace.

Next you can get the path to to your media files. We shall define the absolute path to a folder in the Colab workspace with the `FOLER_TO_UPLOAD` variable, then use out utility function `collect_files()` which finds all of the files of a certain type within a given folder. We shall be uploading image files, so we give the `ACCEPTED_FILES` variable a list of image extensions.

In [None]:
FOLDER_TO_UPLOAD = os.path.join(os.getcwd())
ACCEPTED_FILES = ["jpg", "jpeg", "png"]

local_media_files = collect_files(FOLDER_TO_UPLOAD, ACCEPTED_FILES)
print(f"🔍 Found {len(local_media_files)} image files in {FOLDER_TO_UPLOAD}.")

# 2. Connect to Arvest
Next, we need to "connect" to Arvest using the Arvest API package. For this, we need our user email and our password which we will give to an instance of the `arvestapi.Arvest()` class.

In [None]:
EMAIL = "my_email@something.com"
PASSWORD = "myarvestpassword"

ar = arvestapi.Arvest(EMAIL, PASSWORD)
print(f"👍 Succesfully connected to Arvest with \"{ar.profile.name}\"")

Now we can add the media to Arvest using the `add_media()` function. This will take one kwarg, `path`, which is the path to the file we'd like to upload.

We'll first upload the file and put the returned object into a variable called `added_media`. This will then allow us to update the **title** and the **description** in Arvest of the media item.

In [None]:
uploaded_media = []
count = 0
print("Uploading files...")

for i, media_file_path in enumerate(local_media_files):
    print_progress_bar_colab(i, len(local_media_files) - 1, f"(Local file {i + 1}/{len(local_media_files)})")

    # Add media using the add_media() function:
    added_media = ar.add_media(path = media_file_path)

    # Update the title and description (change this to whatever you want):
    added_media.update_title(f"{os.path.basename(media_file_path)} (batch upload file {i + 1}).")
    added_media.update_description(f"Uploaded to demonstrate batch media uploading from a python notebook.")
    
    # We add the media to a list so that we can retrieve them later:
    uploaded_media.append(added_media)
    count = count + 1

print(f"👏 Added {count} media files to Arvest!")

If you like, run the following cell which uses the `get_full_url()` function to consult your media which is now stocked online, or logon to your [workspace](https://workspace.arvest.app/) and see the new media items in your list.

In [None]:
print("Uploaded media:")
for media in uploaded_media:
    print(f"{media.get_full_url()}")

# 3. Update metadata
Finally we can update our media's metadata. Among other things, this will notably be useful for parsing our documents and making sure that we find the files we need when scripting.

We can deal with our metadata as a `dict` in python which we get using the `get_metadata()` function. We can then update this dict and use the `update_metadata()` function to update in Arvest.

Check your [workspace](https://workspace.arvest.app/) again to examine how the metadat has been updated.

In [None]:
print("Updating metadata...")

for i, added_media in enumerate(uploaded_media):
    print_progress_bar_colab(i, len(uploaded_media) - 1, f"(File {i + 1}/{len(uploaded_media)})")

    # Get the metadata dict:
    media_metadata = added_media.get_metadata()

    # Update fields:
    media_metadata["creator"] = "Batch media upload example script"
    media_metadata["identifier"] = "&&BATCH_UPLOAD"

    # Update on Arvest:
    added_media.update_metadata(media_metadata)

print(f"👍 Metadata updated!")

# 4. Batch remove media
If we need to remove media files we can do so by parsing through all of our media and checjign certain conditions. For example, we can get all of our media files using the `get_medias()` function, then check it's metadata. If it's one of the files we want to remove, we can then use the `remove()` function.

**⚠️ Warning: there's no going back after using the remove function, so be careful! To avoid accidential removal, we've added a `REMOVE` variable that need to be set to `True` for the code to run.**

In [None]:
REMOVE = False

if REMOVE:
    count = 0
    print("Removing files...")

    # Get all of our media files:
    all_media = ar.get_medias()
    
    for i, media_file in enumerate(all_media):
        print_progress_bar_colab(i, len(all_media) - 1, f"(Processing file {i + 1}/{len(all_media)})")
        
        # Get the media item's metadata and check if it matches some conditions:
        media_metadata = media_file.get_metadata()
        if media_metadata["creator"] == "Batch media upload example script" and media_metadata["identifier"] == "&&BATCH_UPLOAD":
            
            # Remove the item:
            media_file.remove()
            count = count + 1

    print(f"🗑️ Removed {count} media files!")