# Data Lake Example 2 - Object tagging
Why Use Object Tagging?

Object tagging is useful for adding custom metadata to objects, which makes it easier to organize, manage, and search for objects within your storage buckets. In the film-making industry, for instance, you might use tags to indicate the project, scene, priority, or other attributes relevant to an asset.

## 0. Load libraries and common configuration

In [1]:
# Install necessary packages
!pip install boto3 certifi



In [2]:
import boto3
from botocore.client import Config
from botocore.exceptions import ClientError
import os
import ssl
import certifi
import sys
import warnings
warnings.filterwarnings('ignore')

#Some issues might appear (SSL verification error) with yhe client if python is not properly configured. 
# You might find this line useful to skip the error 
ssl._create_default_https_context = ssl._create_unverified_context


# MinIO server connection information
minio_url = 'https://s3api.scene.local'  # Replace with your MinIO instance URL
access_key = 'testuser'       # Replace with your actual access key
secret_key = 'testscene'       # Replace with your actual secret key


# Initialize a session using boto3
session = boto3.session.Session()

# Create a client with the MinIO server
# Add "verify=False" tothe list if you have troubles with SSL verification
s3_client = session.client(
    's3',
    verify=False,
    endpoint_url=minio_url,    
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    config=Config(signature_version='s3v4'),
    region_name='us-east-1'  # You can choose any region name. Not applicable here
)
print("Libraries loaded successfully")

Libraries loaded successfully


## 1. Adding Tags to Objects
In this step, we’ll add different tags to athens.png (an image) and athens.webm (a video file) in testbucket. We’ll apply a unique set of tags for each object.

In [6]:
# Parameters
bucket_name = "testbucket"
object_key_image = "athens.png"  # Image file
object_key_video = "athens.webm"  # Video file

# Define tags for each object
tags_image = [
    {'Key': 'FileType', 'Value': 'Image'},
    {'Key': 'Location', 'Value': 'Athens'},
    {'Key': 'Priority', 'Value': 'Medium'}
]

tags_video = [
    {'Key': 'FileType', 'Value': 'Video'},
    {'Key': 'Location', 'Value': 'Athens'},
    {'Key': 'Priority', 'Value': 'High'}
]

def add_tags_to_object(bucket_name, object_key, tags, client=s3_client):
    # Convert the list of tags to the expected format (query string)
    tag_string = '&'.join([f"{tag['Key']}={tag['Value']}" for tag in tags])
    
    try:
        response = client.put_object_tagging(
            Bucket=bucket_name,
            Key=object_key,
            Tagging={'TagSet': tags}
        )
        print(f"Tags applied to {object_key}: {tag_string}")
    except ClientError as e:
        print(f"Error tagging object {object_key}: {e}")


# Add tags to athens.png (image)
add_tags_to_object(bucket_name, object_key_image, tags_image)

# Add tags to athens.webm (video)
add_tags_to_object(bucket_name, object_key_video, tags_video)


Tags applied to athens.png: FileType=Image&Location=Athens&Priority=Medium
Tags applied to athens.webm: FileType=Video&Location=Athens&Priority=High


## 2. Retrieving Tags for Each Object
Now that we’ve tagged the objects, let’s retrieve and display these tags to verify that they were correctly applied.

In [7]:
def get_tags_for_object(bucket_name, object_key):
    try:
        # Retrieve tags for the object
        response = s3_client.get_object_tagging(Bucket=bucket_name, Key=object_key)
        tags = response['TagSet']
        print(f"Tags for {object_key} in {bucket_name}: {tags}")
    except ClientError as e:
        print(f"Error retrieving tags for object {object_key}: {e}")

# Retrieve and display tags for athens.png
get_tags_for_object(bucket_name, object_key_image)

# Retrieve and display tags for athens.webm
get_tags_for_object(bucket_name, object_key_video)


Tags for athens.png in testbucket: [{'Key': 'FileType', 'Value': 'Image'}, {'Key': 'Location', 'Value': 'Athens'}, {'Key': 'Priority', 'Value': 'Medium'}]
Tags for athens.webm in testbucket: [{'Key': 'FileType', 'Value': 'Video'}, {'Key': 'Location', 'Value': 'Athens'}, {'Key': 'Priority', 'Value': 'High'}]


## 3. List Objects with a Specific Tag (Limitations in S3 API)
Unfortunately, the standard S3 API does not support filtering objects by tags directly. You cannot use the S3 API to directly query or list objects based on tags. Instead, you would typically need to:
* List all objects in the bucket.
* Retrieve each object’s tags individually and then filter based on the tags you’re interested in.

Below is a workaround approach to list objects in testbucket and filter them by a specific tag, e.g., "Priority": "High".

In [8]:
def list_objects_with_tag(bucket_name, tag_key, tag_value):
    try:
        # List all objects in the bucket
        response = s3_client.list_objects_v2(Bucket=bucket_name)
        
        if 'Contents' in response:
            print(f"Listing objects in {bucket_name} with tag '{tag_key}: {tag_value}':")
            for obj in response['Contents']:
                object_key = obj['Key']
                
                # Retrieve the tags for each object
                tag_response = s3_client.get_object_tagging(Bucket=bucket_name, Key=object_key)
                tags = {tag['Key']: tag['Value'] for tag in tag_response['TagSet']}
                
                # Check if the object has the specified tag key and value
                if tags.get(tag_key) == tag_value:
                    print(f"- {object_key} matches tag '{tag_key}: {tag_value}'")
        else:
            print(f"No objects found in bucket: {bucket_name}")
    except ClientError as e:
        print(f"Error listing objects with tag {tag_key}: {tag_value} in {bucket_name}: {e}")

# List objects in testbucket with the tag "Priority: High"
list_objects_with_tag(bucket_name, "Priority", "High")


Listing objects in testbucket with tag 'Priority: High':
- athens.webm matches tag 'Priority: High'


## 4. Removing a Specific Tag from an Object
Below is the code for removing a specific tag from an object. We’ll demonstrate removing the Priority tag from the athens.webm object in testbucket.

In [7]:
def remove_tag_from_object(bucket_name, object_key, tag_key_to_remove):
    try:
        # Step 1: Retrieve the current tags
        response = s3_client.get_object_tagging(Bucket=bucket_name, Key=object_key)
        current_tags = response['TagSet']
        
        # Step 2: Filter out the tag we want to remove
        updated_tags = [tag for tag in current_tags if tag['Key'] != tag_key_to_remove]
        
        # Step 3: Reapply the updated tags to the object
        s3_client.put_object_tagging(
            Bucket=bucket_name,
            Key=object_key,
            Tagging={'TagSet': updated_tags}
        )
        
        print(f"Tag '{tag_key_to_remove}' removed from {object_key} in {bucket_name}.")
        print(f"Updated tags: {updated_tags}")
    except ClientError as e:
        print(f"Error removing tag '{tag_key_to_remove}' from object {object_key}: {e}")

# Remove the 'Priority' tag from athens.webm
remove_tag_from_object(bucket_name, object_key_video, 'Priority')


Tag 'Priority' removed from athens.webm in testbucket.
Updated tags: [{'Key': 'FileType', 'Value': 'Video'}, {'Key': 'Location', 'Value': 'Athens'}]


## 5. Removing all Tags from an Object
Below is the code for removing all tags from an athens.webm and athens.png

In [9]:
# Function to delete all tags from an object
def delete_object_tags(bucket_name, object_key):
    s3_client.delete_object_tagging(Bucket=bucket_name, Key=object_key)
    print(f"Tags removed from {object_key} in bucket {bucket_name}")

# Define your bucket and object keys
bucket_name = 'testbucket'
object_keys = ['athens.png', 'athens.webm']

# Remove tags from each object
for object_key in object_keys:
    delete_object_tags(bucket_name, object_key)

Tags removed from athens.png in bucket testbucket
Tags removed from athens.webm in bucket testbucket
