# Image Operations

In this notebook, we will do the following, and configure what to include in the deposition of a record:
1) Resize Images
2) Extract EXIF data
3) Mask Persons using SOTA models

Before we start, we will initialize a new record and our sandbox database, we will leave out debugging, as you might be familiar and know how to handle errors at this point:

In [None]:
from datetime import date
import json
import os
from pathlib import Path

os.chdir(Path().absolute().parent) if Path().absolute().name == "Tutorials" else None

from db_tools import initialize_db
from main_functions import create_record, publish_record, upload_files_into_deposition
from person_masker import load_person_masker_models, mask_persons
from utilities import append_to_json, get_image_metadata, load_config, resize_image, write_json

# Load DB configuration and change path to DB
db_config = load_config("Tutorials/Configs/db_config.yaml")
db_connection = initialize_db(db_config)

# Initial Configuration
zenodo_config = load_config("Tutorials/Configs/zenodo.yaml")
USE_ENV_API_KEY = zenodo_config["main"]["use_env_api_key"]
USE_SANDBOX = zenodo_config["main"]["use_sandbox"]
ZENODO_BASE_URL = "https://sandbox.zenodo.org" if USE_SANDBOX else "https://zenodo.org"

if USE_ENV_API_KEY:
    ZENODO_API_KEY = os.environ.get("ZENODO_SANDBOX_API_KEY") if USE_SANDBOX else os.environ.get("ZENODO_API_KEY")
else:
    ZENODO_API_KEY = "your_sandbox_api_key_here" if USE_SANDBOX else "your_production_api_key_here"

HEADERS = {"Content-Type": "application/json"}
PARAMS = {"access_token": ZENODO_API_KEY}
print(f"Using {'Sandbox' if USE_SANDBOX else 'Production'} Zenodo Environment.")

test_metadata = {
    "metadata": {
        "title": "Test Dataset for Image Operations (Tutorial)",
        "description": "This is a test dataset for image operations.",
        "upload_type": "dataset",
        "creators": [{"name": "Doe, John", "affiliation": "Test University"}],
        "access_right": "open",
        "license": "cc-by-4.0",
        "version": "0.0.1",
        "publication_date": date.today().strftime("%Y-%m-%d")
    }
}

result_msg, result_data = create_record(test_metadata, db_connection)

### Resize / Scale Images

If we want to scale our images, we can set the maximum length or width size, or set a ratio, both using the configuration, and upload the scaled images as well. It will use Lanczos resampling by default:

In [None]:
# Load Configuration for Image Operations
config_images = load_config("Tutorials/Configs/image_operations.yaml")

# Modify Configuration
config_images["settings"]["image_resize"]["active"] = True # activate resizer
config_images["settings"]["image_resize"]["use_ratio"] = False # disable Ratio Mode
config_images["settings"]["image_resize"]["max_dimension_value"] = 400 # set maximum dimension value in pixel
config_images["settings"]["upload_resized_image"] = True
config_images["settings"]["upload_resized_image_only"] = True # decide if you want to upload the resized images only

# Set input images
filepaths = ["Tutorials/Images/test_image.png", "Tutorials/Images/test_image_2.png"]

# Resize Images and add Paths based on your configurations
new_filepaths = []
for filepath in filepaths:
    resized_filepath = resize_image(filepath, config_images)
    
    if config_images["settings"]["upload_resized_image"]:
        new_filepaths.append(resized_filepath)
    
    if not config_images["settings"]["upload_resized_image_only"]:
        new_filepaths.append(filepath)
        
print(f'Resized Images have been saved to: {config_images["paths"]["output"]["resized_images"]}')

# Perform File Uploads into Draft Deposition
fileupload_msg, fileupload_data = upload_files_into_deposition(result_data, new_filepaths, db_connection=db_connection)
if fileupload_msg["success"] and fileupload_data:
    print("\nFiles successfully uploaded!")
    [print(f"\nDirect Link to {i['filename']}: {i['links']['download'].replace('/files', '/draft/files')}") for i in fileupload_data]
else:
    print("\nFailed to upload Files. Please check the error message above or in fileupload_msg['text']:")
    print(fileupload_msg["text"])

### Extract EXIF, Metadata and Paradata

Since photos and images can generally contain very valuable metadata and paradata, extracting them is extremely useful. However, they can also contain personal data, in particular residential addresses, which we do not want to publish in the Zenodo Record. In the following, we will try this out using an example image, add the extracted data to the description and export a JSON:

In [None]:
# initialize variables
project_title = config_images["project_title"]
image_path = "Tutorials/Images/test_image_exif.jpg"
json_path = f"Tutorials/Output/{project_title}/{Path(image_path).stem}.json"

# extract metadata, paradata and EXIF
image_metadata = get_image_metadata(image_path=image_path, remove_address=True, remove_mail=True)

# print and write result
print(json.dumps(image_metadata, indent=2, ensure_ascii=False))
write_json(image_metadata, json_path)
print(f"JSON written to: {json_path}")

Try to change the argument `remove_mail` to False and check the result.

### Mask Persons on Images

In order to mask persons on images, for example due to privacy concerns, we will use a detector and a segmentation model, more precisely YOLOv10 and Segment-Anything-Model 2, but you can define your own models in the configuration.

In [10]:
config_images["person_masker"]["active"] = True # activate person segmentation
config_images["person_masker"]["bbox_device"] = "cpu" # set to cpu or cuda
config_images["person_masker"]["segmentation_device"] = "cpu" # ""
config_images["person_masker"]["threshold"] = 0.1 # set threshold for person detection model

bbox_model, segmentation_model = load_person_masker_models(config_images)

In [None]:
# define filepaths to images with persons on it
filepaths = ["Tutorials/Images/persontest_image.jpg", "Tutorials/Images/persontest_image_2.jpg"]

# mask persons on images, provide the loaded models
new_filepaths = []
new_filepaths = mask_persons(bbox_model=bbox_model, segmentation_model=segmentation_model,
                             config=config_images, filepaths=filepaths, process_directory=False)

# upload images, decide if you want to upload the masked ones only or both by list operations
fileupload_msg, fileupload_data = upload_files_into_deposition(result_data, new_filepaths, db_connection=db_connection)
    

### Publish Record

In [None]:
# Publish Record
published_msg, published_data = publish_record(result_data, db_connection)
if published_msg["success"]:
    print("Record successfully published!")
    
    # Save the published record data locally
    append_to_json(published_data, "Tutorials/Output/sandbox_published.json")

print("Published Record Information:")
print(f"Title: {published_data['metadata']['title']}")
print(f"DOI: {published_data['doi']}")
print(f"Record URL: {published_data['links']['record_html']}")
print("\nFiles in the published record:")
for file in published_data['files']:
    print(f"- {file['filename']} (Size: {int(file['filesize']) / (1024 * 1024):.2f} MB): {file['links']['download'].replace('/draft', '')}")

### Adjust Parameters

You will notice that, while `persontest_image_2.jpg` was masked perfectly, this is not the case for `persontest_image.jpg`.
<br>If you want to see what went wrong, you can consult the detector results in the configured directory and the printed Logs.
<br>Adjust the parameters like the threshold, or set another model that is more capable, based on your needs and computing capabilities.

<small>

Note: Don't worry, the test images you just uploaded are Public Domain (CC0) licensed.

</small>