# Custom Configurations for Individual Projects

## Introduction

This notebook explores the use of YAML configuration files to streamline and customize Zenodo upload workflows and associated data management processes. By leveraging YAML configurations, we can create flexible, project-specific settings for various operations including:

- Zenodo upload parameters
- Local & Remote Database operations
- Data mappings
- Excel column handling
- File processing routines (Image & 3D Model Operations)

The goal is to develop adaptable workflows that can be easily modified for different projects or datasets without altering the core code.

## Objectives

In this notebook, we will:

1. Introduce the structure and syntax of YAML configuration files
2. Demonstrate how to load and parse YAML configurations in Python
3. Apply custom configurations to Zenodo upload processes
4. Explore examples of configuring database operations and data mappings
5. Show how to use configurations for Excel column handling and file processing


# Working with Configuration Files for Zenodo Uploads

This notebook demonstrates how to use YAML configuration files to customize Zenodo upload workflows and associated data management processes.

## Loading Configuration Files

First, let's import the necessary libraries and load our configuration files.


In [None]:
import os
from pathlib import Path

os.chdir(Path().absolute().parent) if Path().absolute().name == "Tutorials" else None
from db_tools import get_row, initialize_db, print_table, upsert_data
from utilities import load_config, printJSON

# Load example configuration files
zenodo_config = load_config('Tutorials/Configs/zenodo.yaml')
db_config = load_config('Tutorials/Configs/db_config.yaml')
excel_config = load_config('Tutorials/Configs/excel_operations.yaml')

if zenodo_config and db_config and excel_config:
    print("Zenodo configuration loaded.")


## Understanding Configuration Structure

Each of these example YAML files contains specific settings for different aspects of our workflow:

1. `zenodo.yaml`: Contains Zenodo API settings and rate limits.
2. `db_config.yaml`: Defines database configurations and table structures.
3. `excel_processor.yaml`: Specifies Excel processing parameters and column mappings.

Let's examine some key sections:

In [None]:
# Zenodo API settings
print("(Selection) Zenodo API settings:")
printJSON(zenodo_config['main'])

# Database connection settings
print("\n(Selection) Database connection settings:")
printJSON({k: v for k, v in db_config.items() if k not in ['db_structures']})

# Excel column mappings
print("\n(Selection) Excel column mappings:")
printJSON(excel_config['column_mapping'])


## Modifying Configurations (Temporary)

You can modify these configurations to suit your project needs or — for testing purposes — temporarily.
<br>For example, let's change the configuration to not use environment variables and set a custom access token, e.g. one that is read-only. Also, we will set the rate limits to more relaxed ones:


In [None]:
zenodo_config["main"]["use_env_api_key"] = False
zenodo_config["main"]["zenodo_sandbox_api_key"] = "123456789"

rates_ratio = int(zenodo_config["rates"]["per_hour"] / zenodo_config["rates"]["per_minute"])
zenodo_config["rates"]["per_minute"] = 25
zenodo_config["rates"]["per_hour"] = zenodo_config["rates"]["per_minute"] * rates_ratio
printJSON(zenodo_config)

## Modifying Configuration Files

Modyfing configurations in every script with the above method might be too repetitive and tedious, so the main advantage of YAML files is its human-readability and easy cross-script access.
<br>Let's do the following:
1) Modify the values by opening `zenodo.yaml` with your favorite text or code editor.
2) Load the configuration again.
3) Examine the changes.


In [None]:
# assume that you have changed the values
zenodo_config = load_config("Tutorials/Configs/zenodo.yaml")
printJSON(zenodo_config)

## Set Custom Database Parameters

For better reproducibility, we will continue with temporarily set configurations in this Notebook.
<br>Now that the most basic operations are known, we can explore advanced configurations, first by examining and modifying the table structure for 'states'. We will remove all other tables except 'records' and 'states', as foreign keys constraints are defined by default.

In [None]:
# Load Configuration from File and print current Path
db_config = load_config('Tutorials/Configs/db_config.yaml')
print(f"Local Database Path: {db_config['local_db_path']}")

# Set New Path
db_config["local_db_path"] = "Tutorials/Output/new_states.sqlite"
print(f"New Database Path: {db_config['local_db_path']}\n")

# Print current Database Structure for 'states'
print("Old Table Structure for 'states':\n")
printJSON(db_config['db_structures']['tables'][3])

# Set New Table Structures
db_config["db_structures"]["tables"] = []

# records table must be created for this example due to foreign key constraints
db_config["db_structures"]["tables"].append({
    "records": {
        "id": "INTEGER PRIMARY KEY AUTOINCREMENT",
        "concept_recid": "TEXT NOT NULL UNIQUE",
        "recid": "TEXT NOT NULL",
    }
})

db_config["db_structures"]["tables"].append({
    "states": {
        "id": "INTEGER PRIMARY KEY AUTOINCREMENT",
        "concept_recid": "TEXT NOT NULL",
        "recid": "TEXT NOT NULL",
        "edit_mode_enabled": "BOOLEAN", # change #1
        "draft_mode_enabled": "BOOLEAN", # change #2
        "access_right": "TEXT NOT NULL", # change #3
        "updated_at": "DATETIME DEFAULT CURRENT_TIMESTAMP"
    },
    "indexes": [
        "recid"
    ]
})

# View Changes
print("\nNew Table Structure for 'states':\n")
printJSON(db_config['db_structures']['tables'][0])

## Initialize, Connect and Modify Database Structure

Let's see if our modifications have been applied, initialize the database, connect, write and print a test entry:

In [None]:
db_connection = initialize_db(custom_cfg = db_config)

new_data = {
    "concept_recid": "123456",
    "recid": "123457",
    "edit_mode_enabled": False,
    "draft_mode_enabled": True,
    "access_right": "restricted"
}

if db_connection:
    print(f"Database created: {db_config['local_db_path']}")
    if not get_row(db_connection, "records", "concept_recid", "123456"):
        success_0 = upsert_data(
            connection=db_connection, 
            table_name="records", 
            data={"concept_recid": "123456", "recid": "123456"}, 
            primary_key="concept_recid", 
            primary_key_value="123456"
        )
        
        success_1 = upsert_data(
            connection=db_connection, 
            table_name="states", 
            data=new_data, 
            primary_key="concept_recid", 
            primary_key_value="123456"
        )
        
        print("\nUpsert operation successful")

    if success_0 and success_1:
        print("\nrecords:")
        print_table(db_connection, "records", "123456")
        print("\nstates:")
        print_table(db_connection, "states", "123456")
    else:
        print("Upsert operation failed.")
        
db_connection.close()

## What's Next

We will continue to use configurations in order to understand and apply mappings in upcoming notebooks!
<br>This is especially useful for Excel file processing or mappings to various datamodels, like the Europeana Data Model (EDM).

<br>
By using YAML configuration files, you can easily customize your Zenodo upload workflows and associated processes without changing your core code. This approach provides flexibility and makes it simpler to manage different project settings.

<br>Remember to update your YAML files when you need to change settings, and always load the latest configurations in your scripts to ensure you're using the most up-to-date parameters.
