# Updating Zenodo Records and Communities

This notebook demonstrates advanced techniques for managing and updating Zenodo records and communities. We'll cover several key aspects of working with the Zenodo API and our local database:

### 1. Creating a New Version and updating Record Metadata

- Modifying existing record information
- Handling HTML content in descriptions
- Incorporating file tables and changelogs

### 2. File Management

- Updating files in existing records
- Dealing with filename conflicts in record deposits
- Strategies for efficient file updates

### 3. Community Integration

- Adding existing records to Zenodo communities
- Updating community metadata
- Managing record visibility within communities

### 4. Local Database Synchronization

- Reflecting Zenodo updates in our local SQLite database
- Ensuring consistency between Zenodo and local data
- Optimizing database operations for performance

### 5. HTML Content Handling

- Safely incorporating HTML in record descriptions
- Creating and updating file tables dynamically
- Maintaining changelogs with HTML formatting

### 6. Error Handling and Edge Cases

- Addressing API rate limits and timeouts
- Handling partial updates and rollbacks
- Ensuring data integrity across operations

Throughout this notebook, we'll use practical examples to illustrate these concepts. By the end, you'll have a comprehensive understanding of how to effectively manage and update your Zenodo records and communities while maintaining a synchronized local database.

Let's begin by setting up our environment and initializing our connections to both Zenodo and our local database.


In [None]:
import copy
from datetime import date
import os
from pathlib import Path

os.chdir(Path().absolute().parent) if Path().absolute().name == "Tutorials" else None
from db_tools import clear_operations_by_status, initialize_db, print_table
from main_functions import create_new_version, discard_draft, publish_record, update_metadata, upload_files_into_deposition
from utilities import increment_version, load_config, load_json, printJSON, validate_zenodo_metadata


# Initialize Database Connection to track and update Operations
db_config = load_config("Configs/db_config.yaml")
db_path = "Tutorials/sandbox.db"
db_config["local_db_path"] = db_path
db_connection = initialize_db(db_config)

if db_connection:
    print(f"Database connection initialized successfully at {db_path}.")
else:
    print("Failed to initialize database.")
    

# Load recently published record response
data = load_json("Tutorials/Output/sandbox_published.json")[-1]
if data:
    print(f"Successfully loaded Zenodo response data with ConceptRecordID {data['conceptrecid']} and RecordID {data['id']} (Version = {data['metadata']['version']})")

## Create new Record Version

We use the links provided in the response data to create a new version and update this operation. Just provide the response data, as the function will handle the rest, e.g. finding the correct link or discarding pending drafts:

In [None]:
new_version_msg, new_version_data = create_new_version(data, db_connection=db_connection)
if new_version_msg["success"]:
    print(f"Successfully created a new Version with RecordID {new_version_data['id']}: {new_version_data['links']['html']}\n")
    printJSON(new_version_data)
    
    print_table(db_connection, "operations", new_version_data["conceptrecid"])
else:
    print(f"Something went wrong, check Errors messages: {new_version_msg}")

As you can see, the state is 'unsubmitted' and you have received a new Record ID, but the Concept Record ID remained the same.
#### (optional) Discard Version Draft
The function `create_new_version()` handles discarding existing drafts, if the flag `discard_existing_drafts` is not set to `False`, as it is not possible to create new versions while a draft is pending. But if you want to manually discard this version, you can use the `discard_draft` function:

In [None]:
discard_link = new_version_data["links"]["discard"]
discard_msg, discard_data = discard_draft(discard_link, "", db_connection, new_version_data)
if discard_msg["success"]:
    print("Successfully discarded Draft!\n")
    print(f"This page should not be available anymore: {new_version_data['links']['html']}")
    print_table(db_connection, "operations", new_version_data["conceptrecid"])
else:
    print(f"Error while discarding draft: {discard_msg}")

## Update Metadata
Now after creating a new version again, we can proceed with updating the Metadata, based on the previous Metadata, and validate it before pushing:

In [None]:
# Create a new version again, after discarding it
new_version_msg, new_version_data = create_new_version(data, db_connection=db_connection) # data still represents the latest published record, so it is still valid for this operation
if new_version_msg["success"]:
    print(f"Successfully created a new Version with RecordID {new_version_data['id']}: {new_version_data['links']['html']}\n")
    print_table(db_connection, "operations", new_version_data["conceptrecid"])
else:
    print(f"Something went wrong, check Errors messages: {new_version_msg}")


# Copy recent metadata and update it
new_metadata = {"metadata": copy.deepcopy(new_version_data["metadata"])}
new_metadata["metadata"]["version"] = "0.0.2"
new_metadata["metadata"]["description"] = "This is Version N of the Test Dataset."
new_metadata["metadata"]["publication_date"] = date.today().strftime("%Y-%m-%d")

print("Validating Metadata...")
validation_errors = validate_zenodo_metadata(new_metadata)

if validation_errors:
    print("\nValidation errors:")
    for error in validation_errors:
        print(f"- {error}")
else:
    print("\nNo validation errors found.")
    

# Push updated Metadata to the draft of the new record version
update_msg, update_data = update_metadata(new_version_data, new_metadata, db_connection=db_connection)
if update_msg["success"]:
    print(f"New Metadata pushed to pending Draft Version with RecordID {update_data['id']}: {update_data['links']['html']}\n")
    printJSON(update_data)
    print_table(db_connection, "operations", update_data["conceptrecid"])
else:
    print(f"Something went wrong, check Errors messages: {update_msg}")

<small>
Note: If you have created a Draft and did not discard it, you will see the statement 'Discard completed', coming from the automated version discarding when creating a new version.
</small>

### Add and Replace Files

As files with the same filename are not allowed, the function `upload_files_into_deposition` handles the deletion of already existing ones automatically, if the flag `replace_existing` is set to `True`. Just providing a list of filepaths is sufficient.
<br>Let's try to add some 3D Model files and one already existing image file:

In [None]:
filepaths = ["Tutorials/3DModels/test_model.obj", "Tutorials/3DModels/test_model.mtl", "Tutorials/Images/test_image_2.png"]
fileupload_msg, fileupload_data = upload_files_into_deposition(new_version_data, filepaths, replace_existing=True, db_connection=db_connection)

print("\nResponse of Fileupload to Zenodo Sandbox:")
printJSON(fileupload_data)

if fileupload_msg["success"] and fileupload_data:
    print("\nFiles successfully uploaded!")
    [print(f"\nDirect Link to {i['filename']}: {i['links']['download'].replace('/files', '/draft/files')}") for i in fileupload_data]
    print_table(db_connection, "operations", update_data["conceptrecid"])
else:
    print("\nFailed to upload Files. Please check the error message above or in fileupload_msg['text']:")
    print(fileupload_msg["text"])

### Add Filetables and Changelogs to Description

It is allowed to use ([limited](https://github.com/zenodo/zenodo/blob/master/zenodo/modules/records/serializers/fields/html.py#L33)) HTML in the descriptions, including **tables**, which enables implementing them with direct links to the latest versions of recently uploaded files.
<br>For **3D Models**, we can additionally upload **Thumbnails** in different resolutions, in order to satisfy various use cases. **Changelogs** contain all versions as href links to the persistent record versions.

In [None]:
from datetime import datetime
from utilities import update_description
latest_data = update_data # always proceed with the most recent data

new_metadata = {"metadata": copy.deepcopy(latest_data["metadata"])}
new_metadata["metadata"]["publication_date"] = datetime.now().strftime("%Y-%m-%d") # do not forget to set the current date of publication
new_version = increment_version(latest_data["metadata"]["version"], 1) # this sets 0.0.1 to 0.0.2; modify the second parameter to define the incremental level
new_metadata["metadata"]["version"] = new_version

changelog = f"Testing the Changelog Functionality of Version {new_version}."
new_description = update_description(latest_data, fileupload_data, new_version, changelog)
new_metadata["metadata"]["description"] = new_description

metadata_msg, metadata_data = update_metadata(latest_data, new_metadata, db_connection=db_connection)
if metadata_msg["success"]:
    print(f'Successfully pushed new Metadata: {metadata_data["links"]["html"]}')
    print_table(db_connection, "operations", update_data["conceptrecid"])
else:
    print("\nSomething went wrong. Please check the error messages:")
    print(metadata_msg["text"])


Following the given link, you should be able to click on Preview on the right side of your draft and see the new description, Changelog including a link to the current version, and tables with "Main Files" and "Thumbnails".
<br>**Thumbnails** are automatically sorted by suffixes like `..._perspective_1.png`, `..._perspective4_512x512.png` etc., as defined in the 3D thumbnail rendering function. This sorting behaviour is configurable.

## Publish New Version

After updating the Metadata and Files in the new version's draft, we can finally publish the new Version:

In [None]:
# set additional data for DB updates
additional_data = {
    "type": "image",
    "subset": "project_sandbox",
    "changelogs": {},
    "filedata": fileupload_data
}

publish_msg, publish_data = publish_record(metadata_data, db_connection, additional_data) # remember to always use the most recent response data, metadata_data in this case

if publish_msg["success"]:
    print("Record successfully published!")
    print(f"DOI: {publish_data['doi']}")
    print(f"Record URL: {publish_data['links']['record_html']}")
    print_table(db_connection, "operations", update_data["conceptrecid"])
else:
    print("Failed to publish record. Error message:")
    print(publish_msg["text"])

## All-in-One Function for Easy Updates

Above steps were shown to explain what is happening behind the function `update_record` and the processes behind updating a Zenodo record in general.
<br>So, if you easily want to let it handle these processes automatically, use this logic, which does the following:

- Create New Record
- Upload Files
- Update Metadata, including the Version + Description with Filetables and Changelogs
- Publish New Version and Write to DB

In [None]:
from main_functions import identify_latest_record, retrieve_by_concept_recid, update_record
from utilities import increment_version

filepaths = ["Tutorials/3DModels/test_model.obj", "Tutorials/3DModels/test_model.mtl", "Tutorials/Images/test_image_2.png", "Tutorials/Thumbnails/test_model_perspective_4_512x512.png", "Tutorials/Thumbnails/test_model_perspective_1.png"]
# set additional data for DB updates
additional_data = {
    "type": "image",
    "subset": "project_sandbox",
    "changelogs": {},
}

# use this to retrieve the latest published version from Zenodo
# it is more safe, but for large operations, use the responses table in the local db to minimize queries / maximize rate efficiency
retrieval_msg, retrieval_data = retrieve_by_concept_recid(data["conceptrecid"], all_versions=False)
if retrieval_msg["success"] and retrieval_data:
    print(f"Latest Record Data retrieved from Zenodo for ConceptRecordID {data['conceptrecid']}.")
    latest_msg, latest_published_data = identify_latest_record(retrieval_data)
    if not latest_msg["success"]:
        print(f"Could not identify latest Record: {latest_msg['text']}")
else:
    print(f"Could not retrieve latest Record Data for ConceptRecordID {data['conceptrecid']}")

# Write a new Description and a Changelog Text
new_description = f"This is an even newer description of Version {increment_version(latest_published_data['metadata']['version'], 1)}"
changelog_text = f"Uploaded/Updated Files: {' | '.join([Path(i).name for i in filepaths])}"

# Perform Update Processes using the update_record() function
update_msg, update_data = update_record(latest_data=latest_published_data, filepaths=filepaths, replace_existing_files=True, replace_description=new_description, 
                                        changelog=changelog_text, debug_mode=False, 
                                        db_connection=db_connection, additional_data=additional_data)
if update_msg["success"]:
    print(f"Successfully updated ConceptRecordID {update_data['conceptrecid']} to Version {update_data['metadata']['version']}: {update_data['links']['html']}\n")
    printJSON(update_data)
    print_table(db_connection, "operations", update_data["conceptrecid"])
else:
    print(f"Something went wrong, check Errors messages: {update_msg}")

## Update Database and Close the Database Connection

<small>

<u>Note</u>:
The column `all_recids` might not contain the `recid` of the first version in this example, as it was published in Notebook #02 without a connected DB.
<br>This is intentional, as the integrity tools — used to ensure convergence between responses and databases — will be demonstrated in an upcoming Notebook.

</small>

In [None]:
for table_name in [list(i.keys())[0] for i in db_config["db_structures"]["tables"]]:
    print(f"{table_name}:")
    print_table(db_connection, table_name)

db_operation = clear_operations_by_status(db_connection, ["discarded", "published"])
print(f"Clear Operations Table: {db_operation}")
print_table(db_connection, "operations", update_data["conceptrecid"])

db_connection.close()
print("Database connection closed.")
