## üß≠ Argo FormatChecker Notebook (AMRIT Consortium)

This notebook allows you to use the **Argo Format Checker** provided by  **AMRIT** to validate Argo NetCDF files.  
The FormatChecker performs both **format** and **content** checks on Argo NetCDF files to ensure compliance with the Argo data standards.

üìò **References:**
- The Argo NetCDF format is defined in the [Argo User‚Äôs Manual](http://dx.doi.org/10.13155/29825).  
- More details and documentation are available on the [Argo Data Management website](https://www.argodatamgt.org/Documentation).


In [None]:
!pip install ipywidgets

## The main steps to run the checker

1. Run the cells which import necessary packages.
2. Configure the API URL and DAC.
   > Run the cells which define the various helper functions.
3. Run the health check to verify connectivity.
4. Upload .json or .nc files using the upload button.
5. Optionally run the checker on all files of a deployment.
5. Review results in the table or the output below.

In [2]:
# ===============================
# 1Ô∏è‚É£ Importing necessary packages
# ===============================

import requests
import ipywidgets as widgets
import json
import os
from pathlib import Path
import pandas as pd
import time
from ipywidgets import FileUpload, Button, VBox, Output
from IPython.display import display, HTML
import logging
from pathlib import Path
from typing import Any
logger = logging.getLogger("filechecker")
logger.setLevel(logging.INFO)

if logger.hasHandlers():
    logger.handlers.clear()
ch = logging.StreamHandler()
ch.setLevel(logging.INFO)

logger.addHandler(ch)
logger.propagate = False


# File Checker Environment Setup
===============================

1. Set the API_BASE_URL to where the File Checker API is running.
 - If you are running the File Checker locally inside Docker, set it to the address where the 
   Docker container is exposing the API `http://localhost:8000`
 - If you are running the File Checker in a Kubernetes cluster, use the cluster
   URL where the service is exposed. For example:
   `https://livkrakentst.clusters.bodc.me/ewetchy/amrit/argo-toolbox/api/file-checker`
 - Only one of the URLs should be uncommented based on your environment.
2. Set the default DAC for validation.
3. Do NOT run other cells until you configure this.

In [3]:

# ===============================
# 2Ô∏è‚É£ Configuration
# ===============================
# Local Docker instance
API_BASE_URL="http://localhost:8000"

# Kubernetes test instance
# API_BASE_URL= "https://livkrakentst.clusters.bodc.me/ewetchy/amrit/argo-toolbox/api/file-checker"

# Default DAC for validation
DEFAULT_DAC = "bodc"

# Mount location of all the deployment files
# if you are running the API locally via Docker, ensure this path matches the volume mount in your Docker setup
# for example docker  run --rm --name argo-file-checker2 -p 8000:8000 argo-file-checker

# Result file for full deployment checks
deployments_files_check_result_file="deployment_files_check_result.csv"

# Endpoints
CHECK_FILE_ENDPOINT = "/check-files"
FULL_DEPLOYMENT_CHECK_ENDPOINT = "/check-deployment"
HEALTH_ENDPOINT = "/"

# Request settings
TIMEOUT = 30  # API request timeout in seconds
HEADERS = {
    "accept": "application/json"
}

# Full URL for file check endpoint
FILE_CHECK_URL = f"{API_BASE_URL}/{CHECK_FILE_ENDPOINT}"

logger.info("üîó API Base URL:%s",  API_BASE_URL)
logger.info("üèõÔ∏è DAC: %s" ,DEFAULT_DAC)

üîó API Base URL:http://localhost:8000
üèõÔ∏è DAC: bodc


In [4]:
# ============================================
#  3Ô∏è‚É£ function to test config and api
# ============================================
def test_api_connection() -> bool:
    """Test if the API is accessible."""
    logger.info("\nüîç Testing API Connection..")
    logger.info("-" * 30)

    try:
        response = requests.get(f"{API_BASE_URL}/", timeout=5)
        if response.status_code != 200:
            logger.info("API returned status code: %s",response.status_code)
            return False
        else:
            result = response.json()
            logger.info("API is accessible!")
            logger.info("Health Check Response: %s",result)
            return True



    except requests.exceptions.ConnectionError as error:
        logger.info("Could not connect to API. Is the container running?")
        logger.info(error)
        return False

    except requests.exceptions.Timeout:
        logger.info("Connection timed out")
        return False

    except Exception as e:
        logger.exception("Unexpected error")
        return False


In [5]:
# ============================================
#  4Ô∏è‚É£ function to check the files
# ============================================
def file_check(file_paths:list[str], dac:str=DEFAULT_DAC)-> dict[str, Any]:
    """Filechecker to check nc files."""
    files_data = []

    for file_path in file_paths:
        path = Path(file_path)
        if not path.exists():
            logger.info("File not found: %s",file_path)
            continue

        # Determine content type based on file extension
        content_type = {
            ".json": "application/json",
            ".nc": "application/octet-stream",
        }.get(path.suffix, "application/octet-stream")

        try:
            file_obj = path.open("rb")
            files_data.append(("files", (path.name, file_obj, content_type)))
        except Exception as e:
            logger.exception("Error opening  %s",path)
            continue

    if not files_data:
        logger.info("No valid files to upload!")
        return {"success": False, "error": "no files to upload"}

    params = {"dac": dac}
    start_time = time.time()

    try:

        response = requests.post(
            FILE_CHECK_URL,
            files=files_data,
            params=params,
            headers=HEADERS,
            timeout=TIMEOUT,
        )
    except requests.exceptions.Timeout:
        logger.info("Request timed out after %s seconds", TIMEOUT )
        return {"success": False, "error": "timeout"}

    except requests.exceptions.RequestException as e:
        logger.info("Request error: {e}")
        return {"success": False, "error": str(e)}

    except Exception as e:
        logger.exception("Unexpected error")
        return {"success": False, "error": str(e)}
    else:
        end_time = time.time()
        processing_time = end_time - start_time


        for _, (_, file_obj, _) in files_data:
            file_obj.close()
        result = response.json() if response.status_code == 200 else response.text
        success = response.status_code == 200


        return {
            "success": success,
            "status_code": response.status_code,
            "processing_time": processing_time,
            "result": result,
        }


In [6]:
# ============================================
#  5Ô∏è‚É£ function to check the whole deployment
# ============================================
# A mount was created for a whole deployment and that folder is passed.
# Send POST request
def file_check_deployment(deployment_files_folder:str, dac:str=DEFAULT_DAC)-> dict[str, Any]:
    """Filechecker to check nc files for a whole deployment."""
    logger.info("\nüîç Checking full deployment in folder: %s", deployment_files_folder)
    # Check that folder exists
    if not Path(deployment_files_folder).exists() or not Path(deployment_files_folder).is_dir():
        logger.error("Deployment folder does not exist: %s", deployment_files_folder)
        return {"success": False, "error": "folder not found"}

    # Count number of files in the folder
    file_list = [str(f) for f in Path(deployment_files_folder).iterdir() if f.is_file()]
    num_files = len(file_list)
    logger.info("Number of files found in deployment folder: %d", num_files)

    if num_files == 0:
        return {"success": False, "error": "no files found in deployment folder"}

    return file_check(file_list, dac)


In [7]:
# ==========================================
#  5Ô∏è‚É£ function to display the results and save to CSV
# ============================================
def show_result(result: dict, save_path: str | None = None) -> None:
    """Nicely display the result dict in Jupyter."""
    if not isinstance(result, dict):
        logger.info(result)
        return

    results = result.get("result", {}).get("results", [])
    if not results:
        logger.info("No results found.")
        return

    for r in results:
        r["errors_messages"] = "\n".join(r.get("errors_messages", []))
        r["warnings_messages"] = "\n".join(r.get("warnings_messages", []))
    df = pd.DataFrame(results, columns=[
        "file",
        "result",
        "phase",
        "errors_number",
        "warnings_number",
        "errors_messages",
        "warnings_messages",
    ])

    # Optional: Save to CSV
    if save_path:
        df.to_csv(save_path, index=False, encoding="utf-8")
        logger.info("‚úÖ Results saved to %s",save_path)
   # CSS style for borders and wrapping
    styles = """
    <style>
        table {
            border: 1px solid black;
            border-collapse: collapse;
        }
        th, td {
            border: 1px solid black !important;
            padding: 5px;
            text-align: left;
            vertical-align: top;
            max-width: 400px;
            white-space: pre-wrap;
            word-wrap: break-word;
        }
        th {
            background-color: #f2f2f2;
        }
    </style>
    """

    html_table = df.to_html(escape=False).replace("\\n", "<br>")
    display(HTML(styles + html_table))


In [8]:
# Test 1: Check API connection
api_available = test_api_connection()

if not api_available:
    logger.info("\nAPI is not accessible. Please check:")
else:
    logger.info("\nüéâ API is ready for testing!")


üîç Testing API Connection..
------------------------------
API is accessible!
Health Check Response: {'status': 'OK'}

üéâ API is ready for testing!


<h3>üìù Instructions</h3>
<p>Click the <b>Upload</b> button below and select the files you want to check.</p>
<p>You can select multiple files from your computer.</p>

In [9]:
# 2. Upload file(s) to test
upload = FileUpload(accept='.json,.nc', multiple=True)
out = Output()

def on_upload_change(change):
    file_paths = []
    for fileinfo in upload.value:
        fname = fileinfo["name"]
        tmp_path = f"/tmp/{fname}"
        with open(tmp_path, "wb") as f:
            f.write(fileinfo["content"])
        file_paths.append(tmp_path)

    with out:
        out.clear_output()
        logger.info("Selected files: %s",file_paths)
        res=file_check(file_paths)
        show_result(res, save_path="file_check_results.csv")

upload.observe(on_upload_change, names="value")

display(VBox([upload, out]))

VBox(children=(FileUpload(value=(), accept='.json,.nc', description='Upload', multiple=True), Output()))

## Next to run the file checker on a whole deployment

In [10]:
# Create a text input widget for the deployment folder

deployment_folder_widget = widgets.Text(
        value="//wsl.localhost/Ubuntu/home/vidkri/argo_mount",  # default value
        placeholder="Enter path to deployment folder",
        description="Deployment Folder:",
        layout=widgets.Layout(width="90%"),
    )

# Create a button to trigger the check

run_button = widgets.Button(description="Run Deployment Check", button_style="success")

# Define button click handler
def on_run_button_clicked(b):
    with output:
        output.clear_output(wait=True)
        deployment_folder = deployment_folder_widget.value.strip()
        if not Path(deployment_folder).exists():
            print(f"Folder does not exist: {deployment_folder}")
            return
        # Run your deployment check
        result = file_check_deployment(deployment_folder, DEFAULT_DAC)
        logger.info("\nüìÇ Full deployment check result:%s", result)
        show_result(result, deployments_files_check_result_file)
# Output area
output = widgets.Output()


# Link button to handler
run_button.on_click(on_run_button_clicked)

# Display widgets
display(deployment_folder_widget, run_button, output)

Text(value='//wsl.localhost/Ubuntu/home/vidkri/argo_mount', description='Deployment Folder:', layout=Layout(wi‚Ä¶

Button(button_style='success', description='Run Deployment Check', style=ButtonStyle())

Output()