## üß≠ Argo FormatChecker Notebook (AMRIT Consortium)

This notebook allows you to use the **Argo Format Checker** provided by  **AMRIT** to validate Argo NetCDF files.  
The FormatChecker performs both **format** and **content** checks on Argo NetCDF files to ensure compliance with the Argo data standards.

üìò **References:**
- The Argo NetCDF format is defined in the [Argo User‚Äôs Manual](http://dx.doi.org/10.13155/29825).  
- More details and documentation are available on the [Argo Data Management website](https://www.argodatamgt.org/Documentation).


## The main steps to run the checker

1. Setup 
    - before running this notebook, install dependencies with:
    ```bash
    pip install -r notebooks/requirements.txt
    ```
2. Please check and fix any Configurations
    - Check the API URL and DAC are correct.
        - Set the API_BASE_URL to where the File Checker API is running.
            - If you are running the File Checker locally inside Docker, set it to the address where the 
                Docker container is exposing the API `http://localhost:8000`
            - If you are running the File Checker in a Kubernetes cluster, use the cluster
                URL where the service is exposed. For example:
                `https://livkrakentst.clusters.bodc.me/ewetchy/amrit/argo-toolbox/api/file-checker`
        - Only one of the URLs should be uncommented based on your environment.
 
2. Run the complete notebook
    - This will
        - Import necessary packages.
        - Sets the config and DAC.
        - Checks the connectivity to the API.
5. Checking files
    - <p>If you need to check few selected files , select them by clicking on the <b>Upload</b> button.</p>
    - The 'samples' folder has a few files that can used for testing.
6. Optionally run the checker on all files of a deployment.
    - <p>If you want to check a whole deployment, select the path of the files and select <b> Run Deployment Check </b>.</p>
    - The 'profiles' sub directory for the sample float within 'samples' folder has a subset of the full deployment files to test.
7. Results are saved to a csv file if the path is configured.Results are also shown on the console.

In [None]:
# ===============================
# 1Ô∏è‚É£ Importing necessary packages
# ===============================


import tempfile
from pathlib import Path
from typing import Any

import ipywidgets as widgets
import pandas as pd
import requests
from IPython.display import Markdown, display
from ipywidgets import HTML, FileUpload, Output, VBox

## Check the configurations an do any changes accordingly.

In [None]:
# ===============================
# 2Ô∏è‚É£ Configuration
# ===============================

# Local Docker instance
API_BASE_URL="http://localhost:8000"

# Kubernetes test instance
# API_BASE_URL= "https://livkrakentst.clusters.bodc.me/ewetchy/amrit/argo-toolbox/api/file-checker"

# Default DAC for validation
DEFAULT_DAC = "bodc"

# Mount location of all the deployment files
# if you are running the API locally via Docker, ensure this path matches the volume mount in your Docker setup
# for example docker  run --rm --name argo-file-checker2 -p 8000:8000 argo-file-checker

# Result file for full deployment checks
deployments_files_check_result_file="results/deployment_files_check_result.csv"

# Endpoints
CHECK_FILE_ENDPOINT = "/check-files"
FULL_DEPLOYMENT_CHECK_ENDPOINT = "/check-deployment"

# Request settings
TIMEOUT = 30  # API request timeout in seconds
HEADERS = {
    "accept": "application/json",
}

# Full URL for file check endpoint
FILE_CHECK_URL = f"{API_BASE_URL}/{CHECK_FILE_ENDPOINT}"

print(f"üîó API Base URL: {API_BASE_URL}")
print(f"üèõÔ∏è DAC: {DEFAULT_DAC}")

## Check the API connections.

In [None]:
# ===============================
#  3Ô∏è‚É£ Check the API connection
# ===============================
try:
    response = requests.get(f"{API_BASE_URL}/", timeout=5)
    response.raise_for_status()
    print("‚úÖ API is reachable.")
except:
    print("‚ùå API is not reachable.")


## Check a file using the API


In [None]:
def on_upload_change(_change: dict[str, object]) -> None:
    """To handle file upload and trigger file check."""
    with out:
        out.clear_output()
        file_paths = []
        with tempfile.TemporaryDirectory() as tmp_dir:
            # Save uploaded files to temporary directory
            for fileinfo in upload.value:
                fname = fileinfo["name"]
                tmp_path = Path(tmp_dir) / fname
                with Path(tmp_path).open("wb") as f:
                    f.write(fileinfo["content"])
                file_paths.append(tmp_path)


            files_data = []
            spinner = widgets.HTML("Preparing files to check <i class='fa fa-spinner fa-spin'></i>")
            display(spinner)

            for file_path in file_paths:
                if not Path(file_path).exists():
                    print(f" üö® File not found: {file_path}")
                    continue
                try:
                    file_obj = Path(file_path).open("rb")
                    files_data.append(("files", (Path(file_path).name, file_obj, "application/x-netcdf")))
                except Exception:
                    print(f"üìÅ Error opening :{Path(file_path)}")
                    continue
            spinner.value = ""
            # Proceed to send files for checking
            params = {"dac": DEFAULT_DAC}
            spinner = widgets.HTML("Sending files to check <i class='fa fa-spinner fa-spin'></i>")
            display(spinner)
            # Send files to API
            try:
                response = requests.post(
                    FILE_CHECK_URL,
                    files=files_data,
                    params=params,
                    headers=HEADERS,
                    timeout=TIMEOUT,
                )
            except Exception as e:
                print(f"üö® Error {e}")

            else:
                for _, (_, file_obj, _) in files_data:
                    file_obj.close()
                try:
                    checked_result = response.json()['results']
                    for r in checked_result:
                        r["errors_messages"] = "\n".join(r.get("errors_messages", []))
                        r["warnings_messages"] = "\n".join(r.get("warnings_messages", []))
                    df = pd.DataFrame(checked_result, columns=[
                        "file",
                        "result",
                        "phase",
                        "errors_number",
                        "warnings_number",
                        "errors_messages",
                        "warnings_messages",
                    ])

                    # Display the results
                    with pd.option_context(
                        'display.max_columns', None,
                        'display.width', None,
                        'display.max_colwidth', None
                        ):
                        display(df)
                    # Optional: Save to CSV
                    if deployments_files_check_result_file:
                        df.to_csv(deployments_files_check_result_file, index=False, encoding="utf-8")
                        print(f"‚úÖ Results saved to:  {deployments_files_check_result_file}")

                except Exception as e:
                    print(f"üö® Error {e}")


            finally:
                # This block runs no matter what, ensuring file handles are closed
                spinner.value = ""
                for _, (_, file_obj, _) in files_data:
                        file_obj.close()
                display(Markdown("**‚úÖ Check complete!**"))

# ===============================
print("Upload the file or files to check by clicking on the 'upload' button")
upload = FileUpload(accept=".nc", multiple=True)
out = Output()
upload.observe(on_upload_change, names="value")
display(VBox([upload, out]))


## [Optional] Check all the files of a deployment.

In [None]:
# ===============================
# 4Ô∏è‚É£ Deployment folder input and run button
# ===============================
def on_run_button_clicked(_b:dict[str, object]) -> None:
    """Handle deployment check on button click."""
    with output:
        output.clear_output(wait=True)
        deployment_folder = deployment_folder_widget.value.strip()
        if not Path(deployment_folder).exists():
            print(f"üìÅ Folder does not exist: { deployment_folder}")
            return

        # Count number of files in the folder
        file_paths = [str(f) for f in Path(deployment_folder).iterdir() if f.is_file()]
        num_files = len(file_paths)

        if num_files == 0:
            print(f"No files found in deployment folder: {deployment_folder}" )
        else:
            print(f"‚úîÔ∏è Number of files found in deployment folder: {num_files}" )
            files_data = []
            spinner = widgets.HTML("Preparing files to check <i class='fa fa-spinner fa-spin'></i>")
            display(spinner)

            for file_path in file_paths:
                try:
                    file_obj = Path(file_path).open("rb")
                    files_data.append(("files", (Path(file_path).name, file_obj, "application/x-netcdf")))
                except Exception:
                    print(f"üìÅ Error opening :{ Path(file_path)}")
                    continue

            spinner.value = ""
            # Proceed to send files for checking
            params = {"dac": DEFAULT_DAC}
            spinner = widgets.HTML("Sending files to check <i class='fa fa-spinner fa-spin'></i>")
            display(spinner)
            # Send files to API
            try:
                response = requests.post(
                    FILE_CHECK_URL,
                    files=files_data,
                    params=params,
                    headers=HEADERS,
                    timeout=TIMEOUT,
                )
            except Exception as e:
                print(f"üö® Error {e}")

            else:
                for _, (_, file_obj, _) in files_data:
                    file_obj.close()
                try:
                    checked_result = response.json()['results']

                    for r in checked_result:
                        r["errors_messages"] = "\n".join(r.get("errors_messages", []))
                        r["warnings_messages"] = "\n".join(r.get("warnings_messages", []))
                    df = pd.DataFrame(checked_result, columns=[
                        "file",
                        "result",
                        "phase",
                        "errors_number",
                        "warnings_number",
                        "errors_messages",
                        "warnings_messages",
                    ])

                    # Display the results
                    with pd.option_context(
                        'display.max_columns', None,
                        'display.width', None,
                        'display.max_colwidth', None
                        ):
                        display(df)
                    # Optional: Save to CSV
                    if deployments_files_check_result_file:
                        df.to_csv(deployments_files_check_result_file, index=False, encoding="utf-8")
                        print(f"‚úÖ Results saved to:  {deployments_files_check_result_file}")

                except Exception as e:
                    print(f"üö® Error {e}")


            finally:
                # This block runs no matter what, ensuring file handles are closed
                spinner.value = ""
                for _, (_, file_obj, _) in files_data:
                        file_obj.close()
                display(Markdown("**‚úÖ Check complete!**"))

# ===============================
deployment_folder_widget = widgets.Text(
        value="/path/to/your/data", #/path/to/your/data # default value
        placeholder="Enter path to deployment folder",
        description="Enter path to deployment folder:",
        layout=widgets.Layout(width="90%"),
        style={"description_width": "200px"},
    )
run_button = widgets.Button(description="Run complete deployment Check", button_style="success")
output = widgets.Output()
# Link button to handler
run_button.on_click(on_run_button_clicked)
display(deployment_folder_widget, run_button, output)
