# ReproLab Demo

Welcome to ReproLab! This extension helps you make your research more reproducible.

## Features

- **Create Experiments**: Automatically save immutable snapshots of your code under `git` tags to preserve the **exact code and outputs**
- **Manage Dependencies**: Automatically gather and pin **exact package versions**, so that others can set up your environment with one command
- **Cache Data**: Call external API/load manually dataset only once, caching function will handle the rest
- **Archive Data**: Caching function can also preserve the compressed data in *AWS S3*, so you always know what data was used and reduce the API calls
- **Publishing guide**: The reproducibility checklist & automated generation of reproducability package make publishing to platforms such as Zenodo very easy

## Getting Started

1. Use the sidebar to view ReproLab features
2. Create virtual environment and pin your dependencies, go to reprolab section `Create reproducible environment` 
3. Create an experiment to save your current state, go to reprolab section `Create experiment`
4. Archive your data for long-term storage, go to reprolab section `Demo` and play around with it.
5. Publish your work when ready, remember to use reproducability checklist from the section `Reproducibility Checklist`

## Example Usage of persistio decorator

To cache and archive the datasets you use, both from local files and APIs we developed a simple decorator that put over your function that gets the datasets caches the file both locally and in the cloud so that the dataset you use is archived and the number of calls to external APIs is minimal and you don't need to keep the file around after you run it once.

Here is an example using one of NASA open APIs. If you want to test it out yourself, you can copy the code, but you need to provide bucket name and access and secret key in the left-hand panel using the `AWS S3 Configuration` section.

```python
import requests
import pandas as pd
from io import StringIO

# The two lines below is all that you need to add
from reprolab.experiment import persistio
@persistio()
def get_exoplanets_data_from_nasa():
    url = "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"

    query = """
    SELECT TOP 10
        pl_name AS planet_name,
        hostname AS host_star,
        pl_orbper AS orbital_period_days,
        pl_rade AS planet_radius_earth,
        disc_year AS discovery_year
    FROM
        ps
    WHERE
        default_flag = 1
    """

    params = {
        "query": query,
        "format": "csv"
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        df = pd.read_csv(StringIO(response.text))
        
        print(df)
        
    else:
        print(f"Error: {response.status_code} - {response.text}")
    return df

exoplanets_data = get_exoplanets_data_from_nasa()
```

If you run this cell twice you will notice from the logs that the second time file was read from the compressed file in the cache. If you were to lose access to local cache (e.g. by pulling the repository using different device) `persistio` would fetch the data from the cloud archive.


For more information, visit our [documentation](https://github.com/your-repo/reprolab).

In [11]:
import requests
import pandas as pd
from io import StringIO

# The two lines below is all that you need to add
from reprolab.experiment import persistio
@persistio()
def get_exoplanets_data_from_nasa():
    url = "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"

    query = """
    SELECT TOP 10
        pl_name AS planet_name,
        hostname AS host_star,
        pl_orbper AS orbital_period_days,
        pl_rade AS planet_radius_earth,
        disc_year AS discovery_year
    FROM
        ps
    WHERE
        default_flag = 1
    """

    params = {
        "query": query,
        "format": "csv"
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        df = pd.read_csv(StringIO(response.text))
        
        print(df)
        
    else:
        print(f"Error: {response.status_code} - {response.text}")
    return df

exoplanets_data = get_exoplanets_data_from_nasa()
exoplanets_data

Unnamed: 0,planet_name,host_star,orbital_period_days,planet_radius_earth,discovery_year
0,Kepler-6 b,Kepler-6,3.2347,14.616536,2009
1,Kepler-491 b,Kepler-491,4.225385,8.92,2016
2,Kepler-29 c,Kepler-29,13.28613,2.34,2011
3,Kepler-257 b,Kepler-257,2.382667,2.61,2014
4,Kepler-216 b,Kepler-216,7.693641,2.35,2014
5,Kepler-32 c,Kepler-32,8.7522,2.0,2011
6,Kepler-259 c,Kepler-259,36.924931,2.7,2014
7,Kepler-148 c,Kepler-148,4.180043,3.6,2014
8,Kepler-222 d,Kepler-222,28.081912,3.69,2014
9,Kepler-179 b,Kepler-179,2.735926,1.64,2014


In [12]:
print(1*4)

4


In [13]:
!pip install pandas



In [14]:
# DO NOT CONTAINERISE
# parametrization - v95
# Run specific - fill in text to preset, keep empty to prompt
import os


param_radar = "HRW"  # DHL | HRW | DBL
param_start_date = (
    "2019-12-31T23:00+00:00"  # %Y%m%dT%H:%M+TZ; 2019-12-31T23:00+00:00
)
param_end_date = (
    "2020-01-01T01:00+00:00"  # %Y%m%dT%H:%M+TZ; 2020-01-01T01:00+00:00
)
param_concurrency = 5
param_interval_in_minutes = 60
# This is to control uploads / cleaning
# Move results to S3?
param_upload_results = "True"
# Store and retrieve data from the public S3 / MinIO bucket
param_public_minio_data = 0
# Remove input after processing KNMI format to ODIM format
param_clean_knmi_input = "True"
# Should we remove the final Polar Volumes after producing a VP and or RBC
param_clean_pvol_output = "True"
#
param_clean_vp_output = "True"
# The maximum number of timepoints to download and create vertical profiles and polar volumes from
param_maximum_KNMI_files = 4

# Param
### User specific, not neccesarily run specific.
#### Update: Perhaps some of these userinfo should be hardcoded parameters. I mean
#### Not something we'd enter every single time but should be set at setup time.
param_user_number = "001"

#### Visualization parameters
param_dtype = "pvol"  # pvol | vp
param_country = "NL"  # only NL
param_year = "2023"  # as string, YYYY
param_month = "12"  # as string, mm
param_day = "31"  # as string, dd

# Perhaps param prefix works better
param_prefix = "NL/DHL/2023/12/31"

# parameters
param_elevation = 3  #
param_param = "VRADH"  # I know...VRADH, DBZH, TH, WRADH, RHOHV, DBZ...
param_imtype = "ppi"  # Likely only type so far. PVOL -> PPI, VP -> VPTS. Future there would be more.

In [15]:
# DO NOT CONTAINERISE
# Get secrets, if they dont exist, set them
from SecretsProvider import SecretsProvider
from getpass import getpass


secrets_provider = SecretsProvider(input_func=getpass)
secret_key_knmi_api = secrets_provider.get_secret("secret_knmi_api_key")
secret_minio_access_key = secrets_provider.get_secret(
    "secret_minio_access_key"
)
secret_minio_secret_key = secrets_provider.get_secret(
    "secret_minio_secret_key"
)

ModuleNotFoundError: No module named 'SecretsProvider'

In [None]:
# DO NOT CONTAINERIZE
# Use to re-set existing keys
from SecretsProvider import SecretsProvider
from getpass import getpass


secrets_provider = SecretsProvider(input_func=getpass)
secrets_provider.set_secret("secret_knmi_api_key")
secrets_provider.set_secret("secret_minio_access_key")
secrets_provider.set_secret("secret_minio_secret_key")

In [None]:
# DO NOT CONTAINERISE
# configuration - v95
import os
import pathlib

conf_minio_user_bucket_name = "naa-vre-user-data"  # the user bucket name
conf_minio_tutorial_prefix = "ravl-tutorial"
conf_minio_public_bucket_name = "naa-vre-public"  # the public bucket name
conf_minio_public_root_prefix = "vl-vol2bird"
conf_minio_public_conf_prefix = "vl-vol2bird/conf"
conf_minio_public_conf_radar_db_object_name = (
    "vl-vol2bird/conf/OPERA_RADARS_DB.json"
)
conf_minio_endpoint = "scruffy.lab.uvalight.net:9000"

### Directories
conf_local_root = "/tmp/data"
conf_local_knmi = "/tmp/data/knmi"
conf_local_odim = "/tmp/data/odim"
conf_local_vp = "/tmp/data/vp"
conf_local_ppi = "/tmp/data/ppi"
conf_local_vpts = "/tmp/data/vpts"
conf_local_conf = "/tmp/data/conf"
conf_local_radar_db = "/tmp/data/conf/OPERA_RADARS_DB.json"
conf_local_visualization_input = "/tmp/data/visualizations/input"
conf_local_visualization_output = "/tmp/data/visualizatons/output"

conf_pvol_output_prefix = "pvol"
conf_vp_output_prefix = "vp"
conf_ppi_output_prefix = "ppi"
conf_vpts_output_prefix = "vpts"
conf_user_directory = "user"

# radar configuration for the KNMI api
# Rewritten in a long format without page breaks. This is to prevent
# the code analyzer to yield an error.
# datasetName, datasetVersion, api_url, radar code (odim)
conf_herwijnen = [
    "radar_volume_full_herwijnen",
    1.0,
    "https://api.dataplatform.knmi.nl/open-data/v1/datasets/radar_volume_full_herwijnen/versions/1.0/files",
    "NL/HRW",
]
conf_denhelder = [
    "radar_volume_full_denhelder",
    2.0,
    "https://api.dataplatform.knmi.nl/open-data/v1/datasets/radar_volume_denhelder/versions/2.0/files",
    "NL/DHL",
]
conf_radars = {
    "hrw": conf_herwijnen,
    "herwijnen": conf_herwijnen,
    "dhl": conf_denhelder,
    "den helder": conf_denhelder,
}

In [None]:
# DO NOT CONTAINERISE - v60
# Set initial resource in minio, the conf in common
debug = False
from minio import Minio, S3Error
import os
import pathlib

minioClient = Minio(
    endpoint=conf_minio_endpoint,
    access_key=secret_minio_access_key,
    secret_key=secret_minio_secret_key,
    secure=True,
)
# Stat object to see if it is there. We could also just try a try except on fget
# However, canonically this seems better to stat
# Cast to str as stat object doesnt like posix
try:
    radar_db_stat = minioClient.stat_object(
        bucket_name=pathlib.Path(conf_minio_public_bucket_name).as_posix(),
        object_name=pathlib.Path(
            conf_minio_public_conf_radar_db_object_name
        ).as_posix(),
    )
    print(
        f"Reference file found [{pathlib.Path(conf_minio_public_bucket_name).as_posix()}]/{pathlib.Path(conf_minio_public_conf_radar_db_object_name).as_posix()}"
    )
    print(radar_db_stat)

except S3Error as e:
    print(
        f"Failed to find reference file [{pathlib.Path(conf_minio_public_bucket_name).as_posix()}]/{pathlib.Path(conf_minio_public_conf_radar_db_object_name).as_posix()}"
    )
    if debug:
        print(f"{e=}")

    file_stat = os.stat("/home/jovyan/data/conf/OPERA_RADARS_DB.json")
    with open(
        "/home/jovyan/data/conf/OPERA_RADARS_DB.json", mode="rb"
    ) as file_data:
        put_result = minioClient.put_object(
            bucket_name=pathlib.Path(conf_minio_public_bucket_name).as_posix(),
            object_name=pathlib.Path(
                conf_minio_public_conf_radar_db_object_name
            ).as_posix(),
            data=file_data,
            length=file_stat.st_size,
        )
    print(f"{put_result=}")
    # check put result if we indeed uploaded succesfully
    # if we dont try except we will lazily evaluate the result, as fail likely yields hard crash
    print("Succesfully uploaded reference file")

In [None]:
# initializer
import pathlib

# Make directories on shared (local) storage
for local_dir in [
    conf_local_root,
    conf_local_knmi,
    conf_local_odim,
    conf_local_vp,
    conf_local_conf,
]:
    local_dir = pathlib.Path(local_dir)
    if not local_dir.exists():
        local_dir.mkdir(parents=True, exist_ok=True)
# Reference files
if not pathlib.Path(conf_local_radar_db).exists():
    from minio import Minio, S3Error

    minioClient = Minio(
        endpoint=conf_minio_endpoint,
        access_key=secret_minio_access_key,
        secret_key=secret_minio_secret_key,
        secure=True,
    )
    print(f"{conf_local_radar_db} not found, downloading")
    minioClient.fget_object(
        bucket_name=conf_minio_public_bucket_name,
        object_name=conf_minio_public_conf_radar_db_object_name,
        file_path=conf_local_radar_db,
    )

# Now produce a variable which acts as a marker for the workflow manager
# We can then drag a line from the configuration / initializer
# and time the start of the rest of the workflow
# If you decide to make different sets of configurations, you can store them
# and decide per workflow which config to attach
init_complete = "Yes"  # Cant sent bool
print("Finished initialization")

In [None]:
# list-knmi-files
"""
consume dummy var from config to signal workflow start
There is something dodgy going on with how
strings are being passed around.
The string "Yes" is being sent as '"Yes"'
So, to prevent extra quotes being introduced
we eval init_complete first before
we test if it contains "Yes"
"""
# Libraries
import requests


def validate_api_errors():
    if api_response.status_code >= 400:
        raise ValueError(
            f"API {api_response.url} returned an error status code: {api_response.status_code}. {api_response.json()=}"
        )


def validate_number_of_KNMI_files():
    if len(dataset_files) > param_maximum_KNMI_files:
        raise ValueError(
            f"{len(dataset_files)} KNMI files were found to download, but {param_maximum_KNMI_files=}."
            f"\n The data was retrieved with the following parameters:"
            f"\n {param_start_date=} \n {param_end_date=} \n {param_interval_in_minutes=}"
            f"\n Increase {param_maximum_KNMI_files=}, decrease the time range, or increase the interval."
        )


# Strip any extra quotes
init_complete = init_complete.replace("'", "")
init_complete = init_complete.replace('"', "")
if init_complete == "Yes":
    print("Workflow configuration succesfull")
else:
    print("Workflow configuration was not complete, exitting")
    import sys

    sys.exit(1)

# Notes:
# Timestamps in iso8601
# 2020-01-01T00:00+00:00

# configure
start_ts = param_start_date
end_ts = param_end_date
datasetName, datasetVersion, api_url, _ = conf_radars.get(param_radar.lower())
params = {
    "datasetName": datasetName,
    "datasetVersion": datasetVersion,
    "maxKeys": 10,
    "sorting": "asc",
    "orderBy": "created",
    "begin": start_ts,
    "end": end_ts,
}
# Request a response from the KNMI severs
# Try the next page tokens
dataset_files = []
while True:
    api_response = requests.get(
        url=api_url,
        headers={"Authorization": secret_key_knmi_api},
        params=params,
    )
    validate_api_errors()

    api_reponse_json = api_response.json()
    dset_files = api_reponse_json.get("files")

    dset_files = [list(dset_file.values()) for dset_file in dset_files]
    dataset_files += dset_files
    nextPageToken = api_reponse_json.get("nextPageToken")
    if not nextPageToken:
        break
    else:
        params.update({"nextPageToken": nextPageToken})

# KNMI outputs per 5 minutes, per 15 is less of a heavy hit on downloads and processing
# Quick and dirty way to only keep the 15 minute measurements.
# Check API if we can filter for this on their end. If not fine
filtered_list = []
interval_list = list(range(0, 60, param_interval_in_minutes))
for dataset_file in dataset_files:
    minute = int(dataset_file[0].split("_")[-1].split(".")[0][-2:])
    if minute in interval_list:
        filtered_list.append(dataset_file)

dataset_files = filtered_list

validate_number_of_KNMI_files()

print(f"Found {len(dataset_files)} files")
print(dataset_files)

In [None]:
# Download-KNMI
##libraries
import requests
from pathlib import Path
import os

# Changes per 16-11-2023
# Test if we are working with a one element nested list
dataset_files
n_files = len(dataset_files)
print(f"Starting download of {n_files} files.")
_, _, api_url, radar_code = conf_radars.get(param_radar.lower())
knmi_pvol_paths = []
idx = 1
for dataset_file in dataset_files:
    filename = dataset_file[0]
    fname_parts = filename.split("_")
    fname_date_part = fname_parts[-1].split(".")[0]
    year = fname_date_part[0:4]
    month = fname_date_part[4:6]
    day = fname_date_part[6:8]
    p = Path(f"{conf_local_knmi}/{radar_code}/{year}/{month}/{day}/{filename}")
    knmi_pvol_paths.append("{}".format(str(p)))

    if not p.exists():
        print(f"Downloading file {idx}/{n_files}")
        endpoint = f"{api_url}/{filename}/url"
        get_file_response = requests.get(
            endpoint, headers={"Authorization": secret_key_knmi_api}
        )
        download_url = get_file_response.json().get("temporaryDownloadUrl")
        dataset_file_response = requests.get(download_url)
        p.parent.mkdir(parents=True, exist_ok=True)
        p.write_bytes(dataset_file_response.content)
    else:
        print(f"{p} already exists, skipping")
    idx += 1
print(knmi_pvol_paths)
print("Finished downloading files")

In [None]:
# KNMI-to-ODIM-converter
"""
notes:
Need to add this such that it can upload the PVOL From this stage
Need to add option such that this can remove the PVOL files from this stage.
Warning, with the removal of PVOL on this stage auto-bricks the VP / RBC gen
We can introduce a flag check where RBC and VP check if PVOL 'needed' to be removed
If that flag is met - abort, there 'shouldnt' be any INPUT files then.
"""
import subprocess
import pathlib
import h5py
import json
import sys
import shutil


# from typing import List, Object
import math


def str2bool(v):
    if isinstance(v, bool):
        return v
    if v.lower() in ("yes", "true", "t", "y", "1"):
        return True
    elif v.lower() in ("no", "false", "f", "n", "0"):
        return False
    else:
        raise Exception


class FileTranslatorFileTypeError(LookupError):
    """raise this when there's a filetype mismatch derived from h5 file"""


def load_radar_db(radar_db_path):
    """Load and return the radar database

    Output dict sample (wmo code is used as key):
    {
        11038: {'number': '1209', 'country': 'Austria', 'countryid': 'LOWM41', 'oldcountryid': 'OS41', 'wmocode': '11038', 'odimcode': 'atrau', 'location': 'Wien/Schwechat', 'status': '1', 'latitude': '48.074', 'longitude': '16.536', 'heightofstation': ' ', 'band': 'C', 'doppler': 'Y', 'polarization': 'D', 'maxrange': '224', 'startyear': '1978', 'heightantenna': '224', 'diametrantenna': ' ', 'beam': ' ', 'gain': ' ', 'frequency': '5.625', 'single_rrr': 'Y', 'composite_rrr': 'Y', 'wrwp': 'Y'},
        11052: {'number': '1210', 'country': 'Austria', 'countryid': 'LOWM43', 'oldcountryid': 'OS43', 'wmocode': '11052', 'odimcode': 'atfel', 'location': 'Salzburg/Feldkirchen', 'status': '1', 'latitude': '48.065', 'longitude': '13.062', 'heightofstation': ' ', 'band': 'C', 'doppler': 'Y', 'polarization': 'D', 'maxrange': '224', 'startyear': '1992', 'heightantenna': '581', 'diametrantenna': ' ', 'beam': ' ', 'gain': ' ', 'frequency': '5.6', 'single_rrr': 'Y', 'composite_rrr': ' ', 'wrwp': ' '},
        ...
    }
    """
    with open(radar_db_path, mode="r") as f:
        radar_db_json = json.load(f)
    radar_db = {}
    # Reorder list to a usable dict with sub dicts which we can search with wmo codes
    for radar_dict in radar_db_json:
        try:
            wmo_code = int(radar_dict.get("wmocode"))
            radar_db.update({wmo_code: radar_dict})
        except Exception:  # Happens when there is for ex. no wmo code.
            pass
    return radar_db


def translate_wmo_odim(radar_db, wmo_code):
    """ """
    if not isinstance(wmo_code, int):
        raise ValueError("Expecting a wmo_code [int]")
    else:
        pass
    odim_code = (
        radar_db.get(wmo_code).get("odimcode").upper().strip()
    )  # Apparently, people sometimes forget to remove whitespace..
    return odim_code


def extract_wmo_code(in_path):
    with h5py.File(in_path, mode="r") as f:
        # DWD Specific
        # Main attributes
        what = f["what"].attrs
        # Source block
        source = what.get("source")
        source = source.decode("utf-8")
        # Determine if we are dealing with a WMO code or with an ODIM code set
        # Example from Germany where source block is set as WMO
        # what/source: "WMO:10103"
        # Example from The Netherlands where source block is set as a combination of ODIM and various codes
        # what/source: RAD:NL52,NOD:nlhrw,PLC:Herwijnen
        source_list = source.split(sep=",")
    wmo_code = [string for string in source_list if "WMO" in string]
    # Determine if we had exactly one WMO hit
    if len(wmo_code) == 1:
        wmo_code = wmo_code[0]
        wmo_code = wmo_code.replace("WMO:", "")
    # No wmo code found, most likeley dealing with a dutch radar
    elif len(wmo_code) == 0:
        rad_str = [string for string in source_list if "RAD" in string]

        if len(rad_str) == 1:
            rad_str = rad_str[0]
        else:
            print(
                "Something went wrong with determining the rad_str and it wasnt WMO either, exitting"
            )
            sys.exit(1)
        # Split the rad_str
        rad_str_split = rad_str.split(":")
        # [0] = RAD, [1] = rad code
        rad_code = rad_str_split[1]

        rad_codes = {"NL52": "6356", "NL51": "6234", "NL50": "6260"}

        wmo_code = rad_codes.get(rad_code)
    return int(wmo_code)


def translate_knmi_filename(in_path_h5):
    wmo_code = extract_wmo_code(in_path_h5)
    odim_code = translate_wmo_odim(radar_db, wmo_code)
    with h5py.File(in_path_h5, mode="r") as f:
        what = f["what"].attrs
        # Date block
        date = what.get("date")
        date = date.decode("utf-8")
        # Time block
        time = what.get("time")
        # time = f['dataset1/what'].attrs['endtime']
        time = time.decode("utf-8")
        hh = time[:2]
        mm = time[2:4]
        ss = time[4:]
        time = time[:-2]  # Do not include seconds
        # File type
        filetype = what.get("object")
        filetype = filetype.decode("utf-8")
        if filetype != "PVOL":
            raise FileTranslatorFileTypeError("File type was NOT pvol")
    name = [
        odim_code,
        filetype.lower(),
        date + "T" + time,
        str(wmo_code) + ".h5",
    ]
    ibed_fname = "_".join(name)
    return ibed_fname


def knmi_to_odim(in_fpath, out_fpath):
    """
    Converter usage:
    Usage: KNMI_vol_h5_to_ODIM_h5 ODIM_file.h5 KNMI_input_file.h5

    Returns out_fpath and returncode
    """
    converter = "/opt/radar/vol2bird/bin/./KNMI_vol_h5_to_ODIM_h5"
    command = [converter, out_fpath, in_fpath]
    proc = subprocess.run(command, stderr=subprocess.PIPE)
    output = proc.stderr.decode("utf-8")
    returncode = int(proc.returncode)
    return (out_fpath, returncode, output)


def get_pvol_storage_path(relative_path: str = "") -> str:
    if param_public_minio_data:
        return (
            pathlib.Path(conf_minio_public_root_prefix)
            .joinpath(conf_minio_tutorial_prefix)
            .joinpath(conf_pvol_output_prefix)
            .joinpath(relative_path)
        )
    else:
        return (
            pathlib.Path(conf_minio_tutorial_prefix)
            .joinpath(conf_user_directory + param_user_number)
            .joinpath(conf_pvol_output_prefix)
            .joinpath(relative_path)
        )


print(f"{knmi_pvol_paths=}")
odim_pvol_paths = []
radar_db = load_radar_db(conf_local_radar_db)
for knmi_path in knmi_pvol_paths:
    out_path_pvol_odim = pathlib.Path(knmi_path.replace("knmi", "odim"))
    print(f"{knmi_path=}")
    print(f"{out_path_pvol_odim=}")
    if not out_path_pvol_odim.parent.exists():
        out_path_pvol_odim.parent.mkdir(parents=True, exist_ok=False)
    converter_results = knmi_to_odim(
        in_fpath=str(knmi_path), out_fpath=str(out_path_pvol_odim)
    )
    print(f"{converter_results=}")
    if param_clean_knmi_input:
        pathlib.Path(knmi_path).unlink()
        if not any(pathlib.Path(knmi_path).parent.iterdir()):
            pathlib.Path(knmi_path).parent.rmdir()
    # Determine name for our convention
    ibed_pvol_name = translate_knmi_filename(in_path_h5=out_path_pvol_odim)
    out_path_pvol_odim_tce = pathlib.Path(out_path_pvol_odim).parent.joinpath(
        ibed_pvol_name
    )
    shutil.move(src=out_path_pvol_odim, dst=out_path_pvol_odim_tce)
    odim_pvol_paths.append(out_path_pvol_odim_tce)

print(f"{odim_pvol_paths=}")
if str2bool(param_upload_results):
    # Minio version
    from minio import Minio

    minioClient = Minio(
        endpoint=conf_minio_endpoint,
        access_key=secret_minio_access_key,
        secret_key=secret_minio_secret_key,
        secure=True,
    )
    print(f"Uploading results to {get_pvol_storage_path()}")
    for odim_pvol_path in odim_pvol_paths:
        odim_pvol_path = pathlib.Path(odim_pvol_path)
        local_pvol_storage = pathlib.Path(conf_local_odim)
        relative_path = odim_pvol_path.relative_to(local_pvol_storage)
        remote_odim_pvol_path = get_pvol_storage_path(relative_path)
        # check if this exists
        exists = False
        try:
            _ = minioClient.stat_object(
                bucket=(
                    conf_minio_public_bucket_name
                    if param_public_minio_data
                    else conf_minio_user_bucket_name
                ),
                prefix=remote_odim_pvol_path.as_posix(),
            )
            exists = True
        except:
            pass
        if not exists:
            print(f"Uploading {odim_pvol_path} to {remote_odim_pvol_path}")
            with open(odim_pvol_path, mode="rb") as file_data:
                file_stat = os.stat(odim_pvol_path)
                minioClient.put_object(
                    bucket_name=(
                        conf_minio_public_bucket_name
                        if param_public_minio_data
                        else conf_minio_user_bucket_name
                    ),
                    object_name=remote_odim_pvol_path.as_posix(),
                    data=file_data,
                    length=file_stat.st_size,
                )
        else:
            print(f"{remote_odim_pvol_path} exists, skipping ")
    print("Finished uploading results")
# cast to string to not break json serializer
odim_pvol_paths = [path.as_posix() for path in odim_pvol_paths]

In [None]:
# PVOL-VP-converter
import pandas as pd
import re
import pathlib


def str2bool(v):
    if isinstance(v, bool):
        return v
    if v.lower() in ("yes", "true", "t", "y", "1"):
        return True
    elif v.lower() in ("no", "false", "f", "n", "0"):
        return False
    else:
        raise Exception


def load_radar_db(radar_db_path):
    """Load and return the radar database
    Output dict sample (wmo code is used as key):
    {
        11038: {'number': '1209', 'country': 'Austria', 'countryid': 'LOWM41', 'oldcountryid': 'OS41', 'wmocode': '11038', 'odimcode': 'atrau', 'location': 'Wien/Schwechat', 'status': '1', 'latitude': '48.074', 'longitude': '16.536', 'heightofstation': ' ', 'band': 'C', 'doppler': 'Y', 'polarization': 'D', 'maxrange': '224', 'startyear': '1978', 'heightantenna': '224', 'diametrantenna': ' ', 'beam': ' ', 'gain': ' ', 'frequency': '5.625', 'single_rrr': 'Y', 'composite_rrr': 'Y', 'wrwp': 'Y'},
        11052: {'number': '1210', 'country': 'Austria', 'countryid': 'LOWM43', 'oldcountryid': 'OS43', 'wmocode': '11052', 'odimcode': 'atfel', 'location': 'Salzburg/Feldkirchen', 'status': '1', 'latitude': '48.065', 'longitude': '13.062', 'heightofstation': ' ', 'band': 'C', 'doppler': 'Y', 'polarization': 'D', 'maxrange': '224', 'startyear': '1992', 'heightantenna': '581', 'diametrantenna': ' ', 'beam': ' ', 'gain': ' ', 'frequency': '5.6', 'single_rrr': 'Y', 'composite_rrr': ' ', 'wrwp': ' '},
        ...
    }
    """
    with open(radar_db_path, mode="r") as f:
        radar_db_json = json.load(f)
    radar_db = {}
    # Reorder list to a usable dict with sub dicts which we can search with wmo codes
    for radar_dict in radar_db_json:
        try:
            wmo_code = int(radar_dict.get("wmocode"))
            radar_db.update({wmo_code: radar_dict})
        except Exception:  # Happens when there is for ex. no wmo code.
            pass
    return radar_db


def translate_wmo_odim(radar_db, wmo_code):
    """"""
    # class FileTranslatorFileTypeError(LookupError):
    #    '''raise this when there's a filetype mismatch derived from h5 file'''
    if not isinstance(wmo_code, int):
        raise ValueError("Expecting a wmo_code [int]")
    else:
        pass
    odim_code = (
        radar_db.get(wmo_code).get("odimcode").upper().strip()
    )  # Apparently, people sometimes forget to remove whitespace..
    return odim_code


def extract_wmo_code(in_path):
    with h5py.File(in_path, "r") as f:
        # DWD Specific
        # Main attributes
        what = f["what"].attrs
        # Source block
        source = what.get("source")
        source = source.decode("utf-8")
        # Determine if we are dealing with a WMO code or with an ODIM code set
        # Example from Germany where source block is set as WMO
        # what/source: "WMO:10103"
        # Example from The Netherlands where source block is set as a combination of ODIM and various codes
        # what/source: RAD:NL52,NOD:nlhrw,PLC:Herwijnen
        source_list = source.split(sep=",")
    wmo_code = [string for string in source_list if "WMO" in string]
    # Determine if we had exactly one WMO hit
    if len(wmo_code) == 1:
        wmo_code = wmo_code[0]
        wmo_code = wmo_code.replace("WMO:", "")
    # No wmo code found, most likeley dealing with a dutch radar
    elif len(wmo_code) == 0:
        rad_str = [string for string in source_list if "RAD" in string]
        if len(rad_str) == 1:
            rad_str = rad_str[0]
        else:
            print(
                "Something went wrong with determining the rad_str and it wasnt WMO either, exiting"
            )
            sys.exit(1)
        # Split the rad_str
        rad_str_split = rad_str.split(":")
        # [0] = RAD, [1] = rad code
        rad_code = rad_str_split[1]
        rad_codes = {"NL52": "6356", "NL51": "6234", "NL50": "6260"}
        wmo_code = rad_codes.get(rad_code)
    return int(wmo_code)


def vol2bird(
    in_file,
    out_dir,
    radar_db,
    add_version=True,
    add_sector=False,
    overwrite=False,
):
    # Construct output file
    date_regex = "([0-9]{8})"
    if add_version == True:
        version = "v0-3-20"
        suffix = pathlib.Path(in_file).suffix
        in_file_name = pathlib.Path(in_file).name
        in_file_stem = pathlib.Path(in_file_name).stem
        #
        out_file_name = in_file_stem.replace("pvol", "vp")
        out_file_name = "_".join([out_file_name, version]) + suffix
        # odim = odim_code(out_file_name)
        wmo = extract_wmo_code(in_file)
        odim = translate_wmo_odim(radar_db, wmo)
        datetime = pd.to_datetime(re.search(date_regex, out_file_name)[0])
        ibed_path = "/".join(
            [
                odim[:2],
                odim[2:],
                str(datetime.year),
                str(datetime.month).zfill(2),
                str(datetime.day).zfill(2),
            ]
        )
        # check if we need to make this dir
        out_file = "/".join([out_dir, ibed_path, out_file_name])
        out_file_dir = pathlib.Path(out_file).parent
        if not out_file_dir.exists():
            out_file_dir.mkdir(parents=True)

    process = False
    if not overwrite:
        if not pathlib.Path(out_file).exists():
            process = True
            print(f"Not processing, overwrite is set to {overwrite}")
    else:
        process = True

    if process:
        command = ["vol2bird", in_file, out_file]
        result = subprocess.run(
            command, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT
        )
    return [in_file, out_file]


def get_vp_storage_path(relative_path: str = "") -> str:
    if param_public_minio_data:
        return (
            pathlib.Path(conf_minio_public_root_prefix)
            .joinpath(conf_minio_tutorial_prefix)
            .joinpath(conf_vp_output_prefix)
            .joinpath(relative_path)
        )
    else:
        return (
            pathlib.Path(conf_minio_tutorial_prefix)
            .joinpath(conf_user_directory + param_user_number)
            .joinpath(conf_vp_output_prefix)
            .joinpath(relative_path)
        )


vertical_profile_paths = []
radar_db = load_radar_db(conf_local_radar_db)
# cast back to pathlib after deserializing
odim_pvol_paths = [pathlib.Path(path) for path in odim_pvol_paths]
for odim_pvol_path in odim_pvol_paths:
    pvol_path, vp_path = vol2bird(
        odim_pvol_path, conf_local_vp, radar_db, overwrite=False
    )
    vertical_profile_paths.append(vp_path)
print(vertical_profile_paths)

if str2bool(param_clean_pvol_output):
    print("Removing PVOL output from local storage")
    for pvol_path in odim_pvol_paths:
        pathlib.Path(pvol_path).unlink()
        if not any(pathlib.Path(pvol_path).parent.iterdir()):
            pathlib.Path(pvol_path).parent.rmdir()

if str2bool(param_upload_results):
    # Minio version
    from minio import Minio

    minioClient = Minio(
        endpoint=conf_minio_endpoint,
        access_key=secret_minio_access_key,
        secret_key=secret_minio_secret_key,
        secure=True,
    )
    print(f"Uploading results to {get_vp_storage_path()}")
    for vp_path in vertical_profile_paths:
        vp_path = pathlib.Path(vp_path)
        local_vp_storage = pathlib.Path(conf_local_vp)
        relative_path = vp_path.relative_to(local_vp_storage)
        remote_vp_path = get_vp_storage_path(relative_path)
        # check if this exists
        exists = False
        try:
            _ = minioClient.stat_object(
                bucket=(
                    conf_minio_public_bucket_name
                    if param_public_minio_data
                    else conf_minio_user_bucket_name
                ),
                prefix=remote_vp_path.as_posix(),
            )
            exists = True
        except:
            pass
        if not exists:
            print(f"Uploading {vp_path} to {remote_vp_path}")
            with open(vp_path, mode="rb") as file_data:
                file_stat = os.stat(vp_path)
                minioClient.put_object(
                    bucket_name=(
                        conf_minio_public_bucket_name
                        if param_public_minio_data
                        else conf_minio_user_bucket_name
                    ),
                    object_name=remote_vp_path.as_posix(),
                    data=file_data,
                    length=file_stat.st_size,
                )
        else:
            print(f"{remote_vp_path} exists, skipping ")
    print("Finished uploading results")
if str2bool(param_clean_vp_output):
    print("Removing VP output from local storage")
    for vp_path in vertical_profile_paths:
        pathlib.Path(vp_path).unlink()
        if not any(pathlib.Path(vp_path).parent.iterdir()):
            pathlib.Path(vp_path).parent.rmdir()

PVOL_VP_converter_complete = 1

In [None]:
# Retrieve PVOL DEPRECATED
from minio import Minio
import sys
import pathlib

if PVOL_VP_converter_complete:
    print("PVOL-VP-converter successfull")
else:
    print("PVOL-VP-converter was not complete, exitting")
    import sys

    sys.exit(1)

#
minioClient = Minio(
    endpoint=conf_minio_endpoint,
    access_key=secret_minio_access_key,
    secret_key=secret_minio_secret_key,
    secure=True,
)

# dtype = "pvol"
# country = "NL"
# radar = "DHL"
# year = "2023"
# month = "12"
# day = "31"
recursive = True

if param_dtype.lower() in ["pvol", "polar volume", "polarvolume"]:
    search_prefix = f"{conf_minio_tutorial_prefix}/{conf_user_directory+param_user_number}/{conf_pvol_output_prefix}/{param_country}/{param_radar}/{param_year}/{param_month}/{param_day}"
elif param_dtype.lower() in ["vp", "vertical profile", "verticalprofile"]:
    search_prefix = f"{conf_minio_tutorial_prefix}/{conf_user_directory+param_user_number}/{conf_vp_output_prefix}/{param_country}/{param_radar}/{param_year}/{param_month}/{param_day}"
else:
    print(f"{param_dtype} not understood")
    sys.exit(1)
print(f"{search_prefix=}")
# To be implemented:
# The below works, but we can use this for the filtering from parameters at some point.
# This shoud be done after the demo version.
# start_after_prefix=f'{conf_minio_tutorial_prefix}/{conf_pvol_output_prefix}/{country}/{radar}/{year}/{month}/{day}/{country}{radar}_{dtype}_{year}{month}{day}T2200_6234.h5'
# print(f"{start_after_prefix=}")
# objects = minioClient.list_objects(bucket_name=conf_minio_user_bucket_name,
#                                   prefix=search_prefix,
#                                   recursive=recursive,
#                                   start_after=start_after_prefix
#                                  )
objects = minioClient.list_objects(
    bucket_name=(
        conf_minio_public_bucket_name
        if param_public_minio_data
        else conf_minio_user_bucket_name
    ),
    prefix=search_prefix,
    recursive=recursive,
)
local_file_paths = []
for obj in objects:
    obj_path = pathlib.Path(obj._object_name)
    local_file_path = f"{conf_local_visualization_input}/{obj_path.name}"
    local_file_paths.append(local_file_path)
    print(f"Downloading {obj._object_name} to {local_file_path}")
    minioClient.fget_object(
        bucket_name=obj._bucket_name,
        object_name=obj._object_name,
        file_path=local_file_path,
    )
    local_file_paths.append(local_file_path)
print("Finished")

In [None]:
# S3 PVOL Downloader
# ---
# NaaVRE:
#  cell:
#   outputs:
#    - local_pvol_paths: List
# ...

# Code analyzer fix block. Assign empty strings to variables that should not be picked up by the analyzer.
minioClient = ""

# libraries
from minio import Minio
import pandas as pd
import pathlib


# functions
def get_pvol_storage_path(relative_path: str = "") -> str:
    if param_public_minio_data:
        return (
            pathlib.Path(conf_minio_public_root_prefix)
            .joinpath(conf_minio_tutorial_prefix)
            .joinpath(conf_pvol_output_prefix)
            .joinpath(relative_path)
        )
    else:
        return (
            pathlib.Path(conf_minio_tutorial_prefix)
            .joinpath(conf_user_directory + param_user_number)
            .joinpath(conf_pvol_output_prefix)
            .joinpath(relative_path)
        )


# main
psd = pd.to_datetime(param_start_date)
ped = pd.to_datetime(param_end_date)
minioClient = Minio(
    endpoint=conf_minio_endpoint,
    access_key=secret_minio_access_key,
    secret_key=secret_minio_secret_key,
    secure=True,
)

download_objs = []


# grab psd and ped, rework them to include a start_after prefix
psd_prefix = f"{get_pvol_storage_path()}/NL/{param_radar}/{psd.year}/{psd.month:02}/{psd.day:02}"
print(f"Parsing first prefix: {psd_prefix}")
psd_start_after_prefix = f"{psd_prefix}/NL{param_radar}_pvol_{psd.year}{psd.month:02}{psd.day:02}T{psd.hour:02}{psd.minute:02}"
print(f"Building an start after prefix: {psd_start_after_prefix}")
psd_prefix_objs = minioClient.list_objects(
    bucket_name=(
        conf_minio_public_bucket_name
        if param_public_minio_data
        else conf_minio_user_bucket_name
    ),
    prefix=psd_prefix,
    start_after=psd_start_after_prefix,
    recursive=True,
)
psd_prefix_objs = list(psd_prefix_objs)
print(f"{psd_prefix_objs=}")
download_objs += psd_prefix_objs

# Now we need a middle prefix download. The prefixes that are between psd and ped.
print(f"Determining middle prefixes...")
drange = pd.date_range(start=psd, end=ped, freq="5 min")
date_prefix_list = [
    f"{dstamp.year}/{dstamp.month:02}/{dstamp.day:02}" for dstamp in drange
]
unique_date_prefix_list = list(set(date_prefix_list))
# sort the dates
unique_date_prefix_list.sort()
# remove first prefix, we evaluate that differently
middle_prefixes = unique_date_prefix_list[1:-1]
# Now add the correct country, radar to the date prefixes
middle_prefixes = [
    f"{get_pvol_storage_path()}/NL/{param_radar}/{middle_prefix}"
    for middle_prefix in middle_prefixes
]
print(f"Parsing {len(middle_prefixes)} middle prefixes.")
for middle_prefix in middle_prefixes:
    print(f"Downloading {middle_prefix}")
    middle_prefix_objs = minioClient.list_objects(
        bucket_name=(
            conf_minio_public_bucket_name
            if param_public_minio_data
            else conf_minio_user_bucket_name
        ),
        prefix=middle_prefix,
        recursive=True,
    )
    download_objs += list(middle_prefix_objs)


# For PED we need a 'until'.
# So, we need to determine which part of the list of the final prefix we require.
# in essence, we need a ped_until_prefix.
ped_prefix = f"{get_pvol_storage_path()}/NL/{param_radar}/{ped.year}/{ped.month:02}/{ped.day:02}"

ped_until_prefix = f"{ped_prefix}/NL{param_radar}_pvol_{ped.year}{ped.month:02}{ped.day:02}T{ped.hour:02}{ped.minute:02}"
print(f"Parsing last prefix:{ped_prefix}")
ped_until_datetimestr = (
    f"{ped.year}{ped.month:02}{ped.day:02}T{ped.hour:02}{ped.minute:02}"
)
ped_until_timestamp = pd.to_datetime(ped_until_datetimestr)
print(f"Building an end timestamp for object filtering: {ped_until_timestamp}")
ped_prefix_objs = minioClient.list_objects(
    bucket_name=(
        conf_minio_public_bucket_name
        if param_public_minio_data
        else conf_minio_user_bucket_name
    ),
    prefix=ped_prefix,
    recursive=True,
)
print(f"Filtering last prefix objects on timestamps")
ped_prefix_objs = list(ped_prefix_objs)
print(f"{ped_prefix_objs=}")
_ped_prefix_objs = []
for obj in ped_prefix_objs:
    fpath = pathlib.Path(obj._object_name)
    fname = fpath.name
    corad, dtype, datetimestr, radcode_suffix = fname.split("_")
    timestamp = pd.to_datetime(datetimestr)
    if timestamp <= ped_until_timestamp:
        _ped_prefix_objs.append(obj)
        download_objs.append(obj)

ped_prefix_objs = _ped_prefix_objs

# Ensure that download objects are not duplicated in case of single prefix searches
if psd_prefix == ped_prefix:
    # same prefix
    print("Single prefix filtering")
    psd_object_names = [
        psd_prefix_obj._object_name for psd_prefix_obj in psd_prefix_objs
    ]
    print(f"{psd_object_names=}")
    ped_object_names = [
        ped_prefix_obj._object_name for ped_prefix_obj in ped_prefix_objs
    ]
    print(f"{ped_object_names=}")
    intersect_object_names = [
        obj_name
        for obj_name in psd_object_names
        if obj_name in ped_object_names
    ]
    print(f"{intersect_object_names=}")
    # Reset download_objs list
    download_objs = []
    for psd_prefix_obj in psd_prefix_objs:
        if psd_prefix_obj._object_name in intersect_object_names:
            download_objs.append(psd_prefix_obj)


local_pvol_paths = []
for obj in download_objs:
    obj_path = pathlib.Path(obj._object_name)
    if param_public_minio_data:
        lab, workshop, dtype, country, radar, year, month, day, filename = (
            obj_path.parts
        )
    else:
        workshop, uname, dtype, country, radar, year, month, day, filename = (
            obj_path.parts
        )
    local_pvol_path = (
        f"{conf_local_odim}/{country}/{radar}/{year}/{month}/{day}/{filename}"
    )
    print(local_pvol_path)
    minioClient.fget_object(
        bucket_name=obj._bucket_name,
        object_name=obj._object_name,
        file_path=local_pvol_path,
    )
    local_pvol_paths.append(local_pvol_path)

In [None]:
# S3 VP Downloader
# ---
# NaaVRE:
#  cell:
#   outputs:
#    - vp_paths: List
# ...

# Code analyzer fix block. Assign empty strings to variables that should not be picked up by the analyzer.
minioClient = ""

# libraries
from minio import Minio
import pandas as pd
import pathlib

# functions

# main
psd = pd.to_datetime(param_start_date)
ped = pd.to_datetime(param_end_date)
minioClient = Minio(
    endpoint=conf_minio_endpoint,
    access_key=secret_minio_access_key,
    secret_key=secret_minio_secret_key,
    secure=True,
)

download_objs = []

# grab psd and ped, rework them to include a start_after prefix
psd_prefix = f"bwijers1@gmail.com/vp/NL/{param_radar}/{psd.year}/{psd.month:02}/{psd.day:02}"
psd_start_after_prefix = f"{psd_prefix}/NL{param_radar}_vp_{psd.year}{psd.month:02}{psd.day:02}T{psd.hour:02}{psd.minute:02}"
psd_prefix_objs = minioClient.list_objects(
    bucket_name=conf_minio_user_bucket_name,
    prefix=psd_prefix,
    start_after=psd_start_after_prefix,
    recursive=True,
)
download_objs += list(psd_prefix_objs)

# For PED we need a 'until'.
# So, we need to determine which part of the list of the final prefix we require.
# in essence, we need a ped_until_prefix.
ped_prefix = f"bwijers1@gmail.com/vp/NL/{param_radar}/{ped.year}/{ped.month:02}/{ped.day:02}"
ped_until_prefix = f"{ped_prefix}/NL{param_radar}_vp_{ped.year}{ped.month:02}{ped.day:02}T{ped.hour:02}{ped.minute:02}"
ped_until_datetimestr = (
    f"{ped.year}{ped.month:02}{ped.day:02}T{ped.hour:02}{ped.minute:02}"
)
ped_until_timestamp = pd.to_datetime(ped_until_datetimestr)
ped_prefix_objs = minioClient.list_objects(
    bucket_name=conf_minio_user_bucket_name, prefix=ped_prefix, recursive=True
)
ped_prefix_objs = list(ped_prefix_objs)
for obj in ped_prefix_objs:
    fpath = pathlib.Path(obj._object_name)
    fname = fpath.name
    corad, dtype, datetimestr, radcode, v2b_version_suffix = fname.split("_")
    timestamp = pd.to_datetime(datetimestr)
    if timestamp <= ped_until_timestamp:
        download_objs.append(obj)

vp_paths = []
for obj in download_objs:
    obj_path = pathlib.Path(obj._object_name)
    uname, dtype, country, radar, year, month, day, filename = obj_path.parts
    local_vp_path = (
        f"{conf_local_vp}/{country}/{radar}/{year}/{month}/{day}/{filename}"
    )
    print(local_vp_path)
    minioClient.fget_object(
        bucket_name=obj._bucket_name,
        object_name=obj._object_name,
        file_path=local_vp_path,
    )
    vp_paths.append(local_vp_path)

In [None]:
# S3 PPI Uploader

# Libraries
from minio import Minio

minioClient = Minio(
    endpoint=conf_minio_endpoint,
    access_key=secret_minio_access_key,
    secret_key=secret_minio_secret_key,
    secure=True,
)

for path in local_ppi_paths:
    print(path)
    # strip the leading "/tmp/data"
    obj_key = pathlib.Path(*pathlib.Path(path).parts[3:])
    obj_name = f"{conf_minio_tutorial_prefix}/{conf_user_directory + param_user_number}/{obj_key}"
    print(obj_name)
    minioClient.fput_object(
        bucket_name=conf_minio_user_bucket_name,
        object_name=obj_name,
        file_path=path,
    )

In [None]:
# S3 VPTS Uploader
# Libraries
from minio import Minio

minioClient = Minio(
    endpoint=conf_minio_endpoint,
    access_key=secret_minio_access_key,
    secret_key=secret_minio_secret_key,
    secure=True,
)

for path in local_vpts_paths:
    # strip the leading "/tmp/data"
    print(path)
    obj_key = pathlib.Path(*pathlib.Path(path).parts[3:])
    obj_name = f"{conf_minio_tutorial_prefix}/{obj_key}"
    print(obj_name)
    minioClient.fput_object(
        bucket_name=conf_minio_user_bucket_name,
        object_name=obj_name,
        file_path=path,
    )

In [17]:
from reprolab.environment import create_new_venv
create_new_venv('my_venv')

[✔] Virtual environment 'my_venv' created at /home/koen-greuell/local_notebooks/my_venv
[✔] Pip upgraded
Collecting ipykernel
  Using cached ipykernel-6.30.1-py3-none-any.whl.metadata (6.2 kB)
Collecting boto3
  Using cached boto3-1.40.15-py3-none-any.whl.metadata (6.7 kB)
Collecting ipylab
  Using cached ipylab-1.1.0-py3-none-any.whl.metadata (6.7 kB)
Collecting pandas
  Using cached pandas-2.3.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (91 kB)
Collecting numpy
  Using cached numpy-2.3.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB)
Collecting xarray
  Using cached xarray-2025.8.0-py3-none-any.whl.metadata (12 kB)
Collecting requests
  Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting pyarrow
  Using cached pyarrow-21.0.0-cp313-cp313-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting nbformat
  Using cached nbformat-5.10.4-py3-none-any.whl.metadata (3.6 kB)
Collecting pyyaml
  Using cached PyYAML-6.

In [3]:
from reprolab.environment import freeze_venv_dependencies
freeze_venv_dependencies('my_venv')

Trying pip at: /home/koen-greuell/local_notebooks/my_venv/bin/pip
Running command: /home/koen-greuell/local_notebooks/my_venv/bin/pip freeze
Pip dependencies saved to requirements.txt
Found 108 packages
Not a Conda environment or not activated. Skipping Conda export.

To recreate the environment:
- For pip: Activate the virtual environment and run: `pip install -r requirements.txt`


In [4]:
from reprolab.experiment import start_experiment
start_experiment()

2025-08-22 17:19:52 - INFO - Starting experiment process
2025-08-22 17:19:52 - INFO - Step 1: Saving all notebooks
2025-08-22 17:19:52 - INFO - Attempting to save all Jupyter notebooks...
2025-08-22 17:19:53 - INFO - ipylab save command executed successfully
2025-08-22 17:19:53 - INFO - nbformat processing completed for 2 notebooks
2025-08-22 17:19:53 - INFO - Jupyter save commands executed successfully
2025-08-22 17:19:53 - INFO - All save methods completed
2025-08-22 17:19:53 - INFO - Step 2: Determining next tag name
2025-08-22 17:19:53 - INFO - Determining next tag name
2025-08-22 17:19:53 - INFO - Fetching all tags from remote repositories
2025-08-22 17:19:54 - INFO - Found 3 tags: ['v1.0.0', 'v1.1.0', 'v1.2.0']
2025-08-22 17:19:54 - INFO - Latest tag: v1.2.0, next tag: v1.3.0
2025-08-22 17:19:54 - INFO - Step 3: Committing with message: 'Project state before running experiment v1.3.0'
2025-08-22 17:19:54 - INFO - Starting commit process with message: 'Project state before running

'v1.3.0'

In [None]:
from reprolab.experiment import end_experiment
end_experiment()

2025-08-22 17:19:58 - INFO - Ending experiment process
2025-08-22 17:19:58 - INFO - Step 1: Saving all notebooks
2025-08-22 17:19:58 - INFO - Attempting to save all Jupyter notebooks...
2025-08-22 17:19:58 - INFO - ipylab save command executed successfully
2025-08-22 17:19:58 - INFO - nbformat processing completed for 2 notebooks


In [3]:
from reprolab.experiment import list_and_sort_git_tags
list_and_sort_git_tags()
# Pick your git tag, to download the reproducability package

['v1.2.0', 'v1.1.0', 'v1.0.0']

In [None]:
from reprolab.experiment import download_reproducability_package
download_reproducability_package('<git_tag>')