In [1]:
from reprolab.experiment import start_experiment, end_experiment
start_experiment()

2025-07-16 13:06:39 - INFO - Starting experiment process
2025-07-16 13:06:39 - INFO - Step 1: Saving all notebooks
2025-07-16 13:06:39 - INFO - Attempting to save all Jupyter notebooks...
2025-07-16 13:06:40 - INFO - ipylab save command executed successfully
2025-07-16 13:06:40 - INFO - nbformat processing completed for 5 notebooks
2025-07-16 13:06:41 - INFO - Jupyter save commands executed successfully
2025-07-16 13:06:41 - INFO - All save methods completed
2025-07-16 13:06:41 - INFO - Step 2: Determining next tag name
2025-07-16 13:06:41 - INFO - Determining next tag name
2025-07-16 13:06:41 - INFO - Fetching all tags from remote repositories
2025-07-16 13:06:42 - INFO - Found 15 tags: ['v1.0.0', 'v1.1.0', 'v1.10.0', 'v1.11.0', 'v1.12.0', 'v1.13.0', 'v1.14.0', 'v1.2.0', 'v1.3.0', 'v1.4.0', 'v1.5.0', 'v1.6.0', 'v1.7.0', 'v1.8.0', 'v1.9.0']
2025-07-16 13:06:42 - INFO - Latest tag: v1.14.0, next tag: v1.15.0
2025-07-16 13:06:42 - INFO - Step 3: Committing with message: 'Project state be

'v1.15.0'

In [2]:
from reprolab.environment import create_new_venv
create_new_venv('.my_venv')

[✔] Virtual environment '.my_venv' created at /Users/spoton/Documents/master_thesis/poc/reprolab/.my_venv
[✔] Pip upgraded
[✔] Installed essential packages: ipykernel, boto3, ipylab, pandas, numpy, xarray, requests, pyarrow, nbformat, pyyaml, ipywidgets, jupyterlab_widgets
Installed kernelspec .my_venv_kernel in /Users/spoton/Library/Jupyter/kernels/.my_venv_kernel
[✔] Kernel '.my_venv_kernel' registered for Jupyter

🎉 Setup complete!
➡ To use the virtual environment in Jupyter:
   1. Restart your Jupyter server
   2. Select kernel: Python (.my_venv)


# ReproLab Demo

Welcome to ReproLab! This extension helps you make your research more reproducible.

## Features

- **Create Experiments**: Automatically save immutable snapshots of your code under `git` tags to preserve the **exact code and outputs**
- **Manage Dependencies**: Automatically gather and pin **exact package versions**, so that others can set up your environment with one command
- **Cache Data**: Call external API/load manually dataset only once, caching function will handle the rest
- **Archive Data**: Caching function can also preserve the compressed data in *AWS S3*, so you always know what data was used and reduce the API calls
- **Publishing guide**: The reproducibility checklist & automated generation of reproducability package make publishing to platforms such as Zenodo very easy

## Getting Started

1. Use the sidebar to view ReproLab features
2. Create virtual environment and pin your dependencies, go to reprolab section `Create reproducible environment` 
3. Create an experiment to save your current state, go to reprolab section `Create experiment`
4. Archive your data for long-term storage, go to reprolab section `Demo` and play around with it.
5. Publish your work when ready, remember to use reproducability checklist from the section `Reproducibility Checklist`

## Example Usage of persistio decorator

To cache and archive the datasets you use, both from local files and APIs we developed a simple decorator that put over your function that gets the datasets caches the file both locally and in the cloud so that the dataset you use is archived and the number of calls to external APIs is minimal and you don't need to keep the file around after you run it once.

Here is an example using one of NASA open APIs. If you want to test it out yourself, you can copy the code, but you need to provide bucket name and access and secret key in the left-hand panel using the `AWS S3 Configuration` section.

```python
import requests
import pandas as pd
from io import StringIO

# The two lines below is all that you need to add
from reprolab.experiment import persistio
@persistio()
def get_exoplanets_data_from_nasa():
    url = "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"

    query = """
    SELECT TOP 10
        pl_name AS planet_name,
        hostname AS host_star,
        pl_orbper AS orbital_period_days,
        pl_rade AS planet_radius_earth,
        disc_year AS discovery_year
    FROM
        ps
    WHERE
        default_flag = 1
    """

    params = {
        "query": query,
        "format": "csv"
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        df = pd.read_csv(StringIO(response.text))
        
        print(df)
        
    else:
        print(f"Error: {response.status_code} - {response.text}")
    return df

exoplanets_data = get_exoplanets_data_from_nasa()
```

If you run this cell twice you will notice from the logs that the second time file was read from the compressed file in the cache. If you were to lose access to local cache (e.g. by pulling the repository using different device) `persistio` would fetch the data from the cloud archive.


For more information, visit our [documentation](https://github.com/your-repo/reprolab).

# ReproLab Demo

Welcome to ReproLab! This extension helps you make your research more reproducible.

## Features

- **Create Experiments**: Automatically save immutable snapshots of your code under `git` tags to preserve the **exact code and outputs**
- **Manage Dependencies**: Automatically gather and pin **exact package versions**, so that others can set up your environment with one command
- **Cache Data**: Call external API/load manually dataset only once, caching function will handle the rest
- **Archive Data**: Caching function can also preserve the compressed data in *AWS S3*, so you always know what data was used and reduce the API calls
- **Publishing guide**: The reproducibility checklist & automated generation of reproducability package make publishing to platforms such as Zenodo very easy

## Getting Started

1. Use the sidebar to view ReproLab features
2. Create virtual environment and pin your dependencies, go to reprolab section `Create reproducible environment` 
3. Create an experiment to save your current state, go to reprolab section `Create experiment`
4. Archive your data for long-term storage, go to reprolab section `Demo` and play around with it.
5. Publish your work when ready, remember to use reproducability checklist from the section `Reproducibility Checklist`

## Example Usage of persistio decorator

To cache and archive the datasets you use, both from local files and APIs we developed a simple decorator that put over your function that gets the datasets caches the file both locally and in the cloud so that the dataset you use is archived and the number of calls to external APIs is minimal and you don't need to keep the file around after you run it once.

Here is an example using one of NASA open APIs. If you want to test it out yourself, you can copy the code, but you need to provide bucket name and access and secret key in the left-hand panel using the `AWS S3 Configuration` section.

```python
import requests
import pandas as pd
from io import StringIO

# The two lines below is all that you need to add
from reprolab.experiment import persistio
@persistio()
def get_exoplanets_data_from_nasa():
    url = "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"

    query = """
    SELECT TOP 10
        pl_name AS planet_name,
        hostname AS host_star,
        pl_orbper AS orbital_period_days,
        pl_rade AS planet_radius_earth,
        disc_year AS discovery_year
    FROM
        ps
    WHERE
        default_flag = 1
    """

    params = {
        "query": query,
        "format": "csv"
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        df = pd.read_csv(StringIO(response.text))
        
        print(df)
        
    else:
        print(f"Error: {response.status_code} - {response.text}")
    return df

exoplanets_data = get_exoplanets_data_from_nasa()
```

If you run this cell twice you will notice from the logs that the second time file was read from the compressed file in the cache. If you were to lose access to local cache (e.g. by pulling the repository using different device) `persistio` would fetch the data from the cloud archive.


For more information, visit our [documentation](https://github.com/your-repo/reprolab).

In [5]:
import requests
import pandas as pd
from io import StringIO

# The two lines below is all that you need to add
from reprolab.experiment import persistio
@persistio()
def get_exoplanets_data_from_nasa():
    url = "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"

    query = """
    SELECT TOP 10
        pl_name AS planet_name,
        hostname AS host_star,
        pl_orbper AS orbital_period_days,
        pl_rade AS planet_radius_earth,
        disc_year AS discovery_year
    FROM
        ps
    WHERE
        default_flag = 1
    """

    params = {
        "query": query,
        "format": "csv"
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        df = pd.read_csv(StringIO(response.text))
        
        print(df)
        
    else:
        print(f"Error: {response.status_code} - {response.text}")
    return df

exoplanets_data = get_exoplanets_data_from_nasa()


[persistio] Function: get_exoplanets_data_from_nasa
[persistio] Hash: ca840447667cb2059aa83ed68ec9e995
✅ Metadata written to test_notebook_after_reprolab.ipynb_persistio_archive.yaml
[persistio] Trigger logged for function: get_exoplanets_data_from_nasa
[persistio] Attempting to load from local cache...
[persistio] Successfully loaded from local cache!


In [6]:
exoplanets_data

Unnamed: 0,planet_name,host_star,orbital_period_days,planet_radius_earth,discovery_year
0,Kepler-6 b,Kepler-6,3.2347,14.616536,2009
1,Kepler-491 b,Kepler-491,4.225385,8.92,2016
2,Kepler-257 b,Kepler-257,2.382667,2.61,2014
3,Kepler-216 b,Kepler-216,7.693641,2.35,2014
4,Kepler-32 c,Kepler-32,8.7522,2.0,2011
5,Kepler-259 c,Kepler-259,36.924931,2.7,2014
6,Kepler-148 c,Kepler-148,4.180043,3.6,2014
7,Kepler-222 d,Kepler-222,28.081912,3.69,2014
8,Kepler-29 c,Kepler-29,13.28613,2.34,2011
9,Kepler-179 b,Kepler-179,2.735926,1.64,2014


In [3]:
from reprolab.environment import freeze_venv_dependencies
freeze_venv_dependencies('.my_venv')

Trying pip at: /Users/spoton/Documents/master_thesis/poc/reprolab/.my_venv/bin/pip
Running command: /Users/spoton/Documents/master_thesis/poc/reprolab/.my_venv/bin/pip freeze
Pip dependencies saved to requirements.txt
Found 65 packages
Not a Conda environment or not activated. Skipping Conda export.

To recreate the environment:
- For pip: Activate the virtual environment and run: `pip install -r requirements.txt`


In [4]:
end_experiment()

2025-07-16 13:06:45 - INFO - Ending experiment process
2025-07-16 13:06:45 - INFO - Step 1: Saving all notebooks
2025-07-16 13:06:45 - INFO - Attempting to save all Jupyter notebooks...
2025-07-16 13:06:45 - INFO - ipylab save command executed successfully
2025-07-16 13:06:45 - INFO - nbformat processing completed for 5 notebooks
2025-07-16 13:06:46 - INFO - Jupyter save commands executed successfully
2025-07-16 13:06:46 - INFO - All save methods completed
2025-07-16 13:06:46 - INFO - Step 2: Determining next tag name
2025-07-16 13:06:46 - INFO - Determining next tag name
2025-07-16 13:06:46 - INFO - Fetching all tags from remote repositories
2025-07-16 13:06:48 - INFO - Found 15 tags: ['v1.0.0', 'v1.1.0', 'v1.10.0', 'v1.11.0', 'v1.12.0', 'v1.13.0', 'v1.14.0', 'v1.2.0', 'v1.3.0', 'v1.4.0', 'v1.5.0', 'v1.6.0', 'v1.7.0', 'v1.8.0', 'v1.9.0']
2025-07-16 13:06:48 - INFO - Latest tag: v1.14.0, next tag: v1.15.0
2025-07-16 13:06:48 - INFO - Step 3: Committing with message: 'Project state afte

'v1.15.0'