# MeteoSaver v1.0 – Minimal Working Example (MWE)

### A Jupyter Notebook to demonstrate the setup and use of MeteoSaver on sample data.

This notebook guides you through setting up and running **MeteoSaver v1.0** on a sample dataset.  
We will:
- Clone the repository  
- Set up the environment  
- Run the transcription  
- View results  

⚠️ **Note:** Run each cell **sequentially** or execute the entire notebook.

Note: You can run each of these cells individually or run the entire notebook

## Step 1: Clone the Repository

In [1]:
# Step 1: Clone the repository
!git clone https://github.com/VUB-HYDR/MeteoSaver.git

Cloning into 'MeteoSaver'...


## Step 2: Navigate to the Project Directory

In [2]:
import os

# Change working directory to the cloned repo
os.chdir("MeteoSaver")

# Confirm contents
print(os.listdir())

['.git', '.gitignore', '.vscode', 'configuration.ini', 'data', 'Dockerfile', 'docs', 'environment.yml', 'job_script.sh', 'LICENSE', 'OCR_HTR_models', 'README.md', 'results', 'setup.py', 'src']


## Step 3: Set Up the Conda Environment

⚠️ Note: If using Jupyter, conda activate won't persist in a notebook cell. Use !conda run for running scripts within the notebook or activate the environment in a terminal manually.

In [None]:
# Create the python environment from the .yml file in the cloned directory
!conda env create -f environment.yml

⚠️ **Note: Tesseract-OCR Setup (Required)**

To run MeteoSaver, ensure **Tesseract-OCR** is correctly installed and configured. You need to specify:
- **Tesseract executable path**
- **Language data directory (tessdata)**

📌 **Local Setup:** Install Tesseract and ensure the its correct path is placed in the configuration.ini located in MeteoSaver's root directory :
```python
    tesseract_path = "C:/Program Files/Tesseract-OCR/tesseract.exe"


📌 **HPC Setup:** Contact your admin or load the Tesseract module:

        tesseract_path = "/path/to/tesseract"
    
        export TESSDATA_PREFIX="/OCR_HTR_models/"
        
For detailed setup instructions, see **MeteoSaver v1.0 User Manual, Section 2.3**. 

## Step 4: Run MeteoSaver on the Sample Dataset

In [18]:
# Run MeteoSaver to transcribe the sample data included in this repository i.e. "MeteoSaver/data/00_post1960_DRC_hydroclimate_datasheet_images/" 

!conda run -n transcribing_drc_data_environment python src/main.py


Running in Local mode.

Number of current rows in the current column: 42

Added new box at: (57, 2615, 120, 50)

Number of current rows in the current column: 41

Added new box at: (198, 1224, 120, 50)

Added new box at: (216, 1995, 120, 50)

Number of current rows in the current column: 40

Added new box at: (348, 1218, 120, 50)

Added new box at: (355, 1676, 120, 50)

Added new box at: (333, 1996, 120, 50)

Number of current rows in the current column: 41

Added new box at: (499, 2128, 120, 50)

Added new box at: (493, 1681, 120, 50)

Number of current rows in the current column: 42

Added new box at: (634, 2130, 120, 50)

Number of current rows in the current column: 40

Added new box at: (759, 2047, 120, 50)

Added new box at: (782, 2626, 120, 50)

Added new box at: (778, 1186, 120, 50)

Number of current rows in the current column: 40

Added new box at: (955, 2631, 120, 50)

Added new box at: (925, 1930, 120, 50)

Added new box at: (936, 1349, 120, 50)

Number of current rows in t

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.

Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much 

## Step 5: View Sample Output – Post QA/QC Transcribed Values


Below is a sample post QA/QC transcribed monthly climate data sheet generated using MeteoSaver v1.0.

This output represents the processed climate data after Quality Assessment/Quality Control (QA/QC) checks, ensuring accuracy and consistency.

In [32]:
import pandas as pd
from IPython.core.display import HTML

# Load one of the transcribed data sheets after quality assessment and quality control
Post_QA_QC_transcribed_data_file = "results/02_post_QA_QC_transcribed_hydroclimate_data/203/203_196905_SF.JPG_post_QA_QC.xlsx"

# Read the Excel file and show the first few rows
df = pd.read_excel(Post_QA_QC_transcribed_data_file, header=[0, 2])

HTML(df.to_html())  # This ensures the output is properly formatted in Jupyter

Unnamed: 0_level_0,No de la pentade,Date,Bellani (gr. Cal/cm2) 6-6h,Températures extrêmes,Températures extrêmes,Températures extrêmes,Températures extrêmes,Températures extrêmes,Evaportation en cm3 6 - 6h,Evaportation en cm3 6 - 6h,Pluies en mm. 6-6h,Température et Humidité de l'air à 6 heures,Température et Humidité de l'air à 6 heures,Température et Humidité de l'air à 6 heures,Température et Humidité de l'air à 6 heures,Température et Humidité de l'air à 6 heures,Température et Humidité de l'air à 15 heures,Température et Humidité de l'air à 15 heures,Température et Humidité de l'air à 15 heures,Température et Humidité de l'air à 15 heures,Température et Humidité de l'air à 15 heures,Température et Humidité de l'air à 18 heures,Température et Humidité de l'air à 18 heures,Température et Humidité de l'air à 18 heures,Température et Humidité de l'air à 18 heures,Température et Humidité de l'air à 18 heures,Date
Unnamed: 0_level_1,Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Max.,Min.,(M+m)/2,Ampl.,Min. gazon,Abri.,Ext.,Unnamed: 10_level_1,T,T'a,e.,U,∆e,T,T'a,e.,U,∆e,T,T'a,e.,U,∆e,Unnamed: 26_level_1
0,,1,186.2,28.8,19.8,24.3,87.8,18.2,1.7,9.9,0.0,20.2,20.1,83.4,99.0,0.2,28.7,24.1,0.8,168.9,12.2,25.2,22.8,26.2,811.8,0.8,1
1,,2,403.5,32.6,18.2,25.4,14.4,16.8,32.5,5.5,0.0,18.9,18.8,28.6,99.0,0.2,32.6,25.4,27.8,156.6,21.2,27.2,24.0,27.8,77.0,8.2,2
2,,3,422.3,34.2,18.8,26.5,15.4,17.2,11.6,7.2,0.0,20.0,20.0,23.3,100.0,0.0,32.7,25.7,28.6,55.2,20.7,28.7,23.7,26.1,66.4,4.1,3
3,,4,4335.1,34.7,19.3,27.0,15.4,18.2,43.3,1.9,0.0,20.1,20.1,35.1,100.0,0.0,34.2,25.5,27.1,50.6,26.5,29.5,24.4,27.3,66.2,13.8,4
4,,5,299.0,34.41,20.8,27.6,14.0,119.0,2.0,0.4,30.9,21.0,20.6,40.9,96.5,10.8,21.4,21.2,25.1,98.0,10.4,22.2,21.9,26.1,97.4,0.6,5
5,,Tot.,1783.5,164.8,96.9,130.8,67.9,89.4,16.5,27.5,30.9,100.2,99.6,115.8,1894.5,11.2,149.6,121.9,135.7,229.9,81.0,132.8,116.8,133.5,388.8,,Tot.
6,,Moy.,356.7,33.0,19.4,26.2,13.6,17.9,3.3,5.5,,20.0,19.9,23.2,98.9,10.2,29.9,24.4,27.1,66.0,16.2,26.6,23.4,26.7,77.8,41.6,Moy.
7,,6,265.6,30.0,19.8,24.9,10.3,18.2,86.6,82.8,0.0,21.2,20.9,124.5,92.3,0.6,29.9,25.5,29.8,70.6,12.4,27.1,24.9,30.1,84.0,8.3,6
8,,7,347.1,31.8,22.6,27.2,9.2,21.2,2.7,44.6,10.0,23.3,23.0,27.9,97.5,0.5,31.8,24.9,27.1,57.8,19.8,27.9,24.6,28.8,76.7,5.7,7
9,,8,1428.6,32.8,20.4,26.6,12.4,19.9,3.3,16.0,0.0,22.1,21.9,26.1,98.0,10.4,32.6,24.6,25.8,52.7,23.3,28.0,24.8,29.3,77.6,8.7,8


## Step 6: Display Sample Validation Figures

Below are sample time series plots of the transcribed daily maximum (red), average (orange), and minimum (blue) temperatures for the respective stations.

- Automatically transcribed values (using MeteoSaver v1.0) are shown as solid markers.
- Manually transcribed values appear as lighter time series bands, with a 0.2°C uncertainty margin applied during QA/QC checks.
- Accuracy percentage and Mean Absolute Error (MAE) between automatically and manually transcribed values are displayed in the upper right corner of each plot.

This visualization helps assess the performance of MeteoSaver's transcription against manually transcribed data

In [34]:
from IPython.display import Image, display

# Display comparison plot 1 from validation
img_path = "results/03_validation_transcibed_data/203/temperature_comparison_plot_203_196905_SF.JPG_post_QA_QC.jpg"
if os.path.exists(img_path):
    display(Image(filename=img_path))
else:
    print("Validation plot not found.")

<IPython.core.display.Image object>

In [36]:
# Display comparison plot 2 from validation
img_path = "results/03_validation_transcibed_data/203/temperature_comparison_plot_203_196906_SF.JPG_post_QA_QC.jpg"
if os.path.exists(img_path):
    display(Image(filename=img_path))
else:
    print("Validation plot not found.")

<IPython.core.display.Image object>