# Data Collection and Integration of Meteostat Dataset

In this notebook, we will extend our data collection process to include historical weather data from the MeteoStat dataset, supplied via Python library. This will allow us to obtain data dating back to 1990, significantly enhancing the robustness of our predictive models.

We will:

- Set up Meteostat and install the necessary libraries.
- Download data for the variables: `temperature`, `rainfall`, `snowfall`.
- Process and save the data in the same format and structure as our existing datasets.
- Integrate the new data with our existing data cleaning pipeline.


### 1. Import libraries

In [6]:
import sys
import os
import pandas as pd
import numpy as np
from datetime import datetime
import time

### 2. Update Python Path & Import Custom Modules

In [7]:
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

print("Updated sys.path:")
for path in sys.path:
    print(path)

from src.data.fetch_data import (
    get_nearest_station,
    download_meteostat_data
)
from src.data.processing import (
    process_meteostat_data,
    compile_meteostat_data
)

Updated sys.path:
/workspace/SkiSnow
/home/gitpod/.pyenv/versions/3.12.6/lib/python312.zip
/home/gitpod/.pyenv/versions/3.12.6/lib/python3.12
/home/gitpod/.pyenv/versions/3.12.6/lib/python3.12/lib-dynload

/workspace/SkiSnow/venv/lib/python3.12/site-packages


### 3. Define Data Paths

In [8]:
# Define the root data directory
data_root = '/workspace/SkiSnow/data'

# Define subdirectories for raw data
raw_data_root = os.path.join(data_root, 'raw', 'cds')

# Create the directories if they don't exist
os.makedirs(raw_data_root, exist_ok=True)

print(f"Raw data will be saved to: {raw_data_root}")

Raw data will be saved to: /workspace/SkiSnow/data/raw/cds


### 4. Define the List of Resorts and Their Coordinates

In [9]:
resorts = {
    'austrian_alps/st._anton': {
        'latitude': 47.1787,
        'longitude': 10.3143,
        'months_open': ['12', '01', '02', '03', '04'],
    },
    'austrian_alps/kitzbuhel': {
        'latitude': 47.4967,
        'longitude': 12.4429,
        'months_open': ['11', '12', '01', '02', '03', '04'],
    },
    'austrian_alps/solden': {
        'latitude': 47.0190,
        'longitude': 11.0606,
        'months_open': ['10', '11', '12', '01', '02', '03', '04', '05'],
    },
    'swiss_alps/st_moritz': {
        'latitude': 46.5407,
        'longitude': 9.8855,
        'months_open': ['11', '12', '01', '02', '03', '04'],
    },
    'swiss_alps/verbier': {
        'latitude': 46.1465,
        'longitude': 7.2769,
        'months_open': ['12', '01', '02', '03', '04'],
    },
    'italian_alps/cortina_d_ampezzo': {
        'latitude': 46.5905,
        'longitude': 12.1857,
        'months_open': ['12', '01', '02', '03', '04'],
    },
    'italian_alps/val_gardena': {
        'latitude': 46.6219,
        'longitude': 11.7673,
        'months_open': ['12', '01', '02', '03', '04'],
    },
    'italian_alps/sestriere': {
        'latitude': 45.0055,
        'longitude': 6.9335,
        'months_open': ['12', '01', '02', '03', '04'],
    },
    'slovenian_alps/kranjska_gora': {
        'latitude': 46.5347,
        'longitude': 13.8336,
        'months_open': ['12', '01', '02', '03'],
    },
    'slovenian_alps/mariborsko_pohorje': {
        'latitude': 46.5652,
        'longitude': 15.6431,
        'months_open': ['12', '01', '02', '03'],
    },
    'slovenian_alps/krvavec': {
        'latitude': 46.3471,
        'longitude': 14.5875,
        'months_open': ['12', '01', '02', '03', '04'],
    },
}


### 4. Execute the Data Retrieval Workflow

In [10]:
def run_data_retrieval(resorts, start_year, end_year, raw_data_root):
    """
    Orchestrate the data retrieval and processing for all resorts.
    
    Parameters:
    - resorts (dict): Dictionary of resorts with coordinates and open months.
    - start_year (int): Starting year for data retrieval.
    - end_year (int): Ending year for data retrieval.
    - raw_data_root (str): Root directory for raw data.
    
    Returns:
    - None
    """
    for resort_name, resort_info in resorts.items():
        latitude = resort_info['latitude']
        longitude = resort_info['longitude']
        months_open = resort_info['months_open']
        
        # Split the resort_name into region and resort
        try:
            region, resort = resort_name.split('/')
        except ValueError:
            print(f"Invalid resort_name format: {resort_name}. Skipping.")
            continue
        
        # Define resort-specific raw data directory using the hierarchical structure
        resort_raw_dir = os.path.join(raw_data_root, region, resort)
        
        # Define compiled CSV path within raw_data_root
        compiled_csv_path = os.path.join(
            resort_raw_dir,
            f"{resort}_meteostat_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}.csv"
        )
        
        # Download data and get the DataFrame
        data_df = download_meteostat_data(
            resort_name,
            latitude,
            longitude,
            start_year,
            end_year,
            resort_raw_dir
        )
        
        if data_df is not None:
            print(f"Data fetched for {resort_name}. Proceeding to compile.")
            time.sleep(1)
            
            # Compile data and save to CSV
            compile_meteostat_data(
                resort_name,
                data_df,
                compiled_csv_path
            )
        else:
            print(f"Skipping compilation for {resort_name} due to missing data.")


### 6. Run the Data Retrieval Process

In [11]:
# Define temporal range
start_year = 1990
end_year = 2023

# Run the data retrieval process
run_data_retrieval(resorts, start_year, end_year, raw_data_root)

Nearest station for austrian_alps/st._anton: 10948




Data for austrian_alps/st._anton fetched successfully.
Data fetched for austrian_alps/st._anton. Proceeding to compile.
Processing data for austrian_alps/st._anton...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/austrian_alps/st._anton/st._anton_meteostat_2024-10-28_10-48-08.csv.
Nearest station for austrian_alps/kitzbuhel: D5941




Data for austrian_alps/kitzbuhel fetched successfully.
Data fetched for austrian_alps/kitzbuhel. Proceeding to compile.
Processing data for austrian_alps/kitzbuhel...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/austrian_alps/kitzbuhel/kitzbuhel_meteostat_2024-10-28_10-48-10.csv.
Nearest station for austrian_alps/solden: 11120




Data for austrian_alps/solden fetched successfully.
Data fetched for austrian_alps/solden. Proceeding to compile.
Processing data for austrian_alps/solden...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/austrian_alps/solden/solden_meteostat_2024-10-28_10-48-12.csv.
Nearest station for swiss_alps/st_moritz: 16008




Data for swiss_alps/st_moritz fetched successfully.
Data fetched for swiss_alps/st_moritz. Proceeding to compile.
Processing data for swiss_alps/st_moritz...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/swiss_alps/st_moritz/st_moritz_meteostat_2024-10-28_10-48-14.csv.
Nearest station for swiss_alps/verbier: 06720




Data for swiss_alps/verbier fetched successfully.
Data fetched for swiss_alps/verbier. Proceeding to compile.
Processing data for swiss_alps/verbier...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/swiss_alps/verbier/verbier_meteostat_2024-10-28_10-48-15.csv.
Nearest station for italian_alps/cortina_d_ampezzo: 16033




Data for italian_alps/cortina_d_ampezzo fetched successfully.
Data fetched for italian_alps/cortina_d_ampezzo. Proceeding to compile.
Processing data for italian_alps/cortina_d_ampezzo...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/italian_alps/cortina_d_ampezzo/cortina_d_ampezzo_meteostat_2024-10-28_10-48-17.csv.
Nearest station for italian_alps/val_gardena: 16033
Data for italian_alps/val_gardena fetched successfully.
Data fetched for italian_alps/val_gardena. Proceeding to compile.
Processing data for italian_alps/val_gardena...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/italian_alps/val_gardena/val_gardena_meteostat_2024-10-28_10-48-18.csv.
Nearest station for italian_alps/sestriere: 16061




Data for italian_alps/sestriere fetched successfully.
Data fetched for italian_alps/sestriere. Proceeding to compile.
Processing data for italian_alps/sestriere...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/italian_alps/sestriere/sestriere_meteostat_2024-10-28_10-48-19.csv.
Nearest station for slovenian_alps/kranjska_gora: 14008




Data for slovenian_alps/kranjska_gora fetched successfully.
Data fetched for slovenian_alps/kranjska_gora. Proceeding to compile.
Processing data for slovenian_alps/kranjska_gora...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/slovenian_alps/kranjska_gora/kranjska_gora_meteostat_2024-10-28_10-48-20.csv.
Nearest station for slovenian_alps/mariborsko_pohorje: 11240




Data for slovenian_alps/mariborsko_pohorje fetched successfully.
Data fetched for slovenian_alps/mariborsko_pohorje. Proceeding to compile.
Processing data for slovenian_alps/mariborsko_pohorje...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/slovenian_alps/mariborsko_pohorje/mariborsko_pohorje_meteostat_2024-10-28_10-48-22.csv.
Nearest station for slovenian_alps/krvavec: 14015




Data for slovenian_alps/krvavec fetched successfully.
Data fetched for slovenian_alps/krvavec. Proceeding to compile.
Processing data for slovenian_alps/krvavec...
Compiled data saved to /workspace/SkiSnow/data/raw/cds/slovenian_alps/krvavec/krvavec_meteostat_2024-10-28_10-48-23.csv.
