
# 🌍 Open-Meteo - Data Engineering Project

In the previous **Silver Layer**, we cleaned and standardized the raw weather data, ensuring type consistency, proper formatting, and reliable structure.

Now, in the **Gold Layer**, we elevate our data to a **business-level view** — transforming clean technical data into **actionable insights**. This layer is designed to support **reporting**, **analytics**, and **decision-making**.



### 🎯 Objectives

We will perform the following transformations:

- **Cross-table merging**:
  - Join **weather condition codes** with the **geolocation** table.
  - Enrich weather observations with geographical and categorical context.

- **Aggregations and Grouping**:
  - Compute **global** and **per-city** historical weather statistics.
  - Generate **monthly** and **yearly** temperature summaries.
  - Evaluate **forecast accuracy** by comparing forecasted vs. actual historical data (*future implementation*).

- **Statistical Analysis** *(optional/advanced)*:
  - Explore correlations (e.g., between **latitude/longitude**, **elevation**, and **weather variables**)(*future implementation*).




### 📦 Available Silver Tables

We will use the following inputs from the Silver Layer:

- `📍 geolocation` — Cleaned metadata for cities and coordinates.
- `🌤 weathercode` — Descriptions and classification of weather conditions.
- `☁ current_weather` — Real-time conditions at time of data collection.
- `📆 forecast_daily_weather` — Daily-level weather predictions.
- `🕒 forecast_hourly_weather` — Hourly-level weather predictions.
- `📈 historical_weather` — Actual observed weather data.



In [1]:
#Import libraries
import sys
import os
#to get summarized informations about df, just used for investigation
from ydata_profiling import ProfileReport
# Add the path to the modules directory
my_current_loc = os.getcwd()
my_modules_dir = "/Users/focus_profond/UTN/Data_engineering/proyecto/UTN_data_engineering_project/Entrega_Final/Modules"
os.chdir(my_modules_dir)

#Importing personal modules
from DF_functions import *
from openmeteo_API import *

#Returning to the main directory
#my_main_dir = "/Users/focus_profond/UTN/Data_engineering/proyecto/UTN_data_engineering_project/Entrega"
os.chdir(my_current_loc)
os.chdir('../')




Module exécuté dans : /Users/focus_profond/UTN/Data_engineering/proyecto/UTN_data_engineering_project/utn_env/bin/python


### 🌐 3.1 Merge Weather Tables with Geolocation & Weather Descriptions

In this step, we will enrich each weather dataset — **Current Weather**, **Forecast Daily**, **Forecast Hourly**, and **Historical Weather** — with geographical and descriptive context.

📊 The goal is to create **business-level weather tables**, combining:
- **Raw meteorological metrics** (temperature, humidity, precipitation, etc.)
- **Geographical information** (city name, country, coordinates, elevation, etc.)
- **Semantic descriptions** (decoded weather codes)

🔗 We will perform a two-step join:
1. Merge the weather data with the **geolocation table** using the location identifier.
2. Merge the result with the **weather code table** using the `weathercode` column.

This will allow us to produce rich datasets ready for:
- **Business analysis**
- **Visualization**
- **Advanced correlation or predictive modeling**


##### 🌐 3.1.1. Current weather

In [2]:
path1 = 'Data/Silver/OpenMeteo/Current'
path2 = 'Data/Silver/OpenMeteo/Others/Geolocation'
path3 = 'Data/Silver/OpenMeteo/Others/WeatherCode'
type_merge = 'left'
df1_keys = ['latitude','longitude']
df2_keys = ['latitude','longitude']
df2_columns = ['elevation_m','timezone','population','country','latitude','longitude']

df1a_keys = ['weather_code']
df2a_keys = ['weather_code']
df2a_columns = ['weather_code','description']

#cross joining the current weather data with the geolocation data
df_current = merging_df(path1,path2,type_merge,df1_keys,df2_keys,df2_columns)
#cross joining the current weather data with the weather code
df_current_full = merging_df(df_current,path3,type_merge,df1a_keys,df2a_keys,df2a_columns)
my_dt = df_current_full

# Add new columns based on temperature thresholds
my_dt['warm_enough'] = my_dt['temperature_2m_C'] > 23
my_dt['too_cold'] = my_dt['temperature_2m_C'] < 10

In [None]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Current'
predicate = "target.Date = source.Date AND target.Time = source.Time AND target.City = source.City"
partition_cols = "Date"
save_new_data_as_delta(my_dt,name_folder,predicate = predicate, partition_cols=partition_cols, layer = 'Gold', source = 'open-meteo-current',author='Augustin')

# Verifying the data of the silver layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()

Unnamed: 0,date,time,city,longitude,latitude,surface_pressure_hPa,snowfall_cm,is_day,wind_gusts_10m,pressure_msl_hPa,...,weather_code,precipitation_mm,temperature_2m_C,warm_enough,too_cold,elevation_m,timezone,population,country,description
0,2025-04-12,23:39,Barcelona,2.15899,41.38879,1015.924927,0.0,True,31.68,1021.200012,...,2,0.0,16.950001,False,False,15.0,Europe/Madrid,1621537,Spain,Partly cloudy
1,2025-04-12,23:39,Biu,12.19458,10.61285,926.992493,0.0,False,18.359999,1010.099976,...,0,0.0,25.6,True,False,762.0,Africa/Lagos,95005,Nigeria,Clear sky
2,2025-04-12,23:39,Boston,-71.05977,42.35843,1013.312744,0.0,False,47.519997,1015.700012,...,3,0.0,2.622,False,True,14.0,America/New_York,667137,United States,Overcast
3,2025-04-12,23:39,Brussels,4.34878,50.85045,1024.516846,0.0,True,28.08,1027.800049,...,3,0.0,15.05,False,False,28.0,Europe/Brussels,1019022,Belgium,Overcast
4,2025-04-12,23:39,Buenos Aires,-58.37723,-34.61315,1016.168091,0.0,True,37.439999,1018.400024,...,3,0.0,22.65,False,False,31.0,America/Argentina/Buenos_Aires,13076300,Argentina,Overcast


##### 🌐 3.1.2. Forecast daily weather

In [4]:
path1 = 'Data/Silver/OpenMeteo/Forecast/Daily'
path2 = 'Data/Silver/OpenMeteo/Others/Geolocation'
path3 = 'Data/Silver/OpenMeteo/Others/WeatherCode'
type_merge = 'left'
df1_keys = ['latitude','longitude']
df2_keys = ['latitude','longitude']
df2_columns = ['elevation_m','timezone','population','country','latitude','longitude']

df1a_keys = ['weather_code']
df2a_keys = ['weather_code']
df2a_columns = ['weather_code','description']

#cross joining the forecast daily weather data with the geolocation data 
df_current = merging_df(path1,path2,type_merge,df1_keys,df2_keys,df2_columns)
#cross joining the forecast daily weather with the weather code 
df_current_full = merging_df(df_current,path3,type_merge,df1a_keys,df2a_keys,df2a_columns)

my_dt = df_current_full

# Add new columns based on conditions
my_dt['warm_enough'] = my_dt['temperature_2m_max_C'] > 23
my_dt['too_cold'] = my_dt['temperature_2m_min_C'] < 10
my_dt['sufficient_sunshine'] = my_dt['sunshine_duration_hours'] > 6

my_dt.head()

Unnamed: 0,requested_date,city,forecast_day,weather_code,apparent_temperature_min_C,sunshine_duration_seconds,rain_sum_mm,precipitation_probability_max_inPercent,shortwave_radiation_sum_MJm2,temperature_2m_max_C,...,daylight_duration_minutes,daylight_duration_hours,elevation_m,timezone,population,country,description,warm_enough,too_cold,sufficient_sunshine
0,2025-04-12,Barcelona,2025-04-15 00:00:00+00:00,95,10.728018,43362.71875,0.0,68.0,21.540001,20.271,...,800.52771,13.342129,15.0,Europe/Madrid,1621537,Spain,,False,False,True
1,2025-04-12,Barcelona,2025-04-16 00:00:00+00:00,80,10.020213,29489.005859,0.6,88.0,14.97,17.821001,...,803.131897,13.385531,15.0,Europe/Madrid,1621537,Spain,Rain showers slight,False,False,True
2,2025-04-12,Barcelona,2025-04-17 00:00:00+00:00,3,5.977081,44506.859375,0.0,15.0,23.42,20.0825,...,805.713745,13.428562,15.0,Europe/Madrid,1621537,Spain,Overcast,False,True,True
3,2025-04-12,Chicago,2025-04-18 00:00:00+00:00,51,2.674845,30807.576172,0.3,48.0,12.81,20.925999,...,809.530457,13.492174,179.0,America/Chicago,2720546,United States,Drizzle light intensity,False,True,True
4,2025-04-12,La Paz,2025-04-18 00:00:00+00:00,3,17.340675,43364.820312,0.0,2.0,26.75,32.548,...,767.507751,12.791796,47.0,America/Mazatlan,215178,Mexico,Overcast,True,False,True


In [5]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Forecast/Daily'
predicate = """target.requested_date = source.requested_date AND target.city = source.city and target.forecast_day = source.forecast_day """
partition_cols = ["requested_date"]

save_new_data_as_delta(my_dt,name_folder,predicate= predicate, partition_cols=partition_cols,layer = 'Gold', source= 'open-meteo-forecast-daily', author ='Augustin')

# Verifying the data of the gold layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()

Unnamed: 0,requested_date,city,forecast_day,weather_code,apparent_temperature_min_C,sunshine_duration_seconds,rain_sum_mm,precipitation_probability_max_inPercent,shortwave_radiation_sum_MJm2,temperature_2m_max_C,...,daylight_duration_minutes,daylight_duration_hours,warm_enough,too_cold,sufficient_sunshine,elevation_m,timezone,population,country,description
0,2025-04-12,Biu,2025-04-12 00:00:00+00:00,3,21.073042,39960.453125,0.0,0.0,26.219999,36.323498,...,740.483826,12.341397,True,False,True,762.0,Africa/Lagos,95005,Nigeria,Overcast
1,2025-04-12,Biu,2025-04-14 00:00:00+00:00,3,18.443285,41903.117188,0.0,0.0,27.799999,37.4235,...,741.623596,12.360393,True,False,True,762.0,Africa/Lagos,95005,Nigeria,Overcast
2,2025-04-12,Biu,2025-04-15 00:00:00+00:00,3,22.393799,39693.480469,0.0,3.0,27.26,38.573498,...,742.186218,12.36977,True,False,True,762.0,Africa/Lagos,95005,Nigeria,Overcast
3,2025-04-12,Biu,2025-04-16 00:00:00+00:00,2,23.345686,42075.976562,0.0,10.0,28.16,38.823498,...,742.743408,12.379057,True,False,True,762.0,Africa/Lagos,95005,Nigeria,Partly cloudy
4,2025-04-12,Biu,2025-04-17 00:00:00+00:00,3,26.679255,42106.542969,0.0,0.0,25.870001,38.973499,...,743.295532,12.388259,True,False,True,762.0,Africa/Lagos,95005,Nigeria,Overcast


FORECAST HOURLY WEATHER  WITH GEOLOC AND WEATHERCODE

##### 🌐 3.1.3. Forecast hourly weather

In [6]:
path1 = 'Data/Silver/OpenMeteo/Forecast/Hourly'
path2 = 'Data/Silver/OpenMeteo/Others/Geolocation'
path3 = 'Data/Silver/OpenMeteo/Others/WeatherCode'
type_merge = 'left'
df1_keys = ['latitude','longitude']
df2_keys = ['latitude','longitude']
df2_columns = ['elevation_m','timezone','population','country','latitude','longitude']

df1a_keys = ['weather_code']
df2a_keys = ['weather_code']
df2a_columns = ['weather_code','description']

#cross joining the forecast hourly weather data with the geolocation data 
df_current = merging_df(path1,path2,type_merge,df1_keys,df2_keys,df2_columns)
#cross joining the forecast hourly weather with the weather code
df_current_full = merging_df(df_current,path3,type_merge,df1a_keys,df2a_keys,df2a_columns)
my_dt = df_current_full

# Add new columns based on conditions
my_dt['warm_enough'] = my_dt['temperature_2m_C'] > 23
my_dt['too_cold'] = my_dt['temperature_2m_C'] < 10
my_dt['too_much_cloud'] = my_dt['cloud_cover_inPercent'] > 65
my_dt.head()


Unnamed: 0,requested_date,city,forecast_date,forecast_hour,longitude,latitude,soil_moisture_27_to_81cm_m3m3,soil_moisture_9_to_27cm_m3m3,soil_moisture_3_to_9cm_m3m3,soil_moisture_1_to_3cm_m3m3,...,relative_humidity_2m_inPercent,temperature_2m_C,elevation_m,timezone,population,country,description,warm_enough,too_cold,too_much_cloud
0,2025-04-12,Rio de Janeiro,2025-04-13,20:00,-43.18223,-22.90642,0.373,0.35,0.328,0.306,...,62.0,27.575998,12.0,America/Sao_Paulo,6023699,Brazil,Mainly clear,True,False,False
1,2025-04-12,Barcelona,2025-04-10,12:00,2.15899,41.38879,0.317,0.286,0.272,0.265,...,74.0,17.371,15.0,Europe/Madrid,1621537,Spain,Partly cloudy,False,False,True
2,2025-04-12,Barcelona,2025-04-10,15:00,2.15899,41.38879,0.317,0.286,0.271,0.261,...,73.0,17.171001,15.0,Europe/Madrid,1621537,Spain,Partly cloudy,False,False,False
3,2025-04-12,Barcelona,2025-04-11,00:00,2.15899,41.38879,0.317,0.285,0.269,0.264,...,97.0,12.321,15.0,Europe/Madrid,1621537,Spain,Fog,False,False,False
4,2025-04-12,Barcelona,2025-04-11,01:00,2.15899,41.38879,0.316,0.285,0.269,0.264,...,98.0,11.821,15.0,Europe/Madrid,1621537,Spain,Fog,False,False,True


In [7]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Forecast/Hourly'
partition_cols = ["requested_date"]
predicate = """target.requested_date = source.requested_date  AND target.city = source.city AND target.forecast_date = source.forecast_date and target.forecast_hour = source.forecast_hour """

upsert_data_as_delta(my_dt,name_folder,predicate= predicate, partition_cols=partition_cols, layer = 'Gold', source= 'open-meteo-forecast-hourly', author ='Augustin')

# Verifying the data of the bronze layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()

Unnamed: 0,requested_date,city,forecast_date,forecast_hour,longitude,latitude,soil_moisture_27_to_81cm_m3m3,soil_moisture_9_to_27cm_m3m3,soil_moisture_3_to_9cm_m3m3,soil_moisture_1_to_3cm_m3m3,...,relative_humidity_2m_inPercent,temperature_2m_C,warm_enough,too_cold,too_much_cloud,elevation_m,timezone,population,country,description
0,2025-04-12,Biu,2025-04-12,11:00,12.19458,10.61285,0.156,0.128,0.11,0.085,...,12.0,34.073498,True,False,False,762.0,Africa/Lagos,95005,Nigeria,Clear sky
1,2025-04-12,Biu,2025-04-12,15:00,12.19458,10.61285,0.156,0.127,0.11,0.082,...,11.0,35.973499,True,False,False,762.0,Africa/Lagos,95005,Nigeria,Clear sky
2,2025-04-12,Biu,2025-04-13,05:00,12.19458,10.61285,0.156,0.127,0.107,0.077,...,28.0,22.473499,False,False,False,762.0,Africa/Lagos,95005,Nigeria,Clear sky
3,2025-04-12,Biu,2025-04-13,08:00,12.19458,10.61285,0.156,0.126,0.107,0.077,...,19.0,27.723499,True,False,False,762.0,Africa/Lagos,95005,Nigeria,Clear sky
4,2025-04-12,Biu,2025-04-14,11:00,12.19458,10.61285,0.155,0.124,0.103,0.071,...,13.0,35.473499,True,False,False,762.0,Africa/Lagos,95005,Nigeria,Clear sky


##### 🌐 3.1.4. Historical daily weather

In [8]:
path1 = 'Data/Silver/OpenMeteo/Historical/Daily'
path2 = 'Data/Silver/OpenMeteo/Others/Geolocation'
path3 = 'Data/Silver/OpenMeteo/Others/WeatherCode'
type_merge = 'left'
df1_keys = ['latitude','longitude']
df2_keys = ['latitude','longitude']
df2_columns = ['elevation_m','timezone','population','country','latitude','longitude']

df1a_keys = ['weather_code']
df2a_keys = ['weather_code']
df2a_columns = ['weather_code','description']

#cross joining the historical daily weather data with the geolocation data
df_current = merging_df(path1,path2,type_merge,df1_keys,df2_keys,df2_columns)
#cross joining the historical daily weather data with the weather code
df_current_full = merging_df(df_current,path3,type_merge,df1a_keys,df2a_keys,df2a_columns)
my_dt = df_current_full

# Add new columns based on conditions
my_dt['warm_enough'] = my_dt['temperature_2m_max_C'] > 23
my_dt['too_cold'] = my_dt['temperature_2m_min_C'] < 10
my_dt['sufficient_sunshine'] = my_dt['sunshine_duration_hours'] > 6

my_dt.head()

Unnamed: 0,city,historical_date,historical_year,historical_month,historical_day,longitude,latitude,wind_direction_10m_dominant_deg,precipitation_hours_h,precipitation_sum,...,daylight_duration_minutes,daylight_duration_hours,warm_enough,too_cold,sufficient_sunshine,elevation_m,timezone,population,country,description
0,Barcelona,2013-01-18,2013,1,Fri,2.15899,41.38879,245.865112,9.0,2.4,...,576.67981,9.61133,False,True,False,15.0,Europe/Madrid,1621537,Spain,Drizzle light intensity
1,Barcelona,2013-01-26,2013,1,Sat,2.15899,41.38879,35.972466,4.0,1.2,...,592.010498,9.866841,False,True,True,15.0,Europe/Madrid,1621537,Spain,Drizzle moderate intensity
2,Barcelona,2013-01-24,2013,1,Thu,2.15899,41.38879,299.295288,3.0,0.6,...,587.966187,9.799437,False,True,True,15.0,Europe/Madrid,1621537,Spain,Drizzle light intensity
3,Barcelona,2013-02-15,2013,2,Fri,2.15899,41.38879,211.914871,0.0,0.0,...,639.140747,10.652346,False,True,True,15.0,Europe/Madrid,1621537,Spain,Overcast
4,Barcelona,2013-02-03,2013,2,Sun,2.15899,41.38879,327.231506,0.0,0.0,...,609.381714,10.156362,False,True,True,15.0,Europe/Madrid,1621537,Spain,Mainly clear


In [9]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Historical/Daily'
predicate = """target.city = source.city AND target.historical_date = source.historical_date"""
partition_cols = ["historical_year"]
#it works with low amount of data but to check if the size increases.
save_new_data_as_delta(my_dt,name_folder,predicate= predicate, partition_cols=partition_cols, layer = 'Gold', source= 'open-meteo-historical-daily', author ='Augustin')

#Verifying the data of the gold layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()



Unnamed: 0,city,historical_date,historical_year,historical_month,historical_day,longitude,latitude,wind_direction_10m_dominant_deg,precipitation_hours_h,precipitation_sum,...,daylight_duration_minutes,daylight_duration_hours,warm_enough,too_cold,sufficient_sunshine,elevation_m,timezone,population,country,description
0,Barcelona,2013-01-28,2013,1,Mon,2.15899,41.38879,300.627197,5.0,2.8,...,596.161682,9.936028,False,True,True,15.0,Europe/Madrid,1621537,Spain,Drizzle dense intensity
1,Barcelona,2013-01-27,2013,1,Sun,2.15899,41.38879,282.633362,2.0,0.8,...,594.073914,9.901232,False,True,True,15.0,Europe/Madrid,1621537,Spain,Drizzle moderate intensity
2,Barcelona,2013-01-17,2013,1,Thu,2.15899,41.38879,116.564987,4.0,0.5,...,574.950378,9.582506,False,True,True,15.0,Europe/Madrid,1621537,Spain,Drizzle light intensity
3,Barcelona,2013-02-04,2013,2,Mon,2.15899,41.38879,301.835205,0.0,0.0,...,611.725891,10.195432,False,True,True,15.0,Europe/Madrid,1621537,Spain,Partly cloudy
4,Barcelona,2013-02-25,2013,2,Mon,2.15899,41.38879,49.23642,0.0,0.0,...,665.394775,11.089913,False,True,True,15.0,Europe/Madrid,1621537,Spain,Overcast


### 📈 3.2 Apply Aggregations & Comparative Analysis

In this section, we will apply aggregation and comparative logic to the cleaned weather datasets, in order to extract **meaningful insights** and **business-level metrics**.

📊 **Available Datasets:**
- `current_weather`
- `forecast_daily_weather`
- `forecast_hourly_weather`
- `historical_weather`
- `geolocation`
- `weathercode`


##### 🧮 Step 1 — Compute Aggregations on Historical Data

We will analyze **historical weather patterns** by computing the following:
- **Global averages per city** (across the whole dataset);
- **Monthly averages per city** — to observe seasonal variations;
- **Yearly averages per city** — to analyze long-term climate trends;
- **Global monthly and yearly averages** — useful for high-level comparisons, to analyze long-term climate trends across a cluster of cities.

These aggregations will help in detecting **climatic trends**, **temperature shifts**, or **anomalies** over time.



##### 🔍 Step 2 — Forecast vs Historical Comparison : for future implementation

We will assess the **forecast accuracy** by comparing forecasted data with historical records. Specifically:
- For **daily forecasts**, compare each metric with historical daily averages.
- For **hourly forecasts**, compare with historical hourly patterns.

We will compute a **forecast accuracy score**, defined as a **percentage difference** between forecasted and actual historical values.

This enables:
- **Quantitative evaluation** of forecast reliability
- **City-level forecast performance analysis**



##### 🧠 Step 3 — Additional Exploratory Metrics (Optional) :  for future implementation

- Correlation analysis between **latitude, longitude, and elevation** with temperature and precipitation
- Identify **outliers** or **unexpected weather behavior** in specific regions



##### 🧮 3.2.1. Means per city per month

In [10]:
name_folder = 'Data/Gold/OpenMeteo/Historical/Daily'
my_dt = DeltaTable(name_folder).to_pandas()
groupby = ['city','historical_year', 'historical_month']
agg_columns = {'wind_direction_10m_dominant_deg':['min','max','mean']
               ,'precipitation_hours_h':'mean'
               ,'precipitation_sum':'mean'
               ,'apparent_temperature_max_C':'mean'
               ,'daylight_duration_seconds':'mean'
               ,'wind_gusts_10m_max_kmh':'mean'
               ,'snowfall_sum_cm':'mean'
               , 'temperature_2m_min_C':'mean'
               ,'et0_fao_evapotranspiration_mm':'mean'
               ,'wind_speed_10m_max_kmh':'mean'
               ,'showers_sum_mm':'mean'
               ,'temperature_2m_max_C':'mean'
               ,'shortwave_radiation_sum_MJm2':'mean'
               ,'rain_sum_mm':'mean'
               ,'sunshine_duration_seconds':'mean'
               ,'apparent_temperature_min_C':'mean'
               ,'sunshine_duration_minutes':'mean'
               ,'sunshine_duration_hours':'mean'
               ,'daylight_duration_minutes':'mean'
               ,'daylight_duration_hours':'mean'
               ,'weather_code':lambda x: x.mode().iloc[0] }

my_dt =aggregate_dataframe(my_dt, groupby, agg_columns)
my_dt.head()

Unnamed: 0,city,historical_year,historical_month,wind_direction_10m_dominant_deg_MIN,wind_direction_10m_dominant_deg_MAX,wind_direction_10m_dominant_deg_MEAN,precipitation_hours_h_MEAN,precipitation_sum_MEAN,apparent_temperature_max_C_MEAN,daylight_duration_seconds_MEAN,...,temperature_2m_max_C_MEAN,shortwave_radiation_sum_MJm2_MEAN,rain_sum_mm_MEAN,sunshine_duration_seconds_MEAN,apparent_temperature_min_C_MEAN,sunshine_duration_minutes_MEAN,sunshine_duration_hours_MEAN,daylight_duration_minutes_MEAN,daylight_duration_hours_MEAN,weather_code_<LAMBDA>
0,Barcelona,2010,1,21.209312,340.920624,181.725616,4.903226,2.377419,8.945212,34491.246094,...,11.314355,6.125484,2.254839,19725.25,3.090729,328.75415,5.479236,574.854065,9.580901,3
1,Barcelona,2010,2,6.916732,314.799225,183.86293,4.142857,2.892857,9.794568,38254.699219,...,12.696786,9.768214,2.867857,25891.765625,2.730789,431.529388,7.192157,637.57843,10.626306,3
2,Barcelona,2010,3,22.299452,358.054688,188.971069,3.258065,2.564516,12.013682,43027.148438,...,14.230483,15.243226,1.977419,33314.097656,4.588932,555.234985,9.253916,717.119141,11.951986,3
3,Barcelona,2010,4,27.315474,275.102081,183.652283,1.566667,0.523333,17.23802,48013.386719,...,17.67,19.0,0.523333,37735.097656,9.062257,628.918213,10.48197,800.223083,13.337051,3
4,Barcelona,2010,5,34.653992,281.52713,170.053055,5.193548,3.551613,19.99824,52247.675781,...,19.461128,21.741613,3.551613,43421.300781,12.292632,723.688354,12.061473,870.794678,14.513245,51


In [11]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Historical/Calculations/PerCityPerMonth'
save_data_as_delta(my_dt, name_folder, mode="overwrite", layer='Gold',source='open-meteo-historical-daily', author='Augustin')

#Verifying the data of the gold layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()

Unnamed: 0,city,historical_year,historical_month,wind_direction_10m_dominant_deg_MIN,wind_direction_10m_dominant_deg_MAX,wind_direction_10m_dominant_deg_MEAN,precipitation_hours_h_MEAN,precipitation_sum_MEAN,apparent_temperature_max_C_MEAN,daylight_duration_seconds_MEAN,...,temperature_2m_max_C_MEAN,shortwave_radiation_sum_MJm2_MEAN,rain_sum_mm_MEAN,sunshine_duration_seconds_MEAN,apparent_temperature_min_C_MEAN,sunshine_duration_minutes_MEAN,sunshine_duration_hours_MEAN,daylight_duration_minutes_MEAN,daylight_duration_hours_MEAN,weather_code_<LAMBDA>
0,Barcelona,2010,1,21.209312,340.920624,181.725616,4.903226,2.377419,8.945212,34491.246094,...,11.314355,6.125484,2.254839,19725.25,3.090729,328.75415,5.479236,574.854065,9.580901,3
1,Barcelona,2010,2,6.916732,314.799225,183.86293,4.142857,2.892857,9.794568,38254.699219,...,12.696786,9.768214,2.867857,25891.765625,2.730789,431.529388,7.192157,637.57843,10.626306,3
2,Barcelona,2010,3,22.299452,358.054688,188.971069,3.258065,2.564516,12.013682,43027.148438,...,14.230483,15.243226,1.977419,33314.097656,4.588932,555.234985,9.253916,717.119141,11.951986,3
3,Barcelona,2010,4,27.315474,275.102081,183.652283,1.566667,0.523333,17.23802,48013.386719,...,17.67,19.0,0.523333,37735.097656,9.062257,628.918213,10.48197,800.223083,13.337051,3
4,Barcelona,2010,5,34.653992,281.52713,170.053055,5.193548,3.551613,19.99824,52247.675781,...,19.461128,21.741613,3.551613,43421.300781,12.292632,723.688354,12.061473,870.794678,14.513245,51


##### 🧮 3.2.2. Means per city per year

In [12]:
name_folder = 'Data/Gold/OpenMeteo/Historical/Daily'
my_dt = DeltaTable(name_folder).to_pandas()
groupby = ['city','historical_year']
agg_columns = {'wind_direction_10m_dominant_deg':['min','max','mean']
               ,'precipitation_hours_h':'mean'
               ,'precipitation_sum':'mean'
               ,'apparent_temperature_max_C':'mean'
               ,'daylight_duration_seconds':'mean'
               ,'wind_gusts_10m_max_kmh':'mean'
               ,'snowfall_sum_cm':'mean'
               , 'temperature_2m_min_C':'mean'
               ,'et0_fao_evapotranspiration_mm':'mean'
               ,'wind_speed_10m_max_kmh':'mean'
               ,'showers_sum_mm':'mean'
               ,'temperature_2m_max_C':'mean'
               ,'shortwave_radiation_sum_MJm2':'mean'
               ,'rain_sum_mm':'mean'
               ,'sunshine_duration_seconds':'mean'
               ,'apparent_temperature_min_C':'mean'
               ,'sunshine_duration_minutes':'mean'
               ,'sunshine_duration_hours':'mean'
               ,'daylight_duration_minutes':'mean'
               ,'daylight_duration_hours':'mean'
               ,'weather_code':lambda x: x.mode().iloc[0] }

my_dt =aggregate_dataframe(my_dt, groupby, agg_columns)
my_dt.head(2)

Unnamed: 0,city,historical_year,wind_direction_10m_dominant_deg_MIN,wind_direction_10m_dominant_deg_MAX,wind_direction_10m_dominant_deg_MEAN,precipitation_hours_h_MEAN,precipitation_sum_MEAN,apparent_temperature_max_C_MEAN,daylight_duration_seconds_MEAN,wind_gusts_10m_max_kmh_MEAN,...,temperature_2m_max_C_MEAN,shortwave_radiation_sum_MJm2_MEAN,rain_sum_mm_MEAN,sunshine_duration_seconds_MEAN,apparent_temperature_min_C_MEAN,sunshine_duration_minutes_MEAN,sunshine_duration_hours_MEAN,daylight_duration_minutes_MEAN,daylight_duration_hours_MEAN,weather_code_<LAMBDA>
0,Barcelona,2010,3.813969,358.054688,189.707962,3.39726,1.956438,18.843021,43955.074219,31.305204,...,18.988974,15.075836,1.893151,34342.546875,11.756217,572.375793,9.539597,732.584595,12.209743,3
1,Barcelona,2011,2.076218,359.999939,181.439728,2.654794,1.599452,20.602261,43955.238281,29.398684,...,20.298698,15.863863,1.599452,36369.554688,13.244725,606.159302,10.102654,732.587341,12.209788,3


In [13]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Historical/Calculations/PerCityPerYear'
save_data_as_delta(my_dt, name_folder, mode="overwrite", layer='Gold',source='open-meteo-historical-daily', author='Augustin')

#Verifying the data of the gold layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()

Unnamed: 0,city,historical_year,wind_direction_10m_dominant_deg_MIN,wind_direction_10m_dominant_deg_MAX,wind_direction_10m_dominant_deg_MEAN,precipitation_hours_h_MEAN,precipitation_sum_MEAN,apparent_temperature_max_C_MEAN,daylight_duration_seconds_MEAN,wind_gusts_10m_max_kmh_MEAN,...,temperature_2m_max_C_MEAN,shortwave_radiation_sum_MJm2_MEAN,rain_sum_mm_MEAN,sunshine_duration_seconds_MEAN,apparent_temperature_min_C_MEAN,sunshine_duration_minutes_MEAN,sunshine_duration_hours_MEAN,daylight_duration_minutes_MEAN,daylight_duration_hours_MEAN,weather_code_<LAMBDA>
0,Barcelona,2010,3.813969,358.054688,189.707962,3.39726,1.956438,18.843021,43955.074219,31.305204,...,18.988974,15.075836,1.893151,34342.546875,11.756217,572.375793,9.539597,732.584595,12.209743,3
1,Barcelona,2011,2.076218,359.999939,181.439728,2.654794,1.599452,20.602261,43955.238281,29.398684,...,20.298698,15.863863,1.599452,36369.554688,13.244725,606.159302,10.102654,732.587341,12.209788,3
2,Barcelona,2012,0.473433,355.962402,190.508514,2.073771,1.245082,20.143944,43925.964844,30.775082,...,20.23571,16.407732,1.238525,36560.914062,12.451295,609.348511,10.155809,732.099487,12.201657,3
3,Barcelona,2013,4.6e-05,359.610168,197.353226,2.794521,1.670411,19.750835,43955.203125,31.835836,...,19.867056,15.61252,1.663288,35721.398438,12.273068,595.356689,9.922611,732.586731,12.209779,3
4,Barcelona,2016,0.364887,359.692749,183.21312,2.527322,1.45765,20.659861,43926.230469,30.89213,...,20.535559,15.748087,1.45765,36032.515625,13.077597,600.541992,10.009032,732.103821,12.201731,3


##### 🧮 3.2.3. Means overall countries per month 

In [14]:
name_folder = 'Data/Gold/OpenMeteo/Historical/Daily'
my_dt = DeltaTable(name_folder).to_pandas()
groupby = ['historical_year','historical_month']
agg_columns = {'wind_direction_10m_dominant_deg':['min','max','mean']
               ,'precipitation_hours_h':'mean'
               ,'precipitation_sum':'mean'
               ,'apparent_temperature_max_C':'mean'
               ,'daylight_duration_seconds':'mean'
               ,'wind_gusts_10m_max_kmh':'mean'
               ,'snowfall_sum_cm':'mean'
               , 'temperature_2m_min_C':'mean'
               ,'et0_fao_evapotranspiration_mm':'mean'
               ,'wind_speed_10m_max_kmh':'mean'
               ,'showers_sum_mm':'mean'
               ,'temperature_2m_max_C':'mean'
               ,'shortwave_radiation_sum_MJm2':'mean'
               ,'rain_sum_mm':'mean'
               ,'sunshine_duration_seconds':'mean'
               ,'apparent_temperature_min_C':'mean'
               ,'sunshine_duration_minutes':'mean'
               ,'sunshine_duration_hours':'mean'
               ,'daylight_duration_minutes':'mean'
               ,'daylight_duration_hours':'mean'
               ,'weather_code':lambda x: x.mode().iloc[0] }

my_dt =aggregate_dataframe(my_dt, groupby, agg_columns)
my_dt.head(5)

Unnamed: 0,historical_year,historical_month,wind_direction_10m_dominant_deg_MIN,wind_direction_10m_dominant_deg_MAX,wind_direction_10m_dominant_deg_MEAN,precipitation_hours_h_MEAN,precipitation_sum_MEAN,apparent_temperature_max_C_MEAN,daylight_duration_seconds_MEAN,wind_gusts_10m_max_kmh_MEAN,...,temperature_2m_max_C_MEAN,shortwave_radiation_sum_MJm2_MEAN,rain_sum_mm_MEAN,sunshine_duration_seconds_MEAN,apparent_temperature_min_C_MEAN,sunshine_duration_minutes_MEAN,sunshine_duration_hours_MEAN,daylight_duration_minutes_MEAN,daylight_duration_hours_MEAN,weather_code_<LAMBDA>
0,2010,1,1.432091,358.986053,189.476181,3.37276,2.168459,11.686591,38814.847656,35.526451,...,13.011835,12.511756,1.901434,26589.986328,3.964203,443.166443,7.386107,646.914124,10.781903,3
1,2010,2,1.537713,359.610809,189.849503,4.142857,3.007738,12.858019,40825.683594,38.695,...,14.246036,13.815198,2.449008,27422.53125,5.02788,457.042206,7.61737,680.428101,11.340467,3
2,2010,3,1.77875,359.714386,182.350708,3.281362,2.878495,16.092392,43342.796875,38.107098,...,17.288538,16.216846,2.735305,31881.773438,7.396659,531.362915,8.856049,722.379944,12.039666,3
3,2010,4,0.422862,357.929993,186.750519,2.833333,1.747037,18.965096,45982.90625,35.669998,...,19.701887,18.593388,1.715926,35620.183594,9.272553,593.669739,9.894496,766.381775,12.773029,3
4,2010,5,0.897164,359.123108,191.353516,2.573477,1.681004,20.704971,48272.6875,35.599354,...,21.188807,18.945932,1.681004,36703.542969,11.385653,611.725708,10.195428,804.5448,13.409081,3


In [15]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Historical/Calculations/PerMonth'
save_data_as_delta(my_dt, name_folder, mode="overwrite", layer='Gold',source='open-meteo-historical-daily', author='Augustin')

#Verifying the data of the gold layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()

Unnamed: 0,historical_year,historical_month,wind_direction_10m_dominant_deg_MIN,wind_direction_10m_dominant_deg_MAX,wind_direction_10m_dominant_deg_MEAN,precipitation_hours_h_MEAN,precipitation_sum_MEAN,apparent_temperature_max_C_MEAN,daylight_duration_seconds_MEAN,wind_gusts_10m_max_kmh_MEAN,...,temperature_2m_max_C_MEAN,shortwave_radiation_sum_MJm2_MEAN,rain_sum_mm_MEAN,sunshine_duration_seconds_MEAN,apparent_temperature_min_C_MEAN,sunshine_duration_minutes_MEAN,sunshine_duration_hours_MEAN,daylight_duration_minutes_MEAN,daylight_duration_hours_MEAN,weather_code_<LAMBDA>
0,2010,1,1.432091,358.986053,189.476181,3.37276,2.168459,11.686591,38814.847656,35.526451,...,13.011835,12.511756,1.901434,26589.986328,3.964203,443.166443,7.386107,646.914124,10.781903,3
1,2010,2,1.537713,359.610809,189.849503,4.142857,3.007738,12.858019,40825.683594,38.695,...,14.246036,13.815198,2.449008,27422.53125,5.02788,457.042206,7.61737,680.428101,11.340467,3
2,2010,3,1.77875,359.714386,182.350708,3.281362,2.878495,16.092392,43342.796875,38.107098,...,17.288538,16.216846,2.735305,31881.773438,7.396659,531.362915,8.856049,722.379944,12.039666,3
3,2010,4,0.422862,357.929993,186.750519,2.833333,1.747037,18.965096,45982.90625,35.669998,...,19.701887,18.593388,1.715926,35620.183594,9.272553,593.669739,9.894496,766.381775,12.773029,3
4,2010,5,0.897164,359.123108,191.353516,2.573477,1.681004,20.704971,48272.6875,35.599354,...,21.188807,18.945932,1.681004,36703.542969,11.385653,611.725708,10.195428,804.5448,13.409081,3


##### 🧮 3.2.4. Means overall countries per year

In [26]:
name_folder = 'Data/Gold/OpenMeteo/Historical/Daily'
my_dt = DeltaTable(name_folder).to_pandas()
groupby = ['historical_year']
agg_columns = {'wind_direction_10m_dominant_deg':['min','max','mean']
               ,'precipitation_hours_h':'mean'
               ,'precipitation_sum':'mean'
               ,'apparent_temperature_max_C':'mean'
               ,'daylight_duration_seconds':'mean'
               ,'wind_gusts_10m_max_kmh':'mean'
               ,'snowfall_sum_cm':'mean'
               , 'temperature_2m_min_C':'mean'
               ,'et0_fao_evapotranspiration_mm':'mean'
               ,'wind_speed_10m_max_kmh':'mean'
               ,'showers_sum_mm':'mean'
               ,'temperature_2m_max_C':'mean'
               ,'shortwave_radiation_sum_MJm2':'mean'
               ,'rain_sum_mm':'mean'
               ,'sunshine_duration_seconds':'mean'
               ,'apparent_temperature_min_C':'mean'
               ,'sunshine_duration_minutes':'mean'
               ,'sunshine_duration_hours':'mean'
               ,'daylight_duration_minutes':'mean'
               ,'daylight_duration_hours':'mean'
               ,'weather_code':lambda x: x.mode().iloc[0] }

my_dt =aggregate_dataframe(my_dt, groupby, agg_columns)
my_dt.head(10)

Unnamed: 0,historical_year,wind_direction_10m_dominant_deg_MIN,wind_direction_10m_dominant_deg_MAX,wind_direction_10m_dominant_deg_MEAN,precipitation_hours_h_MEAN,precipitation_sum_MEAN,apparent_temperature_max_C_MEAN,daylight_duration_seconds_MEAN,wind_gusts_10m_max_kmh_MEAN,snowfall_sum_cm_MEAN,...,temperature_2m_max_C_MEAN,shortwave_radiation_sum_MJm2_MEAN,rain_sum_mm_MEAN,sunshine_duration_seconds_MEAN,apparent_temperature_min_C_MEAN,sunshine_duration_minutes_MEAN,sunshine_duration_hours_MEAN,daylight_duration_minutes_MEAN,daylight_duration_hours_MEAN,weather_code_<LAMBDA>
0,2010,0.132633,359.999969,191.827621,3.148706,2.255997,18.372244,43852.160156,36.517315,0.096317,...,19.006775,15.845911,2.129589,32150.509766,9.976358,535.841797,8.930697,730.869324,12.181156,3
1,2011,5e-06,359.999969,184.626678,3.059513,2.208326,18.867092,43852.246094,36.48811,0.081592,...,19.352322,15.849396,2.098387,32184.056641,10.323406,536.40094,8.940016,730.870728,12.181179,3
2,2012,0.123495,359.999969,190.983566,3.036582,2.08221,19.169992,43836.628906,36.722294,0.067652,...,19.657406,15.964025,1.990027,32479.445312,10.669188,541.324097,9.022068,730.610474,12.176842,3
3,2013,4.6e-05,359.999969,188.623444,3.086806,2.114376,19.041803,43842.222656,36.921932,0.107094,...,19.559967,16.047319,1.97119,32162.359375,10.375408,536.039307,8.93399,730.703674,12.178395,3
4,2016,4.6e-05,359.999969,190.417831,3.020188,2.086202,19.230024,43836.765625,36.958687,0.077066,...,19.638399,15.845064,1.982301,32314.154297,10.793228,538.569275,8.976154,730.612793,12.176879,3
5,2018,5e-06,359.999969,189.520172,4.282192,2.848043,17.389984,43887.613281,39.933651,0.096603,...,18.335117,14.799181,2.710391,30154.84375,8.478782,502.580719,8.376346,731.460205,12.191004,3
6,2019,0.063526,359.999969,196.506866,4.147749,2.732407,17.226034,43887.828125,41.119465,0.096096,...,18.368162,15.100665,2.595499,30763.587891,8.119406,512.72644,8.545441,731.463806,12.191063,3
7,2020,0.189092,359.968353,192.767044,3.949649,2.543228,17.491987,43870.316406,41.41869,0.065273,...,18.727514,15.209746,2.450039,31247.123047,8.538176,520.7854,8.679756,731.171936,12.186198,3
8,2021,0.103045,360.0,192.347672,4.163405,2.684755,17.118025,43887.722656,39.702644,0.087822,...,18.251059,15.152159,2.559472,30904.958984,8.357899,515.082642,8.58471,731.462036,12.191034,3
9,2022,0.000128,359.999969,190.866287,3.618004,2.577182,17.658636,43887.480469,38.859848,0.081027,...,18.749727,15.666607,2.461683,31362.736328,8.256819,522.71228,8.711872,731.457947,12.190967,3


In [28]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Historical/Calculations/PerYear'
save_data_as_delta(my_dt, name_folder, mode="overwrite", layer='Gold',source='open-meteo-historical-daily', author='Augustin')

#Verifying the data of the gold layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head(15)

Unnamed: 0,historical_year,wind_direction_10m_dominant_deg_MIN,wind_direction_10m_dominant_deg_MAX,wind_direction_10m_dominant_deg_MEAN,precipitation_hours_h_MEAN,precipitation_sum_MEAN,apparent_temperature_max_C_MEAN,daylight_duration_seconds_MEAN,wind_gusts_10m_max_kmh_MEAN,snowfall_sum_cm_MEAN,...,temperature_2m_max_C_MEAN,shortwave_radiation_sum_MJm2_MEAN,rain_sum_mm_MEAN,sunshine_duration_seconds_MEAN,apparent_temperature_min_C_MEAN,sunshine_duration_minutes_MEAN,sunshine_duration_hours_MEAN,daylight_duration_minutes_MEAN,daylight_duration_hours_MEAN,weather_code_<LAMBDA>
0,2010,0.132633,359.999969,191.827621,3.148706,2.255997,18.372244,43852.160156,36.517315,0.096317,...,19.006775,15.845911,2.129589,32150.509766,9.976358,535.841797,8.930697,730.869324,12.181156,3
1,2011,5e-06,359.999969,184.626678,3.059513,2.208326,18.867092,43852.246094,36.48811,0.081592,...,19.352322,15.849396,2.098387,32184.056641,10.323406,536.40094,8.940016,730.870728,12.181179,3
2,2012,0.123495,359.999969,190.983566,3.036582,2.08221,19.169992,43836.628906,36.722294,0.067652,...,19.657406,15.964025,1.990027,32479.445312,10.669188,541.324097,9.022068,730.610474,12.176842,3
3,2013,4.6e-05,359.999969,188.623444,3.086806,2.114376,19.041803,43842.222656,36.921932,0.107094,...,19.559967,16.047319,1.97119,32162.359375,10.375408,536.039307,8.93399,730.703674,12.178395,3
4,2016,4.6e-05,359.999969,190.417831,3.020188,2.086202,19.230024,43836.765625,36.958687,0.077066,...,19.638399,15.845064,1.982301,32314.154297,10.793228,538.569275,8.976154,730.612793,12.176879,3
5,2018,5e-06,359.999969,189.520172,4.282192,2.848043,17.389984,43887.613281,39.933651,0.096603,...,18.335117,14.799181,2.710391,30154.84375,8.478782,502.580719,8.376346,731.460205,12.191004,3
6,2019,0.063526,359.999969,196.506866,4.147749,2.732407,17.226034,43887.828125,41.119465,0.096096,...,18.368162,15.100665,2.595499,30763.587891,8.119406,512.72644,8.545441,731.463806,12.191063,3
7,2020,0.189092,359.968353,192.767044,3.949649,2.543228,17.491987,43870.316406,41.41869,0.065273,...,18.727514,15.209746,2.450039,31247.123047,8.538176,520.7854,8.679756,731.171936,12.186198,3
8,2021,0.103045,360.0,192.347672,4.163405,2.684755,17.118025,43887.722656,39.702644,0.087822,...,18.251059,15.152159,2.559472,30904.958984,8.357899,515.082642,8.58471,731.462036,12.191034,3
9,2022,0.000128,359.999969,190.866287,3.618004,2.577182,17.658636,43887.480469,38.859848,0.081027,...,18.749727,15.666607,2.461683,31362.736328,8.256819,522.71228,8.711872,731.457947,12.190967,3


##### 🧮 3.2.5. Means forecast per city per day

In [18]:
name_folder = 'Data/Gold/OpenMeteo/Forecast/Hourly'
my_dt = DeltaTable(name_folder).to_pandas()

groupby = ['requested_date', 'city', 'forecast_date']
agg_columns = {'temperature_2m_C': 'mean',
    'relative_humidity_2m_inPercent': 'mean',
    'apparent_temperature_C': 'mean',
    'wind_speed_10m_kmh': 'mean',
    'precipitation_mm': 'mean',
    'rain_mm': 'mean',
    'snowfall_cm': 'mean',
    'pressure_msl_hPa': 'mean',
    'cloud_cover_inPercent': 'mean',
    'visibility_m': 'mean',
    'soil_temperature_0cm_C': 'mean',
    'soil_moisture_0_to_1cm_m3m3': 'mean'
}
my_dt =aggregate_dataframe(my_dt, groupby, agg_columns)


my_dt.head(5)

Unnamed: 0,requested_date,city,forecast_date,temperature_2m_C_MEAN,relative_humidity_2m_inPercent_MEAN,apparent_temperature_C_MEAN,wind_speed_10m_kmh_MEAN,precipitation_mm_MEAN,rain_mm_MEAN,snowfall_cm_MEAN,pressure_msl_hPa_MEAN,cloud_cover_inPercent_MEAN,visibility_m_MEAN,soil_temperature_0cm_C_MEAN,soil_moisture_0_to_1cm_m3m3_MEAN
0,2025-04-11,Barcelona,2025-04-10,15.118917,83.916664,14.993148,7.029439,0.0,0.0,0.0,1021.270813,69.916664,18199.166016,17.343916,0.262292
1,2025-04-11,Barcelona,2025-04-11,14.221001,83.333336,13.796891,6.2861,0.0,0.0,0.0,1019.845886,87.25,21068.333984,16.631416,0.259292
2,2025-04-11,Barcelona,2025-04-12,15.623084,76.5,15.278325,6.146988,0.0,0.0,0.0,1012.06665,93.958336,33533.332031,17.810583,0.255042
3,2025-04-11,Barcelona,2025-04-13,15.798083,83.291664,16.019545,5.351457,0.016667,0.0,0.0,1008.225037,99.166664,19317.5,17.843916,0.25375
4,2025-04-11,Barcelona,2025-04-14,16.148085,81.291664,15.894337,7.985165,0.304167,0.0,0.0,1006.420837,92.875,25020.833984,17.771,0.316458


In [19]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Forecast/Calculations/PerCityPerDay'
save_data_as_delta(my_dt, name_folder, mode="overwrite", layer='Gold',source='open-meteo-forecast-daily', author='Augustin')

# Verifying the data of the gold layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()

Unnamed: 0,requested_date,city,forecast_date,temperature_2m_C_MEAN,relative_humidity_2m_inPercent_MEAN,apparent_temperature_C_MEAN,wind_speed_10m_kmh_MEAN,precipitation_mm_MEAN,rain_mm_MEAN,snowfall_cm_MEAN,pressure_msl_hPa_MEAN,cloud_cover_inPercent_MEAN,visibility_m_MEAN,soil_temperature_0cm_C_MEAN,soil_moisture_0_to_1cm_m3m3_MEAN
0,2025-04-11,Barcelona,2025-04-10,15.118917,83.916664,14.993148,7.029439,0.0,0.0,0.0,1021.270813,69.916664,18199.166016,17.343916,0.262292
1,2025-04-11,Barcelona,2025-04-11,14.221001,83.333336,13.796891,6.2861,0.0,0.0,0.0,1019.845886,87.25,21068.333984,16.631416,0.259292
2,2025-04-11,Barcelona,2025-04-12,15.623084,76.5,15.278325,6.146988,0.0,0.0,0.0,1012.06665,93.958336,33533.332031,17.810583,0.255042
3,2025-04-11,Barcelona,2025-04-13,15.798083,83.291664,16.019545,5.351457,0.016667,0.0,0.0,1008.225037,99.166664,19317.5,17.843916,0.25375
4,2025-04-11,Barcelona,2025-04-14,16.148085,81.291664,15.894337,7.985165,0.304167,0.0,0.0,1006.420837,92.875,25020.833984,17.771,0.316458


##### 🧮 3.2.6. Means forecast per city

In [20]:
name_folder = 'Data/Gold/OpenMeteo/Forecast/Hourly'
my_dt = DeltaTable(name_folder).to_pandas()
days_of_forecast =len(my_dt['forecast_date'].unique())


groupby = ['requested_date', 'city']
agg_columns = {'temperature_2m_C': 'mean',
    'relative_humidity_2m_inPercent': 'mean',
    'apparent_temperature_C': 'mean',
    'wind_speed_10m_kmh': 'mean',
    'precipitation_mm': 'mean',
    'rain_mm': 'mean',
    'snowfall_cm': 'mean',
    'pressure_msl_hPa': 'mean',
    'cloud_cover_inPercent': 'mean',
    'visibility_m': 'mean',
    'soil_temperature_0cm_C': 'mean',
    'soil_moisture_0_to_1cm_m3m3': 'mean'
}

my_dt =aggregate_dataframe(my_dt, groupby, agg_columns)

#insering the nb of day of forecast
my_dt.insert(2,'nb_day_of_forecast',days_of_forecast)

my_dt.head(5)

Unnamed: 0,requested_date,city,nb_day_of_forecast,temperature_2m_C_MEAN,relative_humidity_2m_inPercent_MEAN,apparent_temperature_C_MEAN,wind_speed_10m_kmh_MEAN,precipitation_mm_MEAN,rain_mm_MEAN,snowfall_cm_MEAN,pressure_msl_hPa_MEAN,cloud_cover_inPercent_MEAN,visibility_m_MEAN,soil_temperature_0cm_C_MEAN,soil_moisture_0_to_1cm_m3m3_MEAN
0,2025-04-11,Barcelona,7,15.381834,81.666664,15.19645,6.55983,0.064167,0.0,0.0,1013.565796,88.633331,23427.833984,17.480167,0.269367
1,2025-04-11,Brussels,7,13.035937,66.01667,10.678061,10.543744,0.025833,0.025833,0.0,1014.97168,53.266666,44215.5,11.285833,0.25535
2,2025-04-11,Buenos Aires,7,18.727583,80.050003,19.122736,9.891475,0.0225,0.0,0.0,1017.578308,59.75,23709.833984,19.090916,0.378008
3,2025-04-11,Chicago,7,7.930583,71.675003,4.565909,13.537216,0.039167,0.031667,0.000583,1016.190796,68.541664,21355.333984,9.665584,0.289883
4,2025-04-11,London,7,12.08815,62.408333,9.589204,8.8247,0.0,0.0,0.0,1013.420837,53.658333,52578.832031,14.158566,0.242367


In [21]:
#STORING THE DATA
name_folder = 'Data/Gold/OpenMeteo/Forecast/Calculations/PerCity'
save_data_as_delta(my_dt, name_folder, mode="overwrite", layer='Gold',source='open-meteo-forecast-daily', author='Augustin')

#Verifying the data of the gold layer
my_dt = DeltaTable(name_folder).to_pandas()
my_dt.head()

Unnamed: 0,requested_date,city,nb_day_of_forecast,temperature_2m_C_MEAN,relative_humidity_2m_inPercent_MEAN,apparent_temperature_C_MEAN,wind_speed_10m_kmh_MEAN,precipitation_mm_MEAN,rain_mm_MEAN,snowfall_cm_MEAN,pressure_msl_hPa_MEAN,cloud_cover_inPercent_MEAN,visibility_m_MEAN,soil_temperature_0cm_C_MEAN,soil_moisture_0_to_1cm_m3m3_MEAN
0,2025-04-11,Barcelona,7,15.381834,81.666664,15.19645,6.55983,0.064167,0.0,0.0,1013.565796,88.633331,23427.833984,17.480167,0.269367
1,2025-04-11,Brussels,7,13.035937,66.01667,10.678061,10.543744,0.025833,0.025833,0.0,1014.97168,53.266666,44215.5,11.285833,0.25535
2,2025-04-11,Buenos Aires,7,18.727583,80.050003,19.122736,9.891475,0.0225,0.0,0.0,1017.578308,59.75,23709.833984,19.090916,0.378008
3,2025-04-11,Chicago,7,7.930583,71.675003,4.565909,13.537216,0.039167,0.031667,0.000583,1016.190796,68.541664,21355.333984,9.665584,0.289883
4,2025-04-11,London,7,12.08815,62.408333,9.589204,8.8247,0.0,0.0,0.0,1013.420837,53.658333,52578.832031,14.158566,0.242367


**📊 CHECK GOLD TABLE STATS AND COMPARING WITH SILVER: Rows, Nulls, Duplicates**

In [22]:
#Checking gold layer
name_folder = 'Data/_meta/metadata_table'
my_dt = DeltaTable(name_folder).to_pandas()
my_dt = my_dt[my_dt['layer']=='Gold']
my_dt.head(15)

Unnamed: 0,table_path,table_name,layer,total_rows,rows_with_nulls,rows_duplicated,columns,dtypes,delta_table_size_MB,file_count,updated_at,created_at,source,author
0,Data/Gold/OpenMeteo/Forecast/Calculations/PerCity,PerCity,Gold,33,0,0,"[""requested_date"", ""city"", ""nb_day_of_forecast...","{""requested_date"": ""datetime64[us]"", ""city"": ""...",0.08,18,2025-04-12 23:46:24.577568,2025-04-12 16:20:03.649771,open-meteo-forecast-daily,Augustin
1,Data/Gold/OpenMeteo/Forecast/Calculations/PerC...,PerCityPerDay,Gold,193,0,0,"[""requested_date"", ""city"", ""forecast_date"", ""t...","{""requested_date"": ""datetime64[us]"", ""city"": ""...",0.12,18,2025-04-12 23:46:24.467994,2025-04-12 16:20:03.547844,open-meteo-forecast-daily,Augustin
2,Data/Gold/OpenMeteo/Historical/Calculations/Pe...,PerYear,Gold,12,0,0,"[""historical_year"", ""wind_direction_10m_domina...","{""historical_year"": ""object"", ""wind_direction_...",0.11,18,2025-04-12 23:46:24.365032,2025-04-12 16:20:03.449528,open-meteo-historical-daily,Augustin
3,Data/Gold/OpenMeteo/Historical/Calculations/Pe...,PerMonth,Gold,144,0,0,"[""historical_year"", ""historical_month"", ""wind_...","{""historical_year"": ""object"", ""historical_mont...",0.2,18,2025-04-12 23:46:24.217049,2025-04-12 16:20:03.309486,open-meteo-historical-daily,Augustin
4,Data/Gold/OpenMeteo/Historical/Calculations/Pe...,PerCityPerYear,Gold,189,0,0,"[""city"", ""historical_year"", ""wind_direction_10...","{""city"": ""object"", ""historical_year"": ""object""...",0.22,18,2025-04-12 23:46:24.034897,2025-04-12 16:20:03.182500,open-meteo-historical-daily,Augustin
5,Data/Gold/OpenMeteo/Historical/Calculations/Pe...,PerCityPerMonth,Gold,2268,0,0,"[""city"", ""historical_year"", ""historical_month""...","{""city"": ""object"", ""historical_year"": ""object""...",1.42,18,2025-04-12 23:46:23.880231,2025-04-12 16:20:03.054421,open-meteo-historical-daily,Augustin
6,Data/Gold/OpenMeteo/Historical/Daily,Daily,Gold,69049,1,0,"[""city"", ""historical_date"", ""historical_year"",...","{""city"": ""object"", ""historical_date"": ""datetim...",5.39,18,2025-04-12 23:46:23.617671,2025-04-12 16:20:02.856602,open-meteo-historical-daily,Augustin
7,Data/Gold/OpenMeteo/Forecast/Hourly,Hourly,Gold,4632,9,0,"[""requested_date"", ""city"", ""forecast_date"", ""f...","{""requested_date"": ""datetime64[us]"", ""city"": ""...",5.52,41,2025-04-12 23:46:23.165166,2025-04-12 16:20:02.576407,open-meteo-forecast-hourly,Augustin
8,Data/Gold/OpenMeteo/Forecast/Daily,Daily,Gold,231,4,0,"[""requested_date"", ""city"", ""forecast_day"", ""we...","{""requested_date"": ""datetime64[us]"", ""city"": ""...",0.09,8,2025-04-12 23:46:22.932314,2025-04-12 16:20:02.373351,open-meteo-forecast-daily,Augustin
9,Data/Gold/OpenMeteo/Current,Current,Gold,19,0,0,"[""date"", ""time"", ""city"", ""longitude"", ""latitud...","{""date"": ""datetime64[us]"", ""time"": ""object"", ""...",0.12,18,2025-04-12 23:46:22.709489,2025-04-12 16:20:02.224662,open-meteo-current,Augustin


In [23]:
#Comparating gold and silver
name_folder = 'Data/_meta/metadata_table'
my_dt = DeltaTable(name_folder).to_pandas()
my_dt = my_dt[(my_dt['layer'] == 'Gold') | (my_dt['layer'] == 'Silver')]
row_counts_per_table = pd.DataFrame({
    "layer":my_dt["layer"],
    "table_name": my_dt["table_name"],
    "table_path": my_dt["table_path"],
    "total_rows": my_dt['total_rows'],
    "rows_with_at_least_one_nulls":my_dt['rows_with_nulls'],
    "rows_duplicated":my_dt['rows_duplicated']
})

row_counts_per_table.sort_values(by='table_path').head(30)

Unnamed: 0,layer,table_name,table_path,total_rows,rows_with_at_least_one_nulls,rows_duplicated
9,Gold,Current,Data/Gold/OpenMeteo/Current,19,0,0
0,Gold,PerCity,Data/Gold/OpenMeteo/Forecast/Calculations/PerCity,33,0,0
1,Gold,PerCityPerDay,Data/Gold/OpenMeteo/Forecast/Calculations/PerC...,193,0,0
8,Gold,Daily,Data/Gold/OpenMeteo/Forecast/Daily,231,4,0
7,Gold,Hourly,Data/Gold/OpenMeteo/Forecast/Hourly,4632,9,0
5,Gold,PerCityPerMonth,Data/Gold/OpenMeteo/Historical/Calculations/Pe...,2268,0,0
4,Gold,PerCityPerYear,Data/Gold/OpenMeteo/Historical/Calculations/Pe...,189,0,0
3,Gold,PerMonth,Data/Gold/OpenMeteo/Historical/Calculations/Pe...,144,0,0
2,Gold,PerYear,Data/Gold/OpenMeteo/Historical/Calculations/Pe...,12,0,0
6,Gold,Daily,Data/Gold/OpenMeteo/Historical/Daily,69049,1,0


In [24]:
export_metadata_to_excel(layer='Gold')

✅ Métadonnées exportées avec succès dans : logs/2025-04-12/gold_metadata_20h46.xlsx
