# Climate indicators

The notebook titled climate_indicators.ipynb from the FAIRiCUBE Urban Climate Use Case repository focuses on computing and analyzing climate-related indicators for European cities. It serves as a foundational step in understanding urban climate dynamics, which is essential for informed urban planning and climate adaptation strategies.


This notebook is designed to break down different climate indicators.
The datasource are ERA-5 Data on:
https://cloud.google.com/storage/docs/public-datasets/era5?hl=de

We use this data source because the data is available there in an analysis ready cloud-optimised format.

The selected climate indices are:
- Number of summer days
- Number of tropical nights
- 2m temperature statistics (mean, std, min, max)
- total precipitation (mean, std, min, max)


This notebook is designed to:
- Access Climate Data: Retrieve gridded climate datasets, such as temperature and humidity, from sources like the Copernicus Climate Data Store.
- Compute Climate Indicators: Calculate various climate indicators, including:
    - Average annual temperature
    - Number of summer days (e.g., days with temperatures above 25°C)
    - Number of tropical nights (e.g., nights with temperatures above 20°C)
    - Universal Thermal Climate Index (UTCI) metrics
- Spatial Analysis: Aggregate and analyze these indicators at the city level, aligning with urban boundaries to provide city-specific insights.
- Data Integration: Prepare the computed indicators for integration with other datasets, such as socio-economic or land use data, facilitating comprehensive urban climate assessments.

In [1]:
# reading LIBS
import sqlalchemy as sa # conection to the database
from datetime import datetime, timedelta
#import timezonefinder
from configparser import ConfigParser
import matplotlib.pyplot as plt
import pandas as pd
import xarray as xr      
import fsspec

print ("lib done")

lib done


In [2]:
from dask.distributed import Client, performance_report
client = Client()  # Connect to distributed cluster and override default
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 30.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:43551,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 30.00 GiB

0,1
Comm: tcp://127.0.0.1:37557,Total threads: 2
Dashboard: http://127.0.0.1:45909/status,Memory: 7.50 GiB
Nanny: tcp://127.0.0.1:43673,
Local directory: /tmp/dask-worker-space/worker-w7jb0cuu,Local directory: /tmp/dask-worker-space/worker-w7jb0cuu

0,1
Comm: tcp://127.0.0.1:34631,Total threads: 2
Dashboard: http://127.0.0.1:41929/status,Memory: 7.50 GiB
Nanny: tcp://127.0.0.1:45829,
Local directory: /tmp/dask-worker-space/worker-ktoixwpq,Local directory: /tmp/dask-worker-space/worker-ktoixwpq

0,1
Comm: tcp://127.0.0.1:38761,Total threads: 2
Dashboard: http://127.0.0.1:33713/status,Memory: 7.50 GiB
Nanny: tcp://127.0.0.1:41833,
Local directory: /tmp/dask-worker-space/worker-o3uilh13,Local directory: /tmp/dask-worker-space/worker-o3uilh13

0,1
Comm: tcp://127.0.0.1:42023,Total threads: 2
Dashboard: http://127.0.0.1:45901/status,Memory: 7.50 GiB
Nanny: tcp://127.0.0.1:43599,
Local directory: /tmp/dask-worker-space/worker-tf26lnmr,Local directory: /tmp/dask-worker-space/worker-tf26lnmr


In [3]:
# check data:

fs = fsspec.filesystem('gs')
fs.ls('gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2/')


arco_era5 = xr.open_zarr(
    'gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2', 
    # chunks={'time': 24*30},
    consolidated=True,
    drop_variables=["10m_u_component_of_wind",  
    "10m_v_component_of_wind",
    # "2m_temperature",
    "angle_of_sub_gridscale_orography",
    "anisotropy_of_sub_gridscale_orography",
    "geopotential",
    "geopotential_at_surface",
    "high_vegetation_cover",
    "lake_cover",
    "lake_depth",
    "land_sea_mask",
    "low_vegetation_cover",
    "mean_sea_level_pressure",
    "sea_ice_cover",
    "sea_surface_temperature",
    "slope_of_sub_gridscale_orography",
    "soil_type",
    "specific_humidity",
    "standard_deviation_of_filtered_subgrid_orography",
    "standard_deviation_of_orography",
    "surface_pressure",
    "temperature",
    "toa_incident_solar_radiation",
    "total_cloud_cover",
    "total_column_water_vapour",
    # "total_precipitation",
    "type_of_high_vegetation",
    "type_of_low_vegetation",
    "u_component_of_wind",
    "v_component_of_wind",
    "vertical_velocity"]
)
arco_era5


#for var in arco_era5.data_vars:
#    print(var)

Unnamed: 0,Array,Chunk
Bytes,2.09 TiB,3.96 MiB
Shape,"(552264, 721, 1440)","(1, 721, 1440)"
Dask graph,552264 chunks in 2 graph layers,552264 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 2.09 TiB 3.96 MiB Shape (552264, 721, 1440) (1, 721, 1440) Dask graph 552264 chunks in 2 graph layers Data type float32 numpy.ndarray",1440  721  552264,

Unnamed: 0,Array,Chunk
Bytes,2.09 TiB,3.96 MiB
Shape,"(552264, 721, 1440)","(1, 721, 1440)"
Dask graph,552264 chunks in 2 graph layers,552264 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.09 TiB,3.96 MiB
Shape,"(552264, 721, 1440)","(1, 721, 1440)"
Dask graph,552264 chunks in 2 graph layers,552264 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 2.09 TiB 3.96 MiB Shape (552264, 721, 1440) (1, 721, 1440) Dask graph 552264 chunks in 2 graph layers Data type float32 numpy.ndarray",1440  721  552264,

Unnamed: 0,Array,Chunk
Bytes,2.09 TiB,3.96 MiB
Shape,"(552264, 721, 1440)","(1, 721, 1440)"
Dask graph,552264 chunks in 2 graph layers,552264 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Testing the data source and data:

In [None]:
arco_era5['2m_temperature_degree_celcius'].sel(time=slice(start_date, end_date)).sel(longitude=lon_city[0],latitude=lat_city[0], method="ffill")

In [None]:
# subeset of the big climate data cube:

# select 1 year

##PARAMETER+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++S
## Lux -location: AOI
lat_city=[37.8193869629]  #### y
lon_city=[lon_to_360(-25.7287017223)]   #### x
# subest time: TIME-RANGE
start_date = "2020-12-01"; end_date = "2020-12-31";   # TESTING only on month in 2020
##PARAMETER+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++E
# update values:  # Kelvin to Degree:
arco_era5['2m_temperature_degree_celcius'] = arco_era5['2m_temperature'] - 273.15

# subeset cube:
#arco_era5_2018_luxembourg = arco_era5['2m_temperature'].sel(time=slice(start_date, end_date),longitude=lon_city,latitude=lat_city)
arco_era5_2018_luxembourg = arco_era5['2m_temperature_degree_celcius'].sel(time=slice(start_date, end_date)).sel(longitude=lon_city[0],latitude=lat_city[0], method="ffill")

## calculation of different indicators:


## 1. TEMPERATURE indicicators:

## DAY and NIGHT SET:
START_NIGHT_h = 18  
END_NIGHT_h = 6

START_DAY_h = 6
END_DAY_h = 18


## 1.1 MAX and MIN per DAY and NIGHT:

# --------------------------------------------------------------------------------------------------------------

# --------------------------------------------------------------------------------------------------------------

## 1.1.1 MAX
max_temperature_per_day = arco_era5_2018_luxembourg.resample(time='D').max(dim='time')  # calc max and min temp per day
night_time_range = arco_era5_2018_luxembourg.sel(time=((arco_era5_2018_luxembourg['time.hour'] >= START_NIGHT_h) | (arco_era5_2018_luxembourg['time.hour'] < END_NIGHT_h)))
max_night_temperature_per_day = night_time_range.resample(time='D').max(dim='time') 


## MAX temp NIGHT per YEAR
max_night_temperature_per_year = night_time_range.resample(time='Y').max(dim='time') 
#print("Max Temp [°C] in the night per year:")
#print(max_night_temperature_per_year.time.dt.year.values.item())
#print(max_night_temperature_per_year.values.item())
#print("---")


day_time_range = arco_era5_2018_luxembourg.sel(time=((arco_era5_2018_luxembourg['time.hour'] >= START_DAY_h) | (arco_era5_2018_luxembourg['time.hour'] < END_DAY_h)))
max_day_temperature_per_day = day_time_range.resample(time='D').max(dim='time') 

## MAX temp DAY per YEAR
max_day_temperature_per_year = day_time_range.resample(time='Y').max(dim='time') 
#print("Max Temp [°C] in the day per year:")
#print(max_day_temperature_per_year.time.dt.year.values.item())
#print(max_day_temperature_per_year.values.item())
#print("---")


## 1.1.2 MIN
min_temperature_per_day = arco_era5_2018_luxembourg.resample(time='D').min(dim='time')
#night_time_range = arco_era5_2018_luxembourg.sel(time=((arco_era5_2018_luxembourg['time.hour'] >= START_NIGHT_h) | (arco_era5_2018_luxembourg['time.hour'] < END_NIGHT_h)))
min_night_temperature_per_day = night_time_range.resample(time='D').min(dim='time') 
#day_time_range = arco_era5_2018_luxembourg.sel(time=((arco_era5_2018_luxembourg['time.hour'] >= START_DAY_h) | (arco_era5_2018_luxembourg['time.hour'] < END_DAY_h)))
min_day_temperature_per_day = day_time_range.resample(time='D').min(dim='time') 





## 1.2 Number of tropical nights
# Calculate the number of tropical days per year
#tropical_threshold = 30
#tropical_nights_count_per_year = (max_night_temperature_per_year > tropical_threshold).groupby('time.year').sum()
## Print the result
#print("Number of tropical nights per year:")
#print("---")
##print(tropical_nights_count_per_year['year'].values.item())
##print(tropical_nights_count_per_year.values.item())
#tropical_nights_count_per_year_df = tropical_nights_count_per_year.to_dataframe()
#new_column_names = {'2m_temperature_degree_celcius': 'tropical_nights_count_per_year'}
#tropical_nights_count_per_year_df.rename(columns=new_column_names, inplace=True)


#print(tropical_nights_count_per_year_df)
#print("---")




## 1.3 Number of summer days
#Annual count of days when TX (daily maximum temperature) > 25°C. Let TXij be daily minimum temperature on day i in year j. Count the number of days where TXij > 25 °C.
max_temperature_per_day = arco_era5_2018_luxembourg.resample(time='D').max(dim='time')  # calc max and min temp per day

summer_day_threshold = 25
summer_days_count_per_year = (max_temperature_per_day > summer_day_threshold).groupby('time.year').sum()


#print("Number of summer days per year:")
#print("---")
##print(summer_days_count_per_year['year'].values.item())
##print(summer_days_count_per_year.values.item())
#summer_days_count_per_year_df = summer_days_count_per_year.to_dataframe()
#new_column_names = {'2m_temperature_degree_celcius': 'summer_days_count_per_year'}
#summer_days_count_per_year_df.rename(columns=new_column_names, inplace=True)
#print (summer_days_count_per_year_df)
#print("---")
#

#arco_era5_2018_month = arco_era5_2018.resample(time="1M")
#ds =arco_era5_2018_month

#ds_time_subeset = ds.sel(time=slice(start_date, end_date)) 

#ds = ds_time_subeset

#print (arco_era5_2018_luxembourg)
    

#max_temperature_per_day.plot() 

# #


# Create a single subplot with two columns
fig, ax = plt.subplots(figsize=(12, 5))

# Plot the original temperature data
max_night_temperature_per_day.plot(ax=ax, label='max Temperature per NIGHT', color='blue')
max_day_temperature_per_day.plot(ax=ax, label='max Temperature per DAY', color='red')

min_night_temperature_per_day.plot(ax=ax, label='min Temperature per NIGHT', color='green')
min_day_temperature_per_day.plot(ax=ax, label='min Temperature per DAY', color='black')

ax.set_title('Maximum % Minimum Temperature per Day and Night')
ax.set_xlabel('Time')
ax.set_ylabel('Temperature (°C)')
ax.legend()

# Adjust layout to prevent overlap
plt.tight_layout()

# Show the plot
plt.show()

print ("done")


## Filter datacube by location (city centre point coordinates, time)

TODO remove dependency from db

In [4]:
from sqlalchemy import create_engine, text

### SET connection to MS-sql server:
################################################## SET postgre-sql connection:

################################################## read database keys:
def config(filename, section='postgresql'):
    # create a parser
    parser = ConfigParser()
    # read config file
    parser.read(filename)

    # get section, default to postgresql
    db = {}
    if parser.has_section(section):
        params = parser.items(section)
        for param in params:
            db[param[0]] = param[1]
    else:
        raise Exception(
            'Section {0} not found in the {1} file'.format(section, filename))

    return db
#config(filename='./../../../../uc1-urban-climate/database.ini')
keys = config(filename='./../../../../uc1-urban-climate/database.ini')

POSTGRESQL_SERVER_NAME=keys['host']
PORT=                  keys['port']
Database_name =        keys['database']
USER =                 keys['user']
PSW =                  keys['password']
##################################################
                                   
engine_postgresql = sa.create_engine('postgresql://'+USER+':'+PSW+ '@'+POSTGRESQL_SERVER_NAME+':'+str(PORT)+ '/' + Database_name)

## testing reading tables from database:

with engine_postgresql.begin() as conn:
    query = text("""SELECT urau_code, _wgs84y, _wgs84x, time_zone_offset
    FROM lut.l_city_urau2021;""")
    city_center_df = pd.read_sql_query(query, conn)
#print (city_center_df)
city_center_df = city_center_df.reset_index()  # make sure indexes pair with number of rows

city_center_df_r = city_center_df#[city_center_df['urau_code'] == 'PT007C']#[66:] #use this to create dataset subset
# city_center_df_r = city_center_df_r.loc[~(city_center_df_r['urau_code'].isin(['PT001C1', 'PT002C1', 'PT003C1']))]
# get city coordinates
# lonlat_list =[["NL005C", 4.640960, 52.113299], ["NL006C", 5.384670, 52.173656], ["NL007C", 5.921886, 52.189884]]
# helper function
def lon_to_360(dlon: float) -> float:
    return ((360 + (dlon % 360)) % 360)

lon_list = [lon_to_360(val) for val in city_center_df_r["_wgs84x"].values.tolist()]
lat_list = city_center_df_r["_wgs84y"].values.tolist()
city_list = city_center_df_r["urau_code"].values.tolist()
target_lon = xr.DataArray(lon_list, dims="city", coords={"city": city_list})
target_lat = xr.DataArray(lat_list, dims="city", coords={"city": city_list})
time_zone_offset = xr.DataArray(city_center_df_r['time_zone_offset'], dims="city", coords={"city": city_list})


In [5]:
# subest time: TIME-RANGE
start_date = "2018-01-01"
end_date = "2018-12-31"
data = arco_era5.sel(time=slice(start_date, end_date))
## next filter dataframe by city:
data = data.sel(
    longitude=target_lon, 
    latitude=target_lat, method="ffill")

data = xr.merge([data,time_zone_offset])

# ##PARAMETER+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++E
# # update values:  # Kelvin to Degree:
# data['2m_temperature_degree_celcius'] = data['2m_temperature'] - 273.15

# transform into Dask DataFrame to speed up computations
data_df = data.to_dask_dataframe()
data_df = data_df.reset_index()
# select only relevant variables
data_df = data_df[['city', 'time', 'time_zone_offset', 'latitude', 'longitude', '2m_temperature', 'total_precipitation']]

## Summer days and tropical nights

In [6]:
def daytime(x):
    time_local = x.time+timedelta(hours=x.time_zone_offset)
    if(time_local.hour >= 6 and time_local.hour < 18):
        return 1
    else:
        return 0

def night_date(x):
    # assign previous day date to nighttime hours [0..end_night]
    # first adjust to local time
    time_local = x.time+timedelta(hours=x.time_zone_offset)
    if(time_local.hour < 6):
        # date = x.time - timedelta(days=1)
        return x.time - timedelta(days=1)
    else:
        return x.time


data_df['daytime'] = data_df.apply(daytime, axis=1, meta=(None, 'int'))
data_df['time_shifted'] =data_df.apply(night_date, axis=1, meta=(None, 'datetime64[ns]'))
data_df['date'] = data_df.time_shifted.dt.date
# data_df['2m_temperature_degree_celcius'] = data['2m_temperature'] - 273.15

In [7]:
## calculation of different indicators:
## 1.1 MAX and MIN per DAY and NIGHT:
with performance_report(filename="temp_min_2018.html"):
    # temp_max = data_df.groupby(['city', 'daytime', 'date']).max()
    temp_min = data_df.groupby(['city', 'daytime', 'date']).min()
    # --------------------------------------------------------------------------------------------------------------
    # temp_max_c = temp_max.compute()
    temp_min_c = temp_min.compute()
    # --------------------------------------------------------------------------------------------------------------
client.close()



### Filter summer days and tropical nights and yearly count

In [11]:
tropical_night_threshold = 20.0 + 273.15
# summer_day_threshold = 25.0 + 273.15
# # select only days and nights above the threshold
# temp_max_c.reset_index(inplace=True)
# summer_days = temp_max_c.loc[(temp_max_c['2m_temperature'] > summer_day_threshold) & (temp_max_c['daytime'] == 1)].reset_index()
# # count how many days pass the threshold per year
# summer_days_count = summer_days.groupby('city').count()

temp_min_c.reset_index(inplace=True)
tropical = temp_min_c.loc[(temp_min_c['2m_temperature'] > tropical_night_threshold) & (temp_min_c['daytime'] == 0)].reset_index()
tropical_count = tropical.groupby(['city', 'daytime']).count()

# add result to the list of cities and set result to 0 for cities that do not have tropical nights/summer days
# zero is a valid value too!
# summer_days_all = pd.merge(city_center_df_r, summer_days_count['2m_temperature'], how='left', left_on='urau_code', right_on='city').fillna(0)
tropical_nights_all = pd.merge(city_center_df_r, tropical_count['2m_temperature'], how='left', left_on='urau_code', right_on='city').fillna(0)

In [12]:
year = '2018'
# summer_days_count['year'] = year
# summer_days_count.reset_index()[['city', 'index', 'year']].to_csv('summer_days_2018.csv')
# tropical_count['year'] = year
# tropical_count.reset_index()[['city', 'index', 'year']].to_csv('tropical_count_2018.csv')
# summer_days_all
tropical_nights_all

Unnamed: 0,index,urau_code,_wgs84y,_wgs84x,time_zone_offset,2m_temperature
0,0,ES019C,43.284187,-2.973099,1,49.0
1,1,CH012C,47.178167,8.525289,1,1.0
2,2,BG018C,43.283887,23.610568,2,4.0
3,3,IT050C,40.913821,14.788222,1,22.0
4,4,BG014C,41.891262,25.596173,2,17.0
...,...,...,...,...,...,...
724,724,RO010C,46.541113,24.556075,2,0.0
725,725,SE006C,59.944869,17.716293,1,3.0
726,726,SI001C,46.071022,14.495290,1,0.0
727,727,RO028C,44.855644,24.859411,2,0.0


### Save into the database

In [56]:
##################################
## cu_* tables mandatory columns:
# city_code,
# parameter, 
# parameter_id, 
# parameter_value, 
# year, 
# city_code_version, 
# lineage, 
# datasource
table_name = 'cu_city_era5_summer_days'
schema_name = 'cube'

summer_days_all.rename( columns= {'urau_code': 'city_code',
                         '2m_temperature': 'parameter_value'}, inplace=True)
summer_days_all['year'] = year
summer_days_all['parameter'] = 'Count of summer days (>25 degrees) per year per city, based on 5th gen. ECMWF Atmospheric Reanalysis model'
summer_days_all['parameter_id'] = 'city_era5_summer_days_count'
summer_days_all['lineage'] = 'https://github.com/FAIRiCUBE/uc1-urban-climate/blob/master/notebooks/dev/f04_climate_data/climate_indicators.ipynb'
summer_days_all['datasource'] = 'https://cloud.google.com/storage/docs/public-datasets/era5'
summer_days_all.drop(columns=['index', '_wgs84x', '_wgs84y', 'time_zone_offset'], inplace=True)
summer_days_all

Unnamed: 0,city_code,parameter_value,year,parameter,parameter_id,lineage,datasource
0,ES019C,2.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
1,CH012C,68.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2,BG018C,131.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
3,IT050C,101.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
4,BG014C,142.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
...,...,...,...,...,...,...,...
724,RO010C,105.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
725,SE006C,35.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
726,SI001C,46.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
727,RO028C,88.0,2018,Count of summer days (>25 degrees) per year pe...,city_era5_summer_days_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...


In [57]:
summer_days_all.to_sql(table_name, engine_postgresql, schema=schema_name, if_exists='replace')

729

In [13]:
##################################
## cu_* tables mandatory columns:
# city_code,
# parameter, 
# parameter_id, 
# parameter_value, 
# year, 
# city_code_version, 
# lineage, 
# datasource
table_name = 'cu_city_era5_tropical_nights'
schema_name = 'cube'
tropical = tropical_nights_all
tropical.rename( columns= {'city': 'city_code',
                  '2m_temperature': 'parameter_value'}, inplace=True)
tropical['year'] = year
tropical['parameter'] = 'Count of tropical nights (min. temperature >20 degrees) per year per city, based on 5th gen. ECMWF Atmospheric Reanalysis model'
tropical['parameter_id'] = 'city_era5_tropical_nights_count'
tropical['lineage'] = 'https://github.com/FAIRiCUBE/uc1-urban-climate/blob/master/notebooks/dev/f04_climate_data/climate_indicators.ipynb'
tropical['datasource'] = 'https://cloud.google.com/storage/docs/public-datasets/era5'
tropical.drop(columns=['index', '_wgs84x', '_wgs84y', 'time_zone_offset'], inplace=True)
tropical

Unnamed: 0,urau_code,parameter_value,year,parameter,parameter_id,lineage,datasource
0,ES019C,49.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
1,CH012C,1.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2,BG018C,4.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
3,IT050C,22.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
4,BG014C,17.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
...,...,...,...,...,...,...,...
724,RO010C,0.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
725,SE006C,3.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
726,SI001C,0.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
727,RO028C,0.0,2018,Count of tropical nights (min. temperature >20...,city_era5_tropical_nights_count,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...


In [14]:
tropical.to_sql(table_name, engine_postgresql, schema=schema_name, if_exists='replace')

729

## Daily temperature stats

In [6]:
with performance_report(filename="climate_stats_2018.html"):
    climate_stats = data_df.groupby('city')[['2m_temperature', 'total_precipitation']].agg(['mean', 'std', 'min', 'max'])
    climate_stats_c = climate_stats.compute()

ERROR:tornado.application:Exception in callback <bound method BokehTornado._keep_alive of <bokeh.server.tornado.BokehTornado object at 0x7fa376c6e8e0>>
Traceback (most recent call last):
  File "/home/conda/fairicubeuc1/467086c01351ef630f14ef1c1e2c3607605786e25d92214e8ed7a5ca85b36969-20230703-150612-959289-12-fairicube_env/lib/python3.9/site-packages/tornado/ioloop.py", line 921, in _run
    val = self.callback()
  File "/home/conda/fairicubeuc1/467086c01351ef630f14ef1c1e2c3607605786e25d92214e8ed7a5ca85b36969-20230703-150612-959289-12-fairicube_env/lib/python3.9/site-packages/bokeh/server/tornado.py", line 760, in _keep_alive
    c.send_ping()
  File "/home/conda/fairicubeuc1/467086c01351ef630f14ef1c1e2c3607605786e25d92214e8ed7a5ca85b36969-20230703-150612-959289-12-fairicube_env/lib/python3.9/site-packages/bokeh/server/connection.py", line 93, in send_ping
    self._socket.ping(str(self._ping_count).encode("utf-8"))
  File "/home/conda/fairicubeuc1/467086c01351ef630f14ef1c1e2c360760578

In [15]:
client.close()

In [7]:
climate_stats_c

Unnamed: 0_level_0,2m_temperature,2m_temperature,2m_temperature,2m_temperature,total_precipitation,total_precipitation,total_precipitation,total_precipitation
Unnamed: 0_level_1,mean,std,min,max,mean,std,min,max
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
ES019C,288.266296,4.468978,274.669495,299.118530,0.000152,0.000353,-3.725290e-09,0.005259
CH012C,284.568481,8.230749,260.960510,306.778198,0.000128,0.000372,-3.725290e-09,0.007403
BG018C,285.901049,10.107052,256.783844,306.631409,0.000083,0.000284,-3.725290e-09,0.007257
IT050C,288.210292,7.184011,264.208771,305.620911,0.000166,0.000534,-3.725290e-09,0.009408
BG014C,287.567296,9.539940,256.713562,308.798645,0.000084,0.000342,-3.725290e-09,0.006873
...,...,...,...,...,...,...,...,...
RO010C,284.615500,9.644480,260.641907,304.793335,0.000073,0.000248,-3.725290e-09,0.005556
SE006C,280.396224,9.955242,256.820801,305.406708,0.000057,0.000238,-3.725290e-09,0.005981
SI001C,283.285869,8.665122,256.816132,303.702789,0.000159,0.000543,-3.725290e-09,0.013283
RO028C,283.647069,9.588638,255.702881,303.032104,0.000100,0.000352,-3.725290e-09,0.007217


### Save into the database

In [18]:
year = '2018'
precipitation_stats = climate_stats_c[['total_precipitation']]
precipitation_stats.columns = precipitation_stats.columns.droplevel(0)
precipitation_stats.reset_index(inplace=True)
precipitation_stats

temperature_stats = climate_stats_c[['2m_temperature']]
temperature_stats.columns = temperature_stats.columns.droplevel(0)
temperature_stats.reset_index(inplace=True)
temperature_stats

Unnamed: 0,city,mean,std,min,max
0,ES019C,288.266296,4.468978,274.669495,299.118530
1,CH012C,284.568481,8.230749,260.960510,306.778198
2,BG018C,285.901049,10.107052,256.783844,306.631409
3,IT050C,288.210292,7.184011,264.208771,305.620911
4,BG014C,287.567296,9.539940,256.713562,308.798645
...,...,...,...,...,...
724,RO010C,284.615500,9.644480,260.641907,304.793335
725,SE006C,280.396224,9.955242,256.820801,305.406708
726,SI001C,283.285869,8.665122,256.816132,303.702789
727,RO028C,283.647069,9.588638,255.702881,303.032104


#### Long tables (cu_* tables)

In [19]:
schema_name = 'cube'
table_name = 'cu_city_era5_total_precipitation'
precipitation_stats.rename(columns={'mean': 'parameter_value-city_era5_total_precipitation_yearly_mean',
                     'std' : 'parameter_value-city_era5_total_precipitation_yearly_std',
                     'min' : 'parameter_value-city_era5_total_precipitation_yearly_min',
                     'max' : 'parameter_value-city_era5_total_precipitation_yearly_max',
                     'city' : 'city_code'}, inplace=True)
precipitation_stats_long = pd.wide_to_long(precipitation_stats, stubnames='parameter_value', 
                                  i='city_code', 
                                  j='parameter_id', 
                                  sep='-', 
                                  suffix='(city_era5_total_precipitation_yearly_[A-z]*)')
precipitation_stats_long.reset_index(inplace=True)
##################################
## cu_* tables mandatory columns:
# city_code,
# parameter, 
# parameter_id, 
# parameter_value, 
# year, 
# city_code_version, 
# lineage, 
# datasource

precipitation_stats_long['year'] = year
precipitation_stats_long['parameter'] = precipitation_stats_long['parameter_id']
precipitation_stats_long.loc[precipitation_stats_long['parameter_id'] == 'city_era5_total_precipitation_yearly_mean', 'parameter'] ='hourly total precipitation averaged in city over one year, based on 5th gen. ECMWF Atmospheric Reanalysis model'
precipitation_stats_long.loc[precipitation_stats_long['parameter_id'] == 'city_era5_total_precipitation_yearly_std', 'parameter'] = 'hourly total precipitation standard deviation in city over one year, based on 5th gen. ECMWF Atmospheric Reanalysis model'
precipitation_stats_long.loc[precipitation_stats_long['parameter_id'] == 'city_era5_total_precipitation_yearly_min', 'parameter'] = 'hourly total precipitation minimum in city within one year, based on 5th gen. ECMWF Atmospheric Reanalysis model'
precipitation_stats_long.loc[precipitation_stats_long['parameter_id'] == 'city_era5_total_precipitation_yearly_max', 'parameter'] = 'hourly total precipitation maximum in city within one year, based on 5th gen. ECMWF Atmospheric Reanalysis model'
precipitation_stats_long['city_code_version'] = 'ua_2021'
precipitation_stats_long['lineage'] = 'https://github.com/FAIRiCUBE/uc1-urban-climate/blob/master/notebooks/dev/f04_climate_data/climate_indicators.ipynb'
precipitation_stats_long['datasource'] = 'https://cloud.google.com/storage/docs/public-datasets/era5'
precipitation_stats_long

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  precipitation_stats.rename(columns={'mean': 'parameter_value-city_era5_total_precipitation_yearly_mean',


Unnamed: 0,city_code,parameter_id,parameter_value,year,parameter,city_code_version,lineage,datasource
0,ES019C,city_era5_total_precipitation_yearly_mean,0.000152,2018,hourly total precipitation averaged in city ov...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
1,CH012C,city_era5_total_precipitation_yearly_mean,0.000128,2018,hourly total precipitation averaged in city ov...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2,BG018C,city_era5_total_precipitation_yearly_mean,0.000083,2018,hourly total precipitation averaged in city ov...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
3,IT050C,city_era5_total_precipitation_yearly_mean,0.000166,2018,hourly total precipitation averaged in city ov...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
4,BG014C,city_era5_total_precipitation_yearly_mean,0.000084,2018,hourly total precipitation averaged in city ov...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
...,...,...,...,...,...,...,...,...
2911,RO010C,city_era5_total_precipitation_yearly_max,0.005556,2018,hourly total precipitation maximum in city wit...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2912,SE006C,city_era5_total_precipitation_yearly_max,0.005981,2018,hourly total precipitation maximum in city wit...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2913,SI001C,city_era5_total_precipitation_yearly_max,0.013283,2018,hourly total precipitation maximum in city wit...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2914,RO028C,city_era5_total_precipitation_yearly_max,0.007217,2018,hourly total precipitation maximum in city wit...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...


In [20]:
precipitation_stats_long.to_sql(table_name, engine_postgresql, schema=schema_name, if_exists='replace')

916

In [21]:
schema_name = 'cube'
table_name = 'cu_city_era5_2m_temperature'
temperature_stats.rename(columns={'mean': 'parameter_value-city_era5_2m_temperature_yearly_mean',
                     'std' : 'parameter_value-city_era5_2m_temperature_yearly_std',
                     'min' : 'parameter_value-city_era5_2m_temperature_yearly_min',
                     'max' : 'parameter_value-city_era5_2m_temperature_yearly_max',
                     'city' : 'city_code'}, inplace=True)
temperature_stats_long = pd.wide_to_long(temperature_stats, stubnames='parameter_value', 
                                  i='city_code', 
                                  j='parameter_id', 
                                  sep='-', 
                                  suffix='(city_era5_2m_temperature_yearly_[A-z]*)')
temperature_stats_long.reset_index(inplace=True)
##################################
## cu_* tables mandatory columns:
# city_code,
# parameter, 
# parameter_id, 
# parameter_value, 
# year, 
# city_code_version, 
# lineage, 
# datasource

temperature_stats_long['year'] = year
temperature_stats_long['parameter'] = temperature_stats_long['parameter_id']
temperature_stats_long.loc[temperature_stats_long['parameter_id'] == 'city_era5_2m_temperature_yearly_mean', 'parameter'] ='hourly 2m temperature averaged in city over one year, based on 5th gen. ECMWF Atmospheric Reanalysis model'
temperature_stats_long.loc[temperature_stats_long['parameter_id'] == 'city_era5_2m_temperature_yearly_std', 'parameter'] = 'hourly 2m temperature standard deviation in city over one year, based on 5th gen. ECMWF Atmospheric Reanalysis model'
temperature_stats_long.loc[temperature_stats_long['parameter_id'] == 'city_era5_2m_temperature_yearly_min', 'parameter'] = 'hourly 2m temperature minimum in city within one year, based on 5th gen. ECMWF Atmospheric Reanalysis model'
temperature_stats_long.loc[temperature_stats_long['parameter_id'] == 'city_era5_2m_temperature_yearly_max', 'parameter'] = 'hourly 2m temperature maximum in city within one year, based on 5th gen. ECMWF Atmospheric Reanalysis model'
temperature_stats_long['city_code_version'] = 'ua_2021'
temperature_stats_long['lineage'] = 'https://github.com/FAIRiCUBE/uc1-urban-climate/blob/master/notebooks/dev/f04_climate_data/climate_indicators.ipynb'
temperature_stats_long['datasource'] = 'https://cloud.google.com/storage/docs/public-datasets/era5'
temperature_stats_long

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  temperature_stats.rename(columns={'mean': 'parameter_value-city_era5_2m_temperature_yearly_mean',


Unnamed: 0,city_code,parameter_id,parameter_value,year,parameter,city_code_version,lineage,datasource
0,ES019C,city_era5_2m_temperature_yearly_mean,288.266296,2018,hourly 2m temperature averaged in city over on...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
1,CH012C,city_era5_2m_temperature_yearly_mean,284.568481,2018,hourly 2m temperature averaged in city over on...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2,BG018C,city_era5_2m_temperature_yearly_mean,285.901049,2018,hourly 2m temperature averaged in city over on...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
3,IT050C,city_era5_2m_temperature_yearly_mean,288.210292,2018,hourly 2m temperature averaged in city over on...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
4,BG014C,city_era5_2m_temperature_yearly_mean,287.567296,2018,hourly 2m temperature averaged in city over on...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
...,...,...,...,...,...,...,...,...
2911,RO010C,city_era5_2m_temperature_yearly_max,304.793335,2018,hourly 2m temperature maximum in city within o...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2912,SE006C,city_era5_2m_temperature_yearly_max,305.406708,2018,hourly 2m temperature maximum in city within o...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2913,SI001C,city_era5_2m_temperature_yearly_max,303.702789,2018,hourly 2m temperature maximum in city within o...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...
2914,RO028C,city_era5_2m_temperature_yearly_max,303.032104,2018,hourly 2m temperature maximum in city within o...,ua_2021,https://github.com/FAIRiCUBE/uc1-urban-climate...,https://cloud.google.com/storage/docs/public-d...


In [22]:
temperature_stats_long.to_sql(table_name, engine_postgresql, schema=schema_name, if_exists='replace')

916