# Radiation per Gemeinde for Switzerland

The following notebook uses the Meteomatics API (Weather Data Provider, normally not free of charge, but Yann Bregy works there and received the permission to use it for academic purposes). It retrieves data for all ZIP codes in Switzerland. We chose the 10-year average radiation per ZIP code and retrieved all data from April in a hourly interval. Although data from the whole year would be more accurate, it would have meant much more data points and hence more computationally intensive requests from the API, which timed out when we tried.

In [1]:
import pandas as pd
import numpy as np
import datetime as dt
import time
import meteomatics.api as api
from dateutil.relativedelta import relativedelta
from __future__ import print_function
import pprint
import re

In [2]:
###Credentials for  Meteomatics API:
username = 'ybregy'
password = 'FnJUNA0OaoAM'

In [14]:
# Define start, end and interval of query
# We chose the month of April as a representative month for an average analysis, a full year would have been computationally
# too expensive for the API, leading to timeouts
now = dt.datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
startdate_ts = dt.datetime(2022, 4,1, 0, 0)
enddate_ts = dt.datetime(2022, 5, 1, 0, 0)
interval_ts = dt.timedelta(hours = 1)

In [11]:
# Use of 10 year mean radiation parameter, a measure of the average of the radiation over the last 10 years for 
# a given day and given hour
parameters_ts = ['global_rad_10y_mean:W']

In [4]:
# Read the dataset from all the "Gemeinden" in Switzerland. Only the PLZ is important in that regard
plz = pd.read_csv("plz.csv", sep=';', encoding='latin-1')
plz.head()

Unnamed: 0,Gde-Nr.,PLZ,Gemeinde
0,2703,4125,Riehen
1,615,3532,Mirchel
2,692,2743,Eschert
3,884,3125,Toffen
4,6808,2882,Clos du Doubs


In [5]:
# Insert postal_code column that respects the coordinate format of the Meteomatics API query
plz["postal_code"] = "postal_CH" + plz["PLZ"].astype(str)

In [6]:
# Insert empty column for radiation parameter that we are looking for 
plz['global_rad_10y_mean:W'] = 0

In [7]:
# Set index of df to facilitate matching with API dataframe
plz = plz.set_index('postal_code')

In [8]:
# create a list of all postal codes in the CHXXXX format to use for the API query
list_plz = plz.index.tolist()

In [9]:
def split_list(lst, n):
    """Create a list of lists that breaks down the initial list into smaller chunks
       Allows to split up the workload for the API and avoid timing out
       Was inspired by https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks"""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

In [10]:
# Split up the plz list into chunks of 50
lists_plz = list(split_list(list_plz, 50))

The following is the request from the API, which we broke down into chunks of 50 ZIP codes.

In [12]:
#Iterate through list chunks to cover all ZIP codes (PLZ)
for count, list_chunk in enumerate(lists_plz):
    # Set coordinates of request to all the ZIP codes from that list chunk
    coordinates_ts = list_chunk
    while True:
        try:
            #Query from the API for the chunk of 50 locations and determined dates and parameters
            df_ts = api.query_time_series(coordinates_ts, startdate_ts, enddate_ts, interval_ts,
                                      parameters_ts, username, password)
            
        # In case a postal code is not found in the API, we deal with it by removing that PLZ from the list chunk
        except Exception as e:
            # If the plz is not in the database, an error with the given postal_code is printed
            print("Failed, the exception is {}".format(e))
            
            # x finds the postal code in question
            x = re.findall("CH\d\d\d\d", str(e))
            
            # y sets the postal code in the format such as in the list
            y = "postal_" + str(x[0])
            print(y) # Only to keep track of missing PLZ/ZIPs
            
            # Remove postal code from chunk and then repeat the loop
            coordinates_ts.remove(y)
        else:
            # Take the average of all results for a given postal code (aggregation of 30*24 values)
            df_ts = df_ts.groupby('postal_code').mean()
            
            # Merge results with initial dataframe 
            plz = plz.merge(df_ts, on="postal_code", how="left")
            
            # Delete excess columns that emerged from the merging operation
            plz['global_rad_10y_mean:W'] = np.max(plz[['global_rad_10y_mean:W_x', 'global_rad_10y_mean:W_y']], axis=1)
            plz = plz.drop(labels=['global_rad_10y_mean:W_x', 'global_rad_10y_mean:W_y'], axis=1)
           
            # Print succesful operation to keep track of scraping progress
            print("Iteration {} was succesful".format(count))
            break
    
    # Make the loop sleep for 10 seconds to avoid excessive usage of API
    time.sleep(10)

# As soon as the loop is over, print the info of the dataframe to mark the end of the scraping
plz.info()    

Iteration 0 was succesful
Iteration 1 was succesful
Iteration 2 was succesful
Iteration 3 was succesful
Iteration 4 was succesful
Iteration 5 was succesful
Failed, the exception is Code CH2874 could not be found.
postal_CH2874
Failed, the exception is Code CH2875 could not be found.
postal_CH2875
Iteration 6 was succesful
Iteration 7 was succesful
Iteration 8 was succesful
Iteration 9 was succesful
Iteration 10 was succesful
Iteration 11 was succesful
Failed, the exception is Code CH2877 could not be found.
postal_CH2877
Iteration 12 was succesful
Iteration 13 was succesful
Iteration 14 was succesful
Iteration 15 was succesful
Iteration 16 was succesful
Iteration 17 was succesful
Iteration 18 was succesful
Iteration 19 was succesful
Iteration 20 was succesful
Iteration 21 was succesful
Iteration 22 was succesful
Iteration 23 was succesful
Iteration 24 was succesful
Iteration 25 was succesful
Iteration 26 was succesful
Failed, the exception is Code CH7201 could not be found.
postal_CH72

In [13]:
# Export results to csv for further ROI analysis
plz.to_csv("radiation_HourlyAverageAprilTenYear.csv")