# Augmenting weather data

### Step 03

## Historical weather data (from file)
Code for importing and cleaning historical data, including quality flags, from a CSV.

> **Quality flag codes** \
> (-1 ... erwartet, aber (noch) nicht empfangen) \
> 0   ... (noch) nicht geprüft \
> 1   ... nicht gemessen \
> 2   ... fehlt oder zu spät \
> 100 ... original, ok \
> 200 ... original, ok \
> 300 ... nicht original, manuell ergänzt \
> 400 ... nicht original, gelöscht \
> 500 ... nicht original, automatisch ergänzt 

Considering above listed values, the codes 100-300 and 500 shall be taken into account.
> As Method ffill (forward fill) is used to get rid of the (very few) missing data points, quality flag code 2 and 400 can also be taken into account (empty cells are filled with previous value!)

In [3]:
import pandas as pd 
import csv
from datetime import date

# global variables
g_len_list = 50 #variable for dataframe head or tail method
debug = 0

###### INPUTS
path = 'S:/Andreas/FH/Technikum/BA/'  #including slash at the end!
get = '30_AugmentData/'
put = '40_Prep/'

filename = 'ZEHNMIN Datensatz_7710_20161114T1510_20230916T1700_Part2.csv'
ROOM = 'OA'
VAL = '2'
delimiter = ','
###### ######

class ImportWeather:
    def __init__(self):
        self.process()
        
    def process(self):
        # 01 import data from csv
        data = csv.DictReader(open(path+get+filename, encoding="ISO8859"), 
                    delimiter=delimiter)  #open('C:/Users/andre/Nextcloud/WS_2023/IKT/11_DataAugmentation/ZEHNMIN Datensatz_7710_20161114T1510_20230916T1700.csv')
        df = pd.DataFrame(data)

        df['time'] = pd.to_datetime(df.iloc[:, 0]) #set 1st column as timestamp so numeric function afterwards doesn't make 'NaT' on second conversion
        df.rename(columns=lambda x: ROOM +'_'+x[:], inplace=True)
        df = df.apply(pd.to_numeric, errors='coerce') #'ignore'
        df['UTC'] = pd.to_datetime(df.iloc[:, 0]) #overwrite 1st column with timestamp again // add column
        df.insert(0, 'UTC', df.pop('UTC'))

        if debug:
            print(df.info())
            print(df.dtypes)
        print(f'> {len(df.index)} lines into DF imported')
        # 02 check for duplicates, list and clean them
        ## not needed
        # 03 add data to the 10mins timestamp list
        ## not needed

        # 04 fill gaps
        
        if debug:
            #show gaps first...
            print(df[df.isna().any(axis=1)].tail(g_len_list))
        
        met = 'ffill'
        df = self.fill_gap_in_col(df, met)
        if debug:
            print(df.tail(50))
        print(f'> "NaN" gaps filled with method \'{met}\'')
        
        # 05 export to csv
        file = path+put+str(ROOM) +'_'+ str(VAL) + '.csv'   #+ '_AUG_cleaned_and_filled_at_' + str(date.today()) + '.csv'
        df.to_csv(file, sep=',', index=False, encoding='utf-8')
        print(f'> Export to \'' + file + '\' cleaned and filled successful')
        
    def fill_gap_in_col(self, col, method):
        """Fills true gap in series."""
        colf = col.copy()
        first_idx = colf.first_valid_index()
        last_idx = colf.last_valid_index()
        #colf.loc[first_idx:last_idx] = colf.loc[first_idx:last_idx].fillna(method=method) #deprecated
        if (method == 'ffill'):
            colf.loc[first_idx:last_idx] = colf.loc[first_idx:last_idx].ffill()
        return colf
        
if __name__ == "__main__":
    ImportWeather()  #call Import class and there it directly jumps to __init__

> 359580 lines into DF imported
> "NaN" gaps filled with method 'ffill'
> Export to 'S:/Andreas/FH/Technikum/BA/40_Prep/OA_2.csv' cleaned and filled successful


## Historical weather data (from API)
In order to get data and store it into a Pandas dataframe the following code needs to be provided with 
* `start date` in a close to UTC format, i.e. `2023-11-18T17:10`
* `end date` (format the same) and 
* `parameters` i.e. abbreviations such as `TS` to load historical data for temperature

> Import historical meteorological records from ZAMG aka `geosphere` \
> Nearest Station ID is `7710` for Seibersdorf \
> Info about api: https://data.hub.geosphere.at/dataset/klima-v1-10min

In [None]:
import requests
import json
import pandas as pd
from pandas import json_normalize
from glom import glom, Flatten
from datetime import date

class MakeApiCall:

    def get_user_data(self, api, parameters):
        response = requests.get(f"{api}", params=parameters)
        if response.status_code == 200:
            # print data
            print("Successfully fetched the data with parameters provided")
            self.formatted_print(response.json())
            
            # get data into dataframe
            dataraw = response.json()
            
            df1 = glom(dataraw,  ('timestamps'))
            df2 = glom(dataraw,  ('features', ['properties.parameters.TS.data']))
            df3 = glom(dataraw,  ('features', ['properties.parameters.DD.data']))
            df4 = glom(dataraw,  ('features', ['properties.parameters.FFAM.data']))
            df22 = df2[0] #get rid of second array from parsing
            df33 = df3[0] #get rid of second array from parsing
            df44 = df4[0] #get rid of second array from parsing
            
            df = pd.DataFrame({'timestamp': df1, 'TS': df22, 'DD': df33, 'FFAM': df44})
            
            # print data example based on first line
            print(df.iloc[:1])
            #print("---")
            #print(df.values[:1])
            
            # save data to file
            filename = 'ZEHNMIN Datensatz_7710_fetched_at_' + str(date.today()) + '.csv'
            df.to_csv(path+get+filename, sep=',', index=False, encoding='utf-8')
            print(f"Export to: " + filename + " successful")
            
        else:
            print(
                f"There's a {response.status_code} error with your request")

    def formatted_print(self, obj):
        """
        function prints text out of json-object 
        """
        text = json.dumps(obj, sort_keys=True, indent=4)  #indent - clustering of json "folder"
        print(text)

    def __init__(self, api):
        parameters = {} #"TL,DD,FFAM&station_ids=7710&start=2023-09-16T17:10&end=2023-09-16T23:50"}
        print(f"{api}")
        print(parameters)

        self.get_user_data(api, parameters) # "station_ids": "7710", "start": "2023-09-16T17:10","end": "2023-09-16T23:50"


if __name__ == "__main__":
    api_call = MakeApiCall("https://dataset.api.hub.geosphere.at/v1/station/historical/klima-v1-10min?parameters=TS,DD,FFAM&station_ids=7710&start=2023-11-18T17:10&end=2023-11-19T23:50")
    #DD,FFAM,P,RFAM,TP,TS

## Current weather data
Helper code for testing to get the API working for historical data and furthermore find nearest weather station to location of house
> Using "zamg" module (library) found on github

In [1]:
"""Asynchronous Python client for ZAMG weather data."""
import asyncio
import zamg
from zamg import ZamgData
#from zamg.exceptions import ZamgError
#from os import curdir

# Patch asyncio to allow nested event loops
import nest_asyncio 
nest_asyncio.apply()

async def main():
    """Sample of getting data"""
    try:
        async with ZamgData() as zamg:
            # option to disable verify of ssl check
            zamg.verify_ssl = False
            # trying to read zamg station id of the closest station
            data = await zamg.closest_station(48.03, 16.48)    ###### INPUT
            # set closest station as default one to read
            zamg.set_default_station(data)
            print("Closest_station = " + str(zamg.get_station_name) + " / " + str(data))
            # print list with all possible parameters
            print(f"Possible station parameters: {zamg.get_all_parameters()}")
            # set parameters directly
            ##zamg.station_parameters = "TL,SO"
            # or set parameters as list
            ##zamg.set_parameters(("TL", "TS", "DD", "FFAM"))
            # if none of the above parameters are set, all possible parameters are read
            
            # do an update
            await zamg.update()

            print(f"---------- Weather for station {zamg.get_station_name} ({data}) ----------")
            for param in zamg.get_parameters():
                print(
                    str(param)
                    + " -> "
                    + str(zamg.get_data(parameter=param, data_type="name"))
                    + " -> "
                    + str(zamg.get_data(parameter=param))
                    + " "
                    + str(zamg.get_data(parameter=param, data_type="unit"))
                )
            print("--- Last update:",zamg.last_update, " ---")
    except (ZamgError) as exc:
        print(exc)


if __name__ == "__main__":
    asyncio.run(main())

Closest_station = SEIBERSDORF / 11387
Possible station parameters: ['DD', 'DDX', 'FFAM', 'FFX', 'FFN_2', 'GLOW', 'P', 'PGPM', 'PMAX', 'PMIN', 'PRED', 'RFAM', 'RFMAX', 'RFMIN', 'RR', 'RR1', 'RR10', 'RR2', 'RR3', 'RR4', 'RR5', 'RR6', 'RR7', 'RR8', 'RR9', 'RRM', 'RRM1', 'RRM10', 'RRM2', 'RRM3', 'RRM4', 'RRM5', 'RRM6', 'RRM7', 'RRM8', 'RRM9', 'SCHNEE', 'SO', 'TB1', 'TB2', 'TB3', 'TL', 'TLAM', 'TLMAX', 'TLMIN', 'TP', 'TPAM', 'TS', 'TSMAX', 'TSMIN', 'ZEITX']
---------- Weather for station SEIBERSDORF (11387) ----------
DD -> Windrichtung -> 316.0 °
DDX -> Windrichtung der Windspitze -> 318.0 °
FFAM -> Arithmetisches Mittel der Windgeschwindigkeit -> 5.5 m/s
FFX -> Windspitze -> 8.1 m/s
FFN_2 -> Anzahl der 2 Sekunden-Messwerte des Windes -> 300.0 
GLOW -> Globalstrahlung [W/m²] -> 434.0 W/m²
P -> Luftdruck -> 1001.6 hPa
PGPM -> Geopotential -> None gpm
PMAX -> Luftdruck Maximalwert -> 1001.9 hPa
PMIN -> Luftdruck Minimalwert -> 1001.6 hPa
PRED -> Reduzierter Luftdruck -> 1024.2 hPa
RFAM -> Re