# Creating the input database for ecPoint-Calibrate

### What does this Jupyter notebook?
This Jupyter notebook will generate the input database, for forecasts and observations, in the format required by ecPoint-Calibrate.

### What raw forecasts will be used?
The raw forecasts that will be used in this Jupyter notebook come from form the Integrated Forecasting System (IFS) of the European Centre for Medium-range Weather Forecasts (ECMWF). The forecasts come from an experiment run to create forecasts with the the 47r1 version of the IFS, for a period between January 1st and December 31st, 2019. The forecasts are provided in grib format.

### What raw observations will be used used?
The raw observations used in this software are the "Global Surface Summary of Day" product produced by the NOAA's National Centers for Environmental Information (NCEI). The observations are freely downloadable from https://www.ncei.noaa.gov/data/global-summary-of-the-day/archive/ and are processed with pandas and Metview (https://pypi.org/project/metview/). From all the parameters in the raw files, this Jupyter notebook will use only:
1. Precipitation amount for 24 hours, PRCP (.01 inches)
2. Mean temperature over 24 hours, TEMP (.1 Fahrenheit)
The observations will be converted in geopoints, and the units will get converted in mm for precipitation, and Celsius degrees for temperature.

NOTE: each file contains all the observations for the year at a specific station
_______________________________________________________________________________________________________________________________

## Setting the environment

In [3]:
import os
from datetime import date, datetime, timedelta
import numpy as np
import pandas as pd

In [4]:
# INPUT PARAMETERS
DateS = date(2019, 1, 1)
DateF = date(2019, 1, 1)
Delta_Date = timedelta(days=1)
WorkDir = "C:/Users/f_ati/OneDrive/Desktop/GitHub/ecPointCalibrate_CaseStudy"
RawData_Dir = "RawData"
InputDB_Dir = "InputDB" 
FC_Dir = "FC"
OBS_Dir = "OBS"

In [5]:
# CREATING SOME ENVIRONMENT VARIABLES
RawData_FC_Dir = WorkDir + "/" + RawData_Dir + "/" + FC_Dir
RawData_OBS_Dir = WorkDir + "/" + RawData_Dir + "/" + OBS_Dir
InputDB_FC_Dir = WorkDir + "/" + InputDB_Dir + "/" + FC_Dir
InputDB_OBS_Dir = WorkDir + "/" + InputDB_Dir + "/" + OBS_Dir

## Create the InputDB for observations

In [None]:
# List all the raw observation files in the directory "RawData_OBS_Dir"
arr = os.listdir(RawData_OBS_Dir)
print("NOTE:")
print("There are " + str(len(arr)) + " stations around the globe to analyse.")

# Begining of the observations pre-processing
# NOTE: each day of the year will be pre-process at a time
TheDate = DateS
while TheDate <= DateF:
    
    TheDateSTR = TheDate.strftime("%Y-%m-%d")
    print(" ")
    print("Pre-Processing observations for " + TheDateSTR)
    now = datetime.now()
    print("Starting at... ", now.strftime("%H:%M:%S"))
    
    # Generating empty dataframes for each day of the year
    df_prcp = pd.DataFrame()
    df_temp = pd.DataFrame()
    
    for RawData_OBS_Filename in arr:
        
        # Read the raw observations for each station
        RawData_OBS_File = RawData_OBS_Dir + "/" + RawData_OBS_Filename
        df = pd.read_csv(RawData_OBS_File)
        df1 = df[df["DATE"].isin([TheDateSTR])] #selection of the date of interest
        
        # Pre-processing of precipitation observations
        df1_prcp = df1[["STATION", "DATE", "LATITUDE", "LONGITUDE", "ELEVATION", "PRCP", "PRCP_ATTRIBUTES"]]
        frames = [df_prcp, df1_prcp]
        df_prcp = pd.concat(frames)
        
        # Pre-processing of temperature observations
        df1_temp = df1[["STATION", "DATE", "LATITUDE", "LONGITUDE", "ELEVATION", "TEMP", "TEMP_ATTRIBUTES"]]
        frames = [df_temp, df1_temp]
        df_temp = pd.concat(frames)

    # Pre-processing of precipitation observations
    df_prcp["LONGITUDE"] = df_prcp["LONGITUDE"] + 180 # to have longitudes from 0° to 360° 
    df_prcp = df_prcp[df_prcp.PRCP != 99.99] # eliminating the missing values
    df_prcp = df_prcp[df_prcp.PRCP_ATTRIBUTES != "A"] # eliminating the stations that reported only 1 report of 6-hour precipitation amount.
    df_prcp = df_prcp[df_prcp.PRCP_ATTRIBUTES != "B"] # eliminating the stations that reported only the summation of 2 reports of 6-hour precipitation amount.
    df_prcp = df_prcp[df_prcp.PRCP_ATTRIBUTES != "C"] # eliminating the stations that reported only the summation of 3 reports of 6-hour precipitation amount.
    df_prcp = df_prcp[df_prcp.PRCP_ATTRIBUTES != "E"] # eliminating the stations that reported only 1 report of 12-hour precipitation amount.
    df_prcp = df_prcp[df_prcp.PRCP_ATTRIBUTES != "H"] # eliminating the stations that reported incomplete data for the day.
    df_prcp = df_prcp[df_prcp.PRCP_ATTRIBUTES != "I"] # eliminating the stations that did not report any precipitation data for the day.
    df_prcp["PRCP"] = df_prcp["PRCP"] * 25.4 # converting rainfall in mm from inches.
    del df_prcp['PRCP_ATTRIBUTES'] # eliminate the "PRCS_ATTRIBUTES" column from the final output
    print("Total n. of rainfall observations maintained: " + str(len(df_prcp)))
    
    # Pre-processing of temperature observations
    df_temp["LONGITUDE"] = df_temp["LONGITUDE"] + 180 # to have longitudes from 0° to 360°.
    df_temp = df_temp[df_temp.TEMP != 9999.9] # eliminating the missing values.
    df_temp = df_temp[df_temp.TEMP_ATTRIBUTES >= 20] # eliminating those reports that did not provide hourly observations for at least 20 hours in the day.
    df_temp["TEMP"] = (df_temp["TEMP"] - 32) * (5/9) # converting temperature in Celsius from Farenheit degrees.
    del df_temp['TEMP_ATTRIBUTES'] # eliminate the "TEMP_ATTRIBUTES" column from the final output
    print("Total n. of temperature observations maintained: " + str(len(df_temp)))
    
    TheDate += Delta_Date

    now = datetime.now()
    print("Ending at... ", now.strftime("%H:%M:%S"))