### Keeping WOW stations Oct-Nov-Dec 2020

Following a conversation with AEMET, we keep only the list of stations in the last trimestre of 2020. The reason for this is to have less "ghost" stations for the test with HARP when we add the European WOW 2023-2024. The program is simple, we iterate over the file structure containing the `L0` files created in the program `data_carpentry.ipynb` and we annotate tuples of (SiteId, Latitude, Longitude) using a Python set. Then we cross the set with the pre-computed list of stations obtained in 2022, so that we keep the order.

In [2]:
import datetime
import pandas as pd

In [41]:
path_in = r"/home/jovyan/work/private/data-common/KNMI/eumetnet-wow-2020/proc/L0/WOW-Eumetnet-L0-{0}.csv"
path_all_stations = r"/home/jovyan/work/private/sources/projects/wow4mod/reference_tables/WOW_EUMETNET_CWS_Locations_2020_Within_Elevation.csv"
path_ou = r"/home/jovyan/work/private/sources/projects/wow4mod/reference_tables/WOW_EUMETNET_CWS_Locations_Oct-Nov-Dec_2020_Within_Elevation.csv"


sd = datetime.datetime(2020, 10, 1)
ed = datetime.datetime(2021, 1, 1)
date_range = pd.date_range(sd, ed, freq="h")

df_stations = pd.read_csv(path_all_stations, sep=";")

s = set()
not_found = 0
for date in date_range:
    path_cur_in = path_in.format(date.strftime("%Y-%m-%d_%H"))
    try:
        df = pd.read_csv(path_cur_in, sep=",", parse_dates=[5])
        for item in df[["Site Id"]].values.tolist():
            s.add(tuple(item))
    except FileNotFoundError:
        not_found += 1

stations_in_ond2020 = 3754 #precomputed
stations_in_2020 = 5199 #precomputed
print("Total number of stations seen in OND-2020: ", stations_in_ond2020)
print("Total number of stations seen in year: ", stations_in_2020)
print("Hourly files not found: ", not_found)
print("\n")

# Now we select the rows in the pre-computed file from 2022
print("Number of stations in the pre-computed file: ", df_stations.shape)
l = [item[0] for item in s]
df_sel = df_stations[df_stations['Site Id'].isin(l)]
print("Number of stations after the Oct-Nov-Dec selection: ", df_sel.shape)
print(df_sel.shape)
print("\n\n")

# We keep only some columns and we do column sanity
cols_to_keep = ["HARP_SID_int", "latitude", "longitude", "elevation"]
df_keep = df_sel[df_sel.columns.intersection(cols_to_keep)][cols_to_keep]
df_keep["elevation"] = df_keep["elevation"].astype('Int64')
df_keep["name"] = "---"
df_keep = df_keep.rename(columns={"HARP_SID_int": "SID", "latitude": "lat", "longitude": "lon", "elevation":"elev"})
print(df_keep.head())

df_keep.to_csv(path_ou, sep=";", index=False)

  df = pd.read_csv(path_cur_in, sep=",", parse_dates=[5])
  df = pd.read_csv(path_cur_in, sep=",", parse_dates=[5])
  df = pd.read_csv(path_cur_in, sep=",", parse_dates=[5])
  df = pd.read_csv(path_cur_in, sep=",", parse_dates=[5])
  df = pd.read_csv(path_cur_in, sep=",", parse_dates=[5])


Total number of stations seen in OND-2020:  3754
Total number of stations seen in year:  5199
Hourly files not found:  18


Number of stations in the pre-computed file:  (3903, 12)
Number of stations after the Oct-Nov-Dec selection:  (3222, 12)
(3222, 12)



          SID      lat      lon  elev name
0  4200000001  54.1866  -2.9194    37  ---
1  4200000002  51.1036   3.6398     5  ---
2  4200000003  50.8769   4.7110    38  ---
3  4200000007  51.5072  -0.1275    14  ---
4  4200000009  60.2276  16.7794    39  ---
