<h1><center> NASA Airathon - NO2 Track </center></h1>

### <center> Preprocessing of Training Labels </center>

<div style="text-align: center"> 
    Dr. Sukanta Basu <br/> Associate Professor <br/> Delft University of Technology, The Netherlands <br/> Email: s.basu@tudelft.nl<br/> https://sites.google.com/view/sukantabasu/
</div>

#### Log

Last updated: 4th April, 2022

#### User instructions

Running this notebook will produce the **trainOBS.csv** file inside the 'path/to/repo_sukantabasu/data/airathon/processed/train/STN' folder. This file contains datetime, ID, longitude, latitude, and measured NO2 values.

#### Load packages

In [1]:
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
from pathlib import Path

#### Directories

In [2]:
ROOT_DIR    = '../../'

#Location of raw datasets
DATA_DIR    = ROOT_DIR + 'data/airathon/raw/STN/'

#Location of processed datasets
EXTDATA_DIR = ROOT_DIR + 'data/airathon/processed/'

#### Load Train & Grid Metadata

In [3]:
df_trn = pd.read_csv(DATA_DIR    + 'train_labels.csv')
df_grd = pd.read_csv(EXTDATA_DIR + 'grid_latlon.csv') #Contains: ID, latitude, longitude

#### Extract variables

In [4]:
datetime = df_trn['datetime'].values
ID       = df_trn['grid_id'].values
NO2      = df_trn['value'].values

#### For each grid id in the training label file, extract lat-lon values using grid_metadata

In [5]:
nTrn = np.size(NO2)

for i in range(nTrn):
    latlon = df_grd[df_grd['ID'].str.contains(ID[i])]
    if i == 0:
        latlonAll = latlon
    else:
        latlonAll = np.vstack((latlonAll,latlon))
    
print(latlonAll)

[['3A3IE' -117.91136694359656 34.1494450091748]
 ['3S31A' -117.9562827078025 33.81424261935068]
 ['7II4T' -118.04611423621448 34.00062937561966]
 ...
 ['VYH7U' 77.05749508102988 28.763422811135467]
 ['YHOPV' 77.28207390205976 28.64523466998896]
 ['ZF3ZW' 77.05749508102987 28.684645547971925]]


#### Create new dataframe

In [6]:
df_new = pd.DataFrame(data=latlonAll,columns=['ID','longitude','latitude'])
df_new.insert(0, 'datetime', datetime)
df_new.insert(4, 'NO2', NO2)
print(df_new)

                   datetime     ID   longitude   latitude        NO2
0      2019-01-01T08:00:00Z  3A3IE -117.911367  34.149445   8.695000
1      2019-01-01T08:00:00Z  3S31A -117.956283  33.814243  10.496667
2      2019-01-01T08:00:00Z  7II4T -118.046114  34.000629  37.208333
3      2019-01-01T08:00:00Z  8BOQH -118.450356  34.037858   9.791667
4      2019-01-01T08:00:00Z  A2FBI -117.417294  34.000629   4.308333
...                     ...    ...         ...        ...        ...
36126  2020-10-31T18:30:00Z  UC74Z   77.282074  28.526913  64.384211
36127  2020-10-31T18:30:00Z  VXNN3   77.147327  28.802789  46.573913
36128  2020-10-31T18:30:00Z  VYH7U   77.057495  28.763423  36.300000
36129  2020-10-31T18:30:00Z  YHOPV   77.282074  28.645235  68.415455
36130  2020-10-31T18:30:00Z  ZF3ZW   77.057495  28.684646  41.390909

[36131 rows x 5 columns]


#### Save dataframe to a csv file

In [7]:
df_new.to_csv(EXTDATA_DIR + 'train/STN/' + 'trainOBS.csv', index=False)