# NAWA TREND dataset extraction

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is used to retrieve and concatenate the NAWA TREND dataset. 

The output is one file per catchemnt (similar to the CAMELS-CH), with 22 columns:

- date
- NH4_N
- Cl
- q_max_kanton 
- q_min_kanton
- q_mean_kanton
- q_mean_sensor
- doc
- ec25_lab
- ec25_sensor
- tp
- tn
- NO3_N
- NO2_N 
- drp
- pH_lab
- pH_sensor
- O2_lab
- O2_sensor 
- O2S_sensor
- turbidity_sensor
- temp_lab
- temp_sensor


## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* numpy
* os
* pandas=2.1.3
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* nawa_data_neu_v5.xlsx
* CAMELS_CH_chem_stations_short_v3.xlsx


**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 


## References
* NAWA. National Surface Water Quality Monitoring Programme. https://www.bafu.admin.ch/bafu/en/home/topics/water/state/water--monitoring-networks/national-surface-water-quality-monitoring-programme--nawa-.html (last access: 15 May 2023).
## Observations
* None

# Import modules

In [None]:
import pandas as pd
import tqdm as tqdm
import os
import warnings

# Configurations

In [None]:
# Only editable variables:
# Relative path to your local directory
PATH = "../.."

# Suppress all warnings
warnings.filterwarnings("ignore")

# Path to where the data are stored
path_nawa = r"C:\Users\nascimth\Documents\data\CAMELS_CH_Chem\\"

PATH_OUTPUT = r"results\Dataset\stream_water_chemistry\interval_samples"


* #### The users should NOT change anything in the code below here. 

In [None]:
# Non-editable variables:
# Set the directory:
os.chdir(PATH)

# Import data
* FULL dataset

In [None]:
# Full dataset of interval (time-series)
dataset_nawa = pd.read_excel(path_nawa+"data\\NAWA/nawa_data_neu_v5.xlsx")
dataset_nawa

- Network

In [None]:
# Network NAWA
network_nawa = pd.read_excel(path_nawa+"data\\CAMELS_CH_chem_stations_short_v3.xlsx", sheet_name='nawa')
network_nawa

### Renaming the columns

In [None]:
dataset_nawa.columns

In [None]:
dataset_nawa.columns = ['nawatrend_id', 'date', 'NH4_N', 'Cl', 'q_max_kanton', 
                        'q_min_kanton', 'q_mean_kanton', 'q_mean_sensor', 'doc',
                        'ec25_lab', 'ec25_sensor', 'tp', 'tn', 'NO3_N', 'NO2_N', 
                        'drp', 'pH_lab', 'pH_sensor', 'O2_lab', 'O2_sensor', 
                        'O2S_sensor', 'turbidity_sensor', 'temp_lab', 'temp_sensor']

In [None]:
dataset_nawa

In [None]:
# Function to round numbers and preserve symbols
def round_values(val):
    if isinstance(val, str):  # Handle string values with symbols
        if val.startswith('>') or val.startswith('<'):
            symbol = val[0]  # Extract the symbol ('>' or '<')
            try:
                number = float(val[1:])  # Convert the rest to a float
                return f"{symbol}{round(number, 4)}"
            except ValueError:  # Handle cases where conversion might fail
                return val
        else:
            try:
                return str(round(float(val), 4))  # Round plain string numbers
            except ValueError:
                return val  # Return original value if conversion fails
    elif isinstance(val, (int, float)):  # Handle numeric values
        return round(val, 4)
    return val  # Return unchanged if it's neither string nor numeric

In [None]:
network_nawa

In [None]:
# Network CAMELS_CH_Chem
network_camels_ch_chem = pd.read_excel(path_nawa+r"data/CAMELS_CH_chem_stations_short_v3.xlsx", sheet_name='all_5')
#network_camels_ch_chem.set_index("basin_id", inplace=True)
network_camels_ch_chem

In [None]:
code =network_nawa.nawa_id[0]
dataset = dataset_nawa[dataset_nawa["nawatrend_id"] == code]
dataset.set_index("date", inplace = True)
dataset.drop(["nawatrend_id"], axis=1, inplace = True)
    
dataset.index.name = "date"
    
# Apply the function to the column
dataset = dataset.applymap(round_values)

dataset

In [None]:
for code in tqdm.tqdm(network_nawa.nawa_id):
    
    dataset = dataset_nawa[dataset_nawa["nawatrend_id"] == code]
    dataset.set_index("date", inplace = True)
    dataset.drop(["nawatrend_id"], axis=1, inplace = True)
    
    dataset.index.name = "date"
    
    # Apply the function to the column
    dataset = dataset.applymap(round_values)

    # There are some non-numeric things in the columns, instead of NaNs
    #dataset = dataset.apply(pd.to_numeric, errors='coerce')
    
    # Here we take out the > or < before converting to a numeric value:
    #dataset = dataset.applymap(lambda x: str(x).replace('<', '') if isinstance(x, str) else x)
    #dataset = dataset.applymap(lambda x: str(x).replace('>', '') if isinstance(x, str) else x)

    # There are some non-numeric things in the columns, instead of NaNs
    #dataset = dataset.apply(pd.to_numeric, errors='coerce')

    #dataset = dataset.round(4)
    basin_id_name = str(network_camels_ch_chem[network_camels_ch_chem.nawa_id == code].loc[:, "basin_id"].values[0])

    dataset.to_csv(PATH_OUTPUT + "/nawa_trend/camels_ch_chem_nawatrend_"+str(basin_id_name)+".csv", encoding='latin')
    

Observations
- We have 76 stations in total from NAWA-Trend
- The non-numeric values are kept (> or <)

# End