# Download WSC real-time flow data
Rel-time flow data can be accessed through a URL request. Real-time refers to real-time observations. These go back from present to minus 18 months at high temporal resolution.

Sources:
- GUI: https://wateroffice.ec.gc.ca/services/links_e.html
- Readme: https://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/Document/WebService_Guidelines.pdf

Parameters:
- Streamflow; ID = `47`

Approval flags:
- Either `provisional` or `final`

Data quality flags:
- `-1`: `UNSPECIFIED`: Automatically assigned during data recording
- `10`: `ICE`: Data may be affected by the presence of ice (backwater effects)
- `20`: `ESTIMATED` Estimated value
- `30`: `PARTIAL DAY`: More than 120 minutes missing during calculation of daily mean
- `40`: `DRY`: Water level has dropped below what the sensor can measure
- `50`: `REVISED`: Previously approved data that were subsequently reviewed and edited

Data returned as `.csv` and in UTC time.

### Reference
“Extracted from the Environment and Climate Change Canada Real-time Hydrometric Data web site (https://wateroffice.ec.gc.ca/mainmenu/real_time_data_index_e.html) on 2023-04-05”

In [1]:
import sys
import time
import pandas as pd
import urllib.request
from pathlib import Path
sys.path.append(str(Path().absolute().parent))
import python_cs_functions as cs

### Config handling

In [2]:
# Specify where the config file can be found
config_file = '../0_config/config.txt'

In [3]:
# Get the required info from the config file
data_path = cs.read_from_config(config_file,'data_path')

# CAMELS-spat metadata
cs_meta_path = cs.read_from_config(config_file,'cs_basin_path')
cs_meta_name = cs.read_from_config(config_file,'cs_meta_name')
cs_unusable_name = cs.read_from_config(config_file,'cs_unusable_name')

# Basin folder
cs_basin_folder = cs.read_from_config(config_file, 'cs_basin_path')
basins_path = Path(data_path) / cs_basin_folder

# Data period
time_s = cs.read_from_config(config_file, 'wsc_start_t')
time_e = cs.read_from_config(config_file, 'wsc_start_e')

### Data loading

In [4]:
# CAMELS-spat metadata file
cs_meta_path = Path(data_path) / cs_meta_path
cs_meta = pd.read_csv(cs_meta_path / cs_meta_name)

In [5]:
# Open list of unusable stations; Enforce reading IDs as string to keep leading 0's
cs_unusable = pd.read_csv(cs_meta_path / cs_unusable_name, dtype={'Station_id': object}) 

FileNotFoundError: [Errno 2] No such file or directory: '/Users/darrieythorsson/compHydro/data/CAMELS_spat/camels-spat-data/camels_spat_unusable.csv'

### Loop over sites and download the flow record

In [6]:
# General settings
iv_var = '47' # streamflow; 00065 for gage height
iv_url = 'https://wateroffice.ec.gc.ca/services/real_time_data/csv/' 

In [7]:
# Loop over the Canada stations only
dnf = [] # List of incomplete stations, retaining these for easier printout and checking later
for ix,row in cs_meta.iterrows():
    if row.Country == 'CAN':
        
        # Get paths, etc
        site, _, _, raw_path_iv, _,_ = cs.prepare_flow_download_outputs(cs_meta, ix, basins_path, time='iv')
        
        # Resume after interrupts
        if raw_path_iv.is_file():
            tmp = pd.read_csv(raw_path_iv)
            if len(tmp) == 0:
                dnf.append(site)
                print(f'No data downloaded for {site}')
            else:
                print(f'Completed {site}')
            continue
        
        # Downloads
        dnf = cs.download_wsc_values(iv_url, site, time_s, time_e, iv_var, raw_path_iv, dnf)
        time.sleep(0.5) # pause for a second so we don't bombard the server with requests

No data downloaded for 09AA004
Completed 09AA012
Completed 09AA013
Completed 09AC001
Completed 09AC007
No data downloaded for 09AE002
Completed 09AE003
Completed 09AE006
Completed 09AG001
Completed 09AH003
Completed 09AH004
Completed 09BA001
Completed 09BB001
Completed 09BC001
Completed 09BC004
Completed 09CA004
Completed 09CA006
Completed 09CB001
Completed 09CD001
Completed 09DB001
Completed 09DD003
Completed 09DD004
Completed 09EA003
Completed 09EA004
Completed 09EB003
No data downloaded for 09ED001
Completed 09FB002
Completed 09FC001
Completed 09FD002
Completed 10AA001
Completed 10AA004
Completed 10AA005
Completed 10AB001
Completed 10AC005
Completed 10BE001
Completed 10BE004
Completed 10BE007
Completed 10BE009
Completed 10BE013
Completed 10CA001
Completed 10CB001
Completed 10CD001
Completed 10CD003
Completed 10CD004
Completed 10CD005
Completed 10EA003
Completed 10EB001
Completed 10ED001
Completed 10ED002
Completed 10ED003
Completed 10ED007
Completed 10ED009
Completed 10FA002
Complet

# Update the `unusable` file list

In [None]:
# Print which basins we need to check
for entry in dnf:
    print(f'No data downloaded for {entry}')

No data downloaded for 01AD002
No data downloaded for 01AE001
No data downloaded for 01BD008
No data downloaded for 01BG005
No data downloaded for 01BG008
No data downloaded for 01BG009
No data downloaded for 01BH005
No data downloaded for 01BH010
No data downloaded for 02BF005
No data downloaded for 02BF006
No data downloaded for 02BF007
No data downloaded for 02BF008
No data downloaded for 02BF009
No data downloaded for 02BF012
No data downloaded for 02BF013
No data downloaded for 02DB007
No data downloaded for 02ED014
No data downloaded for 02FD001
No data downloaded for 02FF004
No data downloaded for 02GB007
No data downloaded for 02GH003
No data downloaded for 02HG001
No data downloaded for 02JB013
No data downloaded for 02LB017
No data downloaded for 02LC027
No data downloaded for 02LC043
No data downloaded for 02LD005
No data downloaded for 02LG005
No data downloaded for 02NE011
No data downloaded for 02NF003
No data downloaded for 02OA057
No data downloaded for 02OB032
No data 

In [None]:
country = 'CAN'

In [None]:
missing = 'iv'
reason = 'No real-time discharge observations available'

In [None]:
# Make a dataframe that lists the basins we cannot use
tmp = pd.DataFrame({'Country': country,
                    'Station_id': dnf,
                    'Missing': missing,
                    'Reason': reason})

In [None]:
cs_unusable = pd.concat([cs_unusable,tmp]).reset_index(drop=True)

In [None]:
cs_unusable.to_csv(cs_meta_path / cs_unusable_name, encoding='utf-8', index=False)