In [7]:
from pathlib import Path
import os

if Path('.').absolute().parents[1].name == 'ml_drought':
    os.chdir(Path('.').absolute().parents[1])

from src import exporters

exporters.__all__

['ERA5Exporter',
 'VHIExporter',
 'ERA5ExporterPOS',
 'CHIRPSExporter',
 'S5Exporter',
 'GLEAMExporter',
 'SRTMExporter',
 'ESACCIExporter']

# Exporters

The `Exporters` are responsible for downloading data from external sources and writing to the `data/raw` directory. 

These `Exporters` download data from a variety of web-based sources and are all open-sourced although some may need agreements. Please check carefully before publishing results with this pipeline.

<img src="img/exporter_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

### Sources:
- The `S5Exporter` and the `ERA5Exporter` work with the [`Climate Data Store` (CDS)](https://cds.climate.copernicus.eu/#!/home) to download data. 
- The `ERA5ExporterPOS` downloads data from the PlanetOS AWS data mirror which can be visualised [here](https://data.planetos.com/datasets/ecmwf_era5)
- The `GLEAMExporter` downloads data from the [GLEAM FTP Server](https://www.gleam.eu/)
- The `VHIExporter` downloads data from the [NOAA Vegetation Health FTP Server](https://www.star.nesdis.noaa.gov/smcd/emb/vci/VH/vh_ftp.php)
- The `SRTMExporter` uses the [`elevation` package](https://github.com/bopen/elevation)

NOTE: By default the data 

## Exporters API

The exporters have a common `export` method which will download the data to the `data/raw` directory by default. If you wish to download the data elsewhere then you should provide an `pathlib.Path` path to the `Exporter`. 

**Be aware that data volumes are significant (can be upwards of 1TB if you use downloaded all data)**

**NOTE: the area surrounding Kenya will be downloaded by default for the CDS Exporters. Otherwise data is global and is subset later**

## Let's explore the `ERA5Exporter`

In [12]:
exporter = exporters.ERA5Exporter()

[method for method in dir(exporter) if '__' not in method]

['_check_iterable',
 '_correct_input',
 '_export',
 '_filename_from_selection_request',
 '_print_api_request',
 'client',
 'create_area',
 'create_selection_request',
 'data_folder',
 'dataset',
 'export',
 'get_dataset',
 'get_era5_times',
 'make_filename',
 'raw_folder']

In [15]:
help(exporter.export)

Help on method export in module src.exporters.cds:

export(variable: str, dataset: Union[str, NoneType] = None, granularity: str = 'hourly', show_api_request: bool = True, selection_request: Union[Dict, NoneType] = None, break_up: bool = False, n_parallel_requests: int = 3) -> List[pathlib.Path] method of src.exporters.cds.ERA5Exporter instance
    Export functionality to prepare the API request and to send it to
    the cdsapi.client() object.
    
    Arguments:
    ---------
    variable: str
        The variable to be exported
    dataset: Optional[str], default = None
        The dataset from which to pull the variable from. If None, this
        is inferred from the dataset and its granularity
    granularity: str: {'hourly', 'monthly'}, default = 'hourly'
        The temporal resolution of the data to be pulled
    show_api_request: bool = True
        Whether to print the selection dictionary before making the API request
    selection_request: Optional[Dict], default = None
  

In [18]:
exporter.export(variable='total_precipitation', granularity='monthly', selection_request=dict(year=[2010], month=[1]))

------------------------
Dataset: reanalysis-era5-single-levels-monthly-means
Selection Request:
{'area': '6.002/33.501/-5.202/42.283',
 'format': 'netcdf',
 'month': ['01'],
 'product_type': 'monthly_averaged_reanalysis',
 'time': ['00:00',
          '01:00',
          '02:00',
          '03:00',
          '04:00',
          '05:00',
          '06:00',
          '07:00',
          '08:00',
          '09:00',
          '10:00',
          '11:00',
          '12:00',
          '13:00',
          '14:00',
          '15:00',
          '16:00',
          '17:00',
          '18:00',
          '19:00',
          '20:00',
          '21:00',
          '22:00',
          '23:00'],
 'variable': ['total_precipitation'],
 'year': ['2010']}
------------------------
Output Filename:
data/raw/reanalysis-era5-single-levels-monthly-means/total_precipitation/2010/01.nc
------------------------


[PosixPath('data/raw/reanalysis-era5-single-levels-monthly-means/total_precipitation/2010/01.nc')]

In [None]:
### Let's look at the VHI Exporter

In [20]:
exporter = exporters.VHIExporter()

[method for method in dir(exporter) if '__' not in method]

['_run_export',
 'check_52_files',
 'check_failures',
 'chunks',
 'data_folder',
 'dataset',
 'export',
 'get_default_years',
 'get_filepaths_for_year',
 'get_ftp_filenames',
 'get_missing_filepaths',
 'output_folder',
 'raw_folder',
 'save_errors']

In [21]:
help(exporter.export)

Help on method export in module src.exporters.vhi:

export(years: Union[List, NoneType] = None, repeats: int = 5, num_processes: int = 100) -> List method of src.exporters.vhi.VHIExporter instance
    Export VHI data from the ftp server.
    By default write output to raw/vhi/{YEAR}/{filename}
    
    Arguments:
    ---------
    years : Optional[List] = None
        list of years that you want to download. If None, all years will
        be downloaded
    repeats: int = 5
        The number of times to retry downloads which failed
    num_processes: int = 100
        The number of processes to run. If 1, the download happens serially
    
    Returns:
    -------
    batches : List
        list of lists containing batches of filenames downloaded



In [None]:
exporter.export(years=[2015], num_processes=1)

Successful Download! data/raw/vhi/2015/VHP.G04.C07.npp.P2015001.VH.nc
Successful Download! data/raw/vhi/2015/VHP.G04.C07.npp.P2015002.VH.nc


TODO:
- Standardise the VHI api (num_processes -> n_parallel_processes)
- Allow the .export functionto only download single / a few files