# Raster data processing
This notebook is designed to download and process raster-based environmental data into consistent gridded datasets for analysis in geospatial software and in MaxEnt species distribution modeling.

For each data type, we run the following steps:
- downloading from a remote server
- handling no-data values
- mosaicing and reprojecting imagery
- resampling to multiple grid sizes
- converting to maxent's raster data format

The raw data are hosted on a publicly-accessible cloud bucket, `gs://aedes-americas/`. Some of the data processed in this notebook will be uploaded to this bucket. May of the raw data were generated from Google Earth Engine, and the scripts used to generate these data are available in the directory `aedes-americas/data-processing/earthengine/`.

To run this notebook, first navigate to the root directory of the `aedes-americas` repository and activate the conda environment.

```bash
cd /path/to/aedes-americas
conda activate aedes
jupyter notebook
```

## 0.1 Loading modules and configs

In [1]:
# set to allow module re-loading during development
%load_ext autoreload
%autoreload 2

In [16]:
# import packages
import os
import gdal
import glob
import yaml
import numpy as np
import otbApplication as otb

# raise gdal runtime errors
gdal.UseExceptions()

# set up orfeo application calls
bandMath = otb.Registry.CreateApplication("BandMath")
manageNoData = otb.Registry.CreateApplication("ManageNoData")

# set the otb creation options to pass to output tif files
creation_options = "&gdal:co:COMPRESS=DEFLATE&gdal:co:TILED=YES"

In [11]:
# read the config into memory
with open('../config.yml', 'r+') as f:
    config = yaml.load(f, Loader=yaml.Loader)

In [12]:
# create the data directories if they don't yet exist
for directory in [config['data-dir'], config['data-mx-in'], config['data-mx-out']]:
    if not os.path.exists(directory):
        print('Creating data directory: {}'.format(directory))
        os.mkdir(directory)

Creating data directory: /external/aedes-americas/data-raw/
Creating data directory: /external/aedes-americas/data-mx-in/
Creating data directory: /external/aedes-americas/data-mx-out/


### 0.2 helper functions for common tasks

In [None]:
# updating no-data values
def update_nodata(path, nodata=config['nodataata']):
    
    # read the file refGA_ReadOnlyeadOnlyeadOnlyadOnlyadOnlyget band count
    ref = gdal.Open(path, gdal.GA_Update)
    n_bands = ref.RasterCount
    
    # replace no-data values for each band
    for band_number in range(n_bands):
        band = ref.GetRasterBand(band_number + 1)
        band.SetNoDataValue(nodata)
        band.FlushCache()
    
    # write these changes to disk
    ref.FlushCache()
    ref = None
    return True

## 1.1 Population density data
The human population density data were acquired via WorldPop (https://www.worldpop.org/) and accessed via earthengine.

### 1.2 Downloading data

### 1.3 Handling no-data values

In [15]:
# get the list of raw file paths
raw_paths = glob.glob(os.path.join(config['data-dir'], 'LACR-pop-density-*'))
raw_paths.sort()

In [23]:
# replace all the `nan` no-data values with numeric no-data values
for input_path in raw_paths:
    
    # set the output file name
    output_path = '{file}?{options}'.format(file = input_path[:-4]+"-nd.tif", options = creation_options)
    
    # build the otb command
    manageNoData.SetParameterString("in", input_path)
    manageNoData.SetParameterString("out", output_path)
    manageNoData.SetParameterString("mode", "changevalue")
    manageNoData.SetParameterValue("mode.changevalue.newv", config['nodata'])
    manageNoData.SetParameterValue("usenan", True)
    
    # run it
    manageNoData.ExecuteAndWriteOutput()
    
    # sanitize the otb call for future use
    manageNoData.ClearValue("in")
    manageNoData.ClearValue("out")
    manageNoData.ClearValue("mode")
    manageNoData.ClearValue("mode.changevalue.newv")
    manageNoData.ClearValue("usenan")
    
    # then make sure the nodata value is set
    update_nodata(output_path, config['nodata'])

Replacing no-data for: /external/aedes-americas/data-raw/LACR-pop-density-2015-100m-0000098304-0000065536.tif
2020-05-31 17:39:12 (INFO) ManageNoData: Default RAM limit for OTB is 256 MB
2020-05-31 17:39:12 (INFO) ManageNoData: GDAL maximum cache size is 801 MB
2020-05-31 17:39:12 (INFO) ManageNoData: OTB will use at most 8 threads
2020-05-31 17:39:13 (INFO): Estimated memory for full processing: 426.533MB (avail.: 256 MB), optimal image partitioning: 2 blocks
2020-05-31 17:39:13 (INFO): File /external/aedes-americas/data-raw/LACR-pop-density-2015-100m-0000098304-0000065536-nd.tif will be written in 3 blocks of 10752x1168 pixels
Writing /external/aedes-americas/data-raw/LACR-pop-density-2015-100m-0000098304-0000065536-nd.tif?&gdal:co:COMPRESS=DEFLATE&gdal:co:TILED=YES...: 100% [**************************************************] (2s)
Replacing no-data for: /external/aedes-americas/data-raw/LACR-pop-density-2015-100m-0000065536-0000065536.tif
2020-05-31 17:39:15 (INFO): Estimated memory

### 1.4 Mosaicing and reprojecting imagery
First we'll create a virtual raster mosaic, then we'll run `gdalwarp` to reproject the imagery

In [None]:
# get the list of masked file paths
raw_paths = glob.glob(os.path.join(config['data-dir'], 'LACR-pop-density-*-nd.tif'))
raw_paths.sort()