# Data Stack Preparation (data-prep)

This tutorial will walk you through the workflow of the [VegMapper](https://github.com/NaiaraSPinto/VegMapper) repo. At the end of this tutorial, you will create multi-band geotiffs that can be used for the identification and classification of specific agroforestry systems, such as palm-oil plantations.

## 1) Get credentials ##
This repo makes use of several third-party services which will require credentials. These can be obtained using the following links. Please note that approval of an account may take several days in some cases.

1) [NASA Earthdata](https://urs.earthdata.nasa.gov/users/new)

2) [JAXA](https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/registration.htm)

3) [AWS S3/EC2](https://portal.aws.amazon.com/billing/signup#/start) or [Google Cloud Storage/GCS](https://cloud.google.com/storage/), if using cloud storage is desired

4) [Google Earth Engine](https://earthengine.google.com/)

One-time authentication for Google Earth Engine:

In [1]:
import ee
ee.Authenticate()

Enter verification code: xxxxxx
Successfully saved authorization token.


## 2) Set up data-prep conda environment ##

To create the **data-prep** environment and install required packages, run these commands in your terminal:

```
% cd data-prep
% conda env create -f data-prep-env.yml
% conda activate data-prep
```

Run **setup.py** to verify environment and permissions of scripts:

In [1]:
%run setup.py

ALOS-2/alos2_download_mosaic.py
ALOS-2/alos2_proc.py
Landsat/gee_export_landsat_ndvi.py
MODIS/gee_export_modis_tc.py
Sentinel/s1_build_vrt.py
Sentinel/s1_metadata_summary.py
Sentinel/s1_proc.py
Sentinel/s1_remove_edges.py
Sentinel/s1_submit_hyp3_jobs.py
Stacks/build_stacks.py
Stacks/build_condensed_stacks.py
Utils/calc_vrt_stats.py
Utils/prep_tiles.py
Utils/remove_edges.py


## 3) Set up project directory ##

This repo will make use of one consistent project directory, referred from here as proj_dir. The name of proj_dir is up to you, but all subfolders and completed tiles will be generated automatically. The completed stacks as well as any intermediate products will be stored in proj_dir. At the moment, AWS S3, GCS, and local storage systems are supported. Some extra setup may be required, depending on your storage system of choice:

* To set up AWS Command Line Interface (CLI) configurations and credentials (required if your proj_dir is S3):

    ```
    (data-prep) % aws configure
    ```

    where you will be asked to enter your **aws_access_key_id** and **aws_secret_access_key**.


* To set up Google Cloud gsutil tool (required if your proj_dir is GCS):

    ```
    (data-prep) % gsutil config
    ```

    Then you will be prompted to sign in using your Google credentials. 
    
    
* If using local storage, create a new folder at the desired location in your filesystem. Note that using local storage is not necessarily advised as the stacks generated will be large files, and some steps may be quicker in the cloud.

## 4) Prepare UTM tiles for Area of Interest (AOI) ##

To create the stacks, a universal tiling system is required to ensure all data sources are aligned to the same grid. In the following section, we will generate a geoJSON file that contains the tiles to be used by all of the data processing scripts.

### prep_tiles.py ###

prep.tiles.py will create the tile geoJSON. It takes three required arguments: aoi_name, aoi_shp, and tile_size. aoi_name is used to name the output geojson. aoi_shp should point to a shapefile or geoJSON of the area of interest. geoJSON files for many subnational administrative boundaries can be found here: (insert link) tile_size is the desired size of each tile, in meters.

### Usage ###

```
(data-prep) % python prep_tiles.py [-h] aoi_name aoi_shp tile_size
```

Suppose our area of interest is the Ucayali region of Peru, highlighted in blue:

![ucayali_region](img/ucayali_region.png)

Create tiles for our AOI with a tile size of 150x150 km:

In [9]:
%run Utils/prep_tiles.py AOI/ucayali/ucayali AOI/ucayali/ucayali_boundary.geojson 150000

Tiles for Aoi/ucayali/ucayali: aoi/ucayali/ucayali_tiles.geojson
14 out of 20 tiles intersecting Aoi/ucayali/ucayali


Output tiles:

![ucayali_tiles](img/ucayali_tiles.png)

Note that some tiles do not intersect the region. These will be masked out and will not be used for the final stacks.

## 5) Prepare Sentinel-1 Tiles ##

The first piece of the data stack is Sentinel-1 tiles. In the following section, we search for granules within our AOI, process them using the ASF HyP3 API, and calculate statistics for the granules, saving the results as .tif files either locally or in the cloud.

## Search for Sentinel-1 granules on [ASF Vertex](https://search.asf.alaska.edu/#/) ##

1. Sign in using your Earthdata credentials. If you haven't used ASF Vertex before, you will need to agree their terms in order to use their HyP3 processing.

2. Use the following "Additional Filters" when searching for granules within your AOI:

    * File Type: L1 Detected High-Res Dual-Pol (GRD-HD)
    * Beam Mode: IW
    * Polarization: VV+VH

    ![vertex_search_filters](img/vertex_search_filters.png)

3. Add the selected granules into the download queue:

    ![vertex_add_queue](img/vertex_add_queue.png)

4. Download metadata files. Download at least one csv or geojson file, which will be used for submitting HyP3 jobs.

    ![vertex_download_metadata](img/vertex_download_metadata.png)

5. Clear the selected granules in the downloads. Do not download these GRD-HD products as we will submit HyP3 jobs to apply radiometric terrain correction (RTC) to them.

## Submit HyP3 RTC jobs ##

For the initial processing of the Sentinel-1 granules, we make use of ASF's HyP3 API. Information about the specifics of this processing can be found in the [HyP3 documentation](https://hyp3-docs.asf.alaska.edu/).

### s1_submit_hyp3_jobs.py ###

s1_submit_hyp3_jobs.py will submit the granules chosen in the previous step to the HyP3 API for processing. The results can either be saved locally or uploaded to cloud storage, such as AWS S3 or Google Cloud Storage (GCS). It takes two arguments: proj_dir, and csv/geojson. proj_dir should point to the location where results will be copied to. Supported proj_dirs are:
* AWS S3: s3//your_bucket/some_prefix
* GCS: gs://your_bucket/some_prefix
* Local storage: your_local_path

The processed granules will be saved in the following directory structure:
```
        proj_dir
           └──sentinel_1
               └──year
                   └──path_frame
                       └──processed_granules
```
csv/geojson should point to the location of the metadata files containing a list of granules to be submitted for processing downloaded in the previous step.

### Notes ###

* Since ASF HyP3 stores the processed granules in their AWS S3 buckets, the data transfer will be much faster if you set up your S3 bucket to host these data. That is, using **s3://your_bucket/some_prefix** for the proj_dir option.


    
### Usage ###

```
(data-prep) % python s1_submit_hyp3_jobs.py [-h] proj_dir csv/geojson
```

Submit Sentinel-1 granules from the Ucayali region in the year 2017 for processing:

In [16]:
%run Sentinel/s1_submit_hyp3_jobs.py Sentinel/granules/ucayali/ucayali_sentinel_granules_2017.geojson


Enter Earthdata Username: xxxxxx
Enter Earthdata Password: ········

Your remaining quota for HyP3 jobs: 250 granules.

You will be submitting the following granules for HyP3 RTC processing:
    2017_171_617 - 18 granules
    2017_171_622 - 18 granules
    2017_25_621 - 20 granules
    2017_25_626 - 16 granules
    2017_25_631 - 16 granules
    2017_98_617 - 28 granules
    2017_98_622 - 28 granules
    2017_98_627 - 28 granules

Enter 'Y' to confirm you would like to submit these granules, or 'N' if you have already submitted the granules and want to copy the processed granules to your proj_dir: N

2017_171_617: downloading processed granules to ./sentinel_1
2017_171_617: There was an error when downloaing processed granules from ASF S3 bucket to .. Continuing to the next granule ...


Traceback (most recent call last):
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 188, in main
    download_granules(proj_dir, year, path_frame, granule_sources)
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 109, in download_granules
    raise Exception('\nJobs already expired and cannot be copied.')
Exception: 
Jobs already expired and cannot be copied.



2017_171_622: downloading processed granules to ./sentinel_1
2017_171_622: There was an error when downloaing processed granules from ASF S3 bucket to .. Continuing to the next granule ...


Traceback (most recent call last):
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 188, in main
    download_granules(proj_dir, year, path_frame, granule_sources)
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 109, in download_granules
    raise Exception('\nJobs already expired and cannot be copied.')
Exception: 
Jobs already expired and cannot be copied.



2017_25_621: downloading processed granules to ./sentinel_1
2017_25_621: There was an error when downloaing processed granules from ASF S3 bucket to .. Continuing to the next granule ...


Traceback (most recent call last):
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 188, in main
    download_granules(proj_dir, year, path_frame, granule_sources)
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 109, in download_granules
    raise Exception('\nJobs already expired and cannot be copied.')
Exception: 
Jobs already expired and cannot be copied.



2017_25_626: downloading processed granules to ./sentinel_1
2017_25_626: There was an error when downloaing processed granules from ASF S3 bucket to .. Continuing to the next granule ...


Traceback (most recent call last):
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 188, in main
    download_granules(proj_dir, year, path_frame, granule_sources)
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 109, in download_granules
    raise Exception('\nJobs already expired and cannot be copied.')
Exception: 
Jobs already expired and cannot be copied.



2017_25_631: downloading processed granules to ./sentinel_1
2017_25_631: There was an error when downloaing processed granules from ASF S3 bucket to .. Continuing to the next granule ...


Traceback (most recent call last):
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 188, in main
    download_granules(proj_dir, year, path_frame, granule_sources)
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 109, in download_granules
    raise Exception('\nJobs already expired and cannot be copied.')
Exception: 
Jobs already expired and cannot be copied.



2017_98_617: downloading processed granules to ./sentinel_1
2017_98_617: There was an error when downloaing processed granules from ASF S3 bucket to .. Continuing to the next granule ...


Traceback (most recent call last):
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 188, in main
    download_granules(proj_dir, year, path_frame, granule_sources)
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 109, in download_granules
    raise Exception('\nJobs already expired and cannot be copied.')
Exception: 
Jobs already expired and cannot be copied.



2017_98_622: downloading processed granules to ./sentinel_1
2017_98_622: There was an error when downloaing processed granules from ASF S3 bucket to .. Continuing to the next granule ...


Traceback (most recent call last):
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 188, in main
    download_granules(proj_dir, year, path_frame, granule_sources)
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 109, in download_granules
    raise Exception('\nJobs already expired and cannot be copied.')
Exception: 
Jobs already expired and cannot be copied.



2017_98_627: downloading processed granules to ./sentinel_1
2017_98_627: There was an error when downloaing processed granules from ASF S3 bucket to .. Continuing to the next granule ...

Done with everything.


Traceback (most recent call last):
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 188, in main
    download_granules(proj_dir, year, path_frame, granule_sources)
  File "/Users/Remy/OneDrive - Cal Poly/HDD/cal poly/classes/deep gis/palm oil/VegMapper/data-prep/Sentinel/s1_submit_hyp3_jobs.py", line 109, in download_granules
    raise Exception('\nJobs already expired and cannot be copied.')
Exception: 
Jobs already expired and cannot be copied.


## Sentinel-1 Processing ##

The final processing step involves calculating the temporal mean for the Sentinel-1 granules and removing left/right (cross-track) edge pixels where border noise is prominent. 

### s1_proc.py ###

s1_proc.py handles this final processing step. It achieves this using helper scripts in the Sentinel directory: **s1_build_vrt.py** and **calc_vrt_stats.py** for the temporal mean, and **remove_edges.py** for removing edge pixels. It takes 5 arguments: path_frame, m1, and m2 (all optional), srcpath, and year (both required). If path_frame is specified, only granules matching path_frame will be processed; otherwise, all path_frames under srcpath/year will be processed. If m1 and m2 are specified, only granules with acquisition month >= m1 and <= m2 will be processed. srcpath should point to where processed granules are stored (see previous section for valid paths). year is the year of granules to be processed.

### Notes ###

* The processing will be slow if srcpath is on AWS S3 or GCS because it requires heavy network I/O between the cloud and your local machine. If srcpath is on AWS S3, it is strongly recommended that you run the processing on AWS EC2.

### Usage ###

```
(data-prep) % python s1_proc.py [-h] [--pf path_frame] [--m1 m1] [--m2 m2] srcpath year
```

Process all granules stored locally for the year 2017:

In [2]:
%run Sentinel/s1_proc.py . 2017


Processing Sentinel-1 RTC data in ./sentinel_1/2017 ...

Processing 2017_171_622 ...


ls: sentinel_1/2017/171_622/*.zip: No such file or directory
Traceback (most recent call last):
  File "/Users/Remy/opt/anaconda3/envs/data-prep/bin/s1_build_vrt.py", line 115, in <module>
    main()
  File "/Users/Remy/opt/anaconda3/envs/data-prep/bin/s1_build_vrt.py", line 70, in main
    zip_list = subprocess.check_output(ls_cmd, shell=True).decode(sys.stdout.encoding).splitlines()
  File "/Users/Remy/opt/anaconda3/envs/data-prep/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/Users/Remy/opt/anaconda3/envs/data-prep/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ls sentinel_1/2017/171_622/*.zip' returned non-zero exit status 1.


CalledProcessError: Command 's1_build_vrt.py ./sentinel_1 2017_171_622 VV --m1 1 --m2 12' returned non-zero exit status 1.

## 6) Prepare ALOS-2 tiles ##

## Download ALOS/ALOS-2 Mosaic ##

### alos2_download_mosaic.py ###

### Usage ###

```
(data-prep) % alos2_download_mosaic.py [-h] aoi year dst
```

### Description ###

Download ALOS/ALOS-2 Mosaic data from JAXA website.

### Positional Arguments ###

  **aoi**

      shp/geojson of area of interest (AOI)

  **year**

      Year

  **dst**

      Destination location (s3:// or gs:// or local paths). Downloaded data will be stored under dst/year/tarfiles/

### Notes ###

* Downloading ALOS/ALOS-2 Mosaic data requires a JAXA account, which can be registered from: https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/registration.htm.

In [None]:
# sample run here

## ALOS-2 Processing ##

### alos2_proc.py ###

### Usage ###

```
(data-prep) % alos2_proc.py [-h] proj_dir aoi year
```

### Description ###

Process ALOS-2 tiles by appyling an Enhanced Lee Filter (why? is that all it does?)

### Positional Arguments ###

  **proj_dir**

      project directory (s3:// or gs:// or local dir). ALOS/ALOS-2 mosaic data (.tar.gz) will be stored under
      proj_dir/alos2_mosaic/year/tarfiles/

  **aoi**

      shp/geojson of area of interest (AOI)

  **year**

      year

### Notes ###

* ...

In [None]:
# sample run here

## 7) Prepare Landsat Tiles ##

## Export Landsat NDVI ##

### gee_export_landsat_ndvi.py ###

### Usage ###

```
(data-prep) % gee_export_landsat_ndvi.py [-h] sitename tiles res year
```

### Description ###

Submit GEE processing for Landsat NDVI. 

### Positional Arguments ###

  **sitename**

      sitename 

  **tiles**

      shp/geojson file containing tiles onto which output raster will be resampled

  **res**

      Resolution
      
  **year**

      Year

In [None]:
# sample run here

## 8) Prepare MODIS Tree Cover Tiles ##

## Export MODIS TC ##

### gee_export_modis_tc.py ###

### Usage ###

```
(data-prep) % gee_export_modis_tc.py [-h] sitename tiles res year
```

### Description ###

Submit GEE processing for MODIS tree cover

### Positional Arguments ###

  **sitename**

      sitename 

  **tiles**

      shp/geojson file containing tiles onto which output raster will be resampled

  **res**

      Resolution
      
  **year**

      Year. 

In [None]:
# sample run here

## 9) Build Stacks ##

### build_stacks.py ###

### Usage ###

```
(data-prep) % build_stacks.py [-h] [--sitename sitename] proj_dir tiles year
```

### Description ###

Build 8 band stacks that include (C-VV / C-VH / C-INC / L-HH / L-HV / L-INC / NDVI / TC)
   
### Positional Arguments ###

  **proj_dir**
   
      project directory (s3:// or gs:// or local dirs)
  
  **tiles**
  
      shp/geojson file that contains tiles for the output stacks
      
  **year** 
  
      Year
      
###  Optional Arguements ###
      
  **sitename**
  
      sitename. If sitename not specified, proj_dir basename is used at sitename. 
  
### Notes ###

* ...

In [None]:
# sample run here