# Introduction

The *pyveg* package contains some useful functions for interacting with the Python API of Google Earth Engine.

However, before we can use GEE, we need to authenticate (assuming we have an account).

In [None]:
import ee
ee.Authenticate()

## Pipelines, Sequences, and Modules

In pyveg, we have the concept of a "Pipeline" for downloading and processing data from GEE.

A Pipeline is composed of one or more Sequences, which are in turn composed of Modules.

A Module is an class designed for one specific task (e.g. "download vegetation data from GEE", or "calculate network centrality of binary images"), and they are generally grouped into Sequences such that one Module will work on the output of the previous one.  
So our standard Pipeline has:
* A vegetation Sequence consisting of VegetationDownloader, VegetationImageProcessor, NetworkCentralityCalculator, and NDVICalculator.   
* A weather Sequence consisting of WeatherDownloader, WeatherImageToJSON
* A combiner Sequence consisting of a single combiner Module, that takes the outputs of the other two Sequences and produces a final output file.

### Running the full pipeline from the command-line

For the second part of this notebook will will demonstrate running individual Modules and Sequences, but the majority of users will probably just want to run the full Pipeline for their selected location/collection/date range, so we will cover that first.

We have a couple of "entrypoints" (i.e. command-line commands) linked to functions in some pyveg scripts to help do this.  
* To configure and run a downloading-and-processing pipeline we run the command `pyveg_run_pipeline --config_file <some-config-file>`
* To generate the config file in the above command we have the command `pyveg_generate_config`.

Both these can accept multiple command-line arguments, and these can be seen with the `--help` argument:

In [5]:
!pyveg_generate_config --help

usage: pyveg_generate_config [-h] [--configs_dir CONFIGS_DIR]
                             [--collection_name COLLECTION_NAME]
                             [--output_dir OUTPUT_DIR] [--test_mode]
                             [--latitude LATITUDE] [--longitude LONGITUDE]
                             [--country COUNTRY] [--start_date START_DATE]
                             [--end_date END_DATE]
                             [--time_per_point TIME_PER_POINT]
                             [--run_mode RUN_MODE] [--n_threads N_THREADS]

create a config file for running pyveg_pipeline

optional arguments:
  -h, --help            show this help message and exit
  --configs_dir CONFIGS_DIR
                        path to directory containing config files
  --collection_name COLLECTION_NAME
                        collection name (e.g. 'Sentinel2')
  --output_dir OUTPUT_DIR
                        Directory for local output data
  --test_mode           Run in test mode, over fe

For `pyveg_generate_config` any parameters it needs that are not provided as command-line arguments will be requested from the user, and the various allowed options will be provided, along with (in most cases) default values that will be used if the user just presses "enter".
However, this doesn't seem to work so well with Jupyter, so let's just provide all the arguments it needs:

In [None]:
!pyveg_generate_config --configs_dir ../../pyveg/configs --collection_name Sentinel2 --output_dir ./ --test_mode --latitude 11.58 --longitude 27.94 --country Sudan --start_date 2019-01-01 --end_date 2019-04-01 --time_per_point 1m --run_mode local --n_threads 2

latitude 11.58 lat_range[0] -90.0 lat_range[1]
Enter name of country, or press return to use OpenCage country lookup based on coordinates : 

In [3]:
x = input("say something")

say somethingdasfafa


In [4]:
x


'dasfafa'

## Prepare data - get download URL

To specify what we want to download, we create a dictionary.  There is a file *config.py* in pyveg that demonstrates the format (and this is the default one that will be used if you use the command line entrypoint *pyveg_gee_download*).

For this example, let's download some Sentinel 2 NDVI data:

In [None]:
collection_dict = {
        'collection_name': 'COPERNICUS/S2',
        'type': 'vegetation',
        'RGB_bands': ('B4','B3','B2'),
        'NIR_band': 'B8',
        'cloudy_pix_flag': 'CLOUDY_PIXEL_PERCENTAGE',
}


we also need to specify the coordinates we want to look at (in ***(long,lat)*** format) - let's look at one of our locations in the Sahel:

In [None]:
coords = [28.37,11.12]

And we need to choose a date range.  If we are looking at vegetation data as in this case, we will take the median of all images available within this date range (after filtering out cloudy ones).

In [None]:
date_range = ["2018-06-01","2018-07-31"]

Now we're ready to talk to our GEE interface:

In [None]:
urls = ee_prep_data(collection_dict, coords, date_range)

GEE has given us a URL from where we can download a zipfile, that will in contain one .tif file per band.

In [None]:
urls[0]

## Downloading and unzipping 

We need to choose a directory in which to put the unzip-ed .tif files.   Let's just use a temporary directory.

In [None]:
tif_filebase = download_and_unzip(urls[0][0], "/tmp/gee_test_veg")[0]

## Constructing RGB, NDVI, and binary images.

So we have some .tif files locally (one per band), but they're not that interesting to look at (most image viewing software won't interpret the pixel values in a way that we can see).

The first and simplest thing we can do is to create an RGB image from those three bands.

In [None]:
from pyveg.src.image_utils import *

In [None]:
rgb_img = convert_to_rgb(tif_filebase,collection_dict["RGB_bands"])

We have to jump through a few hoops to look at this image in a jupyter notebook, but in a script you could just do rgb_img.save(<filename>).

In [None]:
from matplotlib.pyplot import imshow
import numpy as np

In [None]:
%matplotlib inline

In [None]:
imshow(np.asarray(rgb_img))

## Single band image (e.g. NDVI)

OK, let's look at the NDVI image.  Again the tif file will only contain one value per pixel - if we want to look at it we need to set r,g,b pixel values to somewhere in the 8-bit colour range to get a greyscale image.

In [None]:
ndvi_img = scale_tif(tif_filebase, "NDVI")
imshow(np.asarray(ndvi_img))

## Image processing and thresholding

For our vegetation study, we want to create a binary image from this, where vegetation is in black, and bare soil is white.

We have a function in pyveg that does histogram equalization, adaptive thresholding and median filtering on an input image, to give us a binary version:

In [None]:
binary_img = process_and_threshold(ndvi_img)
imshow(np.asarray(binary_img))

## Getting weather data.

For our study, we are also interested in the precipitation for this region, and this time range.
We can use the ERA5 dataset for this.

In [None]:
collection_dict = {
        'collection_name': 'ECMWF/ERA5/MONTHLY',
        'type': 'weather',
        'precipitation_band': ['total_precipitation'],
        'temperature_band': ['mean_2m_air_temperature']
}

Here, when we ask GEE for the images within our date range, because we set "type" to be "weather", rather than taking the median image, we will take the sum of the precipitation, and the mean of the temperature.

In [None]:
urls = ee_prep_data(collection_dict, coords, date_range)
download_path = "/tmp/gee_test_weather"
tif_filebase = download_and_unzip(urls[0][0], download_path)[0]

In the pyveg *process_satellite_data* module we have some code in a function called *get_weather* that downloads the ERA5 data and puts it into a dictionary.  Let's copy the last few lines of that here.

In [None]:
metrics_dict = {}

for file in os.listdir(download_path):
    if file.endswith(".tif"):
        name_variable = (file.split('.'))[1]
        variable_array = cv.imread(os.path.join(download_path, file), cv.IMREAD_ANYDEPTH)

        metrics_dict[name_variable] = variable_array.mean().astype(np.float64)
metrics_dict

So we can see there was 14cm rain in total in this region in June and July 2018.

## Splitting vegetation image into sub-images and analysing connectedness

This is quite specific to our analysis, but we also have code to divide the images seen above, which cover about 0.1 degrees in latitude and longitude, into small 50x50 sub-images.  These are then the input to our network centrality calculation.

In [None]:
sub_images = crop_image_npix(binary_img, 50)
imshow(np.asarray(sub_images[300]))

We can then run the network centrality on this, and quantify the connectedness of the vegetation.

In [None]:
from pyveg.src.subgraph_centrality import subgraph_centrality, feature_vector_metrics

In [None]:
img_array = pillow_to_numpy(sub_images[300])
feature_vec, _ = subgraph_centrality(img_array)

The feature vector is the Euler Characteristic values for each quantile of vegetation-covered pixels, ordered by subgraph centrality.

In [None]:
plt.plot(feature_vec,'o')

The single number we are using to quantify the connectedness is "offset50" - essentially the slope of the second half of the feature vector.

In [None]:
offset50 = feature_vector_metrics(feature_vec)["offset50"]
offset50