# vegMapper

https://github.com/NaiaraSPinto/VegMapper

This version is set up for Colaboratory but it should run fine in Jupyter notebook with a few tweaks.

## Setup

### Requirements

This cell will attempt to import *geemap* and *pandas*, installing them if unsuccessful.

In [2]:
import ee
from google.colab import files, drive
from os.path import isdir
from io import BytesIO
try:
    import geemap
    import pandas as pd
except ImportError as e:
    !pip install -q pandas geemap

### Authenticate with your GEE credentials

>Authenticate by running the next cell and following the instructions. Click the displayed link, opening a new browser tab; log in with your GEE credentials; copy the temporary token and paste it into the prompt below; hit enter.

Run the next cell and authenticate the current ipynb session. 

In [3]:
ee.Authenticate()
ee.Initialize()

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://accounts.google.com/o/oauth2/auth?client_id=517222506229-vsmmajv00ul0bs7p89v5m89qs8eb9359.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fearthengine+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&code_challenge=xYXFN5IEH-RZNxVGJYfwD8i6UWt2_W_ptPNE2d2l4GE&code_challenge_method=S256

The authorization workflow will generate a code, which you should paste in the box below. 
Enter verification code: 4/1AY0e-g76jmhQ19pvIKplXazrmBZ2WGuhvdTUTm9T8Z1qAFe2vp43RDkGYEQ

Successfully saved authorization token.


### Getting data into colaboratory 

#### Option a: Mount Google Drive

>*Do you want to mount Google Drive to your Colaboratory environment?*
>
>Drive should appear in the *Files* panel to the left after the next cell executes. You may need to hit the refresh button at the top of the panel for it to appear. Try to remember to unmount Drive once you're finished in Colaboratory. You can do that by calling this other function from the drive module: `google.colab.drive.flush_and_unmount()`


Run the next cell to mount your files to the *DRIVE/* directory inside the current workspace.

In [4]:
if not isdir("DRIVE"):
    try:
        drive.mount("DRIVE")
    except Exception as e:
        raise e

!ls DRIVE/

Mounted at DRIVE
MyDrive


### Option b: Upload files to colaboratory

*The workflow requires input features to determine the areas in which to calculate zonal statistics.*

Make sure a suitable file exists in the colab workspace or in Google Drive. You can provide one in either of two ways:

1. Navigate to an input CSV in Google Drive and copy its path into the cell below (assuming Drive is mounted), or
2. Run the next cell as-is and upload a file to the workspace when prompted.

If the second option, run the next cell and click *Choose Files* to upload a file. You may also click *Cancel Upload* to abort the cell and move on.

In [5]:
uploads = files.upload()

Saving smartin.csv to smartin.csv


Now set the path to the input CSV. (You can skip this step if you uploaded a CSV in the last cell.)

In [6]:
csv = None  #"inputs.csv"
if csv is None and len(uploads)!=0:
    csv = list(uploads)[0]

print(csv)

smartin.csv


## Input zones/regions for stats

As mentioned before, this procedure builds a stack of images and calculates the zonal statistics within regions defined by an input feature dataset. 

*Remember: CSV is the only supported input file format at this time.* It should have these columns at a minimum:

* *lat* (float)
* *lon* (float)
* *landcover* (int, optional)


In [7]:
pts = pd.read_csv(csv)

pts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   longitude   100 non-null    float64
 1   latitude    100 non-null    float64
 2   obs_year    50 non-null     float64
 3   class       50 non-null     object 
 4   class_2017  100 non-null    object 
dtypes: float64(3), object(2)
memory usage: 4.0+ KB


### Get feature collection containing point geometries

>We will make a feature collection from the table of points such that a reducer function can be efficiently mapped over point geometries to compute an output statistic for all images/bands in the stack.
>
>Here is some helpful GEE docs to describe *ee.FeatureCollection*: 
>* https://developers.google.com/earth-engine/guides/feature_collections
>* 


Make point geometry for each row from the columns of latitudes and longitudes, then get the geometries as a feature collection. 

In [8]:
def get_geom(x):
    return ee.Geometry.Point(x['longitude'], x['latitude'])

pfc = ee.FeatureCollection(pts.apply(get_geom, axis=1).tolist())

type(pfc)

ee.featurecollection.FeatureCollection


### Region of Interest

Get the minimum bounding extent of the points in the input CSV. Add an arbitrary buffer around the minimum extent and then get a ee.Geometry.Rectangle to represent the ROI.

In [10]:
lon_min = pts['longitude'].min()
lon_max = pts['longitude'].max()
lat_min = pts['latitude'].min()
lat_max = pts['latitude'].max()

roi = ee.Geometry.Polygon([[lon_min, lat_max],
                           [lon_min, lat_min],
                           [lon_max, lat_min],
                           [lon_max, lat_max]])

M = geemap.Map(center=[pts['latitude'].mean(), pts['longitude'].mean()], zoom=8)
M.addLayer(roi, {'color': 'd63000'}, 'Input ROI')
M.addLayer(pfc, name='Input XY')
M.add_layer_control()
M

### Time coverage

Define the time coverage for the output stack by the start and end times of the source datasets (Sentinel-1, Landsat 8, MODIS, etc). The stack will be produced from EOS observations collected only during this period.

In [11]:
startDate = '2017-01-01'  #@param {type: "date"}
endDate = '2017-12-31'    #@param {type: "date"}

## Imagery

Important concepts in GEE:

* Scale: https://developers.google.com/earth-engine/guides/scale
* Projections: https://developers.google.com/earth-engine/guides/projections
  * Compositing: https://developers.google.com/earth-engine/guides/ic_reducing#composites-have-no-projection
  
>If I'm not mistaken, our approach below (which calls *mean* or *median* on a filtered *ImageCollection* from each imagery source) is considered "compositing".

### Sentinel-1

https://developers.google.com/earth-engine/guides/sentinel1

In [12]:
s1 = (ee.ImageCollection("COPERNICUS/S1_GRD_FLOAT")
        .filter(ee.Filter.eq("orbitProperties_pass", "DESCENDING"))
        .filter(ee.Filter.eq("instrumentMode", "IW"))
        .filterDate(startDate, endDate)
        .filterBounds(roi)
        .mean())

# Calculate radar volume index and add it as a band.
s1.addBands(s1.expression(
    expression="4 * VH / (VH + VV)", 
    opt_map={'VV': s1.select('VV'), 
             'VH': s1.select('VH')}
).rename('radar_volume_index'))

<ee.image.Image at 0x7f619b58cdd0>

### ALOS2

https://developers.google.com/earth-engine/datasets/catalog/JAXA_ALOS_PALSAR_YEARLY_SAR

Calculate radar volume index: `4 * VH / (VH + VV)`

In [13]:
alos2 = (ee.ImageCollection('JAXA/ALOS/PALSAR/YEARLY/SAR')
           .select(ee.List(['HV', 'HH']))
           .filterDate(startDate, endDate)
           .filterBounds(roi)
           .mean())

# Calculate radar volume index and add it as a band
alos2.addBands(alos2.expression(
    expression="4 * HV / (HV + HH)", 
    opt_map={'HV': alos2.select('HV'),
             'HH': alos2.select('HH')}
).rename('radar_volume_index'))

<ee.image.Image at 0x7f619ba2b490>

### NDVI from Landsat

* https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_SR
* https://developers.google.com/earth-engine/guides/image_math#colab-python_1

In [21]:
def mask_l8sr(image):
    cloudShadowBitMask = 1<<3
    cloudBitMask = 1<<5
    qa = image.select('pixel_qa')
    mask = qa.bitwiseAnd(cloudShadowBitMask).eq(0).And(
           qa.bitwiseAnd(cloudBitMask).eq(0))
    return image.updateMask(mask)


l8sr = (ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
          .filterDate(startDate, endDate)
          .filterBounds(roi)
          .map(mask_l8sr)
          .median())

# Get NDVI and add the band to the surface reflectance image.
l8out = l8sr.addBands(l8sr.normalizedDifference(['B5', 'B4']).rename('ndvi'))

### VCF from MODIS

In [15]:
modis = (ee.ImageCollection('MODIS/006/MOD44B')
           .select('Percent_Tree_Cover')
           .filterDate(startDate, endDate)
           .filterBounds(roi)
           .first())

## Run the prediction

Not clear on this so it's removed for now:

```python
prior_mean = [0.06491638, -26.63132179, 0.05590800, -29.64091620]
prior_mean_int = ee.Number(6.07)
prediction = (ee.Image(prior_mean_int)
              #.add((Fmod_tc_aoi.multiply(prior_mean[0]))
              .add(s1.select('radar_volume_index').multiply(prior_mean[1]))
              .add(l8sr.select('ndvi').multiply(prior_mean[2]))
              #.add(smooth.select('constant').multiply(prior_mean[3]))))
              ).clip(roi)
predx = prediction.exp()
pred_final = ee.Image(predx.divide(predx.add(1)))
type(pred_final)
```

### Build the stack

>Documentation for the image method *clipToBoundsAndScale* is helpful to understanding this step in GEE: https://developers.google.com/earth-engine/apidocs/ee-image-cliptoboundsandscale
>
>Also see this information on compositing and image projections:      
>https://developers.google.com/earth-engine/guides/projections#the-default-projection

Configure some user preferences here to determine how the stack is created on a common grid. 

In [18]:
mode = "bilinear"  # Resampling method: ["nearest", "bilinear", "bicubic"]
scale = 30         # Resolution in GIS-speak; given in meters

def resampler(image, resample_method: str="bilinear"):
    return image.resample(mode=resample_method)

Build the stack by resampling all the imagery to a common grid and clip them to the region of interest defined by the `roi` geometry.    
(In this case, "common grid" means that pixels in the grid have identical xy coverage across all source bands.)

Achieve this in three simple steps:
1. Merge the imagery from the cells above into one *ImageCollection*.
2. Map the `resampler` function over images in the *ImageCollection*.
3. Convert the *ImageCollection* back into a multi-band *Image*.


In [22]:
stack = (ee.ImageCollection.fromImages([s1, alos2, l8out, modis])
           .map(resampler)  
           .toBands()
           .clipToBoundsAndScale(geometry=roi, scale=scale))

type(stack)

ee.image.Image

Make sure that all the bands were named sensibly.

In [23]:
bands = stack.bandNames().getInfo()

bands

['0_VV',
 '0_VH',
 '0_angle',
 '1_HV',
 '1_HH',
 '2_B1',
 '2_B2',
 '2_B3',
 '2_B4',
 '2_B5',
 '2_B6',
 '2_B7',
 '2_B10',
 '2_B11',
 '2_sr_aerosol',
 '2_pixel_qa',
 '2_radsat_qa',
 '2_ndvi',
 '3_Percent_Tree_Cover']

Verify spatial referencing information by displaying a dictionary of SRS information for each band.

In [24]:
srs = {}
for b in bands:
    p = stack.select(b).projection()
    srs[b] = p.getInfo()
    srs[b]['nominalScale'] = p.nominalScale().getInfo()
    del srs[b]['type']

# All image have identical projections if this test returns true:
len(list(set([str(p) for p in list(srs.values())]))) == 1

True

We are good to proceed if *True* is displayed as the only output of the cell directly above. It means that all imagery in the final stack have the same projection information.

Sanity check: make sure the dictionaries contains sensible projection information.

In [27]:
list(srs.items())[0][1]  # Print the dictionary for the first band.

{'crs': 'EPSG:4326',
 'nominalScale': 30.000000000000004,
 'transform': [0.00026949458523585647, 0, 0, 0, -0.00026949458523585647, 0]}

### Render the stack on a map

GEE makes me so nervous about what's happening on the backend... Mapping for eyeball verification is a must. That's pretty easy with the `geemap` package.

In [30]:
Map = geemap.Map(center=[pts['latitude'].mean(), pts['longitude'].mean()], zoom=8)
for b in bands:
    Map.addLayer(stack.select(b), {'min': -1, 'max': 1.0}, name=b)
Map.add_layer_control()
Map

## Zonal statistics

Map over the feature collection after building the stack.

In [None]:
outputs = stack.reduceRegions(
    collection=pfc,
    reducer=ee.Reducer.mean(),
    crs=stack.projection(),
    scale=scale,
)

type(outputs)

ee.featurecollection.FeatureCollection

In [None]:
rename_lookup = {
    '0': "S1",
    '1': "ALOS2",
    '2': "LS8",
    '3': "MODIS",
}

def rename_columns(x):
    if x.startswith("properties."):
        # Remove 'properties' prefix from column names.
        x = x.split(".")[1]
        # Look up/replace the integer prefix with imagery source.
        x = x.replace(x[:2], rename_lookup[x[0]] + "-")
    # Replace periods '.' with hyphens for compatibility with pandas.
    x = x.replace(".", "-")
    return x

# Get the FeatureCollection dataset as a Python dictionary.
outputs_dict = outputs.getInfo()

# Call pd convenience function 'json_normalize' to translate to table.
outputs_table = pd.json_normalize(outputs_dict['features'])

# Get formatted column names for all columns in the table.
outputs_cols = outputs_table.apply(lambda x: rename_columns(x.name)).to_dict()

# Rename the columns and print summary statistics.
outputs_final = outputs_table.rename(mapper=outputs_cols)

outputs_final.describe()

Unnamed: 0,properties.0_VH,properties.0_VV,properties.0_angle,properties.1_HH,properties.1_HV,properties.2_B1,properties.2_B10,properties.2_B11,properties.2_B2,properties.2_B3,properties.2_B4,properties.2_B5,properties.2_B6,properties.2_B7,properties.2_pixel_qa,properties.2_radsat_qa,properties.2_sr_aerosol,properties.3_Percent_Tree_Cover
count,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0
mean,0.051191,0.225688,38.743445,6464.705847,3660.110781,215.561098,2934.569692,2908.097547,232.545544,414.560264,268.335613,2990.863214,1393.915991,588.577543,322.0,0.0,143.166817,61.666667
std,0.004857,0.025965,0.097461,205.684514,460.404152,48.527183,25.335914,20.724111,33.068989,33.149141,12.965712,344.560974,129.376544,50.502895,0.0,0.0,11.985447,25.696952
min,0.04709,0.200354,38.665733,6237.099725,3131.81935,162.498048,2905.334317,2884.167514,194.374306,379.749895,253.50037,2679.523727,1284.349397,531.376262,322.0,0.0,135.996392,32.0
25%,0.04851,0.212412,38.68877,6378.424436,3502.283768,194.49915,2926.792287,2902.08374,222.562432,398.964845,263.754107,2805.763417,1322.550881,569.364166,322.0,0.0,136.248547,54.0
50%,0.049929,0.224469,38.711807,6519.749147,3872.748187,226.500253,2948.250257,2919.999966,250.750559,418.179796,274.007844,2932.003107,1360.752365,607.352071,322.0,0.0,136.500702,76.0
75%,0.053242,0.238356,38.782301,6578.508907,3924.256497,242.092623,2949.187379,2920.062564,251.631162,431.965448,275.753234,3146.532958,1448.699288,617.178184,322.0,0.0,146.752029,76.5
max,0.056555,0.252242,38.852795,6637.268668,3975.764807,257.684993,2950.124501,2920.125162,252.511766,445.7511,277.498623,3361.06281,1536.646211,627.004296,322.0,0.0,157.003356,77.0


## Export to Google Drive

>*Important: Make sure to specify a path that's inside the directory that corresponds to the Google Drive mount.*

We used pandas to generate the table so simply write to Google Drive with the data frame's `to_csv` method.

In [None]:
# # Omit the column of json metadata from the output table.
# out_columns = [c for c in out.columns if c != "metadata"]

# Write the selected columns to a csv:
outputs_final.to_csv("results.csv", index=None)

## Download to local disk

>You may want to move this to a configuration cell near the top of the notebook: 
>```
>output_filename = "results.csv"
>```

Run this next cell to save to your local machine as a CSV.

In [None]:
output_filename = "results.csv"

# Write a CSV into the colaboratory workspace.
outputs_final.to_csv(output_filename, index=None)

# This function triggers a prompt for you to save the file to local disk.
files.download(filename=output_filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Clean-up

Try not to forget to unmount Google Drive.

In [None]:
drive.flush_and_unmount()

## References

Helpful links:

* https://developers.google.com/earth-engine/guides/resample#resampling
* https://developers.google.com/earth-engine/tutorials/community/extract-raster-values-for-points#understanding_which_pixels_are_included_in_polygon_statistics
  * https://developers.google.com/earth-engine/tutorials/community/extract-raster-values-for-points#notes_on_crs_and_scale
* https://developers.google.com/earth-engine/tutorials/community/extract-raster-values-for-points#zonalstatsfc_params_%E2%87%92_eefeaturecollection
* https://developers.google.com/earth-engine/tutorials/community/beginners-cookbook#example_exporting_data