# EXERCISE 5.2: Agricultural Pipeline using Google Earth Engine 

---

**Use of Google Earth Engine (API) in python to visualize geodata - In this exercise, we will use the popular EO data platform Google Earth Engine (GEE) to access geospatial image data from the Landsat and Sentinel 2 satellite platforms. We will be covering the following major topics:**
* a) Clip and filtering GEE image collections 
* b) Summarize and analyze time series 
* c) Apply reduction to image collections
* d) Create mosaics 


You may already be familiar with GEE via the browser (javascript) API. Here we will use the convenient python API so that we can work in our preferred environment: python notebooks in Colab. There is a bit of setup involved with getting your GEE API configured, but it's worth it.


## Setup
Before working on this Exercise setup a GCP project with GEE and Google Drive APIs enabled by following the instructions given at https://docs.google.com/document/d/13SKLn_mqhlaRc1gElr4kmBrkw6KZPeqDDW3AjcTr8YY/

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import ee
import time
import sys
import numpy as np
import pandas as pd
import itertools
import os
import traceback
import urllib
import folium

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from oauth2client.service_account import ServiceAccountCredentials

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import optimize

## Setup Your Google Earth Engine Credentials
Upload the `.private-key.json` you created while setting up GEE to the current runtime. Click Files > Upload to Session storage on the left pane in this notebook to upload. <br/>
Replace the service account in the code below with your Google Cloud project service account email. It should be of the format <br/>`<id>@ml4eo-<some_number>.iam.gserviceaccount.com`

In [None]:
service_account = 'ml4eo-service@ml4eo-383508.iam.gserviceaccount.com'
credentials = ee.ServiceAccountCredentials(service_account, '.private-key.json')
ee.Initialize(credentials)


Now let's load the following helper function to be used later in the script

In [None]:
def Mapdisplay(center, dicc, Tiles="OpensTreetMap",zoom_start=10):
    '''
    :param center: Center of the map (Latitude and Longitude).
    :param dicc: Earth Engine Geometries or Tiles dictionary
    :param Tiles: Mapbox Bright,Mapbox Control Room,Stamen Terrain,Stamen Toner,stamenwatercolor,cartodbpositron.
    :zoom_start: Initial zoom level for the map.
    :return: A folium.Map object.
    '''
    mapViz = folium.Map(location=center,tiles=Tiles, zoom_start=zoom_start)
    for k,v in dicc.items():
      if ee.image.Image in [type(x) for x in v.values()]:
        folium.TileLayer(
            tiles = v["tile_fetcher"].url_format,
            attr  = 'Google Earth Engine',
            overlay =True,
            name  = k
          ).add_to(mapViz)
      else:
        folium.GeoJson(
        data = v,
        name = k
          ).add_to(mapViz)
    mapViz.add_child(folium.LayerControl())
    return mapViz



## Introduction to GEE: Getting started with Collections


In [None]:
# USGS Landsat 8 Collection 1 Tier 1 TOA Reflectance
ee.ImageCollection('LANDSAT/LC08/C01/T1_TOA')

# Import the USGS ground elevation image.
elv = ee.Image('USGS/SRTMGL1_003')

### Introduction to GEE: Image Collection
An ImageCollection is a stack or time series of images. In addition to loading an ImageCollection using an Earth Engine collection ID, Earth Engine has methods to create image collections. The constructor `ee.ImageCollection()` or the convenience method `ee.ImageCollection.fromImages()` create image collections from lists of images. You can also create new image collections by merging existing collections. 

As with Images, there are a variety of ways to get information about an ImageCollection. The collection can be printed directly to the console, but the console printout is limited to 5000 elements. Collections larger than 5000 images will need to be filtered before printing. Printing a large collection will be correspondingly slower. The following example shows various ways of getting information about image collections programmatically.

We will use the Landsat 8 Collection 1 Tier 1 calibrated top-of-atmosphere (TOA) reflectance dataset. The available bands are detailed in the table below. For more information, checkout details of the collection on the GEE data catalog: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_TOA

NOTE: This collection has been deprecated, but we can still use it for the purposes of this exercise.
<table class="eecat">
<tr>
<th scope="col">Name</th>
<th scope="col">Pixel Size</th>
<th scope="col">Wavelength</th>
<th scope="col">Description</th>
</tr>
<tr>
<td><code translate="no" dir="ltr">B1</code></td>
<td>
      30 meters
</td>
<td>0.43 - 0.45 &mu;m</td>
<td><p>Coastal aerosol</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B2</code></td>
<td>
      30 meters
</td>
<td>0.45 - 0.51 &mu;m</td>
<td><p>Blue</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B3</code></td>
<td>
      30 meters
</td>
<td>0.53 - 0.59 &mu;m</td>
<td><p>Green</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B4</code></td>
<td>
      30 meters
</td>
<td>0.64 - 0.67 &mu;m</td>
<td><p>Red</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B5</code></td>
<td>
      30 meters
</td>
<td>0.85 - 0.88 &mu;m</td>
<td><p>Near infrared</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B6</code></td>
<td>
      30 meters
</td>
<td>1.57 - 1.65 &mu;m</td>
<td><p>Shortwave infrared 1</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B7</code></td>
<td>
      30 meters
</td>
<td>2.11 - 2.29 &mu;m</td>
<td><p>Shortwave infrared 2</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B8</code></td>
<td>
      15 meters
</td>
<td>0.52 - 0.90 &mu;m</td>
<td><p>Band 8 Panchromatic</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B9</code></td>
<td>
      15 meters
</td>
<td>1.36 - 1.38 &mu;m</td>
<td><p>Cirrus</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B10</code></td>
<td>
      30 meters
</td>
<td>10.60 - 11.19 &mu;m</td>
<td><p>Thermal infrared 1, resampled from 100m to 30m</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B11</code></td>
<td>
      30 meters
</td>
<td>11.50 - 12.51 &mu;m</td>
<td><p>Thermal infrared 2, resampled from 100m to 30m</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">BQA</code></td>
<td>
      30 meters
</td>
<td></td>
<td><p>Landsat Collection 1 QA Bitmask (<a href="https://www.usgs.gov/land-resources/nli/landsat/landsat-collection-1-level-1-quality-assessment-band">See Landsat QA page</a>)</p></td>
</tr>
<tr class="alt">
<td colspan=100>
<section class="expandable">
<p class="showalways">Bitmask for BQA</p>
<ul>
<li>
            Bit 0: Designated Fill
<ul>
<li>0: No</li>
<li>1: Yes</li>
</ul>
</li>
<li>
            Bit 1: Terrain Occlusion
<ul>
<li>0: No</li>
<li>1: Yes</li>
</ul>
</li>
<li>
            Bits 2-3: Radiometric Saturation
<ul>
<li>0: No bands contain saturation</li>
<li>1: 1-2 bands contain saturation</li>
<li>2: 3-4 bands contain saturation</li>
<li>3: 5 or more bands contain saturation</li>
</ul>
</li>
<li>
            Bit 4: Cloud
<ul>
<li>0: No</li>
<li>1: Yes</li>
</ul>
</li>
<li>
            Bits 5-6: Cloud Confidence
<ul>
<li>0: Not Determined / Condition does not exist.</li>
<li>1: Low, (0-33 percent confidence)</li>
<li>2: Medium, (34-66 percent confidence)</li>
<li>3: High, (67-100 percent confidence)</li>
</ul>
</li>
<li>
            Bits 7-8: Cloud Shadow Confidence
<ul>
<li>0: Not Determined / Condition does not exist.</li>
<li>1: Low, (0-33 percent confidence)</li>
<li>2: Medium, (34-66 percent confidence)</li>
<li>3: High, (67-100 percent confidence)</li>
</ul>
</li>
<li>
            Bits 9-10: Snow / Ice Confidence
<ul>
<li>0: Not Determined / Condition does not exist.</li>
<li>1: Low, (0-33 percent confidence)</li>
<li>2: Medium, (34-66 percent confidence)</li>
<li>3: High, (67-100 percent confidence)</li>
</ul>
</li>
<li>
            Bits 11-12: Cirrus Confidence
<ul>
<li>0: Not Determined / Condition does not exist.</li>
<li>1: Low, (0-33 percent confidence)</li>
<li>2: Medium, (34-66 percent confidence)</li>
<li>3: High, (67-100 percent confidence)</li>
</ul>
</li>
</ul>
</section>
</td>
</tr>
</table>

Now let's import the collection:

In [None]:
# Load the Landsat 8 Collection 1 Tier 1 calibrated top-of-atmosphere (TOA) reflectance
collection = ee.ImageCollection('LANDSAT/LC08/C01/T1_TOA')

### Introduction to GEE: Filtering Image Collection and Selecting Bands


The function `ImageCollection.select(selectors, names)` is used to select bands from each image in a collection. <br/>
`selectors`: An array of names, regexes or numeric indices specifying
          the bands to select. <br/>
Let's select the RGB bands from the image collection.

In [None]:
trueColor432 = collection.select(['B4', 'B3', 'B2'])

The function `ImageCollection.filterDate(start, end)` filters a collection by a date range. The start and end may be Dates, numbers (interpreted as milliseconds since 1970-01-01T00:00:00Z), or strings (such as '1996-01-01T08:00'). Based on 'system:time_start'. <br/>

`start`:	Date|Number|String	The start date (inclusive). <br/>
`end`:	Date|Number|String, optional	The end date (exclusive). Optional. If not specified, a 1-millisecond range starting at 'start' is created.

Let's set setup some objects to help us filter by date:

In [None]:
# Initial date of interest (inclusive).
i_date = '2017-01-01'

# Final date of interest (exclusive).
f_date = '2020-01-01'

In [None]:
# Filter the selection B4, B3, B2 by the dates
rgb_17_20 = trueColor432.filterDate(i_date, f_date)

## Question 5.2.1
Select the Near Infrared, Shortwave infrared 1, and Shortwave infrared 2 bands from the USGS Landsat 8 Collection 1 Tier 1 TOA Reflectance dataset and filter the data from dates January 1, 2015 to Dec 31, 2015.

## Introduction to GEE: Time series

We can get geospatial information about a particular area of interest (AOI) very easily using GEE. It's often makes sense to analyze by time series, using charts and summary statistics to produce insights about our AOI as it changes over time. To do that, we first need to isolate our geographic AOI. Let's first import image data for a particular location using the `getRegion()` method.

The function `ImageCollection.getRegion(geometry, scale, crs, crsTransform)` can be used to output an array of values for each [pixel, band, image] tuple in an ImageCollection. The output contains rows of id, lon, lat, time, and all bands for each image that intersect each pixel in the given region. Attempting to extract more than 1048576 values will result in an error.<br/>

<table class="blue"><thead><th>Usage</th><th>Returns</th></thead><tbody><tr><td><code translate="no" dir="ltr"><span class="bound-function">ImageCollection.</span><span>getRegion</span>(geometry, <i>scale</i>, <i>crs</i>, <i>crsTransform</i>)</code></td><td>List</td></tr></tbody></table><table class="details"><thead><th>Argument</th><th>Type</th><th>Details</th></thead><tbody><tr><td>this: <code translate="no" dir="ltr">collection</code></td><td>ImageCollection</td><td>The image collection to extract data from.</td></tr><tr><td><code translate="no" dir="ltr">geometry</code></td><td>Geometry</td><td>The region over which to extract data.</td></tr><tr class="docs-arg-optional"><td><code translate="no" dir="ltr">scale</code></td><td>Float, default: null</td><td>A nominal scale in meters of the projection to work in.</td></tr><tr class="docs-arg-optional"><td><code translate="no" dir="ltr">crs</code></td><td>Projection, optional</td><td>The projection to work in. If unspecified, defaults to EPSG:4326. If specified in addition to scale, the projection is rescaled to the specified scale.</td></tr><tr class="docs-arg-optional"><td><code translate="no" dir="ltr">crsTransform</code></td><td>List, default: null</td><td>The array of CRS transform values.  This is a row-major ordering of a 3x2 affine transform.  This option is mutually exclusive with the scale option, and will replace any transform already set on the given projection.</td></tr></tbody></table>


Let's try it out:

In [None]:
# Specify a latitude and longitude and create a point geometry object from it
rw_lat, rw_lon = -1.9812595147561296, 30.454505195296836
rw_poi = ee.Geometry.Point(rw_lon, rw_lat)  # create a point data structure
scale = 1000 # meters

In [None]:
# Use the point geometery to specify a point of interest (POI)
toa_rw_poi = collection.getRegion(rw_poi,  scale).getInfo()
print(len(toa_rw_poi))

In [None]:
# Inspect some examples
toa_rw_poi[:5]

## Question 5.2.2
Complete the function below to transform the data in `toa_rw_poi` to a Pandas DataFrame and use it to transform your RGB data and Coastal Aerosol to a Pandas DataFrame.

**Expected Output of conversion:**
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>time</th>
      <th>datetime</th>
      <th>B4</th>
      <th>B3</th>
      <th>B2</th>
      <th>B1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1368950995220</td>
      <td>2013-05-19 08:09:55.220</td>
      <td>0.083271</td>
      <td>0.103904</td>
      <td>0.110392</td>
      <td>0.130476</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1370333397120</td>
      <td>2013-06-04 08:09:57.120</td>
      <td>0.139077</td>
      <td>0.151624</td>
      <td>0.164563</td>
      <td>0.181041</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1371715789660</td>
      <td>2013-06-20 08:09:49.660</td>
      <td>0.105830</td>
      <td>0.122555</td>
      <td>0.143716</td>
      <td>0.166191</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1373098193550</td>
      <td>2013-07-06 08:09:53.550</td>
      <td>0.087905</td>
      <td>0.090236</td>
      <td>0.098570</td>
      <td>0.116454</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1374480591687</td>
      <td>2013-07-22 08:09:51.687</td>
      <td>0.100140</td>
      <td>0.100164</td>
      <td>0.111266</td>
      <td>0.129167</td>
    </tr>
  </tbody>
</table>

In [None]:
def ee_array_to_df(arr, list_of_bands):
    """Transforms client-side ee.Image.getRegion array to pandas.DataFrame."""
    # Simply convert the input array to a Pandas DF

    # Rearrange the header.

    # Remove rows without data inside.

    # Convert the data to numeric values.

    # Convert the time field into a datetime.

    # Keep the columns of interest.

    # Return the data frame
    

In [None]:
toa_rw_df = ee_array_to_df(toa_rw_poi, ['B4', 'B3', 'B2', 'B1'])

In [None]:
toa_rw_df.head()

## Question 5.2.3
Save the `toa_rw_df` as csv named "toa_rw.csv" and download the file for submission.

## Visualize Time Series Data
We will now study the trend of the Coastal aerosol band.

In [None]:
# extract and transform dataframe columns for plotting
x = np.asanyarray(toa_rw_df['time'].apply(float))
y = np.asanyarray(toa_rw_df['B1'].apply(float))

In [None]:
fig, ax = plt.subplots(figsize=(14, 6))
ax.scatter(toa_rw_df['datetime'], toa_rw_df['B1'],
           c='black', alpha=0.2, label='Coastal Aerosol')

ax.set_title('Coastal Aerosol Near Rwamagana', fontsize=16)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Coastal Aerosol', fontsize=14)

## <font color=orange>Discussion:</font>
What are your observations about the plot? What do you think about the choice of bands for this analysis?

## Question 5.2.4: Application Exercise -- Time Series Application Example
Please complete the following steps


1.   Use the `ee` library to import the MODIS land surface temperature collection. The name of the collection is 'MODIS/006/MOD11A1'
2.   Filter the imported collection for the period January 01, 2017 - December 31, 2019 (inclusive) and select the band named `LST_Day_1km`
3. Define a Point of Interest (POI) around some point at 1.9441° West, 30.0619° North
4. Get the time series for the POI with a scale of 1000 meters
5. Use the function you defined above to convert the dataset into a Pandas Dataframe.
6. Use a scatter plot to visualize the trend
7. Download the plot for submission.





In [None]:
%matplotlib inline




## <font color=orange>Discussion:</font>
Describe the trend of data you see. <br/>Answer: Sinusoidal
<br/>
Does this match your intution of what the trend of surface temperature of the POI should be like? Why? Why not?


## Reducing an ImageCollection
It is sometimes useful to aggregate a particular AOI to extract insights. Reduction is a method for creating a composite of all the images in a collection into a single image. The composite can represent summary statistics: for example, the min, max, mean or standard deviation of the images. 

NOTE - in the code below we are using the Landsat World Reference System (WRS) coordinates rather than latitude and longitude. For information, please see: https://landsat.gsfc.nasa.gov/about/the-worldwide-reference-system/

In [None]:
# Pull image data for a particular date and Landsat WRS location
collection = ee.ImageCollection('LANDSAT/LC08/C01/T1_TOA').filter(ee.Filter.eq('WRS_PATH', 44)).filter(ee.Filter.eq('WRS_ROW', 34)).filterDate('2014-01-01', '2015-01-01')

# Compute a median image and display.
median = collection.median()

dicc = {
    'median' : median.getMapId({'bands': ['B4', 'B3', 'B2'], 'max': 0.3})
}

# Display the results
center = [37.7726, -122.3578]
Mapdisplay(center, dicc, zoom_start= 12)

## Crop Yield Estimation
Now that we are familiar with the basic operations in GEE let's use what we learned to build parts of the pipeline for crop yield estimation (we will be working with this again in the next module where we train a machine learning model to predict crop yields). 

We will use the Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A dataset for this task. Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission supporting Copernicus Land Monitoring studies, including the monitoring of vegetation, soil and water cover, as well as observation of inland waterways and coastal areas.<br/>

Here is the description of the bands in the Harmonized Sentinel-2 MSI dataset.
<table class="eecat">
<tbody><tr>
<th scope="col">Name</th>
<th scope="col">Units</th>
<th scope="col">Min</th>
<th scope="col">Max</th>
<th scope="col">Scale</th>
<th scope="col">Pixel Size</th>
<th scope="col">Wavelength</th>
<th scope="col">Description</th>
</tr>
<tr>
<td><code translate="no" dir="ltr">B1</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      60 meters
</td>
<td>443.9nm (S2A) / 442.3nm (S2B)</td>
<td><p>Aerosols</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B2</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      10 meters
</td>
<td>496.6nm (S2A) / 492.1nm (S2B)</td>
<td><p>Blue</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B3</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      10 meters
</td>
<td>560nm (S2A) / 559nm (S2B)</td>
<td><p>Green</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B4</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      10 meters
</td>
<td>664.5nm (S2A) / 665nm (S2B)</td>
<td><p>Red</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B5</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      20 meters
</td>
<td>703.9nm (S2A) / 703.8nm (S2B)</td>
<td><p>Red Edge 1</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B6</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      20 meters
</td>
<td>740.2nm (S2A) / 739.1nm (S2B)</td>
<td><p>Red Edge 2</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B7</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      20 meters
</td>
<td>782.5nm (S2A) / 779.7nm (S2B)</td>
<td><p>Red Edge 3</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B8</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      10 meters
</td>
<td>835.1nm (S2A) / 833nm (S2B)</td>
<td><p>NIR</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B8A</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      20 meters
</td>
<td>864.8nm (S2A) / 864nm (S2B)</td>
<td><p>Red Edge 4</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B9</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      60 meters
</td>
<td>945nm (S2A) / 943.2nm (S2B)</td>
<td><p>Water vapor</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B11</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      20 meters
</td>
<td>1613.7nm (S2A) / 1610.4nm (S2B)</td>
<td><p>SWIR 1</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">B12</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.0001</td>
<td>
      20 meters
</td>
<td>2202.4nm (S2A) / 2185.7nm (S2B)</td>
<td><p>SWIR 2</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">AOT</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td>0.001</td>
<td>
      10 meters
</td>
<td></td>
<td><p>Aerosol Optical Thickness</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">WVP</code></td>
<td>cm</td>
<td>
</td>
<td>
</td>
<td>0.001</td>
<td>
      10 meters
</td>
<td></td>
<td><p>Water Vapor Pressure. The height the water would occupy if the vapor were condensed into
liquid and spread evenly across the column.</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">SCL</code></td>
<td></td>
<td>
          1
</td>
<td>
          11
</td>
<td></td>
<td>
      20 meters
</td>
<td></td>
<td><p>Scene Classification Map (The "No Data" value of 0 is masked out)</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">TCI_<wbr>R</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td></td>
<td>
      10 meters
</td>
<td></td>
<td><p>True Color Image, Red channel</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">TCI_<wbr>G</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td></td>
<td>
      10 meters
</td>
<td></td>
<td><p>True Color Image, Green channel</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">TCI_<wbr>B</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td></td>
<td>
      10 meters
</td>
<td></td>
<td><p>True Color Image, Blue channel</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">MSK_<wbr>CLDPRB</code></td>
<td></td>
<td>
          0
</td>
<td>
          100
</td>
<td></td>
<td>
      20 meters
</td>
<td></td>
<td><p>Cloud Probability Map (missing in some products)</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">MSK_<wbr>SNWPRB</code></td>
<td></td>
<td>
          0
</td>
<td>
          100
</td>
<td></td>
<td>
      10 meters
</td>
<td></td>
<td><p>Snow Probability Map (missing in some products)</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">QA10</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td></td>
<td>
      10 meters
</td>
<td></td>
<td><p>Always empty</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">QA20</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td></td>
<td>
      20 meters
</td>
<td></td>
<td><p>Always empty</p></td>
</tr>
<tr>
<td><code translate="no" dir="ltr">QA60</code></td>
<td></td>
<td>
</td>
<td>
</td>
<td></td>
<td>
      60 meters
</td>
<td></td>
<td><p>Cloud mask</p></td>
</tr>
<tr class="alt">
<td colspan="100">
<devsite-expandable is-upgraded="" id="expandable-1"><a class="exw-control" aria-controls="expandable-1" aria-expanded="false" tabindex="0" role="button"><p class="showalways">Bitmask for QA60</p></a>

<ul>
<li>
            Bits 0-9: Unused
<ul>
</ul>
</li>
<li>
            Bit 10: Opaque clouds
<ul>
<li>0: No opaque clouds</li>
<li>1: Opaque clouds present</li>
</ul>
</li>
<li>
            Bit 11: Cirrus clouds
<ul>
<li>0: No cirrus clouds</li>
<li>1: Cirrus clouds present</li>
</ul>
</li>
</ul>
</devsite-expandable>
</td>
</tr>
</tbody></table>

### Compute NDVI
We will use the function `normalizedDifference` to compute NDVI. <br/>
`Image.normalizedDifference(bandNames)` is a function that computes the normalized difference between two bands. If the bands are not specified, the function will use the first two bands in the collection. The normalized difference is computed as (first − second) / (first + second).
<table class="details"><thead><tr><th>Argument</th><th>Type</th><th>Details</th></tr></thead><tbody><tr><td>this: <code translate="no" dir="ltr">input</code></td><td>Image</td><td>The input image.</td></tr><tr class="docs-arg-optional"><td><code translate="no" dir="ltr">bandNames</code></td><td>List, default: null</td><td>A list of names specifying the bands to use. If not specified, the first and second bands are used.</td></tr></tbody></table>

First we will create a function that computes the NDVI. 

## <font color=orange>Discussion:</font>
Refer to the table above listing the bands available in the collection. Which bands should be used to compute the NDVI? 
Answer: B8 and B4

In [None]:
def addNDVI(img):
  ndvi = img.normalizedDifference(['B8', 'B4']).rename('NDVI')
  return img.addBands(ndvi)


We will now use the function `map` to apply the function `addNDVI` to each image in the image collection.
`ee.ImageCollection.map`: Maps an algorithm over a collection.
<table class="details"><thead><tr><th>Argument</th><th>Type</th><th>Details</th></tr></thead><tbody><tr><td>this: <code translate="no" dir="ltr">collection</code></td><td>Collection</td><td>The Collection instance.</td></tr><tr><td><code translate="no" dir="ltr">algorithm</code></td><td>Function</td><td>The operation to map over the images or features of the collection. A Python function that receives an image or features and returns one. The function is called only once and the result is captured as a description, so it cannot perform imperative operations or rely on external state.</td></tr><tr class="docs-arg-optional"><td><code translate="no" dir="ltr">dropNulls</code></td><td>Boolean, optional</td><td>If true, the mapped algorithm is allowed to return nulls, and the elements for which it returns nulls will be dropped.</td></tr></tbody></table>

In [None]:
# Load the COPERNICUS/S2_SR_HARMONIZED dataset and filter the data for the period October 1, 2021 to October 14, 2021 (inclusive) and filter the region of interest
collection = ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED").filterDate('2021-10-01', '2021-10-15')

# Define a rectangular region of interest
roi = ee.Geometry.Rectangle(-71.17965, 42.35125, -71.08824, 42.40584)

# Crop the collection to the ROI and apply the NDVI function
collection = collection.filterBounds(roi)
withNDVI = collection.map(addNDVI)


### Mosaicking
The term "mosaicking" refers to the process of combining multiple images into a single seamless image. The resulting mosaic is a composite representation of the input images, where **each pixel in the mosaic is selected from one of the input images based on a specified criterion**. This method is particularly useful in earth observation (EO) applications, where images can be affected by factors such as clouds, atmospheric conditions, and radiometric quality. By creating a mosaic based on a "quality band", you can ensure that the output mosaic has the best possible data quality, and avoid areas with low quality data in the individual images.

The `ImageCollection.qualityMosaic()` method in Google Earth Engine is useful when you have a collection of images that have different quality levels and you want to create a single mosaic that represents the best quality image for each pixel. By selecting the best pixel from each input image based on a specified quality band, the ImageCollection.qualityMosaic() method allows you to produce a mosaic that has the highest quality data for each pixel, regardless of the quality of the individual images.

Additionally, the `ImageCollection.qualityMosaic()` method can also be useful for creating time series mosaics, where you want to create a single mosaic that represents the best quality data for a specific time period. By selecting the best pixel from each input image based on a quality band, you can ensure that the output mosaic has the best possible data quality for the entire time period. <br/>

<table class="details"><thead><tr><th>Argument</th><th>Type</th><th>Details</th></tr></thead><tbody><tr><td>this: <code translate="no" dir="ltr">collection</code></td><td>ImageCollection</td><td>The collection to mosaic.</td></tr><tr><td><code translate="no" dir="ltr">qualityBand</code></td><td>String</td><td>The name of the quality band in the collection.</td></tr></tbody></table>

Let's use the `ImageCollection.qualityMosaic(qualityBand)` to select the greenest pixel in the image collection at each pixel.

In [None]:
collection = ee.ImageCollection('LANDSAT/LC08/C01/T1_TOA')\
               .filterDate('2014-06-01', '2014-12-31')\
               .map(addNDVI)

In [None]:
# use the added NDVI band to mosaic the image collection
greenest = collection.qualityMosaic('NDVI')

In [None]:
# Display the map
center = [42.3712,-71.12532]
dicc = {
    'greenest': greenest.getMapId({'bands': ['B5', 'B4', 'B3'], 'min': 0, 'max': 0.4})
}
Mapdisplay(center, dicc, zoom_start=12)

## Question 5.2.5: Application Excercise

1. Complete the function `addNDWI` given below that computes the NDWI ( [Click Here to See Formula](https://eos.com/make-an-analysis/ndwi/))
2. For the ROI used above compute the NDWI for the dates October 1, 2020 to October 14, 2020.
3. Use NDWI as the quality band to create a mosaic
4. Display the result on a map with center [42.3712, -71.12532]

In [None]:
# define a function that computes NDWi

In [None]:
# Use the function above to compute NDWI band

# display on map with the given center
center = [42.3712,-71.12532]
dicc = {
    'moisture': moisture.getMapId({'bands': ['B5', 'B4', 'B3'], 'min': 0, 'max': 0.4})
}
Mapdisplay(center, dicc, zoom_start=12)

## Transferring Data from GEE to Local Setup


Up to now we have seen how to mainpulate GEE data using different operations and how to prepare data for crop yield estimation. In the upcoming modules, we will use different Python libraries to train different models. Training the models is easiest if we have the data on the same machine that will be used for training the model. In this section, we will see how we can transfer data to our machine (in this case, a Google Colab VM).

In [None]:
# Load a CSV containing location for which we would like to download data for
locations = pd.read_csv('/content/drive/MyDrive/ML4EO_M5/5_02_Agricultural_Pipeline_using_GEE/data/locations_final.csv')[:5]   # Use only 5 examples for demonstration (downloading entire dataset takes too long)
locations.head()

In [None]:
def export_oneimage(img, folder, name, region, scale, crs):
  """Uses a GEE API to batch transfer data to a GCP project Google Drive"""
  options = {
      'driveFolder':folder,
      'driveFileNamePrefix':name,
      'region': region,
      'scale':scale,
      'crs':crs
  }
  task = ee.batch.Export.image.toDrive(img, name, **options)
  task.start()
  while task.status()['state'] == 'READY' or task.status()['state'] == 'RUNNING':
    print('Running...')
    # Wait for sometime before showing an update
    time.sleep(100)
  print('Done..', task.status())


Warning: the code chunk below might take a while to run. Good time to get a coffee!

In [None]:
imgcoll = ee.ImageCollection('MODIS/MOD09A1').filterBounds(ee.Geometry.Rectangle(-106.5, 50,-64, 23))

# Apply median reduction
img = imgcoll.median() #.iterate(appendBand)
for loc1, loc2, lat, lon in locations.values:
    fname = '{}_{}'.format(int(loc1), int(loc2))

    offset = 0.11
    scale  = 500
    crs='EPSG:4326'

    region = str([
        [lat - offset, lon + offset],
        [lat + offset, lon + offset],
        [lat + offset, lon - offset],
        [lat - offset, lon - offset]])

    while True:
        try:
            export_oneimage(img,'GEE_Data',fname,region,scale,crs)
        except Exception as e:
            traceback.print_exc()
            print('retry')
            time.sleep(10)
            continue
        break

In [None]:
# Transfer the data from GCP service account drive to current working directory in Colaboratory
gauth = GoogleAuth()
scopes = ['https://www.googleapis.com/auth/drive']
gauth.credentials = ServiceAccountCredentials.from_json_keyfile_name(".private-key.json", scopes=scopes)

drive = GoogleDrive(gauth)

# get list of files
file_list = drive.ListFile({'q': "trashed=false"}).GetList()
print(f"{len(file_list)} files found")
for file_tiff in file_list:

        filename = file_tiff['title']
        if file_tiff['mimeType'] == 'image/tiff':
          # download file into working directory (in this case a tiff-file)
          file_tiff.GetContentFile(filename, mimetype="image/tiff")

          # delete file afterwards to keep the Drive empty
          file_tiff.Delete()