## Assignment 2: Learning About Datasets

Allie Cole, Clarissa Boyajian, Scout Leonard

In [1]:
# Import packages
import ee
import geemap
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [2]:
ee.Authenticate()
ee.Initialize()

Enter verification code:  4/1AX4XfWhch0Q1M14atdjK5EWL3E_bOJ73IOyKxDdqhALMhy4nRUq2c7mP8Ks



Successfully saved authorization token.


### 1.) Write code to load in the dataset 

The dataset we are interested in is Landsat8. We add the dataset in the code chunk below using the Google Earth Engine (GEE) API.

In [3]:
gdat = ee.ImageCollection("LANDSAT/LC08/C01/T1_SR")

Next, we create a test image from the first image in the Landsat8 dataset. We will use this to explore the features of the images in the collection. 

In [5]:
# pull the first image in the collection 
testimg = gdat.first()

In [6]:
#extract a list containing the names of the bands 
bands = testimg.bandNames()
str(bands.getInfo())

"['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B10', 'B11', 'sr_aerosol', 'pixel_qa', 'radsat_qa']"

In [7]:
#select mangrove region (Sundarabans, in Bangladesh) to explore Landsat8 data for

#lat and lon for Sundarabans mangrove forest
sundarbans_lon = 89
sundarbans_lat = 21.37

#create point of interest for Sundarabans 
sundarbans_poi = ee.Geometry.Point(sundarbans_lon, sundarbans_lat)
scale = 1000   # scale in m

In [8]:
#explore a data feature of the Landsat images; we chose Band 2
B2 = gdat.select('B2')
sundarbans_B2 = B2.getRegion(sundarbans_poi, scale).getInfo()

In [9]:
#create a dataframe using Band 2 measurements for Sundarabans mangrove forest over time 
df = pd.DataFrame(sundarbans_B2)

print(df)

                        0          1          2              3     4
0                      id  longitude   latitude           time    B2
1    LC08_138045_20130708  89.000587  21.366429  1373258003390  1091
2    LC08_138045_20130724  89.000587  21.366429  1374640402070  1587
3    LC08_138045_20130825  89.000587  21.366429  1377405206730  1102
4    LC08_138045_20130910  89.000587  21.366429  1378787604620   198
..                    ...        ...        ...            ...   ...
156  LC08_138045_20210714  89.000587  21.366429  1626237078562  3196
157  LC08_138045_20210815  89.000587  21.366429  1629001892732  3953
158  LC08_138045_20210831  89.000587  21.366429  1630384297322  2219
159  LC08_138045_20210916  89.000587  21.366429  1631766701206   951
160  LC08_138045_20211002  89.000587  21.366429  1633149105920   550

[161 rows x 5 columns]


[Landsat 8](https://developers.google.com/earth-engine/datasets/catalog/landsat-8) data is accessible through GEE, and the data collection is supported jointly by NASA and the USGS. 
Landsat 8 data can be downloaded in many different formats depending on how you want to manipulate it. Various products offer varying combinations of bands. The data found in these products comes from USGS in the form of .TIFFs and .JPEGs. 

[Metadata](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_SR) on GEE describes how the data were collected (atmospherically corrected surface reflectance from the Landsat 8 OLI/TIRS sensors). It also describes the contents of the .TIFF and .JPEG images in the dataset: 5 visible and near-infared (VNIR) bands, 2 short-wave infared (SWIR), and 2 thermal infared (TIR) bands, with short descriptions about how these were processed. It includes links to USGS descriptions of band metadata. 

### 2.) Investigate data quality

Landsat 8 revisits a location every 8-16 days, which is at a much higher rate when compared to most other satellites. Additionally, Landsat 8 captures the multispectral bands necessary for vegetation indexes, such as Normalized Difference Vegetation Index (NDVI). The Landsat 8 resolution is 30 meters (visible, NIR, SWIR); 100 meters (thermal); and 15 meters (panchromatic). 

The frequent revisitation rate and free access makes this a good dataset for our purposes. The downside to this data is the lower resolution. This lower resolution impacts the ability to accurately calculate vegetation coverage percentages. Mangroves are not a homogeneous land cover type, however many analyses of mangroves (such as NDVI) treat each image as a single land cover type. A single image can contain mangroves, as well as sand, water, and other debris/detritus. 

One solution to this is to use mutliple indices in order to more accurately calculate the canopy coverage percentage of one type of vegetation without needing higher resolution images.

### 3.) Consider appropriate use cases

Landsat 8 is the best option for projects with limited financial resources or that are located in remote locations, as it is freely available and covers the globe. For projects with the ability, we would recommend that they pay for higher resolution imagery. For projects using Landsat 8, we would recommend using multiple indices in order to increase the accuracy of land coverage estimates. We will be using the 30m resolution images as those are the bands appropriate to our analysis. We will be calculating NDVI as well as Normalized Difference Infrared Index (NDII). 




In [10]:
#select the region: Cacheu National Park Mangrove Forest
pt = ee.Geometry.Point(12.1582165861, -16.283462202)

In [11]:
#filter image collection using our point in Chacheu National Park Mangrove Forest
gdat_filt = gdat.filterBounds(pt)

In [12]:
#creates a data structure for NDVI visualization parameters 
visParams = {'bands': ['B4', 'B3', 'B2'],
             'min': 0,
             'max': 0.3
            }

In [13]:
map_ndvi = geemap.Map(center = [12.1582165861, -16.283462202], zoom = 12)
map_ndvi

Map(center=[12.1582165861, -16.283462202], controls=(WidgetControl(options=['position', 'transparent_bg'], wid…

In [35]:
map_ndvi.addLayer(gdat_filt, visParams)

In [36]:
gdat_leastcloudy = gdat_filt.sort('CLOUD_COVER').first()

In [37]:
# Function to calculate NDVI for a given input image
def addNDVI(image):
    red = image.select('B4')
    nir = image.select('B5')
    
    ndvi = (nir.subtract(red)).divide((nir.add(red))).rename('NDVI')
    
    return image.addBands(ndvi)

In [38]:
gdat_withndvi = gdat_filt.map(addNDVI)

In [39]:
ndviParams = {'bands': 'NDVI',
              'min': -1, 
              'max': 1, 
              'palette': ['blue', 'white', 'green']
             }

map_ndvi.addLayer(gdat_withndvi, ndviParams, "NDVI")

In [46]:
# Use filter to extract all "non-cloudy" images: ones with less than 20% cloud cover
dat_nocld=gdat_withndvi.filter('CLOUD_COVER < 0.2')

In [47]:
#date filter
date_2014=dat_nocld.filter(ee.Filter.date('2014-01-01', '2014-12-31')).mean();

In [48]:
map_ndvi.addLayer(date_2014, ndviParams, 'NDVI')
