<a href="https://colab.research.google.com/github/acoiman/lulc_zamora_1986_2016/blob/master/reference_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collect ground-truth data using GEE Python API

This notebook will show you how to collect ground-truth data using [Google Earth Engine](https://earthengine.google.com/) (GEE)  and how to clean that data using [geopandas](https://geopandas.org/) and [numpy](https://numpy.org/) libraries.

This notebook has two parts: in part 1 we will collect ground-truth data using GEE python library and the Copernicus Global Land Cover Layers: [CGLS-LC10](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_Landcover_100m_Proba-V_Global) collection 2. Next, in part 2, we will clean our ground-truth samples by selecting only the columns we are interested in and get the column names (land cover class) with the max value.  

To run this notebook you should use [Google Colaboratory](https://colab.research.google.com/notebooks/intro.ipynb) and authenticate GEE with your own credentials. Also,  you must upload the `srsample shapefile` located in the [shp folder](../shp) as a feature collection using the [code editor](https://code.earthengine.google.com/) of GEE  

## First Part: collecting ground-truth samples

In [0]:
# authenticate GEE
!earthengine authenticate

In [0]:
# import pretty print package
import pprint
# import earth engine (ee) python package
import ee 
# initialize the Earth Engine object, using the authentication credentials.
ee.Initialize() 

In [0]:
#  CGLS-LC10 image collection
imgcollection = ee.ImageCollection('COPERNICUS/Landcover/100m/Proba-V/Global')

In [0]:
# list of bands to be used 
bands = ['forest_type',
         'bare-coverfraction',
         'crops-coverfraction',
         'grass-coverfraction',
         'moss-coverfraction',
         'shrub-coverfraction',
         'tree-coverfraction',
         'snow-coverfraction',
         'urban-coverfraction',
         'water-permanent-coverfraction',
         'water-seasonal-coverfraction'
         ];

In [0]:
# select bands
imgcollection2 = imgcollection.select(bands)

In [0]:
# load feature collection from assets
newft = ee.FeatureCollection ('users/abrahamcoiman/zamora_2016/srsample') # use your own path

In [0]:
# perform the reduction
gt = imgcollection2.mean().reduceRegions(
    reducer= ee.Reducer.max(),
    collection= newft,
    scale=100 ) #scale of image pixel size 

In [0]:
# export the featureCollection to a GeoJSON file.
task = ee.batch.Export.table.toDrive(**{
  'collection': gt,
  'description':'reference',
  'fileFormat': 'GeoJSON'
})
task.start()

## Part 2: cleaning the ground-truth dataset

In [0]:
# install geopandas
pip install geopandas

In [0]:

# import modules
import geopandas as gpd
import numpy as np

In [0]:
# activate your Drive
from google.colab import drive
drive.mount('/content/drive')

In [0]:
# reference the geojson file path
fname = "/content/drive/My Drive/reference.geojson"

In [0]:
# load geojson file
df =  gpd.read_file(fname)

In [0]:
# create a new column based on column name (land cover class) containing the  max value of each row
df['referenceX'] = df[['bare-coverfraction', 'crops-coverfraction',
       'forest_type', 'grass-coverfraction', 'moss-coverfraction',
       'shrub-coverfraction', 'snow-coverfraction', 'tree-coverfraction',
       'urban-coverfraction', 'water-permanent-coverfraction',
       'water-seasonal-coverfraction']].idxmax(axis=1)

In [0]:
# eliminate unuseful columns
columntokeep = ['CLASS_NAME','referenceX', 'geometry']
df =df[columntokeep]

In [0]:
# rename column
df = df.rename(columns = {'CLASS_NAME':'LandCover'})

In [0]:
# function to label reference data
def label_ref (df):
    if df['referenceX'] == 'urban-coverfraction':
        return 1
    if df['referenceX'] == 'water-permanent-coverfraction':
        return 2
    if df['referenceX'] == 'tree-coverfraction':
        return 3
    if df['referenceX'] == 'shrub-coverfraction':
        return 4
    if df['referenceX'] == 'shrub-coverfraction':
        return 4
    if df['referenceX'] == 'grass-coverfraction':
        return 4
    if df['referenceX'] == 'crops-coverfraction':
        return 5   

In [0]:
# apply function to label refrerence data
df['Reference'] = df.apply (lambda df: label_ref(df), axis=1)

In [0]:
# function to label classified data
def label_class (df):
    if df['LandCover'] == 'urban fabric':
        return 1
    if df['LandCover'] == 'inland waters':
        return 2
    if df['LandCover'] == 'forest':
        return 3
    if df['LandCover'] == 'scrub and/or herbaceous vegetation associations':
        return 4
    if df['LandCover'] == 'heterogeneous agricultural areas':
        return 5
    if df['LandCover'] == 'open spaces with little or no vegetation':
        return 6
    if df['LandCover'] == 'industrial and commercial units':
        return 7 
    if df['LandCover'] == 'mine and dump sites':
        return 8 

In [0]:
# apply function to label classified data
df['Classified'] = df.apply (lambda df: label_class(df), axis=1)

In [0]:
# eliminate unuseful columns
df = df[['LandCover','Classified', 'Reference', 'geometry']]

In [0]:
# save geodataframe to shp
df.to_file("/content/drive/My Drive/values.shp")