# Ingest MODIS Land Cover Data

This notebook will ingest MODIS land cover data onto the DL platform. The MODIS land cover dats product is released yearly at a maximum resolution of 500m. The product features five different land cover classification bands. They are quite similar - we'll use the first one, the _Annual International Geosphere-Biosphere Programme (IGBP) classification_. The data are available from a number of US government data services, see https://lpdaac.usgs.gov/products/mcd12q1v006/.

The land cover data is available in tiles that follow the MODIS Sinusoidal Grid, a special project system for MODIS products, see Figure. We'll need to use GDAL to convert the hdf tiles to GeoTiffs. The tiles will be downloaded from NASA's Earthdata, for which a registered account is required. A free account can be created [here](https://urs.earthdata.nasa.gov/home). User credentials should then be stored as a dict in json: `{username:<username>, password:<password>}`.

**Figure: MODIS Sinusoidal Grid**

![img](MODIS_sinusoidal_grid1.gif)

In [6]:
import logging, os, sys, json, requests, glob, pickle
from requests.auth import HTTPBasicAuth

import descarteslabs as dl
from descarteslabs.catalog import Product
from descarteslabs.catalog import Image as dl_Image
from descarteslabs.catalog import ClassBand, DataType, Resolution, ResolutionUnit
from bs4 import BeautifulSoup

### Approach
**Fetch the Data**
- Create and store login credentials
- For each year of the land cover product:
  - Parse the website and extract the hdf files
  - Retrieve the hdf files
  
**Push to DL**
- Create the DL product and land cover band
- Convert the hdf files to GeoTiff
- Upload the GeoTiffs to the DL product

In [3]:
params = {}
params['modis_path'] = '/home/jovyan/solar-pv-global-inventory/data/MODIS'  # path to the geodatabase
params['credentials_path'] = os.path.join(params['modis_path'], 'earthdata.cred')
params['product_params'] = {'_id':'modis-land-cover',
                            'name':'MODIS land cover product for uploaded MODIS land cover tiles'}
params['year'] = '2014'
params['band_params'] = {'name':'IGBP_class',
                         'data_range':(0,255),
                         'display_range':(0,20),
                         'resolution':500, 
                         'index':0}

### Download the Data

In [None]:
credentials = json.load(open(params['credentials_path'],'r'))

In [None]:
def get_url_paths(url, ext='', params={}):
    response = requests.get(url, params=params)
    if response.ok:
        response_text = response.text
    else:
        return response.raise_for_status()
    soup = BeautifulSoup(response_text, 'html.parser')
    parent = [url + node.get('href') for node in soup.find_all('a') if node.get('href').endswith(ext)]
    return parent

In [None]:
url = 'https://e4ftl01.cr.usgs.gov/MOTA/MCD12Q1.006/'+params['year']+'.01.01/'
ext = 'hdf'

In [None]:
hdf_urls = get_url_paths(url, ext)

In [None]:
print (len(hdf_urls), hdf_urls[0])

In [None]:
with open(os.path.join(params['modis_path'], 'list.txt'),'w') as f:
    f.write('\n'.join(hdf_urls))

In [None]:
!wget --user={credentials["username"]} --password={credentials["password"]} -i {os.path.join(params['modis_path'],'list.txt')} -P {params['modis_path']+'/tmp'} -q 

### Get Class Labels

In [4]:
class_labels = {
    1 : 'Evergreen Needleleaf Forests: dominated by evergreen conifer trees (canopy >2m). Tree cover >60%.',
    2 : 'Evergreen Broadleaf Forests: dominated by evergreen broadleaf and palmate trees (canopy >2m). Tree cover >60%.',
    3 : 'Deciduous Needleleaf Forests: dominated by deciduous needleleaf (larch) trees (canopy >2m). Tree cover >60%.',
    4 : 'Deciduous Broadleaf Forests: dominated by deciduous broadleaf trees (canopy >2m). Tree cover >60%.',
    5 : 'Mixed Forests: dominated by neither deciduous nor evergreen (40-60% of each) tree type (canopy >2m). Tree cover >60%.',
    6 : 'Closed Shrublands: dominated by woody perennials (1-2m height) >60% cover.',
    7 : 'Open Shrublands: dominated by woody perennials (1-2m height) 10-60% cover.',
    8 : 'Woody Savannas: tree cover 30-60% (canopy >2m).',
    9 : 'Savannas: tree cover 10-30% (canopy >2m).',
    10: 'Grasslands: dominated by herbaceous annuals (<2m).',
    11: 'Permanent Wetlands: permanently inundated lands with 30-60% water cover and >10% vegetated cover.',
    12: 'Croplands: at least 60% of area is cultivated cropland.',
    13: 'Urban and Built-up Lands: at least 30% impervious surface area including building materials, asphalt and vehicles.',
    14: 'Cropland/Natural Vegetation Mosaics: mosaics of small-scale cultivation 40-60% with natural tree, shrub, or herbaceous vegetation.',
    15: 'Permanent Snow and Ice: at least 60% of area is covered by snow and ice for at least 10 months of the year.',
    16: 'Barren: at least 60% of area is non-vegetated barren (sand, rock, soil) areas with less than 10% vegetation.',
    17: 'Water Bodies: at least 60% of area is covered by permanent water bodies.',
}

In [7]:
pickle.dump(class_labels, open('./class_labels_MODIS.pkl','wb'))

In [None]:
class_labels = [': '.join([str(kk),vv.split(':')[0]]) for kk,vv in class_labels.items()]

In [None]:
class_labels

### Convert MODIS to GeoTiff

In [None]:
hdf_files = glob.glob(params['modis_path']+'/tmp'+'/*.hdf')

In [None]:
print (len(hdf_files), hdf_files[0])

In [None]:
for f in hdf_files:
    fname = f.split('/')[-1]
    
    #gdal_translate HDF4_EOS:EOS_GRID:"MCD12Q1.A2018001.h22v04.006.2019200003144.hdf":MCD12Q1:LC_Type1 test.ti
    tifname = f.split('/')[-1].split('.')[2]+'.tif'
    subprocess.call(['gdal_translate',
                     'HDF4_EOS:EOS_GRID:"{}":MCD12Q1:LC_Type1'.format(f),
                     os.path.join(params['modis_path'],params['year'],tifname)])

### Prep DL Product and Bands

In [None]:
product = Product.get('oxford-university:modis-land-cover')#(params['product_params']['_id'])

In [None]:
if not product:
    product = Product(id=params['product_params']['_id'], 
                      name=params['product_params']['name'])
    product.save()

In [None]:
bands = [bb for bb in product.bands().limit(2)]

In [None]:
if not bands:
    band = ClassBand(name=params['band_params']['name'], product=product)
    band.data_type = DataType.BYTE
    band.data_range = params['band_params']['data_range']
    band.display_range = params['band_params']['display_range']
    band.resolution = Resolution(unit=ResolutionUnit.METERS, value=params['band_params']['resolution'])
    band.band_index = params['band_params']['index']
    band.class_labels = class_labels
    band.save()

In [None]:
### delete the product if it needs to be remade
# status = product.delete_related_objects()

In [None]:
# status

In [None]:
# product.delete()

In [None]:
### add readers
# product.readers = ["email:kyle@descarteslabs.com", "email:krishna@descarteslabs.com", "email:jeremy@descarteslabs.com"]
# product.save()

### Upload Images

In [None]:
image_files = glob.glob(os.path.join(params['modis_path'],params['year'],'*.tif'))

In [None]:
print (len(image_files), image_files[0])

In [None]:
uploads = []
for f in image_files:
    image = dl_Image(product=product, name=params['year']+'.'+f.split('/')[-1])
    image.acquired = params['year']+"-01-01"
    image_path = f
    uploads.append(image.upload(image_path))

In [None]:
for u in uploads:
    print (u.status)