# Writing Cloud-Optimized GeoTIFF (COG) with Python 

Authors: Rajat Shinde (UAH), Sheyenne Kirkland (UAH), Alex Mandel (DevSeed), Jamison French (DevSeed), Brian Freitag (NASA MSFC)

Date: January 29, 2024

Description: In this tutorial, we will explore how to write Cloud-Optimized GeoTIFF outputs in Python.

### Run This Notebook
To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the ["Getting started with the MAAP"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.

Disclaimer: It is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors.

### About The Dataset
For this tutorial, we are using the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Level 1 Precision Terrain Corrected Registered At-Sensor Radiance (AST_L1T) data. This data contains calibrated at-sensor radiance, which corresponds with the ASTER Level 1B (AST_L1B) (https://doi.org/10.5067/ASTER/AST_L1B.003), that has been geometrically corrected, and rotated to a north-up UTM projection. The AST_L1T is created from a single resampling of the corresponding ASTER L1A (AST_L1A) (https://doi.org/10.5067/ASTER/AST_L1A.003) product. The bands available in the AST_L1T depend on the bands in the AST_L1A and can include up to three Visible and Near Infrared (VNIR) bands, six Shortwave Infrared (SWIR) bands, and five Thermal Infrared (TIR) bands. The AST_L1T dataset does not include the aft-looking VNIR band 3.

The precision terrain correction process incorporates GLS2000 digital elevation data with derived ground control points (GCPs) to achieve topographic accuracy for all daytime scenes where correlation statistics reach a minimum threshold. Alternate levels of correction are possible (systematic terrain, systematic, or precision) for scenes acquired at night or that otherwise represent a reduced quality ground image (e.g., cloud cover).

For daytime images, if the VNIR or SWIR telescope collected data and precision correction was attempted, each precision terrain corrected image will have an accompanying independent quality assessment. It will include the geometric correction available for distribution in both as a text file and a single band browse images with the valid GCPs overlaid.

This multi-file product also includes georeferenced full resolution browse images. The number of browse images and the band combinations of the images depends on the bands available in the corresponding (AST_L1A) (https://doi.org/10.5067/ASTER/AST_L1A.003) dataset. 

### Additional Resources
- [How to recognize a COG and how to create a proper one!](https://cogeotiff.github.io/rio-cogeo/Is_it_a_COG/)
- [Rasterio documentation](https://rasterio.readthedocs.io/en/stable/intro.html)
- [ASTER Level 1 precision terrain corrected registered at-sensor radiance V003 dataset landing page](https://dx.doi.org/10.5067/ASTER/AST_L1T.003)

### Importing Packages

We import the `os` module, import the `MAAP` package, and create a new MAAP class instance.

In [None]:
# import os module
import os
import rasterio
from rasterio.io import MemoryFile
from rasterio.rio import options
from rio_cogeo.cogeo import cog_translate
from rio_cogeo.profiles import cog_profiles

# import the MAAP package to handle queries
from maap.maap import MAAP

# import printing package to help display outputs
from pprint import pprint

# invoke the MAAP search client
maap = MAAP()

### Creating a Data Directory for this Tutorial

We are creating a data directory for downloading all the required files to this directory.

In [None]:
# set data directory path
dataDir = './data'

# check if directory exists -> if directory doesn't exist, directory is created
if not os.path.exists(dataDir):
    os.mkdir(dataDir)

### Accessing the dataset

The `searchCollection()` method of the maap-py is used for searching the collection on the NASA CMR, with the short name as `AST_L1T`.

In [44]:
maap_collections = maap.searchCollection(
    short_name='AST_L1T',
    cmr_host='cmr.earthdata.nasa.gov')
# maap_collections

In [53]:
COLLECTION_ID = maap_collections[0]["concept-id"]

results = maap.searchGranule(
    concept_id=COLLECTION_ID,
    cmr_host="cmr.earthdata.nasa.gov"
)
pprint(f'Got {len(results)} results')

'Got 20 results'


We will be using the `downloadGranule()` method from the maap-py to download the required GeoTIFF file. Let's first explore the metadata of a Granule.

In [74]:
# print the metadata for the first collection
# we use the depth parameter to set the layer of metadata detail in the results, with (1) having the least detail
# (1) displays the concept ID, format, and revision ID
# adjust the depth to a larger value (6) if you would like to view all of the metadata
pprint(results[0], depth=2)

{'Granule': {'AdditionalAttributes': {...},
             'Campaigns': {...},
             'CloudCover': '75',
             'Collection': {...},
             'DataFormat': 'HDF-EOS2',
             'DataGranule': {...},
             'GranuleUR': 'SC:AST_L1T.003:2148809731',
             'InputGranules': {...},
             'InsertTime': '2015-04-09T09:28:24.676Z',
             'LastUpdate': '2018-07-09T14:35:36.464Z',
             'MeasuredParameters': {...},
             'OnlineAccessURLs': {...},
             'OnlineResources': {...},
             'Orderable': 'true',
             'PGEVersionClass': {...},
             'Platforms': {...},
             'Spatial': {...},
             'Temporal': {...},
             'Visible': 'true'},
 'collection-concept-id': 'C1000000320-LPDAAC_ECS',
 'concept-id': 'G1012302826-LPDAAC_ECS',
 'format': 'application/echo10+xml',
 'revision-id': '15'}


The `downloadGranule()` method requires Granule's HTTP access URL and the path to the data directory for storing the downloaded file. In order to get the online HTTP access URL, we will explore the metadata related to the `OnlineAccessURLs` key of the Granule. 

In [75]:
#Checking the online access URLs
online_access_urls = results[0]["Granule"]["OnlineAccessURLs"]
online_access_urls

{'OnlineAccessURL': [{'URL': 'https://e4ftl01.cr.usgs.gov//ASTER_L1T/ASTT/AST_L1T.003/2000.03.04/AST_L1T_00303042000203404_20150409092553_2788.hdf',
   'URLDescription': 'AST_L1T_00303042000203404_20150409092553_2788.hdf. MimeType: application/x-hdfeos',
   'MimeType': 'application/x-hdfeos'},
  {'URL': 'https://e4ftl01.cr.usgs.gov//ASTER_L1T/ASTT/AST_L1T.003/2000.03.04/AST_L1T_00303042000203404_20150409092553_2788_T.tif',
   'URLDescription': 'AST_L1T_00303042000203404_20150409092553_2788_T.tif. MimeType: application/x-geotiff',
   'MimeType': 'application/x-geotiff'}]}

As we can see above, the Granule has two access URLs, (1) for downloading the data in `h5` file format, and (2) for downloading the `tiff` file. We will be downloading the `tiff` file using the corresponding URL. 

In [76]:
#Extracting the URL for the tiff file
tiff_url = online_access_urls["OnlineAccessURL"][1]["URL"]
print(tiff_url)

https://e4ftl01.cr.usgs.gov//ASTER_L1T/ASTT/AST_L1T.003/2000.03.04/AST_L1T_00303042000203404_20150409092553_2788_T.tif


In [78]:
input_path = maap.downloadGranule(tiff_url, dataDir, overwrite=True)
print(input_path)

./data/AST_L1T_00303042000203404_20150409092553_2788_T.tif


### Loading the GeoTIFF file

Once the file is downloaded locally, we can use `rasterio` to read the `tiff` file. 

In [None]:
with rasterio.open(input_path) as src:
    arr_cog = src.read()
    kwargs = src.meta

### Writing the Cloud Optimized GeoTIFF (COG)

There are multiple ways to write a COG using `rasterio` in Python. We are presenting the recommended approach based on `cog_translate` method using the Memoryfile. This approach is found to be efficient for writing big GeoTIFF files along with copying the overviews and input image metadata. 

In [79]:
input_filename = input_path.split('/')[-1]
output_filename = input_filename.replace('.tif', '_COG.tif')
output_file = os.path.join(dataDir, output_filename)

with MemoryFile() as memfile:
        with memfile.open(**kwargs) as mem:
            # Populate the input file with numpy array
            
            mem.write(arr_cog)
            dst_profile = cog_profiles.get("deflate")
            cog_translate(
                mem,
                output_file,
                dst_profile,
                in_memory=False
            )

  cog_translate(
Reading input: <open DatasetWriter name='/vsimem/e15d1864-5c88-4c87-91ae-99c2c7edb491/e15d1864-5c88-4c87-91ae-99c2c7edb491.tif' mode='w+'>

Adding overviews...
Updating dataset tags...
Writing output to: ./data/AST_L1T_00303042000203404_20150409092553_2788_T_COG.tif


# Validating the COG output

Check the [COG vs Non-COG validation](maap-documentation/docs/source/technical_tutorials/user_data/COG-NonCOG-validation.ipynb) notebook for validating the generated COG.