# GEOS-CF in Google Earth Engine


This notebook details methods for interacting with GEOS Composition Forecast (GEOS-CF) model diagnostics via the Google Earth Engine (GEE) Python API.

This API allows users to access a huge repository of data from GEE and use GEE functions to run fast server-side functions to manipulate data.

The strength of the GEE Python API is that users can pull data client-side and interact with selected fields within Pandas dataframes and other useful data structures.

This notebook will go through the following steps:

- Access GEOS-CF historical estimates and TROPOMI observations from the GEE data repository
- Turning an image collection into a Pandas dataframe at a selected point-of-interest
- Merging GEOS-CF model estimates of NO2 with TROPOMI observations of NO2
- Visualizing these data as images on an interactive folium map
- Visualizing these data using interactive plotly plots

Notebook created by Callum Wayman, 2023

## Importing Required Modules

The Python Earth Engine API requires installation, authentican, and initialization in order to run in a Python environment. These steps can be accomplished in command line, or within Python code. 

More information on these steps can be found [here.](https://developers.google.com/earth-engine/guides/python_install)

Other modules required for this tutorial are listed below.

In [None]:
import ee

#ee.Authenticate()#force=True)

ee.Initialize(project='ee-callumwayman-cf') # Enter your own Earth Engine project name here

In [None]:
import datetime as dt
from math import sqrt
import requests

import warnings
warnings.filterwarnings("ignore", message=".*The 'nopython' keyword.*")

import folium
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from plotly import graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo
import plotly.express as px

## Importing Data from Google Earth Engine

Two collections are imported in this tutorial. The GEOS-CF time-averaged hourly replay collection is being imported, and more information on this collection can be found [here.](https://developers.google.com/earth-engine/datasets/catalog/NASA_GEOS-CF_v1_rpl_tavg1hr)

The other collection being imported is the Near Real-Time Nitrogen Dioxide image collection from the TROPOMI instrument. This collection offers observations of various atmospheric conditions and parameters, including NO2, which will be the focus of this tutorial.

In [None]:
# Import Data

# GEE Data
geosCf = ee.ImageCollection("NASA/GEOS-CF/v1/rpl/tavg1hr")
tropomi = ee.ImageCollection("COPERNICUS/S5P/NRTI/L3_NO2")

## Turning Earth Engine Objects into Pandas Dataframes

Google Earth Engine data is often presented in two object types: features and images.

Features and images may be stacked into feature collections and image collections.

These object types are extremely useful within GEE to manipulate large data sets quickly, however they offer some limitations in our ability to manipulate the data within when we use Python.

For this reason, this tutorial will geographically subset the imported image collections and transform them into Pandas dataframes. More information on Pandas dataframes can be found [here.](https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/frame.html)

In [None]:
def ee_array_to_df(arr, list_of_bands):
    """Transforms client-side ee.Image.getRegion array to pandas.DataFrame."""
    df = pd.DataFrame(arr)

    # Rearrange the header.
    headers = df.iloc[0]
    df = pd.DataFrame(df.values[1:], columns=headers)

    # Remove rows without data inside.
    df = df[['longitude', 'latitude', 'time', *list_of_bands]].dropna()

    # Convert the data to numeric values.
    for band in list_of_bands:
        df[band] = pd.to_numeric(df[band], errors='coerce')
        
    # Convert the time field into a datetime.
    # Time values are stored differently for TROPOMI and GEOS-CF data
    # Need to remove the time columns from GEOS-CF dataframes
    if 'number' in list_of_bands[0]:
        df['datetime'] = pd.to_datetime(df['time'], unit='ms').dt.strftime('%Y-%m-%d %H:%M')
    else:
        df['datetime'] = pd.to_datetime(df['time'], unit='ms')
        # Remove time columns from GEOS-CF dataframes
        df.pop('time')

    # Keep the columns of interest.
    df = df[['datetime',  *list_of_bands]]

    df.reset_index(inplace=True)
    df.drop(columns='index', inplace=True)
    
    return df

In [None]:
class ee_subset:
    """
    A class with methods to create subsets of Earth Engine
    image collections.
      
    ...

    Attributes
    ----------
    band_dict : dict
        dictionary whose keys are band names and values are unit
        conversion values
    num_days : int
        number of days to be subset

    Methods
    -------
    get_subset_collection
        Returns a GEE collection subset for the number of days
        specified.
    get_point_df
        Takes a GEE image collection and returns a dataframe subset for the 
        number of days at a specific set of coordinates.
    """
    def __init__(self, band_dict, num_days):
        # Save band names and unit conversion values
        self.band_dict = band_dict
        self.band_list = list(band_dict.keys())
        
        # Get Dates
        # All subsets end at June 15th, 2023 for this tutorial
        f_date_datetime = dt.datetime.strptime('2023-06-15', "%Y-%m-%d")
        f_year = f_date_datetime.year
        f_month = f_date_datetime.month
        f_day = f_date_datetime.day
        i_date_datetime = f_date_datetime - dt.timedelta(num_days)
        i_year = i_date_datetime.year
        i_month = i_date_datetime.month
        i_day = i_date_datetime.day

        # Initial date of interest (inclusive).
        self.i_date = f'{i_year}-{i_month}-{i_day}'

        # Final date of interest (exclusive).
        self.f_date = f'{f_year}-{f_month}-{f_day}'
    
    def get_subset_collection(self, collection):
        """
        Subsets a GEE collection and returns subset 
        with appropriate time-window and bands selected.
        
        Parameters
        ----------
        collection : ee.imagecollection.ImageCollection
            GEE image collection to be subset
        

        Returns
        -------
        coll_subset : ee.imagecollection.ImageCollection
            Subset of collection
        """

        # Select bands and dates for collection
        coll_subset = collection.select(self.band_list).filterDate(self.i_date, self.f_date)

        return coll_subset
        
    
    def get_point_df(self, collection, lat, lon):
        """
        Subsets a GEE collection and returns subset 
        dataframe for a specific set of coordinates.
        
        Parameters
        ----------
        collection : ee.imagecollection.ImageCollection
            GEE image collection to be subset
        lat : int
            Latitude coordinate for point of interest
        lon : int
            Longitude coordinate for point of interest
        

        Returns
        -------
        data_features: pandas.core.frame.DataFrame
            Subset of collection returned as data frame
            for selected bands, time window, and point of interest.
        """
        
        # Get subsetted collection
        coll_subset = self.get_subset_collection(collection)
        
        # EE point from lat, lon
        poi = ee.Geometry.Point(lon, lat)
        
        # Scale in meters
        scale = 1000
        
        # Get the data for the pixel intersecting the point
        data_poi = coll_subset.getRegion(poi, scale).getInfo()
        
        # Call function to turn data into dataframe
        data_features = ee_array_to_df(data_poi, self.band_list)
        
        # Convert feature units to desired units for each band
        for b in self.band_list:
            data_features[b] = data_features[b]*self.band_dict[b]

        return data_features

## Subsetting GEE Data

Select the desired bands you wish to analyze from each collection. In this tutorial, we will be working with tropospheric column and surface level NO2 data.

We select relevant CF and TROPOMI bands, lat/lon, and number of days and subset the data using the ee_subset class and associated methods.

In [None]:
# Set selected bands
cfSurfBand = 'NO2' # mol mol-1
cfTropBand = 'TROPCOL_NO2' # 1.0e15 molec cm-2
tropNo2Band = 'tropospheric_NO2_column_number_density' # mol/m2

# Create band dictionaries to store unit conversion information
cf_chm_band_dict = {cfSurfBand: 1.0e9, cfTropBand: 10000*1e15/6.02e23, 'O3': 1.0e9, 'NOy': 1.0e9, 'PM25_RH35_GCC': 1}
cf_met_band_dict = {'T10M': 1, 'ZPBL': 1, 'U10M': 1, 'V10M': 1, 'RH': 1}
trop_band_dict = {tropNo2Band: 1}

# Change to exact lat/lon for takoma rec
lat = 38.97
lon = -77.02

# EE point from lat, lon
poi = ee.Geometry.Point(lon, lat)

#Number of days to visualize
num_days = 300

# Get subset GEE collection and subset Dataframe

# GEOS-CF Chemistry
cf_chm_subset = ee_subset(cf_chm_band_dict, num_days)
cf_chm_subset_collection = cf_chm_subset.get_subset_collection(geosCf)
cf_chm_features = cf_chm_subset.get_point_df(geosCf, lat, lon)

# GEOOS-CF Meteorology
cf_met_subset = ee_subset(cf_met_band_dict, num_days)
cf_met_subset_collection = cf_met_subset.get_subset_collection(geosCf)
cf_met_features = cf_met_subset.get_point_df(geosCf, lat, lon)

# TROPOMI chemistry
trop_subset = ee_subset(trop_band_dict, num_days)
trop_subset_collection = trop_subset.get_subset_collection(tropomi)
trop_features = trop_subset.get_point_df(tropomi, lat, lon)

## Mapping GEOS-CF and TROPOMI

Using the folium package, we can create leaflet maps that allow us to view the data we have accessed from the GEE data repository.

This section will also explore the use of plotly plots to show a time-series of the subsetted data from GEOS-CF and TROPOMI at the point of interest.

In [None]:
# Define a point of interest (POI) with a buffer zone of 1000 km around POI.
roi = poi.buffer(1e6)

# Reduce the LST collection by mean.
cf_img = cf_chm_subset_collection.mean()

# Adjust for scale factor.
cf_img = cf_img.select(cfSurfBand).multiply(cf_chm_band_dict[cfSurfBand])

my_map = folium.Map(location=[lat, lon], zoom_start=10)

### Adding a Useful Folium Method

The below function creates a new folium method and adds it to the folium.Map class.

This function receives an Earth Engine image object and creates a set of folium tiles from that image which can be added to the map.

The new method allows us to easily visualize images on a basemap as we might in the GEE code editor.

In [None]:
def add_ee_layer(self, ee_image_object, vis_params, name):
    """
    Adds a method for displaying Earth Engine image tiles to folium map.
    
    Parameters
        ----------
        ee_image_object : ee.image.Image
            GEE image collection to be subset
        vis_params : dict
            Dictionary of GEE visualization parameters
        
        Returns
        -------
        None
    """
    map_id_dict = ee.Image(ee_image_object).getMapId(vis_params)
    folium.raster_layers.TileLayer(
        tiles=map_id_dict['tile_fetcher'].url_format,
        attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
        name=name,
        overlay=True,
        control=True
    ).add_to(self)
    
# Add Earth Engine drawing method to folium.
folium.Map.add_ee_layer = add_ee_layer

### Creating an Interactive Time-Series Plot

In this section, we create and save a plotly plot showing a time-series of tropospheric NO2 data from GEOS-CF and TROPOMI.

The interactive plot will be added to a below folium map.

In [None]:
# Make the figure object
fig=make_subplots(specs=[[{"secondary_y":True}]])

# Create the first plotly trace showing GEOS-CF tropospheric NO2
fig.add_trace(
    go.Scatter(
    x=cf_chm_features['datetime'],
    y=cf_chm_features[cfTropBand],
    name="Tropospheric NO2 (mol/m^2)",
    #mode='lines+markers',
    #hoverinfo='y',
    line = dict(color='indigo', width=3)
     ),
    secondary_y=False)

# Create the second plotly trace showing TROPOMI tropospheric NO2
fig.add_trace(
    go.Scatter(
    x=trop_features['datetime'],
    y=trop_features[tropNo2Band],
    name="TROPOMI NO2 (mol/m^2)",
    #mode='markers',
    #hoverinfo='Date=x<br>Value=y',
    line = dict(color='red', width=3)
     ),
    secondary_y=False)
fig.update_traces(mode="markers+lines", hovertemplate=None)

# Update figure axis aesthetics and labels
fig.update_layout(hoverlabel_bgcolor='#DAEEED',  #Change the background color of the tooltip to light gray
             title_text="GEOS-CF Tropospheric NO2 Replay", #Add a chart title
             title_font_family="Times New Roman",
             title_font_size = 20,
             title_font_color="darkblue", #Specify font color of the title
             title_x=0.5, #Specify the title position
             hovermode="x",
             xaxis=dict(
                    tickfont_size=10,
                    tickangle = 270,
                    showgrid = True,
                    zeroline = True,
                    showline = True,
                    showticklabels = True,
                    #dtick=86400000,
                    dtick="M1",
                    tickformat="%m/%d\n%Y"
                    ),
             legend = dict(orientation = 'h', xanchor = "center", x = 0.72, y= 1), #Adjust legend position
             yaxis_title='Tropospheric NO2 (mol/m^2)')

# Set scientific notation format for y-axis
fig.update_yaxes(exponentformat='e')

# Write the image to a local html file
fig.write_html('surface_no2.html')

# Display the figure
display(fig)

In [None]:
# Select tropospheric NO2 and date range for GEOS-CF and scale to ppbv
cf_trop_img = cf_chm_subset_collection.select(cfTropBand).filterDate('2023-05-14', '2023-05-15').mean().multiply(cf_chm_band_dict[cfTropBand])

# Select tropospheric NO2 and date range for TROPOMI
trop_img = trop_subset_collection.filterDate('2023-05-14', '2023-05-15').mean()

# Set visualization parameters for NO2
cf_vis_params = {
    'min': 1,'max': 40,
    'palette': ['white', 'purple'],
    'opacity': 0.5
}

# Visualization parameters for tropospheric column NO2
visTrop = {'min': 1e-6,
    'max': 1e-4,
    'palette': ['white', 'purple'],
    'opacity': 0.5
          }

# Create a map.
lat, lon = lat, lon
my_map = folium.Map(location=[lat, lon], zoom_start=10)

# Add the cf and tropomi data to the map object.
my_map.add_ee_layer(cf_trop_img, visTrop, 'GEOS-CF NO2')
my_map.add_ee_layer(trop_img, visTrop, 'TROPOMI NO2')

# Add a layer control panel to the map.
my_map.add_child(folium.LayerControl())

# Set the plotly-generated html file into a html snippet
html="""
    <iframe src=\"""" + 'surface_no2.html' + """\" width="850" height="400"  frameborder="0">    
    """
    
# Create pop-up with added plotly html snippet
popup = folium.Popup(folium.Html(html, script=True))

# Add marker to map for point of interest
marker = folium.Marker([lat, lon],
                       popup=popup).add_to(my_map)

# Display the map.
display(my_map)