# SIF AND TEMPO DATA AQUISITION AND ANALYSIS

Emily Rogers, NASA SARP WEST 2024

Contact: erogers4@bellarmine.edu

Acknowledgements

Thank you to Dr. Barron Henderson for developing the pyrsig package for Python, and to Dr. Dan Sousa and Megan Ward-Baranyay for offering crucial guidance. I also thank Lily Lyons from the Aerosols group at NASA SARP West 2024 for guidance retrieving SIF data. This procedure includes adaptations from the resources and advice from these individuals. 

Source code for TEMPO- Python Interface: https://barronh.github.io/pyrsig/ 

Earth Data, OCO-2 SIF : https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1863

In [None]:
# IMPORTING AND INSTALLING LIBRARIES --------------------------------------------------
#!python -m pip install -qq pandas xarray matplotlib netcdf4 pyproj pyrsig pycno
import pyproj
import xarray as xr
import pyrsig
import pandas as pd
import pycno
import getpass
import matplotlib.pyplot as plt
import glob
import netCDF4
from netCDF4 import Dataset
import calendar
import numpy as np

# SELECTING BOUNDING BOXES.

In my research, I selected 10 areas each from High Altitude Forests & Woodlands and Coastal Shrublands & Grasslands using Google Earth and land use maps. The following is used to iterate processes for all of these. Dr. Henderson's GitHub, linked above, has a script for processing data for a single bounding box, as does the NASA Arset training for TEMPO.

In [None]:
# Selected locations for high altitude forest and woodlands-- TEMPO
bbox_list= [(-118.41, 35.49, -118.40, 35.50),
            (-118.38, 35.45, -118.37, 35.46),
            (-118.33, 35.43, -118.32, 35.44),
            (-117.92, 34.06, -117.91, 34.07),
            (-116.75, 34.23, -116.74, 34.24),
            (-116.82, 34.01, -116.81, 34.02),
            (-117.79, 34.21, -117.78, 34.22),
            (-116.74, 33.84, -116.73, 33.85),
            (-117.56, 33.74, -117.55, 33.75),
            (-117.49, 33.66, -117.48, 33.67)]
# bbox lists set the corners of the area you take data from. This includes multiple pixels from TEMPO
# (lon, lat, lon lat) for each area-- first set must be lower than the second!

In [None]:
# Selected locations for coastal shrublands and grasslands
bbox_list_csg= [(-120.55, 34.56, -120.54, 34.57),
                (-120.46, 34.46, -120.45, 34.47),
                (-120.32, 34.47, -120.31, 34.48),
                (-119.34, 34.31, -119.33, 34.32),
                (-119.23, 34.34, -119.22, 34.35),
                (-119.20, 34.36, -119.19, 34.37),
                (-117.38, 33.39, -117.37, 33.40),
                (-117.34, 33.35, -117.33, 33.36),
                (-117.47, 33.32, -117.46, 33.33),
                (-117.36, 33.24, -117.35, 33.25)]
# Call additional bbox lists different names!!

# ACCESSING TEMPO DATA

The following procedure allows you to find the name of the TEMPO product you need. Select a date within your timeframe of interest and a single bounding box for simplicity. The workdir= locname portion creates a subfolder that the pyrsig package will use to store data. **You must delete these files or make a new locname each time you use a new bounding box.**

In this example, I used a particular area in Southern California on December 18, 2023 to identify the product name for level 2 formaldehyde vertical column data. This allows data retrieval and analysis later.
 
 Filename format: TEMPO_{gas}_L2_VO3_{YYYY}{MM}{DD}T{HH}{NN}{SS}Z_S{XXX}G{YY}.nc

 {XXX} represents scan number, {YY} is granule number

In [None]:
# LOOKING FOR A PRODUCT NAME ---------------------------------------------------------------
bdate_test= '2023-12-18' #bounding date-- YYYY-MM-DD format
bbox_test = (-120.55, 34.56, -120.54, 34.57)

api_test = pyrsig.RsigApi(bdate= bdate_test, bbox= bbox_test, workdir= locname, gridfit= True)
api_key = 'anonymous' 
api_test.tempo_kw['api_key'] = api_key

# After cell runs, scroll through the table to find the product you need
# Use filters starting with "tempo" and modify search as needed.
descdf = api.descriptions()
descdf
descdf.query('name.str.contains("tempo.l2.hcho")') 

# ACCESSING FORMALDEHYDE VERTICAL COLUMN DATA FOR 1 MONTH

TEMPO collects data every **daylight** hour. In my research, I chose to display this collection using histograms to visualize the data spread. The following script allows you to collect and manipulate data for an entire month for each of two regions. Empty lists for the autamation preparation must have unique names.

In [None]:
# JULY 2024, HIGH ALTITUDE FORESTS AND WOODLANDS---------------------------------------

# Preparing for an automated process
api_key = 'anonymous'  # using public data, so using anonymous
tempokey='tempo.l2.hcho.vertical_column' # defining a key for the TEMPO product of interest
output_hafw=[] # reserving an empty list for the loop output

# making a list of consecutive dates
start_jul= '2024-07-01'
end_jul= '2024-07-31'
date_range_jul= pd.date_range(start= start_jul, end= end_jul)
date_list_jul = date_range_jul.strftime('%Y-%m-%d').tolist() # this makes a list of usable strings
jul_dates= pd.to_datetime(date_list_jul) # this converts strings to date

In [None]:
# Making a loop: HCHO column for individual coordinates/bboxes
#    It is essential that you do not put files from different bounding boxes in the same folder-- the data will not read
#    properly. We use a number, i, in the locname (the folder it will make in the cloud) so that a new folder is made for
#    each bounding box. By putting outputs in an empty list, the data will be together in the jupyter notebook when you
#    are ready to use it. - Emily R. SARP West 2024
i=-1
for bbox in bbox_list:
    i=i+1
    locname = 'socal_new_'+str(i)
    for day in jul_dates:
        try:
            api = pyrsig.RsigApi(bdate=day, bbox=bbox, workdir=locname, gridfit=(True))
            api.tempo_kw['api_key'] = api_key
            df = api.to_dataframe(tempokey, unit_keys=False, parse_dates=True, backend='xdr')
            df['bbox'] = str(bbox) 
            
            ## the next 3 lines are for indexing by bounding box
            
            df['Timestamp'] = pd.to_datetime(df['Timestamp'], errors='coerce')
            df['date'] = df['Timestamp'].dt.date
            output_hafw.append(df) #adds output to empty list
           #print(f'successfully processed date: {day} and bbox: {bbox}') # to print each successful run if needed
        except Exception as e:
            e = e
            #print(f'an error occurred for date: {day} and bbox: {bbox}. Error: {e}')

In [None]:
# JULY 2024 COASTAL SHRUBLANDS AND GRASSLANDS ----------------------------------------

# Preparing for an automated process
api_key = 'anonymous'  # using public, so using anonymous
tempokey='tempo.l2.hcho.vertical_column' #defining a key for the TEMPO product of interest
output_csg=[] # reserving an empty list for the loop output

#making a loop: HCHO column for individual coordinates/bboxes
# using same date list as high altitude forest and woodlands
i=-1
for bbox in bbox_list_csg:
    i=i+1
    locname = 'socal_jul_csg'+str(i)
    for day in jul_dates:
        
        #print(bbox)
        try:
            api = pyrsig.RsigApi(bdate=day, bbox=bbox, workdir=locname, gridfit=(True))
            api.tempo_kw['api_key'] = api_key
            df = api.to_dataframe(tempokey, unit_keys=False, parse_dates=True, backend='xdr')
            df['bbox'] = str(bbox)
            df['Timestamp'] = pd.to_datetime(df['Timestamp'], errors='coerce')
            df['date'] = df['Timestamp'].dt.date
            output_csg.append(df) #adds output to empty list
           #print(f'successfully processed date: {day} and bbox: {bbox}')
        except Exception as e:
            e = e
            #print(f'an error occurred for date: {day} and bbox: {bbox}. Error: {e}')

In [None]:
# MERGING HIGH ALTITUDE FOREST & WOODLANDS AND COASTAL SHRUBLANDS & GRASSLANDS FOR JULY 2024
df_highaltitude= pd.concat(output_hafw) # convert individual output lists to dataframes
df_coastalshrub= pd.concat(output_csg)
tempo_jul = [df_highaltitude, df_coastalshrub] # make a list of output dataframes
df_tempo_jul = pd.concat(tempo_jul) # merge dataframes
df_tempo_jul

# GRAPHING AND STATISTICS FOR 1 MONTH OF TEMPO DATA

The following cells demonstrate useful analyses for vertical column data. The examples below contain the code I used to analyze the formaldehyde vertical column data for the two regions in Southern California.

In [None]:
# MEDIANS FOR EACH PLANT CLASSIFICATION------------------------------------------------------
median_hafw = np.median(df_highaltitude['vertical_column'])
median_csg = np.median(df_coastalshrub['vertical_column'])
print('median_hafw =', median_hafw)
print('median_csg =', median_csg)

In [None]:
# AVERAGES FOR EACH PLANT CLASSIFICATION-----------------------------------------------------
mean_hafw = df_highaltitude['vertical_column'].mean()
mean_csg = df_coastalshrub['vertical_column'].mean()
print('mean_hafw =', mean_hafw)
print('mean_csg =', mean_csg)

In [None]:
# STANDARD DEVIATION FOR EACH PLANT CLASSIFICTION-----------------------------------------------
stdev_hafw = np.std(df_highaltitude['vertical_column'])
stdev_csg = np.std(df_coastalshrub['vertical_column'])
print('stdev_hafw =', stdev_hafw)
print('stdev_csg=', stdev_csg)

In [None]:
# PLOTTING HISTOGRAMS FOR JULY 2024-----------------------------------------------------------
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)  # (rows, cols, panel number)
plt.hist(df_highaltitude['vertical_column'], bins= 50, color='#3CA3FD', edgecolor='black')
plt.title('HCHO Vertical Column - High Altitude Forests')
plt.axvline(mean_hafw, color='black', linestyle='dashed', linewidth= 3.5, label=f'Mean: {mean_hafw:.2f}')
plt.xlabel('HCHO Vertical Column')
plt.ylabel('Frequency')
plt.xlim(0, 7.5e16)
plt.ylim(0, 28)

# Plot histogram for Coastal Shrublands
plt.subplot(1, 2, 2)
plt.hist(df_coastalshrub['vertical_column'], bins= 50, color='#FF00E7', edgecolor='black')
plt.title('HCHO Vertical Column - Coastal Shrublands')
plt.axvline(mean_csg, color='#0707CE', linestyle='dashed', linewidth= 3.5, label=f'Mean: {mean_hafw:.2f}')
plt.xlabel('HCHO Vertical Column')
plt.ylabel('Frequency')
plt.xlim(0, 7.5e16)
plt.ylim(0, 28)

plt.tight_layout()
plt.show()

In [None]:
# Overlaying thetwo plots above
plt.figure(figsize=(8, 6))

# High Altitude Forests and Woodlands
plt.hist(df_highaltitude['vertical_column'], bins= 50, color='#3CA3FD', edgecolor='black')
plt.axvline(mean_hafw, color='black', linestyle='dashed', linewidth= 3.5, label=f'Mean: {mean_hafw:.2f}')
#plt.axvline(median_hafw, color = 'black', linestyle= ':' , linewidth = 3)

plt.hist(df_coastalshrub['vertical_column'], bins= 50, color= '#FF00E7' , edgecolor='black')
plt.title('HCHO Vertical Column')
#plt.axvline(median_csg, color = 'blue', linestyle = ':', linewidth = 3)
plt.axvline(mean_csg, color='#0707CE', linestyle= 'dashed', linewidth= 3.5, label=f'Mean: {mean_hafw:.2f}')
plt.xlabel('HCHO Vertical Column')
plt.ylabel('Frequency')

plt.xlim(0, 7.5e16)
plt.ylim(0, 28)

plt.tight_layout()
plt.show()

# SIF DATA AQUISITION FOR MULTIPLE MONTHS

I used OCO-2 Solar-Induced Chlorophyll Fluorescence (SIF) data between May 2019 - December 2019 to observe trends in vegetation phenology which are consistent in the regions of interest. As of Aug. 2024, the OCO-2 data record only goes to 2020. 

The OCO-2 satellite has a 16 day return time, affording 2 measurements each month (I designate A and B below). Files are available for 0.5 degree coordinate intervals. 

The procedure below makes a dataframe containing longitude and latitude values for 10 bounding boxes at a time-- this can be modified as needed. I then use this to graph SIF trends for the same time interval that TEMPO data is available as of Aug. 2024. I had to go month-by-month, but there may be a way to automate this. My work can be used as a template if needed, but be very mindful of the names you assign to variables and datasets. 

**You will need to download individual files from Earth Data and upload them to the cloud for this procedure.**

Source: https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1863

In [None]:
#making an empty list for merging all hafw sif dataframes
df_sif_hafw=[]

# Selected locations for high altitude forest and woodlands
bbox_list_hafw= [(-118.40, 35.49, -118.35, 35.54),
                 (-118.38, 35.45, -118.33, 35.50),
                 (-118.33, 35.43, -118.28, 35.48),
                 (-117.92, 34.06, -117.87, 34.11),
                 (-116.75, 34.23, -116.70, 34.28),
                 (-116.82, 34.01, -116.77, 34.06),
                 (-117.79, 34.21, -117.74, 34.26),
                 (-116.74, 33.84, -116.69, 33.89),
                 (-117.56, 33.74, -117.51, 33.79),
                 (-117.49, 33.66, -117.44, 33.71)]
# bbox lists set the corners of the area you take data from
# (lon, lat, lon lat) for each area-- first set must be lower than the second

# Making a dataframe of coordinates to iterate through-- we need column names for the loop
bbox_hafw = pd.DataFrame(bbox_list_hafw, columns=['lon_min', 'lat_min', 'lon_max', 'lat_max'])
#bbox_hafw

DECEMBER 2019

In [None]:
# DECEMBER A ----------------------------------------------------------

# Making an empty list to store the first december measurement for each coordinate pair
results_deca = []

# Telling Python where to pull data from (upload Earth Data files to the cloud and copy the filepath)
filepath_deca = '/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201912a.nc'
sifdeca = xr.open_dataset(filepath_deca)

# Making a loop-- first telling it what each number is in the bbox list above
for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locdeca = sifdeca.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_deca = sif_locdeca.sif_ann.values
    #print(sif_values) # to check

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_deca = sif_values_deca.flatten() # this way, you won't have to extract numbers later

    # Appending results to the list
    results_deca.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_deca})

df_deca_hafw = pd.DataFrame(results_deca) # converts results to dataframe
df_deca_hafw['month'] = pd.Timestamp(year= 2019, month = 12, day=1).strftime('%m') # makes a month column
#print(df_deca_hafw)
df_sif_hafw.append(df_deca_hafw) # adds this dataset to what will be a combined list

In [None]:
# DECEMBER B-----------------------------------------------------------------
results_decb = []

filepath_decb='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201912b.nc' # opening the dataset
sifdecb = xr.open_dataset(filepath_decb)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locdecb = sifdecb.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_decb = sif_locdecb.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_decb = sif_values_decb.flatten()

    # Appending results to the list
    results_decb.append({'lon_min': lon_min,
                        'lat_min': lat_min,
                        'lon_max': lon_max,
                        'lat_max': lat_max,
                        'sif_values': flattened_values_decb})

df_decb_hafw = pd.DataFrame(results_decb) # converts results to dataframe
df_decb_hafw['month'] = pd.Timestamp(year= 2019, month = 12, day=1).strftime('%m')
#print(df_decb_hafw)
df_sif_hafw.append(df_decb_hafw)

NOVEMBER 2019

In [None]:
# NOVEMBER A -------------------------------------------------------
results_nova = []

filepath_nova='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201911a.nc'
sifnova = xr.open_dataset(filepath_nova)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locnova = sifnova.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_nova = sif_locnova.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_nova = sif_values_nova.flatten()

    # Appending results to the list
    results_nova.append({'lon_min': lon_min,
                          'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_nova})

df_nova_hafw = pd.DataFrame(results_nova) # converts results to dataframe
df_nova_hafw['month'] = pd.Timestamp(year= 2019, month = 11, day=1).strftime('%m')
#print(df_nova_hafw)
df_sif_hafw.append(df_nova_hafw)

In [None]:
# NOVEMBER B -------------------------------------------------------
results_novb = []

filepath_novb='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201911b.nc'
sifnovb = xr.open_dataset(filepath_novb)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locnovb = sifnovb.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_novb = sif_locnovb.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_novb = sif_values_novb.flatten()

    # Appending results to the list
    results_novb.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_novb})

df_novb_hafw = pd.DataFrame(results_novb) # converts results to dataframe
df_novb_hafw['month'] = pd.Timestamp(year= 2019, month = 11, day=1).strftime('%m')
#print(df_novb_hafw)
df_sif_hafw.append(df_novb_hafw)

OCTOBER 2019

In [None]:
# OCTOBER A -------------------------------------------------------
results_octa = []

filepath_octa='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201910a.nc'
sifocta = xr.open_dataset(filepath_octa)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_lococta = sifocta.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_octa = sif_lococta.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_octa = sif_values_octa.flatten()

    # Appending results to the list
    results_octa.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_octa})

df_octa_hafw = pd.DataFrame(results_octa) # converts results to dataframe
df_octa_hafw['month'] = pd.Timestamp(year= 2019, month = 10, day=1).strftime('%m')
#print(df_octa_hafw)
df_sif_hafw.append(df_octa_hafw)

In [None]:
# OCTOBER B -------------------------------------------------------
results_octb = []

filepath_octb='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201910b.nc'
sifoctb = xr.open_dataset(filepath_octb)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_lococtb = sifoctb.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_octb = sif_lococtb.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_octb = sif_values_octb.flatten()

    # Appending results to the list
    results_octb.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_octb})

df_octb_hafw = pd.DataFrame(results_octb) # converts results to dataframe
df_octb_hafw['month'] = pd.Timestamp(year= 2019, month = 10, day=1).strftime('%m')
#print(df_octb_hafw)
df_sif_hafw.append(df_octb_hafw)

SEPTEMBER 2019

In [None]:
# SEPTEMBER A -------------------------------------------------------------
results_sepa = []

filepath_sepa='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201909a.nc'
sifsepa = xr.open_dataset(filepath_sepa)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locsepa = sifsepa.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_sepa = sif_locsepa.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_sepa = sif_values_sepa.flatten()

    # Appending results to the list
    results_sepa.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_sepa})

df_sepa_hafw = pd.DataFrame(results_sepa) # converts results to dataframe
df_sepa_hafw['month'] = pd.Timestamp(year= 2019, month = 9, day=1).strftime('%m') 
#print(df_sepa_hafw)
df_sif_hafw.append(df_sepa_hafw)

In [None]:
# SEPTEMBER B -------------------------------------------------------------
results_sepb = []

filepath_sepb='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201909b.nc'
sifsepb = xr.open_dataset(filepath_sepb)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locsepb = sifsepb.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_sepb = sif_locsepb.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_sepb = sif_values_sepb.flatten()

    # Appending results to the list
    results_sepb.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_sepb})

df_sepb_hafw = pd.DataFrame(results_sepb) # converts results to dataframe
df_sepb_hafw['month'] = pd.Timestamp(year= 2019, month = 9, day=1).strftime('%m')
#print(df_sepb_hafw)
df_sif_hafw.append(df_sepb_hafw)

AUGUST 2019

In [None]:
# AUGUST A --------------------------------------------------------------------------------
results_auga = []

filepath_auga='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201908a.nc'
sifauga = xr.open_dataset(filepath_auga)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locauga = sifauga.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_auga = sif_locauga.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_auga = sif_values_auga.flatten()

    # Appending results to the list
    results_auga.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_auga})

df_auga_hafw = pd.DataFrame(results_auga) # converts results to dataframe
df_auga_hafw['month'] = pd.Timestamp(year= 2019, month = 8, day=1).strftime('%m')
#print(df_auga_hafw)
df_sif_hafw.append(df_auga_hafw)

In [None]:
# AUGUST B ----------------------------------------------------------------------------------
results_augb = []

filepath_augb='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201908b.nc'
sifaugb = xr.open_dataset(filepath_augb)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locaugb = sifaugb.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_augb = sif_locaugb.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_augb = sif_values_augb.flatten()

    # Appending results to the list
    results_augb.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_augb})

df_augb_hafw = pd.DataFrame(results_augb) # converts results to dataframe
df_augb_hafw['month'] = pd.Timestamp(year= 2019, month = 8, day=1).strftime('%m')
#print(df_auga_hafw)
df_sif_hafw.append(df_augb_hafw)

JULY 2019

In [None]:
# JULY A ---------------------------------------------------------------------------------------
results_jula = []

filepath_jula='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201907a.nc'
sifjula = xr.open_dataset(filepath_jula)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locjula = sifjula.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_jula = sif_locjula.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_jula = sif_values_jula.flatten()

    # Appending results to the list
    results_jula.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_jula})

df_jula_hafw = pd.DataFrame(results_jula) # converts results to dataframe
df_jula_hafw['month'] = pd.Timestamp(year= 2019, month = 7, day=1).strftime('%m')
#print(df_jula_hafw)
df_sif_hafw.append(df_jula_hafw)

In [None]:
# JULY B --------------------------------------------------------------------------------------
results_julb = []

filepath_julb='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201907b.nc'
sifjulb = xr.open_dataset(filepath_julb)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locjulb = sifjulb.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_julb = sif_locjulb.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_julb = sif_values_julb.flatten()

    # Appending results to the list
    results_julb.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_julb})

df_julb_hafw = pd.DataFrame(results_julb) # converts results to dataframe
df_julb_hafw['month'] = pd.Timestamp(year= 2019, month = 7, day=1).strftime('%m')
print(df_julb_hafw)
df_sif_hafw.append(df_julb_hafw)

JUNE 2019

In [None]:
# JUNE A ---------------------------------------------------------------------------------------
results_juna = []

filepath_juna='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201906a.nc'
sifjuna = xr.open_dataset(filepath_juna)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locjuna = sifjuna.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_juna = sif_locjuna.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_juna = sif_values_juna.flatten()

    # Appending results to the list
    results_juna.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_juna})

df_juna_hafw = pd.DataFrame(results_juna) # converts results to dataframe
df_juna_hafw['month'] = pd.Timestamp(year= 2019, month = 6, day=1).strftime('%m')
#print(df_juna_hafw)
df_sif_hafw.append(df_juna_hafw)

In [None]:
# JUNE B ---------------------------------------------------------------------------------------
results_junb = []

filepath_junb='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201906b.nc'
sifjunb = xr.open_dataset(filepath_junb)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locjunb = sifjunb.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_junb = sif_locjunb.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_junb = sif_values_junb.flatten()

    # Appending results to the list
    results_junb.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_junb})

df_junb_hafw = pd.DataFrame(results_junb) # converts results to dataframe
df_junb_hafw['month'] = pd.Timestamp(year= 2019, month = 6, day=1).strftime('%m')
#print(df_junb_hafw)
df_sif_hafw.append(df_junb_hafw)

MAY 2019

In [None]:
# MAY B (no TEMPO data for May A for comparison)-----------------------------------------------
results_mayb = []

filepath_mayb='/home/jovyan/Emily_Rogers/Learning_SIF/sif_ann_201905b.nc'
sifmayb = xr.open_dataset(filepath_mayb)

for index, row in bbox_hafw.iterrows():
    lon_min = row['lon_min']
    lat_min = row['lat_min']
    lon_max = row['lon_max']
    lat_max = row['lat_max']

    # Selecting data for each bounding box
    sif_locmayb = sifmayb.sel(lat=slice(lat_min, lat_max), lon=slice(lon_min, lon_max))
    sif_values_mayb = sif_locmayb.sif_ann.values
    #print(sif_values)

    # Flattening the array to make it easier to store in the DataFrame
    flattened_values_mayb = sif_values_mayb.flatten()

    # Appending results to the list
    results_mayb.append({'lon_min': lon_min,
                         'lat_min': lat_min,
                         'lon_max': lon_max,
                         'lat_max': lat_max,
                         'sif_values': flattened_values_mayb})

df_mayb_hafw = pd.DataFrame(results_mayb) # converts results to dataframe
df_mayb_hafw['month'] = pd.Timestamp(year= 2019, month = 5, day=1).strftime('%m')
print(df_mayb_hafw)
df_sif_hafw.append(df_mayb_hafw)

# GRAPHING MONTHLY SIF DATA FOR HIGH ALTITUDE FORESTS AND WOODLANDS

I used the following code to graph the relationship between formaldehyde concentration and the SIF over each region of interest for the available time interval.

In [None]:
# MERGING DATAFRAMES FOR HIGH ALTITUDE FORESTS AND WOODLANDS-----------------------------------
sif_hafw= pd.concat(df_sif_hafw)
sif_hafw

In [None]:
# MAKING A DATAFRAME FOR MONTHLY AVERAGE--------------------------------------------------
avg_sif = sif_hafw.groupby('month', as_index=False)['sif_values'].mean()
avg_sif

In [None]:
# If you get values in brackets and gets errors, use the following and then try the cell above again.

#sif_hafw['sif_values_str'] = sif_hafw['sif_values'].astype(str) # extracts values from brackets
#sif_hafw['sif_values'] = sif_hafw['sif_values'].str.extract(r'(\d+)').astype(float) # converts sif values as a number
#sif_hafw # to check

In [None]:
# PLOTTING HIGH ALTITUDE FORESTS AND WOODLANDS ----------------------------------------------
plt.plot(avg_sif['month'], avg_sif['sif_values'])
plt.title('Average SIF for High Altitude Forest and Woodlands')
plt.xlabel('Month')
plt.ylabel('Average SIF')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.tight_layout()