## USGS and EMIT Data Matchup
In this notebook we will search the USGS database for a specific state code and paramater code/s to retrieve a list of sites. We will then use the site coordinates to find matching EMIT granules and gather data around the EMIT granules time stamp. 

### 1. Retrieving site codes
First import package and utils file

In [1]:
import dataretrieval.nwis as nwis
import geopandas as gpd
from shapely.geometry import Point, box, Polygon, MultiPolygon
import requests
import pandas as pd
import datetime as dt
import earthaccess
from tqdm import tqdm
from retrieval_utils import get_param_sites, get_all_site_granules, match_granules

Next we can find active parameters using the USGS website, for a separate guide on this there is a PDF called "Get param codes" in the Github. 

Then we can define the time-frame, state code and paramater codes and call the function. Note: all three are required for the function to work. 

In [2]:
param_codes = ['32315'] # chla fluorescence
param_codes_str = ','.join(param_codes) 
state_code = '06' # california
start_date = '2023-01-01'
end_date = '2023-12-31'

site_list = get_param_sites(param_codes_str, state_code, start_date, end_date)
print(site_list.head())

42 sites found for param codes
    site_no                              station_nm   dec_lat_va   dec_long_va
0  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA   37.3472151  -120.9761777
1  11336600     DELTA CROSS CHANNEL NR WALNUT GROVE   38.2447222  -121.5052778
2  11447650              SACRAMENTO R A FREEPORT CA  38.45566389  -121.5016167
3  11447890  SACRAMENTO R AB DELTA CROSS CHANNEL CA  38.25769218  -121.5182865
4  11448750              SF SCOTTS C NR LAKEPORT CA  39.04027778  -122.9833056


### 2. Retrieving granules based on site locations
Now we have the site list we can use coordinates to search for matching granules. 

Next setup the granule search and call the function.

In [3]:
start_date_dt = dt.datetime.strptime(start_date, '%Y-%m-%d')
end_date_dt = dt.datetime.strptime(end_date, '%Y-%m-%d')
dt_format = '%Y-%m-%dT%H:%M:%SZ'
temporal_str = start_date_dt.strftime(dt_format) + ',' + end_date_dt.strftime(dt_format)


site_granules = get_all_site_granules(site_list.head(), temporal_str)
df_granules = pd.DataFrame(site_granules)
print(df_granules.head())

Processing sites: 100%|███████████████████████████| 5/5 [00:04<00:00,  1.11it/s]

    site_no                              station_nm    site_lat      site_lon  \
0  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   
1  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   
2  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   
3  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   
4  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   

                                        granule_urls                  datetime  
0  [https://data.lpdaac.earthdatacloud.nasa.gov/l...  2023-06-04T19:37:03.000Z  
1  [https://data.lpdaac.earthdatacloud.nasa.gov/l...  2023-08-14T22:35:05.000Z  
2  [https://data.lpdaac.earthdatacloud.nasa.gov/l...  2023-08-18T21:00:31.000Z  
3  [https://data.lpdaac.earthdatacloud.nasa.gov/l...  2023-08-22T19:25:31.000Z  
4  [https://data.lpdaac.earthdatacloud.nasa.gov/l...  2023-08-22T19:25:43.000Z  





### 3. Collecting and matching data base on granule times

Next we can use the granule times and locations to collect and match the USGS data. 
The function will match, with each granule, the closest data time within the time window. 

Call the function and optionally store as a csv file. 

In [4]:
results = match_granules(df_granules.head(), param_codes)
print(results.head())
#results.to_csv('results.csv', index=False)

Processing granules: 100%|████████████████████████| 5/5 [00:02<00:00,  2.03it/s]

    site_no                              station_nm    site_lat      site_lon  \
0  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   
1  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   
2  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   
3  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   
4  11273400  SAN JOAQUIN R AB MERCED R NR NEWMAN CA  37.3472151  -120.9761777   

               granule_time  \
0 2023-06-04 19:37:03+00:00   
1 2023-08-14 22:35:05+00:00   
2 2023-08-18 21:00:31+00:00   
3 2023-08-22 19:25:31+00:00   
4 2023-08-22 19:25:43+00:00   

                                        granule_urls result result_unit  \
0  [https://data.lpdaac.earthdatacloud.nasa.gov/l...   2.43         RFU   
1  [https://data.lpdaac.earthdatacloud.nasa.gov/l...   8.93         RFU   
2  [https://data.lpdaac.earthdatacloud.nasa.gov/l...   9.92         RFU   
3  [https://data.lpdaac.ea


