## Downloading ERA5-Land hourly data at a single location (coordinate) using GEE
This notebook demonstrates how to sample raw ERA5-Land data from Google Earth Engine. Eventual improvments will tidy this data up for use in ELM. 
There is a step involved here where you must wait for GEE's computers to crank through the request, then you'll need to move a file from your Google Drive to your local machine. There are no great ways to automate this, sorry.

In [4]:
import ee
import pandas as pd
from ngeegee import e5l_utils as eu
from pathlib import Path

# Make sure to Initialize with the correct project name (do not use mine--it won't work for you)
ee.Initialize(project='ee-jonschwenk')

# Define a point we want to sample
lat, lon = 68.62758, -149.59429

### Set up our request via a dictionary of parameters
The only tricky thing here is specifying the bands (variables) you want. You have two options: 1) just set `gee_bands` to `all` to fetch all 68 bands, or provide a `list` of bands. If you aren't sure which bands are available, you can run `eu.e5lh_bands()['band_name']`. Below, I'll select just a handful for demonstration.

In [None]:
params = {
    "start_date": "2000-01-01", 
    "end_date": "2100-01-01", # If your end date is longer than what's available, it will just truncate at the last available date. Here I've used the year 2100 to ensure we download all data.
    "gee_bands": ["temperature_2m", 
                  "total_precipitation",
                  'dewpoint_temperature_2m',
                  'snow_albedo'], # select the bands (variables) you want to sample. Must exactly match the band names of the ERA5_LAND/HOURLY imageCollection. You can also specify 'all' to get all bands/variables.
    "point": (lon, lat), 
    "gdrive_folder": "NGEE_test",  # Google Drive folder name - will be created if it doesn't exist
    "filename": "ngee_test_era5_timeseries"  # Output CSV file name
}

# Send the job to GEE!
eu.sample_e5lh_at_point(params)

'Export task started: ngee_test_era5_timeseries_short (Check Google Drive or Task Status in the Javascript Editor for completion.)'

### Now we wait.
You've sent a Task to Google Earth Engine. You can check on its state using the [GEE Javascript code editor](http://code.earthengine.google.com) by clicking the `Tasks` tab in the upper-right panel.
Eventually it will finish, and your csv will show up where you told GEE to put it: `gdrive_folder/filename`.
It should not take more than a few hours to fetch all variables (68 total, I think) for all timesteps. In my testing here, it took 2 hours to fetch two variables from 1950-2025. It likely won't take much longer to fetch more variables, as the time suck is that GEE has to load each hourly ERA5-Land image.
Once that file is created in your GDrive, download it to your local machine and hammer away!

## Some examination of the data
Two hours later, a `.csv` appeared in my Google Drive foler (`NGEE_test\ngee_test_era5_timeseries`). I downloaded it and have included it in this repo just so we can inspect it a bit.

In [7]:
df = pd.read_csv(eu._DATA_DIR.parent / 'notebooks' / 'notebook_data' / 'ngee_test_era5_timeseries.csv')
print(df.columns)

Index(['system:index', 'date', 'temperature_2m', 'total_precipitation',
       '.geo'],
      dtype='object')


There are a couple columns here we don't care about, namely `system:index` and `.geo`. We see that there is a `date` column, and then a column for each variable we requested.
Let's just make sure the dates cover the range we expected, then call it a day.

In [9]:
print(min(df['date']))
print(max(df['date']))

2000-01-01 00:00
2023-01-04 23:00


It looks like the latest available date was Jan 04, 2023. I think I've gotten more recent data from ERA5-Land hourly through the cdsapi (directly from the source), so I'd want to double-check this. But everything looks good overall.