# Load data from NASA Earthdata 

## Create the `.netrc` file
Create at the base level, so `/home/jovyan`.

```text
machine urs.earthdata.nasa.gov
login <USERNAME>
password <PASSWORD>
```

`<USERNAME>` and `<PASSWORD>` would be replaced by your actual Earthdata Login username and password respectively.

## Then add `.netrc` to your `.gitignore` file

You don't really have to do this since `.netrc` wasn't made in the repo, but just for good measure I added it to my `.gitignore` file.

## Test that you can connect to Earthdata

You can also go to the [hackweek tutorial](https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/04_NASA_Earthdata_Authentication.html) to see how to create the file with code, but I just created manually. Run this code to make sure your `.netrc` is working. It should return your user name.

In [12]:
from netrc import netrc
import os
# Determine if netrc file exists, and if so, if it includes NASA Earthdata Login Credentials
urs = 'urs.earthdata.nasa.gov'    # Earthdata URL endpoint for authentication
netrc_name = ".netrc"
netrcDir = os.path.expanduser(f"~/{netrc_name}")
netrc(netrcDir).authenticators(urs)[0]

'afriesz'

## Get the data
The data I'll use is the [GHRSST Level 4 AVHRR_OI Global Blended Sea Surface Temperature Analysis (GDS2) from NCEI](https://search.earthdata.nasa.gov/search/granules/collection-details?p=C2036881712-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&ff=Available%20from%20AWS%20Cloud&fs10=Sea%20Surface%20Temperature&fsm0=Ocean%20Temperature&fst0=Oceans&m=60.46875!-145.0546875!2!1!0!0%2C2). 


"https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-public/AVHRR_OI-NCEI-L4-GLOB-v2.1/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc.md5"

## Get the url for downloading
I'll follow the instructions in the [harmonizing tutorial](), although this seems like a convoluted way to get a simple url. Seems like the Earthdata search page ougth to show me that url.  

First we need to import some more packages.

In [13]:
import requests
from pprint import pprint
from pathlib import Path

import s3fs

import xarray as xr

import matplotlib.pyplot as plt
import cartopy.crs as ccrs

### Get the name of the data on Earthdata
I am going to guess that it is the name in the box in the upper left on the Earthdata page for the OI data. That's seems likely.

In [14]:
data_name = 'AVHRR_OI-NCEI-L4-GLOB-v2.1'

In [15]:
cmr_search_url = 'https://cmr.earthdata.nasa.gov/search'
cmr_collection_url = f'{cmr_search_url}/{"collections"}'
response = requests.get(cmr_collection_url, 
                        params={
                            'short_name': data_name,
                            'cloud_hosted': 'True',
                            },
                        headers={
                            'Accept': 'application/json'
                            }
                       )
response = response.json()
collections = response['feed']['entry']

for collection in collections:
    print(f'{collection["id"]} {"version:"}{collection["version_id"]}')

C2036881712-POCLOUD version:2.1


We want to save this for later.

In [16]:
data_concept_id = collections[0]["id"]

### Set the time and bounding box
The Earthdata page doesn't tell me what the coordinate system is, but the [documentation](https://cmr.earthdata.nasa.gov/search/concepts/C2036881712-POCLOUD.html?token=eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJhbGciOiJIUzI1NiJ9.eyJ0eXBlIjoiT0F1dGgiLCJ1aWQiOiJlZWhvbG1lcyIsImNsaWVudF9pZCI6Ik9McEFabEU0SHFJT01yMFRZcWc3VVEiLCJleHAiOjE2Mzk1OTUyNDksImlhdCI6MTYzNzAwMzI0OSwiaXNzIjoiRWFydGhkYXRhIExvZ2luIn0.YxkHDRAO7nsi0M4zKkqGbkXkC3c6lK0hCUjTRmYwG04:OLpAZlE4HqIOMr0TYqg7UQ) shows that it is on a lat/lon 0.25 degree grid with longitude 0 = UTM 0. Some ocean datasets are shifted so longitude 0 is not UTM 0.

In [17]:
# Bounding Box spatial parameter in decimal degree 'W,S,E,N' format.
bounding_box = '-105,21,-125,32'

# Each date in yyyy-MM-ddTHH:mm:ssZ format; date range in start,end format
temporal = '1980-01-01T00:00:00Z,2021-12-31T23:59:59Z'

In [18]:
granule_url = f'{cmr_search_url}/{"granules"}'
response = requests.get(granule_url, 
                        params={
                            'concept_id': data_concept_id,
                            'temporal': temporal
                            },
                        headers={
                            'Accept': 'application/json'
                            }
                       )
granules = response.json()['feed']['entry']

#for granule in granules:
#    print(granule['links'][0]['href'])

I'll just grab the first one.

In [19]:
url = granules[0]['links'][0]['href']

url

's3://podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc'

## Direct Access

The below is modified from this example: https://github.com/NASA-Openscapes/2021-Cloud-Hackathon/blob/main/tutorials/Additional_Resources__Data_Access__Direct_S3_Access__PODAAC_ECCO_SSH.ipynb

In [20]:
s3_cred_endpoint = {
    'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
    'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'
}

In [21]:
def get_temp_creds():
    temp_creds_url = s3_cred_endpoint['podaac']
    return requests.get(temp_creds_url).json()

In [22]:
temp_creds_req = get_temp_creds()

In [23]:
fs_s3 = s3fs.S3FileSystem(anon=False, 
                          key=temp_creds_req['accessKeyId'], 
                          secret=temp_creds_req['secretAccessKey'], 
                          token=temp_creds_req['sessionToken'])

Using the `url` object from above (i.e., s3://podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20190621120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc)

In [24]:
s3_file_obj = fs_s3.open(url, mode='rb')

In [25]:
ds = xr.open_dataset(s3_file_obj, engine='h5netcdf')
ds

In [26]:
import hvplot.xarray

In [27]:
ds.analysed_sst.hvplot(x='lon', y='lat')

Rean in sample coordinates

In [28]:
import pandas as pd

In [29]:
df = pd.read_csv('../data/sample_point_pairs.csv')

In [31]:
df.head()

Unnamed: 0,x.km.ns,y.km.ns,x.km.os,y.km.os,lon.ns,lat.ns,lon.os,lat.os
0,-14625.94736,6966.576982,-14740.807657,7205.378454,-164.839138,54.893236,-169.350851,56.635443
1,-14529.116024,6991.414556,-14616.450302,7254.117459,-164.108622,55.201356,-168.675149,57.173228
2,-14430.943555,7010.274581,-14529.1362,7283.422068,-163.283984,55.463787,-168.130569,57.512547
3,-14334.933016,7038.029346,-14468.846933,7308.501786,-162.597992,55.794074,-167.825311,57.785663
4,-14239.63822,7066.884886,-14430.378431,7331.087327,-161.933495,56.132499,-167.727642,58.012647


Extract single point from `ds`

In [36]:
lon, lat = df['lon.ns'][0], df['lat.ns'][0]

In [37]:
lon

-164.839138084371

In [44]:
ds.analysed_sst.sel(lat=lat, lon=lon, method='nearest').values[0]

278.32

Extract point from `ds` and append to `df`

In [47]:
ds.analysed_sst

Create a function that extract data from `df`

In [57]:
def coord_sel(row):
    return ds.analysed_sst.sel(lat=row['lat.ns'], lon=row['lon.ns'], method='nearest').values[0]

In [59]:
df['sst'] = df.apply(lambda row: coord_sel(row), axis=1)
df.head()

Unnamed: 0,x.km.ns,y.km.ns,x.km.os,y.km.os,lon.ns,lat.ns,lon.os,lat.os,sst
0,-14625.94736,6966.576982,-14740.807657,7205.378454,-164.839138,54.893236,-169.350851,56.635443,278.320007
1,-14529.116024,6991.414556,-14616.450302,7254.117459,-164.108622,55.201356,-168.675149,57.173228,278.009979
2,-14430.943555,7010.274581,-14529.1362,7283.422068,-163.283984,55.463787,-168.130569,57.512547,277.699982
3,-14334.933016,7038.029346,-14468.846933,7308.501786,-162.597992,55.794074,-167.825311,57.785663,278.009979
4,-14239.63822,7066.884886,-14430.378431,7331.087327,-161.933495,56.132499,-167.727642,58.012647,278.029999


## Assemble the data into a 3D xarray