# Load data from NASA Earthdata 

## Create the `.netrc` file
Create at the base level, so `/home/jovyan`.

```text
machine urs.earthdata.nasa.gov
login <USERNAME>
password <PASSWORD>
```

`<USERNAME>` and `<PASSWORD>` would be replaced by your actual Earthdata Login username and password respectively.

## Then add `.netrc` to your `.gitignore` file

You don't really have to do this since `.netrc` wasn't made in the repo, but just for good measure I added it to my `.gitignore` file.

## Test that you can connect to Earthdata

You can also go to the [hackweek tutorial](https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/04_NASA_Earthdata_Authentication.html) to see how to create the file with code, but I just created manually. Run this code to make sure your `.netrc` is working. It should return your user name.

In [2]:
from netrc import netrc
import os
# Determine if netrc file exists, and if so, if it includes NASA Earthdata Login Credentials
urs = 'urs.earthdata.nasa.gov'    # Earthdata URL endpoint for authentication
netrc_name = ".netrc"
netrcDir = os.path.expanduser(f"~/{netrc_name}")
netrc(netrcDir).authenticators(urs)[0]

'eeholmes'

## Get the data
The data I'll use is the [GHRSST Level 4 AVHRR_OI Global Blended Sea Surface Temperature Analysis (GDS2) from NCEI](https://search.earthdata.nasa.gov/search/granules/collection-details?p=C2036881712-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&ff=Available%20from%20AWS%20Cloud&fs10=Sea%20Surface%20Temperature&fsm0=Ocean%20Temperature&fst0=Oceans&m=60.46875!-145.0546875!2!1!0!0%2C2). 


"https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-public/AVHRR_OI-NCEI-L4-GLOB-v2.1/20160101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc.md5"

## Get the url for downloading
I'll follow the instructions in the [harmonizing tutorial](), although this seems like a convoluted way to get a simple url. Seems like the Earthdata search page ougth to show me that url.  

First we need to import some more packages.

In [3]:
import requests
from pprint import pprint
from pathlib import Path

import s3fs

import xarray as xr

import matplotlib.pyplot as plt
import cartopy.crs as ccrs

### Get the name of the data on Earthdata
I am going to guess that it is the name in the box in the upper left on the Earthdata page for the OI data. That's seems likely.

In [7]:
data_name = 'AVHRR_OI-NCEI-L4-GLOB-v2.1'

In [9]:
cmr_search_url = 'https://cmr.earthdata.nasa.gov/search'
cmr_collection_url = f'{cmr_search_url}/{"collections"}'
response = requests.get(cmr_collection_url, 
                        params={
                            'short_name': data_name,
                            'cloud_hosted': 'True',
                            },
                        headers={
                            'Accept': 'application/json'
                            }
                       )
response = response.json()
collections = response['feed']['entry']

for collection in collections:
    print(f'{collection["id"]} {"version:"}{collection["version_id"]}')

C2036881712-POCLOUD version:2.1


We want to save this for later.

In [14]:
data_concept_id = collections[0]["id"]

### Set the time and bounding box
The Earthdata page doesn't tell me what the coordinate system is, but the [documentation](https://cmr.earthdata.nasa.gov/search/concepts/C2036881712-POCLOUD.html?token=eyJ0eXAiOiJKV1QiLCJvcmlnaW4iOiJFYXJ0aGRhdGEgTG9naW4iLCJhbGciOiJIUzI1NiJ9.eyJ0eXBlIjoiT0F1dGgiLCJ1aWQiOiJlZWhvbG1lcyIsImNsaWVudF9pZCI6Ik9McEFabEU0SHFJT01yMFRZcWc3VVEiLCJleHAiOjE2Mzk1OTUyNDksImlhdCI6MTYzNzAwMzI0OSwiaXNzIjoiRWFydGhkYXRhIExvZ2luIn0.YxkHDRAO7nsi0M4zKkqGbkXkC3c6lK0hCUjTRmYwG04:OLpAZlE4HqIOMr0TYqg7UQ) shows that it is on a lat/lon 0.25 degree grid with longitude 0 = UTM 0. Some ocean datasets are shifted so longitude 0 is not UTM 0.

In [11]:
# Bounding Box spatial parameter in decimal degree 'W,S,E,N' format.
bounding_box = '-105,21,-125,32'

# Each date in yyyy-MM-ddTHH:mm:ssZ format; date range in start,end format
temporal = '2019-06-22T00:00:00Z,2019-06-22T23:59:59Z'

In [20]:
granule_url = f'{cmr_search_url}/{"granules"}'
response = requests.get(granule_url, 
                        params={
                            'concept_id': data_concept_id,
                            'temporal': temporal,
                            'bounding_box': bounding_box,
                            'page_size': 200,
                            },
                        headers={
                            'Accept': 'application/json'
                            }
                       )
granules = response.json()['feed']['entry']

for granule in granules:
    print(granule['boxes'])
    print(granule['links'][0]['href'])

['-89.875 -179.875 89.875 179.875']
s3://podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20190621120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc
['-89.875 -179.875 89.875 179.875']
s3://podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20190622120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc


They look the same. I'll just grab the first one.

In [21]:
box = granules[0]['boxes']
url = granules[0]['links'][0]['href']

In [29]:
url

's3://podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20190621120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc'

Hmm, that doesn't look right. The url should have the bounding box in it somewhere.

## Follow the s3 tutorial to work in cloud
[This tutorial](https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/05_Data_Access_Direct_S3.html)


%matplotlib inline
import matplotlib.pyplot as plt
from datetime import datetime
import os
import subprocess
import requests
import boto3
import pandas as pd
import numpy as np
import xarray as xr
import rasterio as rio
from rasterio.session import AWSSession
from rasterio.plot import show
import rioxarray
import geopandas
import pyproj
from pyproj import Proj
from shapely.ops import transform
import geoviews as gv
from cartopy import crs
import hvplot.xarray
import holoviews as hv
gv.extension('bokeh', 'matplotlib')

Get s3 credentials

In [90]:
s3_cred_endpoint = 'https://archive.podaac.earthdata.nasa.gov/s3credentials'
def get_temp_creds():
    temp_creds_url = s3_cred_endpoint
    return requests.get(temp_creds_url).json()
temp_creds_req = get_temp_creds()

Following the code in the tutorial, create a boto3 Session object using your temporary credentials. This Session is used to pass credentials and configuration to AWS so we can interact wit S3 objects from applicable buckets.

In [28]:
session = boto3.Session(aws_access_key_id=temp_creds_req['accessKeyId'], 
                        aws_secret_access_key=temp_creds_req['secretAccessKey'],
                        aws_session_token=temp_creds_req['sessionToken'],
                        region_name='us-west-2')

Following the tutorial code [here](https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/05_Data_Access_Direct_S3.html#read-in-a-single-hls-file), I'll access the SST raster.

In [31]:
rio_env = rio.Env(AWSSession(session),
                  GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
                  GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'),
                  GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/cookies.txt'))
rio_env.__enter__()

<rasterio.env.Env at 0x7fb8a656f700>

In [93]:
hls_da = rioxarray.open_rasterio(url, chunks=True)

  s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)


## Plot the data
Lot's of problems. Why is land -30000 or so? Why is the temperature so odd? It should be like 280 Kelvin not 1200 in the ocean.

In [95]:
hls_da[2].analysed_sst.hvplot.image(x='x', y='y', rasterize=True, colorbar=True, xlim=[-140, -120], ylim=[42,50])


## Try another way
This didn't work. It worked once and then I ran `f.close()` and it stopped working.

In [91]:
s3_fs = s3fs.S3FileSystem(
    key=temp_creds_req['accessKeyId'],
    secret=temp_creds_req['secretAccessKey'],
    token=temp_creds_req['sessionToken'],
    client_kwargs={'region_name':'us-west-2'},
)


In [92]:
f = s3_fs.open(url, mode='rb')
ds = xr.open_dataset(f)

ValueError: I/O operation on closed file.

In [86]:
# This never worked. Didn't show the plot.
ds.analysed_sst.plot() ;
f.close()

ValueError: I/O operation on closed file.