<a href="https://colab.research.google.com/github/Stp155906/Data-Science-For-Beginners/blob/main/DataSubSettinginEarthData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Subsetting and Transformation Services in the Cloud


---

> Using the Harmony-Py library to access customized data from NASA Earthdata


---

Objectives: 
1. Conceptualize the data transformation service types and offerings provided by NASA Earthdata, including Harmony.
2. Practice skills learned from the introductory CMR tutorial to discover what access and service options exist for a given data set, as well as variable metadata.
3. Utilize the Harmony-py library to request subsetted MODIS L2 Sea Surface Temperature data over the Gulf of Mexico.
4. Read Harmony subsetted outputs directly into xarray. ___



---



In [None]:
!pip install harmony
!pip install requests
!pip install -U harmony-py
!pip install xarray
!pip install datetime
!pip install pprint
!pip install s3fs



---



In [None]:
from harmony import BBox
from harmony import Client
from harmony import Collection
from harmony import Request 
from harmony import LinkType
from harmony.config import Environment
import requests
from pprint import pprint
import datetime as dt
import s3fs
import xarray as xr

In [None]:
# Lets utilize the CMR API skills we learned on Day 1 to inspect service metadata:
url = 'https://cmr.earthdata.nasa.gov/search'
# We want to search by collection to. inspect the access and service options that exist:
collection_url = f'{url}/{"collections"}'

We are going to focus on MODIS_A-JPL-L2P-v2019.0: GHRSST Level 2P Global Sea Surface Skin Temperature from the Moderate Resolution Imaging Spectroradiometer (MODIS) on the NASA Aqua satellite (GDS2). Let’s first save this as a variable that we can use later on once we request data from Harmony.

In [None]:
short_name= 'MODIS_A-JPL-L2P-v2019.0'
concept_id = 'C1940473819-POCLOUD'

We will view the top-level metadata for this collection to see what additional service and variable metadata exist.



---



In [None]:
response = requests.get(collection_url, 
                        params={
                            'concept_id': concept_id,
                            },
                        headers={
                            'Accept': 'application/json'
                            }
                       )
response = response.json()

In [None]:
pprint(response)

{'feed': {'entry': [{'archive_center': 'NASA/JPL/PODAAC',
                     'associations': {'services': ['S1962070864-POCLOUD',
                                                   'S2004184019-POCLOUD',
                                                   'S2153799015-POCLOUD',
                                                   'S2227193226-POCLOUD'],
                                      'tools': ['TL2108419875-POCLOUD',
                                                'TL2092786348-POCLOUD'],
                                      'variables': ['V1997812737-POCLOUD',
                                                    'V1997812697-POCLOUD',
                                                    'V2112014688-POCLOUD',
                                                    'V1997812756-POCLOUD',
                                                    'V1997812688-POCLOUD',
                                                    'V1997812670-POCLOUD',
                                                  

# What do each of these service values mean?


---


Associations 

CMR is a large web of interconnected metadata “schemas”, including Collections, Granules, Services, Tools, and Variables. In this case, this collection is associated with two unique services, two tools, and several unique variables.

Tags

There are also tags that describe what service options exist at a high-level. In this case, we see that this dataset supports the ability to reformat, subset by space and time, as well as by variable. This is used in web applications like Earthdata Search to surface those customization options more readily.

Service Features

In this case, we see three separate “features” listed here: esi, Harmony, and OPeNDAP.


---


> We will dig into more details on what Harmony offers for this dataset.

> First, we need to isolate the services returned for this dataset:

In [None]:
services = response['feed']['entry'][0]['associations']['services']
print(services)

In [None]:
service_url = "https://cmr.earthdata.nasa.gov/search/services"

Inspect the first service returned. 

Now we’re going to search the services endpoint to view that individual service’s metadata, like we did with our dataset above. 

This time, we’re explicitly setting the format of the response to umm-json in the Accept Header in order to see detailed metadata about the service.

In [None]:
service_response = requests.get(service_url, 
                        params={
                            'concept_id': services[0],
                            },
                        headers={
                            'Accept': 'application/vnd.nasa.cmr.umm_results+json'
                            }
                       )
service_response = service_response.json()

Details about the service metadata record include the service options provided by the “backend” processor connected to Harmony, in this case the PODAAC Level 2 Cloud Subsetter:

In [None]:
pprint(service_response)

{'hits': 1,
 'items': [{'meta': {'concept-id': 'S1962070864-POCLOUD',
                     'concept-type': 'service',
                     'deleted': False,
                     'format': 'application/vnd.nasa.cmr.umm+json',
                     'native-id': 'POCLOUD_podaac_l2_cloud_subsetter',
                     'provider-id': 'POCLOUD',
                     'revision-date': '2022-05-31T23:46:37.054Z',
                     'revision-id': 25,
                     'user-id': 'chen5510'},
            'umm': {'AccessConstraints': 'None',
                    'Description': 'Endpoint for subsetting L2 Subsetter via '
                                   'Harmony',
                    'LongName': 'PODAAC Level 2 Cloud Subsetter',
                    'MetadataSpecification': {'Name': 'UMM-S',
                                              'URL': 'https://cdn.earthdata.nasa.gov/umm/service/v1.5.0',
                                              'Version': '1.5.0'},
                    'Name': 'P

# Discover all datasets that support Harmony services


---

> Instead of searching for services on a known dataset of interest, we may want to discovery all available datasets that are supported for a given service. 

> We can utilize GraphQL, which is a way for us to efficiently gain information across service and collection metadata so that we can print out all supported Harmony datasets. 


> First, we need to specify a query string. Here we are asking to query all collections with service type “Harmony”, and to provide details on the service options attached to those services:

In [None]:
query = """query {
  collections(limit: 2000, serviceType: "Harmony") {
    count
    items {
      shortName
      conceptId
      services {
        count
        items {
          name
          supportedReformattings
          supportedInputProjections
          supportedOutputProjections
          serviceOptions
        }
      }
      variables {
        count
      }
    }
  }
}"""

This utilizes a different API endpoint to query CMR metdata using GraphQL. Here we set up another request, passing our query string above:

In [None]:
graphql_url = 'https://graphql.earthdata.nasa.gov/api'

graphql_response = requests.get(graphql_url,
                        params={"query": query},
                        headers={
                            'Accept': 'application/json',
                        }
                       )

A json response is returned that provides all collections with Harmony-supported services. We can then extract just the collectionshortName, conceptID, and the service names supported for each collection:

In [None]:
services = graphql_response.json()['data']['collections']['items']

for service in services:
    print(service['shortName'], ",", service['conceptId'])
    for i in range(len(service['services']['items'])):
        print("Services:", service['services']['items'][i]['name'])

CYGNSS_L2_CDR_V1.1 , C2205121485-POCLOUD
Services: PO.DAAC Cloud OPeNDAP
Services: PODAAC L2 Cloud Subsetter
CYGNSS_L2_SURFACE_FLUX_CDR_V1.0 , C2205618975-POCLOUD
Services: PO.DAAC Cloud OPeNDAP
Services: PODAAC L2 Cloud Subsetter
CYGNSS_L2_SURFACE_FLUX_CDR_V1.1 , C2205121520-POCLOUD
Services: PO.DAAC Cloud OPeNDAP
Services: PODAAC L2 Cloud Subsetter
CYGNSS_L2_V3.0 , C2205620319-POCLOUD
Services: PO.DAAC Cloud OPeNDAP
Services: PODAAC L2 Cloud Subsetter
CYGNSS_L2_V3.1 , C2183155461-POCLOUD
Services: PO.DAAC Cloud OPeNDAP
Services: PODAAC L2 Cloud Subsetter
ECCO_L4_ATM_STATE_05DEG_DAILY_V4R4 , C1990404801-POCLOUD
Services: PO.DAAC Cloud OPeNDAP
Services: PO.DAAC harmony-netcdf-to-zarr
ECCO_L4_ATM_STATE_LLC0090GRID_DAILY_V4R4 , C1991543823-POCLOUD
Services: PO.DAAC Cloud OPeNDAP
Services: PO.DAAC harmony-netcdf-to-zarr
ECCO_L4_ATM_STATE_05DEG_MONTHLY_V4R4 , C1990404814-POCLOUD
Services: PO.DAAC Cloud OPeNDAP
Services: PO.DAAC harmony-netcdf-to-zarr
ECCO_L4_ATM_STATE_LLC0090GRID_MONTHLY_V

Discover variable names

In [None]:
variables = response['feed']['entry'][0]['associations']['variables']
print(variables)

['V1997812737-POCLOUD', 'V1997812697-POCLOUD', 'V2112014688-POCLOUD', 'V1997812756-POCLOUD', 'V1997812688-POCLOUD', 'V1997812670-POCLOUD', 'V1997812724-POCLOUD', 'V2112014684-POCLOUD', 'V1997812701-POCLOUD', 'V1997812681-POCLOUD', 'V2112014686-POCLOUD', 'V1997812663-POCLOUD', 'V1997812676-POCLOUD', 'V1997812744-POCLOUD', 'V1997812714-POCLOUD']


Several variable records are returned. Again, like we did for services, we’ll search the variables endpoint to view an individual variable’s metadata, and we’ll print out the list of variables for our dataset.

In [None]:
var_url = "https://cmr.earthdata.nasa.gov/search/variables"

In [None]:
var_response = requests.get(var_url, 
                        params={
                            'concept_id': variables[0],
                            },
                        headers={
                            'Accept': 'application/vnd.nasa.cmr.umm_results+json'
                            }
                       )
var_response = var_response.json()

This utilizes a different API endpoint to query CMR metdata using GraphQL. Here we set up another request, passing our query string above:



---



In [None]:
pprint(var_response)

{'hits': 1,
 'items': [{'associations': {'collections': [{'concept-id': 'C1940473819-POCLOUD'}]},
            'meta': {'concept-id': 'V1997812737-POCLOUD',
                     'concept-type': 'variable',
                     'deleted': False,
                     'format': 'application/vnd.nasa.cmr.umm+json',
                     'native-id': 'MODIS_A-JPL-L2P-v2019.0-sses_standard_deviation_4um',
                     'provider-id': 'POCLOUD',
                     'revision-date': '2022-06-22T19:21:18.580Z',
                     'revision-id': 7,
                     'user-id': 'chen5510'},
            'umm': {'DataType': 'byte',
                    'Definition': 'mid-IR SST standard deviation error; non '
                                  'Some applications are unable to properly '
                                  'handle signed byte values. If values are '
                                  'encountered > 127, please subtract 256 from '
                                  'this reporte

Next, print out a simple list of all associated variable names by looping the same variable response we submitted above, this time for each variable:

In [None]:
var_list = []
for i in range(len(variables)):
    var_response = requests.get(var_url, 
                            params={
                                'concept_id': variables[i],
                                },
                            headers={
                                'Accept': 'application/vnd.nasa.cmr.umm_results+json'
                                }
                           )
    var_response = var_response.json()
    var_list.append(var_response['items'][0]['umm']['Name'])

# Create Harmony Client object


---



In [None]:
harmony_client = Client(auth=('*****', '****))

In [None]:
request = Request(
    collection=Collection(id=short_name),
    spatial=BBox(-97.77667,21.20806,-83.05197,30.16605),
    temporal={
        'start': dt.datetime(2021, 8, 20),
        'stop': dt.datetime(2021, 8, 21),
    },
)

In [None]:
request.is_valid()

True

Submit request

> Now that the request is created, we can now submit it to Harmony using the Harmony Client object. A job id is returned, which is a unique identifier that represents the submitted request.

In [None]:
job_id = harmony_client.submit(request)
job_id

'a417f181-677b-4a75-98e0-794c1e3dd188'

Check request status


We can check on the progress of a processing job with status(). This method blocks while communicating with the server but returns quickly.




In [None]:
harmony_client.status(job_id)

{'created_at': datetime.datetime(2022, 7, 14, 2, 0, 43, 935000, tzinfo=tzlocal()),
 'created_at_local': '2022-07-14T02:00:43+00:00',
 'data_expiration': datetime.datetime(2022, 8, 13, 2, 0, 43, 935000, tzinfo=tzlocal()),
 'data_expiration_local': '2022-08-13T02:00:43+00:00',
 'message': 'The job is being processed',
 'num_input_granules': 6,
 'progress': 0,
 'request': 'https://harmony.earthdata.nasa.gov/MODIS_A-JPL-L2P-v2019.0/ogc-api-coverages/1.0.0/collections/all/coverage/rangeset?forceAsync=true&subset=lat(21.20806%3A30.16605)&subset=lon(-97.77667%3A-83.05197)&subset=time(%222021-08-20T00%3A00%3A00%22%3A%222021-08-21T00%3A00%3A00%22)',
 'status': 'running',
 'updated_at': datetime.datetime(2022, 7, 14, 2, 0, 45, 272000, tzinfo=tzlocal()),
 'updated_at_local': '2022-07-14T02:00:45+00:00'}

Depending on the size of the request, it may be helpful to wait until the request has completed processing before the remainder of the code is executed. The wait_for_processing() method will block subsequent lines of code while optionally showing a progress bar.

In [None]:
harmony_client.wait_for_processing(job_id, show_progress=True)

 [ Processing: 100% ] |###################################################| [|]


View Harmony job response and output URLs
> Once the data request has finished processing, we can view details on the job that was submitted to Harmony, including the API call to Harmony, and informational messages on the request if available.

> result_json() calls wait_for_processing() and returns the complete job in JSON format once processing is complete.

In [None]:
data = harmony_client.result_json(job_id)
pprint(data)

{'createdAt': '2022-07-14T02:00:43.935Z',
 'dataExpiration': '2022-08-13T02:00:43.935Z',
 'jobID': 'a417f181-677b-4a75-98e0-794c1e3dd188',
 'links': [{'href': 'https://harmony.earthdata.nasa.gov/stac/a417f181-677b-4a75-98e0-794c1e3dd188/',
            'rel': 'stac-catalog-json',
            'title': 'STAC catalog',
            'type': 'application/json'},
           {'bbox': [-97.8, 21.2, -97.3, 24.6],
            'href': 'https://harmony.earthdata.nasa.gov/service-results/harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820203501-JPL-L2P_GHRSST-SSTskin-MODIS_A-D-v02.0-fv01.0_subsetted.nc4',
            'rel': 'data',
            'temporal': {'end': '2021-08-20T20:39:58.000Z',
                         'start': '2021-08-20T20:35:01.000Z'},
            'title': '20210820203501-JPL-L2P_GHRSST-SSTskin-MODIS_A-D-v02.0-fv01.0_subsetted.nc4',
            'type': 'application/x-netcdf4'},
           {'bbox': [-97.8, 25.3, -83.1, 30.2],
            'href

In [None]:
from google.colab import file

AttributeError: ignored

In [203]:
import dmrpp

# Direct cloud access


---

Note that the remainder of this tutorial will only succeed when running this notebook within the AWS us-west-2 region.

Harmony data outputs can be accessed within the cloud using the s3 URLs and AWS credentials provided in the Harmony job response.



---

## Retrieve list of output URLs. 


---



The result_urls() method calls wait_for_processing() and returns a list of the processed data URLs once processing is complete. You may optionally show the progress bar as shown below.



In [None]:
results = harmony_client.result_urls(job_id, link_type=LinkType.s3)
urls = list(results)
pprint(urls)

['s3://harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820203501-JPL-L2P_GHRSST-SSTskin-MODIS_A-D-v02.0-fv01.0_subsetted.nc4',
 's3://harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820190001-JPL-L2P_GHRSST-SSTskin-MODIS_A-D-v02.0-fv01.0_subsetted.nc4',
 's3://harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820185501-JPL-L2P_GHRSST-SSTskin-MODIS_A-D-v02.0-fv01.0_subsetted.nc4',
 's3://harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820093501-JPL-L2P_GHRSST-SSTskin-MODIS_A-N-v02.0-fv01.0.nc4',
 's3://harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820080001-JPL-L2P_GHRSST-SSTskin-MODIS_A-N-v02.0-fv01.0_subsetted.nc4',
 's3://harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820062501-JPL-L2P_GHRSST-SSTskin-MODIS_A-N-v02.0-fv01.0.nc4']


In [None]:
urls2 = 's3://harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820203501-JPL-L2P_GHRSST-SSTskin-MODIS_A-D-v02.0-fv01.0_subsetted.nc4'

We can see that the first file returned does not include the _subsetted suffix, which indicates that a blank file was returned, as no data values were located within our subsetted region. We’ll select the second URL in the list to bring into xarray below.

In [None]:
url = urls[1]
url

's3://harmony-prod-staging/public/podaac/l2-subsetter/a5c3e2e7-94e9-4cec-ba40-66c5c54ab87a/20210820190001-JPL-L2P_GHRSST-SSTskin-MODIS_A-D-v02.0-fv01.0_subsetted.nc4'

AWS credential retrieval


---
> Using aws_credentials you can retrieve the credentials needed to access the Harmony s3 staging bucket and its contents.


In [None]:
creds = harmony_client.aws_credentials()

Open staged files with s3fs and xarray

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
s3_fs = s3fs.S3FileSystem(
    key= creds['aws_access_key_id'],
    secret= creds['aws_secret_access_key'],
    token= creds['aws_session_token'],
    client_kwargs={'region_name':'us-west-2'},
)



---



---



In [None]:
!pip install earthdata
!pip install pydap
!pip install tinynetrc
!pip install podaac

In [195]:
import pydap
import pydap.client

In [199]:
from earthdata import Auth # DataColletions, DataGranules, Accessor
auth = Auth().login()

ContextualVersionConflict: ignored

!pip install earthdata
!pip install pydap
!pip install tinynetrc
!pip install podaac!pip install earthdata
!pip install pydap
!pip install tinynetrc
!pip install podaacp

In [None]:
def begin_s3_direct_access():
    url = "https://archive.podaac.earthdata.nasa.gov/s3credentials"
    response = requests.get(url).json()
    return s3fs.S3FileSystem(
        key=response["accessKeyId"],
        secret=response["secretAccessKey"],
        token=response["sessionToken"],
        client_kwargs={"region_name": "us-west-2"},

SyntaxError: ignored

In [None]:
!pip install requests

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from pydap.client import open_url
from pydap.cas.urs import setup_session
session = setup_session("your_username", "your_pw")
#dataset = open_url('http://server.example.com/path/to/dataset', session=session)

In [None]:
!pip install requests

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement cookielib (from versions: none)[0m
[31mERROR: No matching distribution found for cookielib[0m


In [None]:

%%writefile auth.py
import getpass
import os
from netrc import NetrcParseError
from pathlib import Path
from typing import Any, Dict, List, Optional, Union
from urllib.parse import urlparse

import requests  # type: ignore
from tinynetrc import Netrc

#rom .daac import DAACS


class SessionWithHeaderRedirection(requests.Session):
    """
    Requests removes auth headers if the redirect happens outside the
    original req domain. This is taken from https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+Python
    """

    AUTH_HOST = "urs.earthdata.nasa.gov"

    def __init__(self, username: str = None, password: str = None) -> None:
        super().__init__()
        if username and password:
            self.auth = (username, password)

    # Overrides from the library to keep headers when redirected to or from
    # the NASA auth host.
    def rebuild_auth(self, prepared_request: Any, response: Any) -> None:
        headers = prepared_request.headers
        url = prepared_request.url

        if "Authorization" in headers:

            original_parsed = urlparse(response.request.url)
            redirect_parsed = urlparse(url)
            if (
                (original_parsed.hostname != redirect_parsed.hostname)
                and redirect_parsed.hostname != self.AUTH_HOST
                and original_parsed.hostname != self.AUTH_HOST
            ):

                del headers["Authorization"]
        return


class Auth(object):
    """
    Authentication class for operations that require Earthdata login (EDL)
    """

    def __init__(self) -> None:
        # Maybe all these predefined URLs should be in a constants.py file
        self.authenticated = False
        self.tokens: List = []
        self.EDL_GET_TOKENS_URL = "https://urs.earthdata.nasa.gov/api/users/tokens"
        self.EDL_GENERATE_TOKENS_URL = "https://urs.earthdata.nasa.gov/api/users/token"
        self.EDL_REVOKE_TOKEN = "https://urs.earthdata.nasa.gov/api/users/revoke_token"

    def login(self, strategy: str = "interactive", persist: bool = False) -> Any:
        """Authenticate with Earthdata login

        :strategy: authentication method to used
            "interactive" - (default) enter username and password
            "netrc" - retrieve username and password from ~/.netrc
            "environment" - retrieve username and password from $EDL_USERNAME and $EDL_PASSWORD
        :persist: will persist credentials in a .netrc file
        """
        if self.authenticated:
            print("We are already authenticated with NASA EDL")
            return self
        if strategy == "interactive":
            self._interactive(persist)
        if strategy == "netrc":
            self._netrc()
        if strategy == "environment":
            self._environment()
        return self

    def refresh_tokens(self) -> bool:
        if len(self.tokens) == 0:
            resp_tokens = self._generate_user_token(
                username=self._credentials[0], password=self._credentials[1]
            )
            if resp_tokens.ok:
                self.token = resp_tokens.json()
                self.tokens = [self.token]
                print(
                    f"earthdata generated a token for CMR with expiration on: {self.token['expiration_date']}"
                )
                return True
            else:
                print(resp_tokens)
                return False
        if len(self.tokens) == 1:
            resp_tokens = self._generate_user_token(
                username=self._credentials[0], password=self._credentials[1]
            )
            if resp_tokens.ok:
                self.token = resp_tokens.json()
                self.tokens.extend(self.token)
                print(
                    f"earthdata generated a token for CMR with expiration on: {self.token['expiration_date']}"
                )
                return True
            else:
                print(resp_tokens)
                return False

        if len(self.tokens) == 2:
            resp_revoked = self._revoke_user_token(self.token["access_token"])
            if resp_revoked:
                resp_tokens = self._generate_user_token(
                    username=self._credentials[0], password=self._credentials[1]
                )
                if resp_tokens.ok:
                    self.token = resp_tokens.json()
                    self.tokens[0] = self.token
                    print(
                        f"earthdata generated a token for CMR with expiration on: {self.token['expiration_date']}"
                    )
                    return True
                else:
                    print(resp_tokens)
                    return False

        return False

    def get_s3_credentials(
        self, cloud_provider: str = ""
    ) -> Union[Dict[str, str], None]:
        """
        gets AWS S3 credentials for a given NASA cloud provider
        :param cloud_provider: a NASA DAAC cloud provider i.e. POCLOUD
        :returns: a python dictionary with the S3 keys or None
        """
        auth_url = self._get_cloud_auth_url(cloud_provider)
        if auth_url.startswith("https://"):
            cumulus_resp = self._session.get(auth_url, timeout=10, allow_redirects=True)
            auth_resp = self._session.get(
                cumulus_resp.url, allow_redirects=True, timeout=10
            )
            if not (auth_resp.ok):  # type: ignore
                print(
                    f"Authentication with Earthdata Login failed with:\n{auth_resp.text}"
                )
                return None
            return auth_resp.json()
        else:
            # This happens if the cloud provider doesn't list the S3 credentials or the DAAC
            # does not have cloud collections yet
            print(
                f"Credentials for the cloud provider {cloud_provider} are not available"
            )
            return None

    def get_session(self, bearer_token: bool = False) -> SessionWithHeaderRedirection:
        """
        Returns a new request session instance, since looks like using a session in a context is not threadsafe
        https://github.com/psf/requests/issues/1871
        Session with bearer tokens are used by CMR, simple auth sessions can be used do download data
        from on-prem DAAC data centers.
        :returns: subclass SessionWithHeaderRedirection instance
        """
        if bearer_token and self.authenticated:
            session = SessionWithHeaderRedirection()
            session.headers.update(
                {"Authorization": f'Bearer {self.token["access_token"]}'}
            )
            return session
        else:
            return SessionWithHeaderRedirection(
                self._credentials[0], self._credentials[1]
            )

    def _interactive(self, presist_credentials: bool = True) -> bool:
        username = input("Enter your Earthdata Login username: ")
        password = getpass.getpass(prompt="Enter your Earthdata password: ")
        authenticated = self._get_credentials(username, password)
        if authenticated is True and presist_credentials is True:
            self._persist_user_credentials(username, password)
        return authenticated

    def _netrc(self) -> bool:
        try:
            my_netrc = Netrc()
        except FileNotFoundError as err:
            print(f"Expects .netrc in {os.path.expanduser('~')}")
            print(err)
            return False
        except NetrcParseError as err:
            print("Unable to parse .netrc")
            print(err)
            return False
        if my_netrc["urs.earthdata.nasa.gov"] is not None:
            username = my_netrc["urs.earthdata.nasa.gov"]["login"]
            password = my_netrc["urs.earthdata.nasa.gov"]["password"]
        else:
            return False
        authenticated = self._get_credentials(username, password)
        return authenticated

    def _environment(self) -> bool:
        username = os.getenv("EDL_USERNAME")
        password = os.getenv("EDL_PASSWORD")
        authenticated = self._get_credentials(username, password)
        return authenticated

    def _get_credentials(
        self, username: Optional[str], password: Optional[str]
    ) -> bool:
        if username is not None and password is not None:
            self._session = SessionWithHeaderRedirection(username, password)
            token_resp = self._get_user_tokens(username, password)

            if not (token_resp.ok):  # type: ignore
                print(
                    f"Authentication with Earthdata Login failed with:\n{token_resp.text}"
                )
                return False
            print("You're now authenticated with NASA Earthdata Login")
            self._credentials = (username, password)
            self.tokens = token_resp.json()
            self.authenticated = True

            if len(self.tokens) == 0:
                self.refresh_tokens()
                print(
                    f"earthdata generated a token for CMR with expiration on: {self.token['expiration_date']}"
                )
                self.token = self.tokens[0]
            elif len(self.tokens) > 0:
                self.token = self.tokens[0]
                print(
                    f"Using token with expiration date: {self.token['expiration_date']}"
                )

        return self.authenticated

    def _get_user_tokens(self, username: str, password: str) -> Any:
        session = SessionWithHeaderRedirection(username, password)
        auth_resp = session.get(
            self.EDL_GET_TOKENS_URL,
            headers={
                "Accept": "application/json",
            },
            timeout=10,
        )
        return auth_resp

    def _generate_user_token(self, username: str, password: str) -> Any:
        session = SessionWithHeaderRedirection(username, password)
        auth_resp = session.post(
            self.EDL_GENERATE_TOKENS_URL,
            headers={
                "Accept": "application/json",
            },
            timeout=10,
        )
        return auth_resp

    def _revoke_user_token(self, token: str) -> bool:
        session = SessionWithHeaderRedirection(
            self._credentials[0], self._credentials[1]
        )
        auth_resp = session.post(
            self.EDL_REVOKE_TOKEN,
            params={"token": token},
            headers={
                "Accept": "application/json",
            },
            timeout=10,
        )
        return auth_resp.ok

    def _persist_user_credentials(self, username: str, password: str) -> bool:
        # See: https://github.com/sloria/tinynetrc/issues/34
        netrc_path = Path().home().joinpath(".netrc")
        netrc_path.touch(mode=600, exist_ok=True)
        my_netrc = Netrc(str(netrc_path))
        my_netrc["urs.earthdata.nasa.gov"] = {"login": username, "password": password}
        my_netrc.save()
        return True

    def _get_cloud_auth_url(self, cloud_provider: str = "") -> str:
        for provider in DAACS:
            if (
                cloud_provider in provider["cloud-providers"]
                and len(provider["s3-credentials"]) > 0
            ):
                return str(provider["s3-credentials"])
        return ""


Writing auth.py




---



---



In [None]:
f = s3_fs.open(url, mode='rb')
ds = xr.open_dataset(f)
ds

ParamValidationError: ignored

In [None]:
!pip install rasterio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rasterio
  Downloading rasterio-1.2.10-cp37-cp37m-manylinux1_x86_64.whl (19.3 MB)
[K     |████████████████████████████████| 19.3 MB 4.6 MB/s 
[?25hCollecting snuggs>=1.4.1
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Collecting affine
  Downloading affine-2.3.1-py2.py3-none-any.whl (16 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting click-plugins
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Installing collected packages: snuggs, cligj, click-plugins, affine, rasterio
Successfully installed affine-2.3.1 click-plugins-1.1.1 cligj-0.7.2 rasterio-1.2.10 snuggs-1.4.7


In [None]:
import rasterio

In [None]:
!pip install requests
!pip install wget
!pip install pydap

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pydap
  Downloading Pydap-3.2.2-py3-none-any.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 5.2 MB/s 
Collecting Webob
  Downloading WebOb-1.8.7-py2.py3-none-any.whl (114 kB)
[K     |████████████████████████████████| 114 kB 18.6 MB/s 
Installing collected packages: Webob, pydap
Successfully installed Webob-1.8.7 pydap-3.2.2


In [None]:
from pydap.client import open_url

In [None]:
import pydap as pyd

In [None]:
import os  # importing all we need, it's not much
import wget

urls_to_load = list()  # a list to store the urls
path = 'download_folder'  # the path where we will download those files


In [None]:
import os

In [None]:
os.mkdir('download_files')

In [None]:
dap_url="https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/152/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
data = xr.open_dataset('https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/152/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc')

OSError: ignored

In [204]:
!pip install zarr-eosdis-store

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement zarr-eosdis-store (from versions: none)[0m
[31mERROR: No matching distribution found for zarr-eosdis-store[0m


In [206]:
pip install zarr-eosdis-store

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement zarr-eosdis-store (from versions: none)[0m
[31mERROR: No matching distribution found for zarr-eosdis-store[0m


In [207]:
!pip install zarr-eosdis-store-main.zip

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Processing ./zarr-eosdis-store-main.zip
Collecting requests-futures>=1.0.0
  Downloading requests_futures-1.0.0-py2.py3-none-any.whl (7.4 kB)
Collecting zarr>=2.7.1
  Downloading zarr-2.12.0-py3-none-any.whl (185 kB)
[K     |████████████████████████████████| 185 kB 5.4 MB/s 
[?25hCollecting ipypb~=0.5
  Downloading ipypb-0.5.2-py3-none-any.whl (8.6 kB)
Collecting numcodecs>=0.8.1
  Downloading numcodecs-0.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 2.2 MB/s 
[31mERROR: Package 'zarr-eosdis-store' requires a different Python: 3.7.13 not in '>=3.8'[0m
[?25h

In [209]:
!pip install git+https://github.com/nasa/zarr-eosdis-store.git@main#egg=zarr-eosdis-store

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting zarr-eosdis-store
  Cloning https://github.com/nasa/zarr-eosdis-store.git (to revision main) to /tmp/pip-install-no10f_el/zarr-eosdis-store_2b03a61d1fc7489d90c0b8ace462167b
  Running command git clone -q https://github.com/nasa/zarr-eosdis-store.git /tmp/pip-install-no10f_el/zarr-eosdis-store_2b03a61d1fc7489d90c0b8ace462167b
Collecting requests-futures>=1.0.0
  Using cached requests_futures-1.0.0-py2.py3-none-any.whl (7.4 kB)
Collecting zarr>=2.7.1
  Using cached zarr-2.12.0-py3-none-any.whl (185 kB)
Collecting ipypb~=0.5
  Using cached ipypb-0.5.2-py3-none-any.whl (8.6 kB)
Collecting numcodecs>=0.8.1
  Using cached numcodecs-0.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.6 MB)
[31mERROR: Package 'zarr-eosdis-store' requires a different Python: 3.7.13 not in '>=3.8'[0m


In [211]:
from eosdis_store import EosdisStore
import zarr

# Assumes you have set up .netrc with your Earthdata Login information
f = zarr.open(EosdisStore('https://example.com/your/data/file.nc4'))

# Read metadata and data from f using the Zarr API
print(f['parameter_name'][0:0:0])

ModuleNotFoundError: ignored

In [212]:
!pip install EosdisStore

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement EosdisStore (from versions: none)[0m
[31mERROR: No matching distribution found for EosdisStore[0m


In [217]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [218]:
!unzip /content/zarr-eosdis-store-main.zip

Archive:  /content/zarr-eosdis-store-main.zip
15790b875926185846eb3dc4e34596c8050ceda1
   creating: zarr-eosdis-store-main/
 extracting: zarr-eosdis-store-main/.coveragerc  
 extracting: zarr-eosdis-store-main/.flake8  
   creating: zarr-eosdis-store-main/.github/
  inflating: zarr-eosdis-store-main/.github/release-drafter.yml  
   creating: zarr-eosdis-store-main/.github/workflows/
  inflating: zarr-eosdis-store-main/.github/workflows/draft-release.yml  
  inflating: zarr-eosdis-store-main/.github/workflows/publish-release.yml  
  inflating: zarr-eosdis-store-main/.github/workflows/tests.yml  
  inflating: zarr-eosdis-store-main/.gitignore  
  inflating: zarr-eosdis-store-main/CHANGELOG.md  
  inflating: zarr-eosdis-store-main/LICENSE  
 extracting: zarr-eosdis-store-main/MANIFEST.in  
  inflating: zarr-eosdis-store-main/Makefile  
  inflating: zarr-eosdis-store-main/README.rst  
   creating: zarr-eosdis-store-main/docs/
  inflating: zarr-eosdis-store-main/docs/Makefile  
   creating:

In [222]:
#cd /content/zarr-eosdis-store-main/setup.py
import setup.py

ModuleNotFoundError: ignored

In [223]:
! python setup.py

python3: can't open file 'setup.py': [Errno 2] No such file or directory
