Using data from NSIDC is not possible #13

Faramarz-bagherzadeh · 2022-12-06T12:52:36Z

Hello,
I am trying to get data from NSIDC. The mapped products are huge files and hard to download, process, and upload again. Is it possible to use this data in CVL directly without the need for a username and password? I need this data to proceed with the project. thanks

ykern · 2022-12-06T14:26:22Z

Thanks for raising the issue here.
We had an email conversation about this already. Just want to summarise here in case it is relevant for anybody else.

One approach for now can be following the procedure of NSIDC's example notebook here: https://github.com/nsidc/NSIDC-Data-Access-Notebook/blob/master/notebooks/Customize%20and%20Access%20NSIDC%20Data.ipynb
Here, one need to provide credentials every time access to the data is made. I suggest not to use any of the other methods proposed by NSIDC, that include the creation of a credential file when using CVL. This could otherwise accidentally lead to unintended exposure of personal credentials when for example pushing to github.

However, that does not solve the problem of the general requirement for login credentials by NSIDC. Maybe @steingod has some alternative ideas or comments? Is there a way one could get access to NSIDC data through CVL without login credentials?

steingod · 2022-12-06T18:15:40Z

I have no immediate solution as we haven't tried to interface earthdata profiles earlier. I will check with contacts at NSIDC if there are Single Sign On solutions that could be used.

steingod · 2022-12-07T07:42:32Z

I have no immediate solution as we haven't tried to interface earthdata profiles earlier. I will check with contacts at NSIDC if there are Single Sign On solutions that could be used as we have such an interface scheduled for deployment although primarily tested against PTEP currently.

betolink · 2022-12-12T20:14:21Z

Julia mentioned this question and I'd be happy to help! I'm currently presenting at AGU but will look into this ASAP.

Faramarz-bagherzadeh · 2023-01-02T08:19:09Z

After clicking on the data link from the CVL website, I selected [NASA Earthdata Search]. After filtering the area I needed, I could get ".nc" files. There are many files and I needed daily data only, so with help of a Chorom Extenstion tool (Simple mass downloader), I could download the massive amount of data that I wanted (around 100 GB).

https://search.earthdata.nasa.gov/search

Very nice Addon
https://chrome.google.com/webstore/detail/simple-mass-downloader/abdkkegmcbiomijcbdaodaflgehfffed

Thanks for the help

ykern · 2023-01-19T13:52:17Z

@steingod and @betolink, are there already any updates on how to access NSIDC data from PTEP?
I am just checking before we look into other solutions how to avoid external (outside of PTEP) download and processing of the data.

betolink · 2023-01-20T14:59:19Z

Hi @ykern I dropped the ball on this one after AGU, I don't know what is the core issue with downloading data from NSIDC from your compute environment but it should be doable. There are many ways of downloading data but all of them require the users to authenticate with NASA EDL. Using a .netrc is the default but as you mentioned it is probably not safe as it could be a source of security issues if we push it to a repo. Some questions:

has this "downloading data from NSIDC" been working before?
what approach did you use to authenticate with NASA EDL?
what search API are you using to grab the files from NSIDC?

In case you're interested, I've been working on a Python library called earthaccess to simplify programmatic access to NASA data (not just NSIDC). We can download data from any NASA DAAC using very few lines of code if we know our dataset's concept_id or short_name, if not we can look for it using keywords or DOI.

This would be the code to download a year of MODIS Snow Cover, short_name: MOD10A1, concept_id: C1646610417-NSIDC_ECS
Note that with this approach we still need our EDL credentials but they can be read from the environment, a .netrc file or we can populate them interactively. if we use the environment approach we read them from $EDL_USERNAME and $EDL_PASSWORD

import earthaccess as ea

ea.login(strategy="environment")

granules = ea.search_data(
    concept_id="C1646610417-NSIDC_ECS",
    temporal=("2018","2019")
)
ea.download(granules, local_path="./modis_data")

NASA is rolling a new way of universally accessing data using tokens, first users need to generate a token with with EDL, the token is valid for 3 months and it can be included in HTTP requests as bearer tokens. These are read only tokens.

import requests

my_token="LONG_STRING"

data = requests.get(URL, headers={"Authorization": f"Bearer: {my_token}"})

I'm not registered with PTEP but I assume you could integrate any of these 2 methods with your python environment, hopefully this is a start to solve the issue.

ykern · 2023-01-23T15:59:05Z

Thanks very much @betolink. I think this should be a good set of approaches for @Faramarz-bagherzadeh to check out and follow up on.
Edit: Some few more information on how to set up environmental variables and how to use them in python for anybody not used to this: https://towardsdatascience.com/connect-to-databases-using-python-and-hide-secret-keys-with-env-variables-a-brief-tutorial-4f68e33a6dc6

Faramarz-bagherzadeh · 2023-01-28T12:05:17Z

Thank you all.
I used the following code, which seems doable in this project's framework. But there are still two issues.
First, the total number of grabbed data granules seems to be different, and second is that I need to filter the daily data because I need only files with the format of .daily.nc as shown in the screenshot. Is it possible to do this filtering with 'earthaccess' library?

import earthaccess
auth = earthaccess.login(strategy="interactive")
earthaccess.login(strategy="environment")

granules = earthaccess.search_data(
    concept_id="C1597320047-NSIDC_ECS",
    bounding_box=(-51.06, 75.6, -35.9, 77.4),
    cloud_hosted=True,
    downloadable = True,
    temporal=("2011-05", "2015-05"),
    count=100)
#earthaccess.download(granules, local_path="./modis_data")

betolink · 2023-01-30T15:45:54Z

Hi @Faramarz-bagherzadeh I think the difference is on the downloadable parameter, the portal is showing results that cannot be downloaded if you check the "Find only granules that are available online" you'll see the same number of granules as with the query with earthaccess. For the daily pattern you're looking in the search data portal you can use MODGRNLD.*.daily.* in the granule id box to filter only those with that naming pattern. earthaccess does not support this yet but it was great that you mentioned so we can add it to the list of parameters!

Faramarz-bagherzadeh · 2023-01-30T20:35:30Z

Hi @betolink , Thank you for the help. I see the correct filtering is with a downloadable filter. All is good now and I could get all the files that I needed with earthaccess. As the library is doing most of the work, for getting the daily files, I simply changed them to string as below. I sent a pull request as well to contribute this code to the CVL early adaptors project. Thanks, everyone for helping me.

daily_granules = []
for gr in granules:
    if 'daily' in str(gr):
        daily_granules.append(gr)

ykern · 2023-01-31T07:55:38Z

Thanks for all the efforts everyone. I will close the issue. Please reopen it in case a follow up is needed.

Faramarz-bagherzadeh changed the title ~~Data from NSIDC is not possible~~ Using data from NSIDC is not possible Dec 6, 2022

ykern mentioned this issue Jan 27, 2023

Icecore matching satellite #17

Closed

betolink mentioned this issue Jan 30, 2023

Support pattern matching on arbitraty parameters for granule searches nsidc/earthaccess#198

Open

ykern closed this as completed Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using data from NSIDC is not possible #13

Using data from NSIDC is not possible #13

Faramarz-bagherzadeh commented Dec 6, 2022

ykern commented Dec 6, 2022

steingod commented Dec 6, 2022

steingod commented Dec 7, 2022

betolink commented Dec 12, 2022

Faramarz-bagherzadeh commented Jan 2, 2023

ykern commented Jan 19, 2023

betolink commented Jan 20, 2023 •

edited

Loading

ykern commented Jan 23, 2023 •

edited

Loading

Faramarz-bagherzadeh commented Jan 28, 2023

betolink commented Jan 30, 2023

Faramarz-bagherzadeh commented Jan 30, 2023

ykern commented Jan 31, 2023

Using data from NSIDC is not possible #13

Using data from NSIDC is not possible #13

Comments

Faramarz-bagherzadeh commented Dec 6, 2022

ykern commented Dec 6, 2022

steingod commented Dec 6, 2022

steingod commented Dec 7, 2022

betolink commented Dec 12, 2022

Faramarz-bagherzadeh commented Jan 2, 2023

ykern commented Jan 19, 2023

betolink commented Jan 20, 2023 • edited Loading

ykern commented Jan 23, 2023 • edited Loading

Faramarz-bagherzadeh commented Jan 28, 2023

betolink commented Jan 30, 2023

Faramarz-bagherzadeh commented Jan 30, 2023

ykern commented Jan 31, 2023

betolink commented Jan 20, 2023 •

edited

Loading

ykern commented Jan 23, 2023 •

edited

Loading