Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using data from NSIDC is not possible #13

Closed
Faramarz-bagherzadeh opened this issue Dec 6, 2022 · 12 comments
Closed

Using data from NSIDC is not possible #13

Faramarz-bagherzadeh opened this issue Dec 6, 2022 · 12 comments

Comments

@Faramarz-bagherzadeh
Copy link
Contributor

Hello,
I am trying to get data from NSIDC. The mapped products are huge files and hard to download, process, and upload again. Is it possible to use this data in CVL directly without the need for a username and password? I need this data to proceed with the project. thanks

image

@Faramarz-bagherzadeh Faramarz-bagherzadeh changed the title Data from NSIDC is not possible Using data from NSIDC is not possible Dec 6, 2022
@ykern
Copy link
Collaborator

ykern commented Dec 6, 2022

Thanks for raising the issue here.
We had an email conversation about this already. Just want to summarise here in case it is relevant for anybody else.

One approach for now can be following the procedure of NSIDC's example notebook here: https://github.com/nsidc/NSIDC-Data-Access-Notebook/blob/master/notebooks/Customize%20and%20Access%20NSIDC%20Data.ipynb
Here, one need to provide credentials every time access to the data is made. I suggest not to use any of the other methods proposed by NSIDC, that include the creation of a credential file when using CVL. This could otherwise accidentally lead to unintended exposure of personal credentials when for example pushing to github.

However, that does not solve the problem of the general requirement for login credentials by NSIDC. Maybe @steingod has some alternative ideas or comments? Is there a way one could get access to NSIDC data through CVL without login credentials?

@steingod
Copy link

steingod commented Dec 6, 2022

I have no immediate solution as we haven't tried to interface earthdata profiles earlier. I will check with contacts at NSIDC if there are Single Sign On solutions that could be used.

@steingod
Copy link

steingod commented Dec 7, 2022

I have no immediate solution as we haven't tried to interface earthdata profiles earlier. I will check with contacts at NSIDC if there are Single Sign On solutions that could be used as we have such an interface scheduled for deployment although primarily tested against PTEP currently.

@betolink
Copy link

Julia mentioned this question and I'd be happy to help! I'm currently presenting at AGU but will look into this ASAP.

@Faramarz-bagherzadeh
Copy link
Contributor Author

After clicking on the data link from the CVL website, I selected [NASA Earthdata Search]. After filtering the area I needed, I could get ".nc" files. There are many files and I needed daily data only, so with help of a Chorom Extenstion tool (Simple mass downloader), I could download the massive amount of data that I wanted (around 100 GB).

https://search.earthdata.nasa.gov/search

Very nice Addon
https://chrome.google.com/webstore/detail/simple-mass-downloader/abdkkegmcbiomijcbdaodaflgehfffed

Thanks for the help

@ykern
Copy link
Collaborator

ykern commented Jan 19, 2023

@steingod and @betolink, are there already any updates on how to access NSIDC data from PTEP?
I am just checking before we look into other solutions how to avoid external (outside of PTEP) download and processing of the data.

@betolink
Copy link

betolink commented Jan 20, 2023

Hi @ykern I dropped the ball on this one after AGU, I don't know what is the core issue with downloading data from NSIDC from your compute environment but it should be doable. There are many ways of downloading data but all of them require the users to authenticate with NASA EDL. Using a .netrc is the default but as you mentioned it is probably not safe as it could be a source of security issues if we push it to a repo. Some questions:

  • has this "downloading data from NSIDC" been working before?
  • what approach did you use to authenticate with NASA EDL?
  • what search API are you using to grab the files from NSIDC?

In case you're interested, I've been working on a Python library called earthaccess to simplify programmatic access to NASA data (not just NSIDC). We can download data from any NASA DAAC using very few lines of code if we know our dataset's concept_id or short_name, if not we can look for it using keywords or DOI.

This would be the code to download a year of MODIS Snow Cover, short_name: MOD10A1, concept_id: C1646610417-NSIDC_ECS
Note that with this approach we still need our EDL credentials but they can be read from the environment, a .netrc file or we can populate them interactively. if we use the environment approach we read them from $EDL_USERNAME and $EDL_PASSWORD

import earthaccess as ea

ea.login(strategy="environment")

granules = ea.search_data(
    concept_id="C1646610417-NSIDC_ECS",
    temporal=("2018","2019")
)
ea.download(granules, local_path="./modis_data")

NASA is rolling a new way of universally accessing data using tokens, first users need to generate a token with with EDL, the token is valid for 3 months and it can be included in HTTP requests as bearer tokens. These are read only tokens.

import requests

my_token="LONG_STRING"

data = requests.get(URL, headers={"Authorization": f"Bearer: {my_token}"})

I'm not registered with PTEP but I assume you could integrate any of these 2 methods with your python environment, hopefully this is a start to solve the issue.

@ykern
Copy link
Collaborator

ykern commented Jan 23, 2023

Thanks very much @betolink. I think this should be a good set of approaches for @Faramarz-bagherzadeh to check out and follow up on.
Edit: Some few more information on how to set up environmental variables and how to use them in python for anybody not used to this: https://towardsdatascience.com/connect-to-databases-using-python-and-hide-secret-keys-with-env-variables-a-brief-tutorial-4f68e33a6dc6

@Faramarz-bagherzadeh
Copy link
Contributor Author

Thank you all.
I used the following code, which seems doable in this project's framework. But there are still two issues.
First, the total number of grabbed data granules seems to be different, and second is that I need to filter the daily data because I need only files with the format of .daily.nc as shown in the screenshot. Is it possible to do this filtering with 'earthaccess' library?

import earthaccess
auth = earthaccess.login(strategy="interactive")
earthaccess.login(strategy="environment")

granules = earthaccess.search_data(
    concept_id="C1597320047-NSIDC_ECS",
    bounding_box=(-51.06, 75.6, -35.9, 77.4),
    cloud_hosted=True,
    downloadable = True,
    temporal=("2011-05", "2015-05"),
    count=100)
#earthaccess.download(granules, local_path="./modis_data")

issu13

@betolink
Copy link

Hi @Faramarz-bagherzadeh I think the difference is on the downloadable parameter, the portal is showing results that cannot be downloaded if you check the "Find only granules that are available online" you'll see the same number of granules as with the query with earthaccess. For the daily pattern you're looking in the search data portal you can use MODGRNLD.*.daily.* in the granule id box to filter only those with that naming pattern. earthaccess does not support this yet but it was great that you mentioned so we can add it to the list of parameters!

Screenshot from 2023-01-30 09-25-04

@Faramarz-bagherzadeh
Copy link
Contributor Author

Hi @betolink , Thank you for the help. I see the correct filtering is with a downloadable filter. All is good now and I could get all the files that I needed with earthaccess. As the library is doing most of the work, for getting the daily files, I simply changed them to string as below. I sent a pull request as well to contribute this code to the CVL early adaptors project. Thanks, everyone for helping me.

daily_granules = []
for gr in granules:
    if 'daily' in str(gr):
        daily_granules.append(gr)

@ykern
Copy link
Collaborator

ykern commented Jan 31, 2023

Thanks for all the efforts everyone. I will close the issue. Please reopen it in case a follow up is needed.

@ykern ykern closed this as completed Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants