Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
143 lines (87 sloc) 5.27 KB

CyVerse logo

Home_Icon Learning Center Home

NEON Data API w/ Python

NEON developed an R and Python API for downloading data from their data store.

Cloning Jupyter Tutorials from Github

We provide some example Python3 Notebooks and R Markdown Notebooks for downloading lidar and hyperspectral data.

Prerequisite: Installed Anaconda and RStudio-Server, launched Jupyter Notebook or Lab

In the terminal:

  1. Clone notebooks from NEON Data Science or CyVerse GIS to a location on the VM (e.g. /home/user/)
git clone https://github.com/cyverse-gis/neon_data_science
cd neon_data_science/lessons
  1. From Jupyter Notebook or Lab select a data download notebook.
  2. Follow the notebook instructions.

Download data from CyVerse DataStore in Bash

CyVerse uses a system called iRODS to move files onto and off of its Data Store.

iRODS uses multi-threaded file transfers for faster downloads and uploads than traditional wget or curl

Prerequisite: Installed iRODS iCommands and initiated connection

  1. Use the ils command to view your files on the Data Store

  2. Change ownership of the directory where you want to download the data.

    sudo chown $USER:iplant-everyone /scratch -R
  3. Create a new directory in /scratch

    mkdir -p /scratch/2016_Campaign/HARV/L1/DiscreteLidar/
  4. Use the iget command to download files from the Data Store

iget -KPQbrvf /iplant/home/shared/NEON_data_institute_2018/2016_Campaign/HARV/L1/DiscreteLidar/ClassifiedLaz /scratch/2016_Campaign/HARV/L1/DiscreteLidar/ClassifiedLaz

In this example we are using the flags to:

-K verify the checksum
-P output the progress of the download.
-Q use RBUDP (datagram) protocol for the data transfer
-b bulk file transfer
-r recursive - retrieve subcollections
-v verbose
-f force - write local files even it they exist already (overwrite them)

Upload data to the CyVerse DataStore in Bash

  1. Use the iput command to upload files to the Data Store
iput -KPQbrvf /scratch/2016_Campaign/HARV/L1/DiscreteLidar/some_results /iplant/home/$USER/neon/results

Note, we are using the same flags as the iget statement above.

Download data from CyVerse DataStore with CyberDuck

After you've set up Cyberduck to access your CyVerse DataStore, you can click and drag and drop files to your localhost; or drag and drop files into a second CyberDuck window that is connected to another data source.

Note

Dragging and dropping data with Cyberduck will cause the data to be streamed down to your localhost and then uploaded back to the second remotehost. This will greatly reduce the speed with which you transfer files.

It is strongly suggested you use the Cyberduck CLI tool to move files between two remote data stores.

Jupyter Lab Google Drive Client

Google Drive will ask for some authentication through your browser with a token. After you authenticate you can view files in your Google Drive and move them onto the VM.

If you have any data on Google Drive, you can drag and drop them onto your VM.

Jupyter Lab iRODS Client

After you've authenticated to CyVerse, you will be able to view your data store files.

The Jupyter iRODS Client is not suitable for downloading hundreds of files, but it is useful for finding files and copying their URLs.


Fix or improve this documentation


Home_Icon Learning Center Home