NASA PODAAC AQUARIUS Satellite Data #236

nickrsan · 2017-01-25T07:20:06Z

ftp:/podaac-ftp.jpl.nasa.gov/allData/aquarius.

Suggested in a large email containing many urls

AdamBunn · 2017-01-26T16:24:20Z

I've started downloading this, but I don't have hosting at the moment

kenXengineering · 2017-01-27T23:25:54Z

I've started downloading aquarius as well, will host on Amazon Cloud Drive and IPFS.

astrobackup · 2017-01-28T13:27:06Z

Started downloading as well.
If anyone has finished it yet, can you create a torrent, adding the ftp as a web seed?

kenXengineering · 2017-01-30T17:34:55Z

Status Update: Downloaded 622 GB, have 481 GB to go. It looks like each file transfer is limited to about 700-800 KB/s, and with filezilla I have 10 going atm.

I don't know if I'll get this all added to ACD, will prob take a long time to upload(10 Mb upload). I will add it to IPFS and will work on making a torrent at the least.

bkirkbri · 2017-01-30T20:30:29Z

@chosenken @astrobackup @AdamBunn Thanks to all of you for your help. Please update when your mirrors are complete.

astrobackup · 2017-01-30T23:29:50Z

I am 200GB in. It is quite slow so far but the issue is on my side of the connection.

kenXengineering · 2017-02-02T08:00:33Z

My download completed, and I am now verifying the files. With this data set, they provided a md5 file for each data file, so I wrote a quick and dirty script to go through each folder and check the md5sum. This will take some time. Its already found a few files that failed (most likely from when I had to restart the transfer a few times). BTW, the data set clocks in at 1.07 TB. I don't know if I can get this fully uploaded to Amazon Cloud drive, I'm already at 3.4 TB with them. I will be adding it to IPFS though, and creating a torrent with some web seeds setup. Should have this done in a day or two, it takes a while to validate all the files.

bkirkbri · 2017-02-03T04:14:16Z

@chosenken Thank you! Please post hashdeep -rl ./podaac-ftp.jpl.nasa.gov/allData/aquarius output when you can.

bkirkbri · 2017-02-03T04:14:52Z

@astrobackup Still downloading?

astrobackup · 2017-02-03T08:25:35Z

@bkirkbri yes, I have 506GB on my disk now. It takes ages but that's mainly because I am downloading it on a samba drive and Filezilla only transfer a file there after it is downloaded. My current rate is around 80GB/day so hopefully ~10 days left.

kenXengineering · 2017-02-07T21:04:34Z

@bkirkbri I ran the hashdeep command on the data directory. It took quite a while to run, and generated a 100+ MB file with all the hashes. I've 7zip'ed it up and it is available here

As for adding the data to IPFS, I had to move my IPFS data directory off of my NAS as it was causing issues. It is still transferring to my local disk(it has close to a million files in it already). I have about 400+ GB added to IPFS, so I am about half way done. Once the transfer is complete I can start adding to IPFS once again. I am expecting about a week to get everything done (it took 3 days to get 400GB add). There is a know issue with IPFS that it will not scale well with 8000+ files, and the Aquarius dataset is at 567,203 files on my system. They have a possible fix, so I'm going to build off of that and see if it helps.

EDIT:

Ok, so I got tired on waiting for IPFS to finished, and my slow upload will take forever, so I just spun up a DO box to download all the data and upload to ACD. The data can be accessed here. Once it is completed I will post the updated hashdeep output.

kenXengineering · 2017-02-13T20:32:28Z

Data uploaded to Amazon Cloud Drive, can be accessed here. Still working on getting data added to IPFS.

bkirkbri · 2017-02-15T19:18:36Z

@astrobackup Were you able to get this one? Thanks!
@chosenken Thanks!

astrobackup · 2017-02-15T22:42:22Z

Still working on it, 740GB so far. Is there a way to rsync from ACD through the command line?

bkirkbri · 2017-02-15T23:35:00Z

@astrobackup Thanks for the update. Please update once your mirror completes. You are mirroring from the source, not from @chosenken's ACD, right? Just checking. In any case, it looks like there are two FUSE drivers for accessing ACD:

astrobackup · 2017-02-15T23:53:29Z

Yes, I am downloading from the source. I wanted to check what I already have with @chosenken data and thus do a rsync dry-run. But mounting that as a local drive means I'll need to download it anyway. I will work with the hashdeep file instead.

bkirkbri · 2017-02-16T00:27:28Z

@astrobackup great, thanks. There is definitely a need for a hashdeep comparison script. Someone on slack mentioned that.

kenXengineering · 2017-02-16T20:06:07Z

Ok, I got all the data added to IPFS. Took it about 4 days to get everything added. My IPFS data store is stored on a NAS, and the number of small files was giving it issues so it took a while. The root hash is Qmb86zba6KhGyfXP45fWn34WGt8CA6BufhjJGW816rxvKW. You can access it at

https://gateway.ipfs.io/ipfs/Qmb86zba6KhGyfXP45fWn34WGt8CA6BufhjJGW816rxvKW or

http://localhost:8080/ipfs/Qmb86zba6KhGyfXP45fWn34WGt8CA6BufhjJGW816rxvKW

if you are running IPFS locally. I captured the output into a text file, and it contains all the hashes for all the files. Its 24MB and extracts to 94MB. Note that the directory is 1.2 TB according to IPFS, so if you want to pin it you will need to configure IPFS with more storage (I think default is 10GB. I set mine to 2048GB).

For my next act I'm going to work on creating a torrent backed by IPFS (I think I can use ipfs as a HTTP source, will see on that).

NOTE; If you already have some of the data, and want to speed up pinning in IPFS, you can add it your self to IPFS and hopefully get the same hash back. As a reference, my folder structure was \podaac.jpl.nasa.gov\allData\aquarius, and I added the data with the command ipfs add -r podacc.jpl.nasa.gov. If you follow the same structure and add what data you have to IPFS, it should give the same hash(I hope).

kenXengineering · 2017-02-18T03:56:20Z

Ok, after a long day (this actually took like 6 hours to do so it kinda was) I have created 30 torrent files for the aquarius data set. Due to the number of files in the set, I was not able to create single all encompassing torrent. Instead I had to create torrents for specific directories, as some directories had over 100,000 files, which caused issues with the torrent software I was using. I found it could handle 40,000 files before it gave up.

Included in the zip is a read me, a script to check md5sums, and the output from hashdeep. It also has the root files (mainly text files) and software(sw) folder. With this one zip, you can recreate the entire data directory. All torrents should be backed by IPFS web seeds, though it is possible I may have messed up a URL on one or two. If that is the case, please let me know and I can fix them.

You can get the zip file here. The file is 41.6MB in size as the hashdeep output is quite large. Please let me know if you have any questions. I plan on seeding this for as long as I can. I can't guarantee 100% uptime, as I may need to stop uploads if my bandwidth is getting hit too hard, but I plan to run it for the foreseeable future. And since the torrents should be backed by IPFS, if it is cached on the gateway then you can still pull it.

kenXengineering · 2017-04-03T20:22:11Z

Hey, I just wanted to send out a quick update. I had an issue with IPFS and ended up having to reset everything on my machine. Luckly, adding the data back was easy as they have added a --nocopy flag, so IPFS doesn't create duplicate data files of the data, and instead just reads the data from the original files. I have reuploaded all the data and it is now under a new hash, QmcgqJRxgLJ5eUqZHeP5ftVQ1T5Y5eDnif4FkjV42dYXsm This contains all of aquarius and saral as of April 1st.

I'm also working on creating a new torrent with everything tar'd up instead of multiple torrent files. No one seemed to download the torrents yet so that shouldn't be an issue. Downside is I can't use IPFS as a web seed.

nickrsan added the No Mirrors label Jan 25, 2017

siennathesane changed the title ~~Dataset at ftp:/podaac-ftp.jpl.nasa.gov/allData/aquarius~~ NASA PODAAC AQUARIUS Satellite Data Jan 25, 2017

siennathesane added NASA PODAAC labels Jan 25, 2017

siennathesane added this to the January milestone Jan 25, 2017

kenXengineering mentioned this issue Jan 27, 2017

Dataset at ftp:/podaac-ftp.jpl.nasa.gov/allData/aquarius #187

Closed

bkirkbri added the In Progress label Jan 30, 2017

bkirkbri added Single Mirror and removed No Mirrors labels Feb 3, 2017

gabefair mentioned this issue Feb 9, 2017

NOAA NCEI Ocean Winds #277

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NASA PODAAC AQUARIUS Satellite Data #236

NASA PODAAC AQUARIUS Satellite Data #236

nickrsan commented Jan 25, 2017

AdamBunn commented Jan 26, 2017

kenXengineering commented Jan 27, 2017

astrobackup commented Jan 28, 2017 •

edited

Loading

kenXengineering commented Jan 30, 2017

bkirkbri commented Jan 30, 2017

astrobackup commented Jan 30, 2017

kenXengineering commented Feb 2, 2017

bkirkbri commented Feb 3, 2017

bkirkbri commented Feb 3, 2017

astrobackup commented Feb 3, 2017

kenXengineering commented Feb 7, 2017 •

edited

Loading

kenXengineering commented Feb 13, 2017

bkirkbri commented Feb 15, 2017

astrobackup commented Feb 15, 2017

bkirkbri commented Feb 15, 2017

astrobackup commented Feb 15, 2017

bkirkbri commented Feb 16, 2017

kenXengineering commented Feb 16, 2017

kenXengineering commented Feb 18, 2017

kenXengineering commented Apr 3, 2017

NASA PODAAC AQUARIUS Satellite Data #236

NASA PODAAC AQUARIUS Satellite Data #236

Comments

nickrsan commented Jan 25, 2017

AdamBunn commented Jan 26, 2017

kenXengineering commented Jan 27, 2017

astrobackup commented Jan 28, 2017 • edited Loading

kenXengineering commented Jan 30, 2017

bkirkbri commented Jan 30, 2017

astrobackup commented Jan 30, 2017

kenXengineering commented Feb 2, 2017

bkirkbri commented Feb 3, 2017

bkirkbri commented Feb 3, 2017

astrobackup commented Feb 3, 2017

kenXengineering commented Feb 7, 2017 • edited Loading

kenXengineering commented Feb 13, 2017

bkirkbri commented Feb 15, 2017

astrobackup commented Feb 15, 2017

bkirkbri commented Feb 15, 2017

astrobackup commented Feb 15, 2017

bkirkbri commented Feb 16, 2017

kenXengineering commented Feb 16, 2017

kenXengineering commented Feb 18, 2017

kenXengineering commented Apr 3, 2017

astrobackup commented Jan 28, 2017 •

edited

Loading

kenXengineering commented Feb 7, 2017 •

edited

Loading