-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NASA PODAAC AQUARIUS Satellite Data #236
Comments
I've started downloading this, but I don't have hosting at the moment |
I've started downloading aquarius as well, will host on Amazon Cloud Drive and IPFS. |
Started downloading as well. |
Status Update: Downloaded 622 GB, have 481 GB to go. It looks like each file transfer is limited to about 700-800 KB/s, and with filezilla I have 10 going atm. I don't know if I'll get this all added to ACD, will prob take a long time to upload(10 Mb upload). I will add it to IPFS and will work on making a torrent at the least. |
@chosenken @astrobackup @AdamBunn Thanks to all of you for your help. Please update when your mirrors are complete. |
I am 200GB in. It is quite slow so far but the issue is on my side of the connection. |
My download completed, and I am now verifying the files. With this data set, they provided a |
@chosenken Thank you! Please post |
@astrobackup Still downloading? |
@bkirkbri yes, I have 506GB on my disk now. It takes ages but that's mainly because I am downloading it on a samba drive and Filezilla only transfer a file there after it is downloaded. My current rate is around 80GB/day so hopefully ~10 days left. |
@bkirkbri I ran the As for adding the data to IPFS, I had to move my IPFS data directory off of my NAS as it was causing issues. It is still transferring to my local disk(it has close to a million files in it already). I have about 400+ GB added to IPFS, so I am about half way done. Once the transfer is complete I can start adding to IPFS once again. I am expecting about a week to get everything done (it took 3 days to get 400GB add). There is a know issue with IPFS that it will not scale well with 8000+ files, and the Aquarius dataset is at 567,203 files on my system. They have a possible fix, so I'm going to build off of that and see if it helps. EDIT: Ok, so I got tired on waiting for IPFS to finished, and my slow upload will take forever, so I just spun up a DO box to download all the data and upload to ACD. The data can be accessed here. Once it is completed I will post the updated |
Data uploaded to Amazon Cloud Drive, can be accessed here. Still working on getting data added to IPFS. |
@astrobackup Were you able to get this one? Thanks! |
Still working on it, 740GB so far. Is there a way to rsync from ACD through the command line? |
@astrobackup Thanks for the update. Please update once your mirror completes. You are mirroring from the source, not from @chosenken's ACD, right? Just checking. In any case, it looks like there are two FUSE drivers for accessing ACD: |
Yes, I am downloading from the source. I wanted to check what I already have with @chosenken data and thus do a rsync dry-run. But mounting that as a local drive means I'll need to download it anyway. I will work with the hashdeep file instead. |
@astrobackup great, thanks. There is definitely a need for a hashdeep comparison script. Someone on slack mentioned that. |
Ok, I got all the data added to IPFS. Took it about 4 days to get everything added. My IPFS data store is stored on a NAS, and the number of small files was giving it issues so it took a while. The root hash is https://gateway.ipfs.io/ipfs/Qmb86zba6KhGyfXP45fWn34WGt8CA6BufhjJGW816rxvKW or http://localhost:8080/ipfs/Qmb86zba6KhGyfXP45fWn34WGt8CA6BufhjJGW816rxvKW if you are running IPFS locally. I captured the output into a text file, and it contains all the hashes for all the files. Its 24MB and extracts to 94MB. Note that the directory is 1.2 TB according to IPFS, so if you want to pin it you will need to configure IPFS with more storage (I think default is 10GB. I set mine to 2048GB). For my next act I'm going to work on creating a torrent backed by IPFS (I think I can use ipfs as a HTTP source, will see on that). NOTE; If you already have some of the data, and want to speed up pinning in IPFS, you can add it your self to IPFS and hopefully get the same hash back. As a reference, my folder structure was |
Ok, after a long day (this actually took like 6 hours to do so it kinda was) I have created 30 torrent files for the aquarius data set. Due to the number of files in the set, I was not able to create single all encompassing torrent. Instead I had to create torrents for specific directories, as some directories had over 100,000 files, which caused issues with the torrent software I was using. I found it could handle 40,000 files before it gave up. Included in the zip is a read me, a script to check md5sums, and the output from hashdeep. It also has the root files (mainly text files) and software(sw) folder. With this one zip, you can recreate the entire data directory. All torrents should be backed by IPFS web seeds, though it is possible I may have messed up a URL on one or two. If that is the case, please let me know and I can fix them. You can get the zip file here. The file is 41.6MB in size as the hashdeep output is quite large. Please let me know if you have any questions. I plan on seeding this for as long as I can. I can't guarantee 100% uptime, as I may need to stop uploads if my bandwidth is getting hit too hard, but I plan to run it for the foreseeable future. And since the torrents should be backed by IPFS, if it is cached on the gateway then you can still pull it. |
Hey, I just wanted to send out a quick update. I had an issue with IPFS and ended up having to reset everything on my machine. Luckly, adding the data back was easy as they have added a I'm also working on creating a new torrent with everything tar'd up instead of multiple torrent files. No one seemed to download the torrents yet so that shouldn't be an issue. Downside is I can't use IPFS as a web seed. |
ftp:/podaac-ftp.jpl.nasa.gov/allData/aquarius.
Suggested in a large email containing many urls
The text was updated successfully, but these errors were encountered: