Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCDC FTP Site #112

Open
nickrsan opened this issue Jan 12, 2017 · 14 comments
Open

NCDC FTP Site #112

nickrsan opened this issue Jan 12, 2017 · 14 comments

Comments

@nickrsan
Copy link
Member

nickrsan commented Jan 12, 2017

Name: NOAA Full NCDC Site
Organization: NOAA NCDC
Description URL:
Download URL: https://www1.ncdc.noaa.gov/pub/data/
File Types:
Size:
Status: In progress - mirroring by non-GitHub user
NCEI pub data mirrored by Azimuth Project: https://bitbucket.org/azimuth-backup/azimuth-inventory/issues/40/noaa-ncei-complete-pub-directory

@bkirkbri
Copy link
Collaborator

bkirkbri commented Jan 12, 2017

nClimGrid subset is issue #116

ftp://ftp.ncdc.noaa.gov/pub/data/climgrid/

@bkirkbri
Copy link
Collaborator

bkirkbri commented Jan 12, 2017

Local Climatological Data (LCD) subset is issue #117

ftp://ftp.ncdc.noaa.gov/pub/data/lcd/

@bkirkbri
Copy link
Collaborator

bkirkbri commented Jan 12, 2017

Paleoclimatology subset is issue #17

ftp://ftp.ncdc.noaa.gov/pub/data/paleo/

@ghost
Copy link

ghost commented Jan 17, 2017

We received a report that the https://www1.ncdc.noaa.gov/pub/ is 12 Tb. We grabbed 620 Gb, but don't know the source, one Nick Gregory, and why he would know the size. Can anyone vouch for that number?

@ghost
Copy link

ghost commented Jan 17, 2017

I just did a full directory walk of that ncdc.noaa.gov/pub, and got 29.620 Tb, 1325435 files, and 11686 folders. Thanks for any help people intended.

We cannot do all of this. Is there some sense to dividing it up? Please advise to climate -at- mm -dot- st. Thanks!

@bkirkbri
Copy link
Collaborator

I posted some subsets above. I agree it's best to break up what's left. I can claim some of them. Do you have sizes for top-level directories?

Thanks!

@ghost
Copy link

ghost commented Jan 17, 2017

I can get these tomorrow. Tracking as Azimuth Backup Kickstarter Project Issue #77.

@ghost
Copy link

ghost commented Jan 18, 2017

I am awaiting the /pub/data total but here, in the interim, is what I have. It's been running since mid-afternoon.

Note this is probably a lower bound. I received a number of 500 error codes during the run of the du against these directories, and, so, there were files whose sizes were missed.
I will update when I have the final. The number above for /pub/data was another 30 Tb, but we'll see.
noaa-ncdc-ncei-ftp-subdir-sizes-2017-01-17_170639

@mejackreed
Copy link

I can potentially grab some. What is left?

@ghost
Copy link

ghost commented Jan 20, 2017

The FTP site remains in a "being copied" state. That said, it is not clear exactly where we are. We do have 3.9 Tb of it.

@mejackreed
Copy link

Ok, let me know if you need me to grab anything specific.

@ghost
Copy link

ghost commented Jan 20, 2017

@mejackreed I think someone should make a run at Climate Mirror issue #42. No one as far as I know has even started it. We made a start, but its really incomplete, and the server does not always cooperate. I don't know if we are being throttled or what. I was/am trying:

wget -N -c --dns-timeout=10 --connect-timeout=300 --read-timeout=120 --wait=5 --mirror -e robots=off --random-wait --page-requisites --retry-connrefused --prefer-family=IPv4 --tries=40 --timestamping=on --recursive --level=8 --no-remove-listing --follow-ftp -nv --mirror --append-output=daac-ornl-gov-get-data.log --no-check-certificate https://daac.ornl.gov/

@JeremiahCurtis
Copy link

Sorry...I'm a newcomer here (been writing a book that is taking some time), but was just wondering if anyone is working on ftp://ftp.nodc.noaa.gov/pub/? thanks

@siennathesane siennathesane modified the milestone: January Progress Jan 25, 2017
@JeremiahCurtis
Copy link

I'm willing to grab whatever is needed to get a complete mirror if anyone has any idea where we stand on this....thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants