Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme of climate-mirror/datasets needs updating #278

Open
gabefair opened this issue Jan 25, 2017 · 7 comments
Open

Readme of climate-mirror/datasets needs updating #278

gabefair opened this issue Jan 25, 2017 · 7 comments
Milestone

Comments

@gabefair
Copy link
Collaborator

gabefair commented Jan 25, 2017

https://github.com/climate-mirror/datasets is missing infomation about the wget commands we are using and needs updating in general

@gabefair gabefair reopened this Jan 25, 2017
@siennathesane
Copy link
Contributor

feel free to submit a PR. 👍

@siennathesane siennathesane modified the milestone: January Jan 25, 2017
@jetbalsa
Copy link

a Torrent RSS Feed managed by the maintainers would be nice, Would allow for some Auto Seeding of data to get more data spread out

@baobrien
Copy link

What wget commands are you guys using?

@gabefair
Copy link
Collaborator Author

gabefair commented Jan 26, 2017

@baobrien , Here is an example for http crawls:
wget --mirror --warc-file=www.bvo-dmo.org.warc --warc-cdx
--page-requisites --html-extension --convert-links
--execute robots=off --directory-prefix=. --span-hosts
--domains=bco-dmo.org,usjgofs.whoi.edu
--exclude-domains=mapservice.bco-dmo.org
--user-agent='Mozilla (mailto:flyingmana@googlemail.com)'
--wait=10 --random-wait http://www.bco-dmo.org/data

@gabefair
Copy link
Collaborator Author

wget -N -m /*

@wantonwonton
Copy link

Some notes on wget options:

-r/--recursive - The maximum number of levels defaults to 5!

-l/--level - Specifies how deep the recursion should go. You can specify "inf" for infinite recursion.

-N/--timestamping - For files previously downloaded, downloads them again if the remote file timestamp has changed.

-m/--mirror - Equivalent to -r -N -l inf --no-remove-listing. (The last option keeps .listing files, which contain the raw directory listings from the FTP server.)

-c/--continue - Treats each previously downloaded file as possibly incomplete and requests downloading any data past the end of the file (if the server supports it). This is good for resuming the download of a single large file where the download was interrupted (as long as the file has not changed). For files which have changed, unless the changes are only appended to the ends of the files, this option could result in a corrupted files (by combining the first half of a file from a previous download with a second half that doesn't match the first half).

@nickrsan
Copy link
Member

Agreed with mxplusb - we'll definitely act on a Pull Request that improves any of the documentation, including one on specific commands to run and tools to use. It'd probably be best to make a new markdown file and reference it in the main readme as a table of contents, but whatever you submit that improves it is welcomed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants