-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trouble with wikis with more than 500 pages #32
Comments
|
to give you an idea, in 10 minutes i was able to get ~1000 revisions out of the ~3 million revisions. that is 0.03%. at this rate, it would take 20 days to get a clone of the wiki. :) |
|
oh and oops: all the revisions is actually between 1GB and 5GB, not 80MB: that's only the latest revisions! https://dumps.wikimedia.org/enwikivoyage/20151201/ so yeah, definitely need to go through a shallow copy, and need to overcome the 500 pages limitation. |
|
and here's a patch: note that it may seem to hang for large wiki... i thought of doing a progress bar, but couldn't get |
|
Patch added to Gentoo git patchset, thanks! |
|
i pushed this on https://github.com/anarcat/git/tree/large-wikis |
|
filed the patch as #52. |
it seems virtually impossible to fetch all the pages from wikitravel. the wiki is not that large (dumps are 75MB) so it should be possible to fetch all the changes. however, getting all the revisions (over 2M revisions!) seems to be a little prohibitive:
notice here how
remote.origin.shallow=trueis not having any effect.. maybe that combination should be an error, but that's another thing.trying just shallow gets only 500 pages, probably the API limit of mediawiki:
could there be a hack similar to #16 to get all the pages?
The text was updated successfully, but these errors were encountered: