-
Notifications
You must be signed in to change notification settings - Fork 47
[WIP] Add "page_limit" argument to harvesters #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[WIP] Add "page_limit" argument to harvesters #368
Conversation
addresses [#SHARE-50]
|
I wonder if it might be better to specify the number of times to follow links? That would allow us to choose the minimum number of jumps for our other harvesters to pass |
|
I like that idea. I'll change it so it does that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may have different behavior in python3 (division operator semantics changed slightly, IIRC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I would put this check as a condition of the while loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# python3
1 / 2 # 0.5
1 // 2 # 0
# python 2
1 / 2 # 0
1 // 2 # 0
from __future__ import division
1 / 2 # 0.5
1 // 2 # 0There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
Add arg that will grab only a certain number of pages of results. This will help for both local development and test generation, as we can limit the number of results that the test harvesters harvest, and local harvesters take in.