-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-blocking autologin #46
Conversation
Logout not supported yet
Current coverage is
|
Hm, strange that this https://github.com/TeamHG-Memex/undercrawler/pull/46/files#diff-cd4de073722fd256d3d944b28a9c88baR95 is not covered, may be something fishy here, I was sure it must be covered... |
@@ -36,16 +37,21 @@ class AutologinMiddleware: | |||
- do not block event loop in login() method (instead, collect | |||
scheduled requests in a separate queue and make request with scrapy). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the docstring still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks! Fixed, I think now the only restriction is a single authorization domain per spider, and now it should be easier to relax if needed.
d2a838f
to
b2a0f63
Compare
I think it's ready now @kmike ! I tried to come up with a scrapy API that would simplify this case, but did not come up with anything good. One small thing that could simplify it is making |
cb3ca6d
to
794526f
Compare
else: | ||
self._enqueue(request) | ||
if self.waiting_for_login: | ||
raise IgnoreRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we solve it without dropping the request and maintaining a queue ourselves? Downloader middleware support returning Deferreds from process_request; see e.g. https://github.com/scrapy/scrapy/blob/master/scrapy/downloadermiddlewares/robotstxt.py implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks much nicer, I'll try, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tool a while to get used to such style, but at the end it's better, I think. The only gotcha is that at one point the traceback was incorrect, is it worth a bug report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, bad tracebacks worth a bug report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, filed scrapy/scrapy#1948
I make requests to autologin via normal scrapy requests and process responses in callbacks. Requests before login and during logout are queued and scheduled after login.