New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running pyspider to crawl https://phytozome.jgi.doe.gov/pz/portal.html, why doesn't it work? #526
Comments
Have you set up PhantomJS? |
I think I have. It is "fetch_type='js'". |
It's very tricky to debug javascript rendering, sometimes it just doesn't work in phantomjs. |
Thank you for your attention, binux., and how to deal with this issue? |
have you followed this http://docs.pyspider.org/en/latest/tutorial/Render-with-PhantomJS/ |
Thank you for your attention, cryptocxeq. I have followed that, it is the same result. |
You could try some other way like extract data from XHR requests. |
Ok, thanks. I'll try it. |
How to deal with Request payload? |
You mean post request payload? http://docs.pyspider.org/en/latest/apis/self.crawl/#data |
Ok, thanks. |
How to post request payload? The payload request data is '7|0|9|https://phytozome.jgi.doe.gov/pz/phytoweb/|586D31F87ED8E95E9B7E23B85764FF3C|org.jgi.phyto.client.service.KWSService|fetch|org.jgi.phyto.shared.CQuery/2500317377|1|0|1111111111111111111111111111111111111111111111111111|AUX/IAA|1|2|3|4|1|5|5|6|0|0|0|0|0|0|0|0|7|7|0|0|8|0|0|7|9|0|6|0|' in In Chromium Browser. It seems no effects. Code as follow: from pyspider.libs.base_handler import * class Handler(BaseHandler):
|
Thank you very much, binux. |
code as follow:
from pyspider.libs.base_handler import *
class Handler(BaseHandler):
crawl_config = {'headers': {
'Content-Type':'application/x-www-form-urlencoded',
'Accept':'/',
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'zh-CN,zh;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Content-Length':'295',
'X-Requested-With': 'XMLHttpRequest',
'Cookie':'__utmt=1; __utma=89664858.1557390068.1472454301.1472628080.1472628080.6; __utmb=89664858.3.10.1472628080; __utmc=89664858; __utmz=89664858.1472628080.5.5.utmcsr=sogou|utmccn=(organic)|utmcmd=organic|utmctr=phytozome',
'Host':'phytozome.jgi.doe.gov',
'Origin':'https://phytozome.jgi.doe.gov',
'Referer':'https://phytozome.jgi.doe.gov/pz/portal.html',
'User-Agent':'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36',
'X-GWT-Module-Base':'https://phytozome.jgi.doe.gov/pz/phytoweb/',
'X-GWT-Permutation':'80DA602CF8FBCB99E9D79278AD2DA616',
}
}
只能抓取到css
The text was updated successfully, but these errors were encountered: