We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
爬汽车之家的SUV车型时程序会报错,index out of range。 排查发现因为SUV是加密关键词,但是是个英文关键词所以没有URL转义。所以不能被正则抓取,导致字典长度少了3,所以在执行中索引会溢出字典导致错误。
例如
res = requests.get("http://car.autohome.com.cn/config/spec/1646.html") res.encoding = 'gb18030' item = get_params(res.text) print json.dumps(item, ensure_ascii=False, indent=4)
其中反混淆得到的Js如下,SUV作为前三个字符因为没有采用%xx的形式没被抓到。
SUV%E4%B8%87%E4%B8%AD%E4%BA%AC%E4%BB%B7%E4%BC%98%E4%BD%93%E4%BE%9B%E4%BF%9D%E5%85%83%E5%85%A8%E5%87%86%E5%87%91%E5%88%97%E5%88%B6%E5%89%8D%E5%8A%9B%E5%8A%9F%E5%8A%A8%E5%8A%A9%E5%8C%97%E5%8D%8E%E5%8E%8B%E5%8F%B7%E5%90%88%E5%90%8D%E5%90%8E%E5%90%B8%E5%95%86%E5%96%B7%E5%99%A8%E5%9C%B0%E5%9E%8B%E5%A4%87%E5%A4%9A%E5%A4%A7%E5%A4%AE%E5%AD%90%E5%AE%9A%E5%AE%9E%E5%AE%B9%E5%AE%BD%E5%AF%B8%E5%AF%BC%E5%B0%BA%E5%B7%AE%E5%B9%B4%E5%BA%A6%E5%BC%8F%E5%BC%B9%E5%BE%84%E5%BE%B7%E6%82%AC%E6%88%96%E6%89%AD%E6%89%BF%E6%8C%87%E6%8E%92%E6%95%B0%E6%95%B4%E6%9C%80%E6%9C%BA%E6%9D%86%E6%9E%84%E6%9E%B6%E6%A0%87%E6%A0%BC%E6%A2%B0%E6%AC%A7%E6%AF%94%E6%B0%94%E6%B2%B9%E6%B5%8B%E6%B6%B2%E7%82%B9%E7%84%B6%E7%87%83%E7%8B%AC%E7%8E%87%E7%8E%AF%E7%94%B5%E7%9B%96%E7%9B%98%E7%9F%A9%E7%A6%BB%E7%A7%AF%E7%A7%B0%E7%A8%8B%E7%A8%B3%E7%AB%8B%E7%AE%B1%E7%B0%A7%E7%B4%A7%E7%BB%BC%E7%BC%A9%E7%BC%B8%E7%BD%AE%E8%80%97%E8%83%8E%E8%87%AA%E8%93%9D%E8%A1%8C%E8%A7%84%E8%B1%AA%E8%B4%A8%E8%B7%9D%E8%BD%A6%E8%BD%AC%E8%BD%AE%E8%BD%B4%E8%BD%BD%E8%BF%9B%E8%BF%9E%E9%80%9A%E9%80%9F%E9%85%8D%E9%87%8F%E9%93%81%E9%93%9D%E9%95%BF%E9%97%A8%E9%97%B4%E9%9A%99%E9%9B%85%E9%A3%8E%E9%A9%B1%E9%A9%BB%E9%AB%98%E9%BC%93C%
我怀疑里面的英文字母也会有问题。建议把这个问题修一修,改一下正则。
The text was updated successfully, but these errors were encountered:
修复了一部分,但是纯英文的还没想好怎么弄。。
Sorry, something went wrong.
好像还是没解决
No branches or pull requests
爬汽车之家的SUV车型时程序会报错,index out of range。
排查发现因为SUV是加密关键词,但是是个英文关键词所以没有URL转义。所以不能被正则抓取,导致字典长度少了3,所以在执行中索引会溢出字典导致错误。
例如
其中反混淆得到的Js如下,SUV作为前三个字符因为没有采用%xx的形式没被抓到。
我怀疑里面的英文字母也会有问题。建议把这个问题修一修,改一下正则。
The text was updated successfully, but these errors were encountered: