Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception on autohome SUV [FIXED] #3

Open
Vonng opened this issue Aug 17, 2017 · 2 comments
Open

Exception on autohome SUV [FIXED] #3

Vonng opened this issue Aug 17, 2017 · 2 comments

Comments

@Vonng
Copy link

Vonng commented Aug 17, 2017

爬汽车之家的SUV车型时程序会报错,index out of range。
排查发现因为SUV是加密关键词,但是是个英文关键词所以没有URL转义。所以不能被正则抓取,导致字典长度少了3,所以在执行中索引会溢出字典导致错误。

例如

res = requests.get("http://car.autohome.com.cn/config/spec/1646.html")
res.encoding = 'gb18030'
item = get_params(res.text)
print json.dumps(item, ensure_ascii=False, indent=4)

其中反混淆得到的Js如下,SUV作为前三个字符因为没有采用%xx的形式没被抓到。

SUV%E4%B8%87%E4%B8%AD%E4%BA%AC%E4%BB%B7%E4%BC%98%E4%BD%93%E4%BE%9B%E4%BF%9D%E5%85%83%E5%85%A8%E5%87%86%E5%87%91%E5%88%97%E5%88%B6%E5%89%8D%E5%8A%9B%E5%8A%9F%E5%8A%A8%E5%8A%A9%E5%8C%97%E5%8D%8E%E5%8E%8B%E5%8F%B7%E5%90%88%E5%90%8D%E5%90%8E%E5%90%B8%E5%95%86%E5%96%B7%E5%99%A8%E5%9C%B0%E5%9E%8B%E5%A4%87%E5%A4%9A%E5%A4%A7%E5%A4%AE%E5%AD%90%E5%AE%9A%E5%AE%9E%E5%AE%B9%E5%AE%BD%E5%AF%B8%E5%AF%BC%E5%B0%BA%E5%B7%AE%E5%B9%B4%E5%BA%A6%E5%BC%8F%E5%BC%B9%E5%BE%84%E5%BE%B7%E6%82%AC%E6%88%96%E6%89%AD%E6%89%BF%E6%8C%87%E6%8E%92%E6%95%B0%E6%95%B4%E6%9C%80%E6%9C%BA%E6%9D%86%E6%9E%84%E6%9E%B6%E6%A0%87%E6%A0%BC%E6%A2%B0%E6%AC%A7%E6%AF%94%E6%B0%94%E6%B2%B9%E6%B5%8B%E6%B6%B2%E7%82%B9%E7%84%B6%E7%87%83%E7%8B%AC%E7%8E%87%E7%8E%AF%E7%94%B5%E7%9B%96%E7%9B%98%E7%9F%A9%E7%A6%BB%E7%A7%AF%E7%A7%B0%E7%A8%8B%E7%A8%B3%E7%AB%8B%E7%AE%B1%E7%B0%A7%E7%B4%A7%E7%BB%BC%E7%BC%A9%E7%BC%B8%E7%BD%AE%E8%80%97%E8%83%8E%E8%87%AA%E8%93%9D%E8%A1%8C%E8%A7%84%E8%B1%AA%E8%B4%A8%E8%B7%9D%E8%BD%A6%E8%BD%AC%E8%BD%AE%E8%BD%B4%E8%BD%BD%E8%BF%9B%E8%BF%9E%E9%80%9A%E9%80%9F%E9%85%8D%E9%87%8F%E9%93%81%E9%93%9D%E9%95%BF%E9%97%A8%E9%97%B4%E9%9A%99%E9%9B%85%E9%A3%8E%E9%A9%B1%E9%A9%BB%E9%AB%98%E9%BC%93C%

我怀疑里面的英文字母也会有问题。建议把这个问题修一修,改一下正则。

@dytttf
Copy link
Owner

dytttf commented Aug 19, 2017

修复了一部分,但是纯英文的还没想好怎么弄。。

@xbc922
Copy link

xbc922 commented Nov 8, 2017

好像还是没解决

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants