Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

添加了 geonodedaili.py 爬取代理 #186

Merged
merged 2 commits into from
Mar 1, 2023
Merged

Conversation

MGMCN
Copy link
Contributor

@MGMCN MGMCN commented Feb 27, 2023

您好,我是一名在日留学生,最近需要用到一些代理ip服务,于是找到了您的库。
我贡献了一个新的爬取网站,资源很全。
但是我在我自己写的那个类里重写了crawl和fetch方法,因为需要修改爬取间隔时长,同时想传递比较准确的header的参数。(这个网站的代理资源几乎都可用,同时爬起来也很容易被检测到,所以需要修改请求时间间隔和header。)

@MGMCN
Copy link
Contributor Author

MGMCN commented Feb 27, 2023

或者说我是不是可以直接把header通过crawl函数调用的fetch函数传递进去?这样就不用重写fetch函数了。
期待您的回复。

@MGMCN
Copy link
Contributor Author

MGMCN commented Feb 28, 2023

我在日本这边试了下time.sleep(.5)也能一个不漏爬完,但是不知道中国国内访问速度怎样。

@Germey
Copy link
Member

Germey commented Mar 1, 2023

多谢 @MGMCN 您的贡献,因为本身 crawl 函数预留的功能有限,比如增加时延等功能,所以我觉得重写一下也 make sense 的。我运行了下,感觉挺不错的,非常有用!Merge 了

@Germey Germey merged commit 4c50711 into Python3WebSpider:master Mar 1, 2023
@MGMCN MGMCN deleted the MGMCN branch March 3, 2023 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants