Skip to content

Sorted proxy list in order to do crawling without Ip block

Notifications You must be signed in to change notification settings

DavidKimDY/available-proxy

Repository files navigation

Available-Proxy

Proxy list sorted by under 30s of runtime

test website

https://example.com
http://example.com

tree

├── proxy-list
├── get_test.py
├── ip_list.py
├── item_code.json
├── krx_sweeper.py
├── success_http_ip.txt
├── success_https_ip.txt
├── valid_http.txt
├── valid_https.txt
└── valid_ip.py

proxy-list

https://github.com/clarketm/proxy-list

Setup

git clone https://github.com/DavidKimDY/available-proxy.git

Usage

cd proxy_crawling/proxy-list
git pull
cd ..
python available_proxy.py 

files

success_http[s].txt : Ip with Success mark in proxy-list/proxy-list-status.txt
valid_http[s].txt : Ip passed test using `requests.get(url, runtime=30)

Todo

Asynchronize Avoid using pickle

About

Sorted proxy list in order to do crawling without Ip block

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages