geosite-gfw

本项目是一个机器学习训练http内容二分类的pytorch 项目。

先利用 geosite 的网站列表下载数据，然后用 bert 微调的方式训练模型。

提供了用于识别的 api。

模型下载地址在 https://huggingface.co/e1732a364fed/geosite-gfw/tree/main

下载两个zip文件后解压到项目中即可

curl -LO "https://huggingface.co/e1732a364fed/geosite-gfw/resolve/main/bert_geosite_by_body.zip?download=true"
curl -LO "https://huggingface.co/e1732a364fed/geosite-gfw/resolve/main/bert_geosite_by_head.zip?download=true"

tar -xf bert_geosite_by_body.zip
tar -xf bert_geosite_by_head.zip

下载好模型后就可直接跳到下面第三步进行预测了

本项目已在 ruci 代理项目中使用 :geosite_gfw

Steps

install requirements

pip install transformers numpy scikit-learn flask requests
pip install torch --index-url https://download.pytorch.org/whl/cu124

download bert-base-multilingual-cased from huggingface, store the files in ./bert/ folder

if you are not using nvidia gpu, you may obmit the --index-url parameter.

1. pull geosite response with `python pull.py`

you can set the read list by python pull.py -l proxy-list.txt

the project contains 2 list from https://github.com/Loyalsoldier/v2ray-rules-dat

but maybe dated. You can use your own list file.

2. train model with

python classify.py --mode=train_head
python classify.py --mode=train_body

it will generate the trained model file.

you can set the ok and ban dir by --ok_dir and --ban_dir

3. predict with

python classify.py --mode predict_head --text "Your input text here"
python classify.py --mode predict_body --text "Your input text here"

You can download pretrained model files instead of training own your own.

4. serve api with

python classify.py --mode serve_api --port 5134

5. request

predict by passing the data

curl -X POST http://localhost:5134/predict \
    -H "Content-Type: application/json" \
    -d '{"text": "your website http response body", "model_name": "body"}'

response:

{
  "result": "ok"
}

or

{
  "result": "ban"
}

check by passing the domain

curl -X POST http://localhost:5134/check \
    -H "Content-Type: application/json" \
    -d '{"domain": "www.baidu.com"}'

for more arguments and options, see the source code

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE-APACHE		LICENSE-APACHE
README.md		README.md
china-list.txt		china-list.txt
classify.py		classify.py
proxy-list.txt		proxy-list.txt
pull.py		pull.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

geosite-gfw

Steps

1. pull geosite response with `python pull.py`

2. train model with

3. predict with

4. serve api with

5. request

predict by passing the data

check by passing the domain

About

Releases

Packages

Languages

License

e1732a364fed/geosite-gfw

Folders and files

Latest commit

History

Repository files navigation

geosite-gfw

Steps

1. pull geosite response with python pull.py

2. train model with

3. predict with

4. serve api with

5. request

predict by passing the data

check by passing the domain

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. pull geosite response with `python pull.py`

Packages