GitHub - Angelina354/finegrained-halu-tw: Dataset & evaluation pipeline for hallucination detection in zh-tw

Angelina354 / finegrained-halu-tw Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

Dataset & evaluation pipeline for hallucination detection in zh-tw

0 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
outs		outs
7b.py		7b.py
8b.py		8b.py
README.md		README.md
breeze.py		breeze.py
gpt4-judge.py		gpt4-judge.py
score.py		score.py

Repository files navigation

Fine-Grained Halu Taiwan

Step 0. Dataset (放在 HuggingFace)

Step 1. 交給模型偵錯

可參考各模型對應的 {modelname}.py，依所需更改輸入輸出檔名和 HuggingFace 資料 & 模型的 access token
輸出存於 outs/{modelname}-res.pkl，格式為 [{第一篇回應}, {第二篇回應}, ...]

Step 2. 評估模型表現 (gpt-4)

python gpt4-judge.py {modelname} (7b, 8b or breeze)
輸出存於 outs/eval-{modelname}.txt，格式如下

[Label]
Yes/No
[Label]
Yes/No
...

=====
(下一篇的判斷)

也會直接算出每類錯誤 & 整體的正確率

Backup. 計算分數

可單就先前儲存的 GPT-4 輸出結果 (outs/eval-{modelname}.txt) 重新算分
python score.py {modelname} (7b, 8b or breeze)

About

Dataset & evaluation pipeline for hallucination detection in zh-tw

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%