Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fineTune Model #26

Closed
KuoFengYu opened this issue Apr 13, 2023 · 3 comments
Closed

fineTune Model #26

KuoFengYu opened this issue Apr 13, 2023 · 3 comments
Assignees
Labels
Priority: Trivial least priority Status: 1-Assigned assigned an assignee Type: Question ask question

Comments

@KuoFengYu
Copy link

KuoFengYu commented Apr 13, 2023

你好,
我想請問若要fine-tune以下ws ,pos, ner 的model,
ckiplab/bert-base-chinese-ws
ckiplab/bert-base-chinese-pos
ckiplab/bert-base-chinese-ner

依照例子透過huggingFace上的run_ner.py 來執行,去置換model_name_or_path成以上三個 model來源來做訓練,
那這樣我在fine-tune這三種model時,我的訓練的data標記是只能有 B 跟 I 嗎? 不能額外標註類型嗎,例如 "B-PRODUCT", "I-PRODUCT" 的這種方式嗎? 也不能有O嗎? 因為我看先前的issue提問說是用B、I。

謝謝

@emfomy emfomy self-assigned this Apr 13, 2023
@emfomy emfomy added Priority: Medium third priority Status: 1-Assigned assigned an assignee Type: Question ask question labels Apr 13, 2023
@emfomy
Copy link
Member

emfomy commented Apr 13, 2023

訓練標記可以是其他的值,像 NER 的 Label 就有 B-PRODUCT, I-PRODUCT(詳見 https://huggingface.co/ckiplab/bert-base-chinese-ner/blob/main/config.json#L11)

@emfomy emfomy added Priority: Trivial least priority and removed Priority: Medium third priority labels Apr 13, 2023
@KuoFengYu
Copy link
Author

KuoFengYu commented Apr 14, 2023

你好,謝謝回答
那我看 https://huggingface.co/ckiplab/bert-base-chinese-ws/blob/main/config.json
bert-base-chinese-ws訓練資料標記只有B與I (再先前也有人提問過的issue中的回答),那針對訓練資料中非實體的"O" 的部份需要怎麼處理,會是直接將O的部分拿掉嗎?

@emfomy
Copy link
Member

emfomy commented Apr 14, 2023

bert-base-chinese-ws 是斷詞而非實體辨識模型,所以訓練標記中沒有 "O" 的部分。

請根據你的資料性質選擇適合的模型訓練。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Trivial least priority Status: 1-Assigned assigned an assignee Type: Question ask question
Projects
None yet
Development

No branches or pull requests

2 participants