[BUG] #104

danaekdml · 2023-05-09T09:31:00Z

🐛 Bug

No module named 'kobert'

To Reproduce

from kobert.utils import get_tokenizer
from kobert.pytorch_kobert import get_pytorch_kobert_model

이거를 돌리려 할 때
ModuleNotFoundError Traceback (most recent call last)
in <cell line: 2>()
1 #kobert
----> 2 from kobert.utils import get_tokenizer
3 from kobert.pytorch_kobert import get_pytorch_kobert_model

ModuleNotFoundError: No module named 'kobert'
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

이러한 에러 발생
gluonnlp 버전 문제를 해결하니 kobert 문제 발생

버그를 재현하기 위한 재현절차를 작성해주세요.

Expected behavior

Environment

Additional context

hwangsaeyeon · 2023-05-09T17:09:45Z

저도 같은 오류가 발생하여 hugging face를 이용하는 방법으로 해결했습니다.
hugging face
에 있는 코드를 활용하였고, BERTSentenceTransform에서 한번 더 오류가 발생하는데 이부분은 py파일에서 class를 복붙하여 코드를 수정하는 방법으로 해결했습니다.

blog
코드가 길어서 블로그에 작성했는데 참고하셔도 좋을 것 같습니다.

kibeomi · 2023-05-11T02:01:35Z

hugging face 와 blog
를 참고해서 다시 돌려보는데 정확도 수치가 0.17로 너무 낮게 나오네요. 뭐가 문제일까요?
hugging face로 하기 전에는 정확도가 0.56정도 였는데 이상하네요

!pip install mxnet
!pip install gluonnlp==0.8.0
!pip install tqdm pandas
!pip install sentencepiece
!pip install transformers
!pip install torch
!pip install 'git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf'

from kobert_tokenizer import KoBERTTokenizer
from transformers import BertModel

import torch
from torch import nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import gluonnlp as nlp
import numpy as np
from tqdm.notebook import tqdm
from transformers import AdamW
from transformers.optimization import get_cosine_schedule_with_warmup

tokenizer = KoBERTTokenizer.from_pretrained('skt/kobert-base-v1')
bertmodel = BertModel.from_pretrained('skt/kobert-base-v1', return_dict=False)
vocab = nlp.vocab.BERTVocab.from_sentencepiece(tokenizer.vocab_file, padding_token='[PAD]')

tok = tokenizer.tokenize
이 중에 잘못된 부분이 있을까요?

AbdirayimovS · 2023-05-11T03:19:05Z

I have the same issue!
I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :(
In addition, I could not use tranformers library to download the KoBERT.

kibeomi · 2023-05-11T04:47:17Z

BERTSentenceTransform 클래스 선언할때, 19번째 줄 부분에

tokens_a = self._tokenizer(text_a) 대신
tokens_a = self._tokenizer.tokenize(text_a) 로 수정해야 모델이 제대로 돌아가는것 같습니다...
[출처] No module named 'kobert' 에러 해결|작성자 yeon
이 댓글대로 하니 정확도가 다시 높아졌어요

siyeol97 · 2023-05-11T15:18:47Z

text_a = '한국어 모델을 공유합니다.'
tokens_1 = tokenizer.tokenize(text_a)
tokens_2 = tokenizer(text_a)
print(tokens_1, type(tokens_1))
print(tokens_2, type(tokens_2))

output : 
tokens_1 : ['▁한국', '어', '▁모델', '을', '▁공유', '합니다', '.'] <class 'list'>
tokens_2 : {'input_ids': [2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]} <class 'transformers.tokenization_utils_base.BatchEncoding'>

BERTSentenceTransform 클래스에서,
tokens_a = self._tokenizer(text_a)
기존 코드대로 실행하면, tokens_a 는 위의 tokens_2 형식이 됩니다. 결국 token과 input_ids는

token = [[CLS], 'input_ids', 'token_type_ids', 'attention_mask', [SEP]]
input_ids = [2, 0, 0, 0, 3, 1, 1, 1, 1, ...]

input_ids 가 dataset 길이만큼 전부 같은 형식으로 바뀌어 정확도가 매우 낮아지는 겁니다.

그리고 만약

tok = tokenizer.tokenize
data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False)
data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False)

tok = tokenizer.tokenize 로 변환해 BERTSentenceTransform 클래스에 넣으면, 아마도 convert_tokens_to_ids를 불러올 수 없다는 오류가 나올겁니다.
tokenizer.tokenize로 변환시켜서 넣지말고, BERTSentenceTransform 내부에서

#tokens_a = self._tokenizer(text_a) 
tokens_a = self._tokenizer.tokenize(text_a) #수정

직접 코드를 수정하면 오류가 없을거에요

kibeomi · 2023-05-12T01:04:10Z

설명 감사합니다!

AbdirayimovS · 2023-05-13T23:49:48Z

I have the same issue! I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :( In addition, I could not use tranformers library to download the KoBERT.

I installed it with transformers! Use python10 and the gluonnlp !=0.10.0

cwoonb · 2023-05-16T06:30:25Z

@AbdirayimovS Could you please share some library installation code?

kibeomi · 2023-08-02T06:26:36Z

data_train = BERTDataset(dataset_train, 0, 1, tokenizer,vocab, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tokenizer, vocab, max_len, True, False) 코드를 이렇게 바꿔보세요

…

-----Original Message----- From: "Yeongseo ***@***.***> To: ***@***.***>; Cc: ***@***.***>; ***@***.***>; Sent: 2023-07-31 (월) 18:26:25 (GMT+09:00) Subject: Re: [SKTBrain/KoBERT] [BUG] (Issue #104) data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False) 이 코드를 실행하면 ---> 80 tokens_a = self._tokenizer.tokenize(text_a) 81 tokens_b = None 82 AttributeError: 'function' object has no attribute 'tokenize' 이런 에러가 뜹니다.. 왜그러는 거죠? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

JWWPXX · 2023-12-28T11:58:20Z

I wonder if there are any contradictions in this installation dependencies

JWWPXX · 2023-12-28T11:59:52Z

when I ues pip install git+https://git@github.com/SKTBrain/KoBERT.git@master
it shows
ERROR: Cannot install kobert because these package versions have conflicting dependencies.

The conflict is caused by:
onnxruntime 1.8.0 depends on numpy>=1.16.6
gluonnlp 0.6.0 depends on numpy
mxnet 1.4.0.post0 depends on numpy<1.15.0 and >=1.8.2
onnxruntime 1.8.0 depends on numpy>=1.16.6
gluonnlp 0.6.0 depends on numpy
mxnet 1.4.0 depends on numpy<1.15.0 and >=1.8.2

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

plesae tell how to deal with it
thank you!

danaekdml added the bug Something isn't working label May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] #104

[BUG] #104

danaekdml commented May 9, 2023 •

edited

hwangsaeyeon commented May 9, 2023 •

edited

kibeomi commented May 11, 2023

AbdirayimovS commented May 11, 2023

kibeomi commented May 11, 2023

siyeol97 commented May 11, 2023 •

edited

kibeomi commented May 12, 2023

AbdirayimovS commented May 13, 2023

cwoonb commented May 16, 2023

kibeomi commented Aug 2, 2023 via email

JWWPXX commented Dec 28, 2023

JWWPXX commented Dec 28, 2023

[BUG] #104

[BUG] #104

Comments

danaekdml commented May 9, 2023 • edited

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

hwangsaeyeon commented May 9, 2023 • edited

kibeomi commented May 11, 2023

AbdirayimovS commented May 11, 2023

kibeomi commented May 11, 2023

siyeol97 commented May 11, 2023 • edited

kibeomi commented May 12, 2023

AbdirayimovS commented May 13, 2023

cwoonb commented May 16, 2023

kibeomi commented Aug 2, 2023 via email

JWWPXX commented Dec 28, 2023

JWWPXX commented Dec 28, 2023

danaekdml commented May 9, 2023 •

edited

hwangsaeyeon commented May 9, 2023 •

edited

siyeol97 commented May 11, 2023 •

edited