Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] #104

Open
danaekdml opened this issue May 9, 2023 · 11 comments
Open

[BUG] #104

danaekdml opened this issue May 9, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@danaekdml
Copy link

danaekdml commented May 9, 2023

🐛 Bug

No module named 'kobert'

To Reproduce

from kobert.utils import get_tokenizer
from kobert.pytorch_kobert import get_pytorch_kobert_model

이거를 돌리려 할 때
ModuleNotFoundError Traceback (most recent call last)
in <cell line: 2>()
1 #kobert
----> 2 from kobert.utils import get_tokenizer
3 from kobert.pytorch_kobert import get_pytorch_kobert_model

ModuleNotFoundError: No module named 'kobert'
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

이러한 에러 발생
gluonnlp 버전 문제를 해결하니 kobert 문제 발생

버그를 재현하기 위한 재현절차를 작성해주세요.

Expected behavior

Environment

Additional context

@danaekdml danaekdml added the bug Something isn't working label May 9, 2023
@hwangsaeyeon
Copy link

hwangsaeyeon commented May 9, 2023

저도 같은 오류가 발생하여 hugging face를 이용하는 방법으로 해결했습니다.
hugging face
에 있는 코드를 활용하였고, BERTSentenceTransform에서 한번 더 오류가 발생하는데 이부분은 py파일에서 class를 복붙하여 코드를 수정하는 방법으로 해결했습니다.

blog
코드가 길어서 블로그에 작성했는데 참고하셔도 좋을 것 같습니다.

@kibeomi
Copy link

kibeomi commented May 11, 2023

hugging faceblog
를 참고해서 다시 돌려보는데 정확도 수치가 0.17로 너무 낮게 나오네요. 뭐가 문제일까요?
hugging face로 하기 전에는 정확도가 0.56정도 였는데 이상하네요

!pip install mxnet
!pip install gluonnlp==0.8.0
!pip install tqdm pandas
!pip install sentencepiece
!pip install transformers
!pip install torch
!pip install 'git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf'

from kobert_tokenizer import KoBERTTokenizer
from transformers import BertModel

import torch
from torch import nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import gluonnlp as nlp
import numpy as np
from tqdm.notebook import tqdm
from transformers import AdamW
from transformers.optimization import get_cosine_schedule_with_warmup

tokenizer = KoBERTTokenizer.from_pretrained('skt/kobert-base-v1')
bertmodel = BertModel.from_pretrained('skt/kobert-base-v1', return_dict=False)
vocab = nlp.vocab.BERTVocab.from_sentencepiece(tokenizer.vocab_file, padding_token='[PAD]')

tok = tokenizer.tokenize
이 중에 잘못된 부분이 있을까요?

@AbdirayimovS
Copy link

I have the same issue!
I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :(
In addition, I could not use tranformers library to download the KoBERT.

@kibeomi
Copy link

kibeomi commented May 11, 2023

BERTSentenceTransform 클래스 선언할때, 19번째 줄 부분에

tokens_a = self._tokenizer(text_a) 대신
tokens_a = self._tokenizer.tokenize(text_a) 로 수정해야 모델이 제대로 돌아가는것 같습니다...
[출처] No module named 'kobert' 에러 해결|작성자 yeon
이 댓글대로 하니 정확도가 다시 높아졌어요

@siyeol97
Copy link

siyeol97 commented May 11, 2023

text_a = '한국어 모델을 공유합니다.'
tokens_1 = tokenizer.tokenize(text_a)
tokens_2 = tokenizer(text_a)
print(tokens_1, type(tokens_1))
print(tokens_2, type(tokens_2))
output : 
tokens_1 : ['▁한국', '어', '▁모델', '을', '▁공유', '합니다', '.'] <class 'list'>
tokens_2 : {'input_ids': [2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]} <class 'transformers.tokenization_utils_base.BatchEncoding'>

BERTSentenceTransform 클래스에서,
tokens_a = self._tokenizer(text_a)
기존 코드대로 실행하면, tokens_a 는 위의 tokens_2 형식이 됩니다. 결국 token과 input_ids는

token = [[CLS], 'input_ids', 'token_type_ids', 'attention_mask', [SEP]]
input_ids = [2, 0, 0, 0, 3, 1, 1, 1, 1, ...]

input_ids 가 dataset 길이만큼 전부 같은 형식으로 바뀌어 정확도가 매우 낮아지는 겁니다.

그리고 만약

tok = tokenizer.tokenize
data_train = BERTDataset(dataset_train, 0, 1, tok , vocab, max_len, True, False)
data_test = BERTDataset(dataset_test, 0, 1, tok , vocab, max_len, True, False)

tok = tokenizer.tokenize 로 변환해 BERTSentenceTransform 클래스에 넣으면, 아마도 convert_tokens_to_ids를 불러올 수 없다는 오류가 나올겁니다.
tokenizer.tokenize로 변환시켜서 넣지말고, BERTSentenceTransform 내부에서

#tokens_a = self._tokenizer(text_a) 
tokens_a = self._tokenizer.tokenize(text_a) #수정

직접 코드를 수정하면 오류가 없을거에요

@kibeomi
Copy link

kibeomi commented May 12, 2023

설명 감사합니다!

@AbdirayimovS
Copy link

I have the same issue! I used the python3.7 It solves some version problems. but still the gluonnlp is not installed properly :( In addition, I could not use tranformers library to download the KoBERT.

I installed it with transformers! Use python10 and the gluonnlp !=0.10.0

@cwoonb
Copy link

cwoonb commented May 16, 2023

@AbdirayimovS Could you please share some library installation code?

@kibeomi
Copy link

kibeomi commented Aug 2, 2023 via email

@JWWPXX
Copy link

JWWPXX commented Dec 28, 2023

I wonder if there are any contradictions in this installation dependencies

@JWWPXX
Copy link

JWWPXX commented Dec 28, 2023

when I ues pip install git+https://git@github.com/SKTBrain/KoBERT.git@master
it shows
ERROR: Cannot install kobert because these package versions have conflicting dependencies.

The conflict is caused by:
onnxruntime 1.8.0 depends on numpy>=1.16.6
gluonnlp 0.6.0 depends on numpy
mxnet 1.4.0.post0 depends on numpy<1.15.0 and >=1.8.2
onnxruntime 1.8.0 depends on numpy>=1.16.6
gluonnlp 0.6.0 depends on numpy
mxnet 1.4.0 depends on numpy<1.15.0 and >=1.8.2

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

plesae tell how to deal with it
thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants