모델에 사용자 사전 추가 기능 건의(Suggest that add user's own vocabulary into model) #211

Indigo-Coder-github · 2023-11-11T08:32:45Z

안녕하세요. 다른 토픽모델링 라이브러리를 쓰다가 빠르고 사용하기 편해서 넘어온 사용자입니다.
사용하다보니 조금 아쉬운 점이 있어서 건의를 남깁니다.
sklearn의 tfidfvectorizer의 parameter에는 vocabulary가 있어서 불용어나 특정 단어 제외 등의 전처리를 하지 않은 데이터도 내부에서 자동적으로 vocabulary에 없는 단어는 고려하지 않습니다. tomotopy에서 제공하는 모델의 파라미터나 메서드 중에는 이를 지원하지 않는 것 같아 vocabulary 파일과 전처리되지 않은 데이터가 따로 있다면 전처리를 하고 모델에 적용해야하는 코드를 따로 작성해야하네요.
혹시나 제가 발견하지 못했던 것이라고 생각되신다면 공식 문서내 이와 유사한 기능의 항목의 위치를 알려주시면 감사하겠습니다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

모델에 사용자 사전 추가 기능 건의(Suggest that add user's own vocabulary into model) #211

모델에 사용자 사전 추가 기능 건의(Suggest that add user's own vocabulary into model) #211

Indigo-Coder-github commented Nov 11, 2023

모델에 사용자 사전 추가 기능 건의(Suggest that add user's own vocabulary into model) #211

모델에 사용자 사전 추가 기능 건의(Suggest that add user's own vocabulary into model) #211

Comments

Indigo-Coder-github commented Nov 11, 2023