Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No explanation of the languages supported by Gensim various modules #2990

Closed
rubmz opened this issue Oct 24, 2020 · 1 comment
Closed

No explanation of the languages supported by Gensim various modules #2990

rubmz opened this issue Oct 24, 2020 · 1 comment
Labels
question Discussions that are generally off-topic for the github issue tracker

Comments

@rubmz
Copy link

rubmz commented Oct 24, 2020

Problem description

Many NLP libraries exist today that support many NLP related tasks. Unfortunately not all of them support more beyond English and few other languages. My GUESS here is Gensim does support more languages. But Does it?
It would be nice if anywhere in the docs it would be specified that Gensim supports languages X.Y.Z (supports Unicode, has language models, ...) - or even better - language agnostic.

Reproduce

A usual search would be "gensim supported languages".

Versions

<= 3.8.3

@piskvorky
Copy link
Owner

piskvorky commented Oct 24, 2020

Yeah Gensim is language agnostic. It relies on distributional semantics (word co-occurrences) to do unsupervised learning.

You can check the specific algo you're interested in (LSI, LDA, word2vec, etc) to see if it's suitable for your type of data / language. Gensim itself is not concerned with NLP text preprocessing at all, your "tokens" and documents are opaque strings to Gensim.

General questions like this are better suited for the mailing list: https://groups.google.com/forum/#!forum/gensim

@mpenkov mpenkov added the question Discussions that are generally off-topic for the github issue tracker label Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Discussions that are generally off-topic for the github issue tracker
Projects
None yet
Development

No branches or pull requests

3 participants