Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate lang usage in pke #225

Merged
merged 4 commits into from
Mar 15, 2023
Merged

Consolidate lang usage in pke #225

merged 4 commits into from
Mar 15, 2023

Conversation

ygorg
Copy link
Collaborator

@ygorg ygorg commented Mar 15, 2023

There are 3 cases when using pke regarding to language used:

  • spacy model and stemmer available
python -m spacy download fr_core_news_sm
python << BEGIN
import pke
e = pke.unsupervised.PositionRank()
e.load_document("Ga bu zo meu", language='fr')
BEGIN
  • spacy model but no stemmer available : falls back to porter stemmer, does not use stopwords
python -m spacy download pl_core_news_sm
python << BEGIN
import pke
e = pke.unsupervised.PositionRank()
e.load_document("Ga bu zo meu", language='pl')
# WARNING:root:No stoplist available in pke for 'pl' language.
# WARNING:root:No stemmer available in pke for 'pl' language -> falling back to porter stemmer.
BEGIN
  • no spacy model available: preprocessing needs to happen outside of pke
python << BEGIN
import pke
e = pke.unsupervised.PositionRank()
doc = [[('Ga', 'DET'), ('bu', 'ADJ'), ('zo', 'NOUN'), ('meu', 'VERB')]]
e.load_document(doc, language='ug')
# WARNING:root:No stoplist available in pke for 'ug' language.
# WARNING:root:No stemmer available in pke for 'ug' language -> falling back to porter stemmer.
BEGIN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant