Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error message for the case where a corpus is missing #965

Closed
bcrowell opened this issue Feb 10, 2020 · 3 comments
Closed

Improved error message for the case where a corpus is missing #965

bcrowell opened this issue Feb 10, 2020 · 3 comments
Assignees

Comments

@bcrowell
Copy link

Thanks for all your hard work on this project, and for making it open source. As a newbie trying to get going with a freshly installed copy of CLTK, I ran into a situation where a corpus I needed was missing, but I had a hard time using the error message to figure out that the error was a missing corpus or what I needed to do to fix that. The error was because I was trying to do lemmatization in Greek but hadn't installed greek_models_cltk. The error message was a python stack trace, the final line of which was the following:

FileNotFoundError: [Errno 2] No such file or directory: '/home/bcrowell/cltk_data/greek/model/greek_models_cltk/lemmata/greek_lemmata_cltk.py'

It would be helpful if the software could catch this exception and give a more informative error message. Such an error message would be something like "You need to install the greek_models_cltk corpus. To do this, first use CorpusImporter('greek') to create a CorpusImporter object, then do import_corpus('greek_models_cltk')." Note that in the error message that is currently output, the word "corpus" never occurs, and the strings "greek" and "greek_models_cltk" are not contiguous in the description of the missing directory.

@kylepjohnson kylepjohnson self-assigned this Feb 11, 2020
@kylepjohnson
Copy link
Member

Hi Ben, thanks for the feedback. You're right, that some of our error messages are perfectly unfriendly.

We have some code revision work going on, and improving error messages has been one of the key points.

I will keep this issue open and assigned to me, so it doesn't get lost.

Anything else? Were you in fact able to get the lemmatizer running?

@bcrowell
Copy link
Author

Yes, I did figure it out -- thanks for asking! Other than this, I had some relatively minor suggestions about documentation. I'll post that as a separate issue.

@marviro
Copy link

marviro commented Sep 20, 2020

Thank you very much for this issue that helped me to solve the same problem! Just to be clearer for other newbies like me, here is an example of code for lemmatizer:

from cltk.stem.lemma import LemmaReplacer
from cltk.corpus.utils.importer import CorpusImporter
corpus_importer = CorpusImporter('greek')
corpus_importer.import_corpus('greek_models_cltk')
from cltk.corpus.utils.formatter import cltk_normalize
lemmatizer = LemmaReplacer('greek')
sentence = "μὴ ζήτει δέλτοισιν ἐμαῖς Πρίαμον παρὰ βωμοῖς"
sentence = cltk_normalize(sentence)
sentence = lemmatizer.lemmatize(sentence)
print(sentence)

Would it be possible to add examples like this somewhere in the documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants