Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is this actually even used? #12

Closed
swasheck opened this issue May 30, 2018 · 1 comment
Closed

How is this actually even used? #12

swasheck opened this issue May 30, 2018 · 1 comment

Comments

@swasheck
Copy link

The tutorials do not really tell me how to use this.

For example, what happens between #2 and #3?

So I can import corpora in step 2, and the dowload works because I can see them in ~/cltk_data/. Great. But now how do I use them? I don't really see a way to do this. Do I now have to read the files using file.read() or somesuch? Is the general goal to not use the corpora for analysis but to use them as trained data sets to analyze some other texts that we find?

@kylepjohnson
Copy link
Member

Do I now have to read the files using file.read() or somesuch?

Yes, exactly. For most text corpora, you need to write your own code to open these files.

The usual pattern to open a file:

with open (<filepath>) as file_open:
    text = file_open.read()

If you want to open multiple files in a directory, look at this: https://stackoverflow.com/a/3207973.

Is the general goal to not use the corpora for analysis but to use them as trained data sets to analyze some other texts that we find?

Both, in fact! Some repos are simply for reading (https://github.com/cltk/latin_text_latin_library) while others just for training (https://github.com/cltk/latin_training_set_sentence_cltk). Others could be used for both, perhaps! (For example, using a treebank to either train a predictive model or to do statistics).

What is your goal in using the CLTK?

Closing but post back here with other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants