Strategies for parsing / ingesting corpora - latin_text_latin_library #4

lukehollis · 2015-11-22T04:56:51Z

The beginnings of a solution are in https://github.com/cltk/cltk_api/blob/ingest/ingest/learn/latin_library.py, but we've discussed the difficulty of incorporating TLL files here, and it's very likely that the added benefit at this stage is outweighed by the effort of programming attempting to parse/infer useful metadata.

kylepjohnson · 2015-11-22T16:34:01Z

I want both TLL and Lacus Curtius in the API. My perception is that it will be easier to wait on these until we have settled upon a data structure we know will work.

Idea: how about we get through the first milestone of serving the api from texts, and then you also picking it up in the frontend? I say this for two reasons: (1) not to risk reduplicated effort; (2) the texts (especially TLL) are so inconsistently marked up that it may be better to find someone to copy-paste them into the form we want; (3) I would like to reach out to Bill Thayer to talk about getting the Greek LC files, too, since I never wrote a scraper for them years back. He knows those files well and could be of service for the corpora.

Does this sound logical?

lukehollis · 2015-11-22T16:46:53Z

Sounds great to deprioritize these two for right now!

kylepjohnson · 2015-11-22T16:54:24Z

I appreciate you looking forward. With all these included -- Pers, TLL, and
LC -- the site will be a formidable resource.

On Sunday, November 22, 2015, Luke Hollis notifications@github.com wrote:

Sounds great to deprioritize these two for right now!

—
Reply to this email directly or view it on GitHub
#4 (comment).

kylepjohnson mentioned this issue Nov 22, 2015

Strategies for parsing / ingesting corpora - latin_text_lacus_curtius #3

Open

lukehollis added the document-conversion label Mar 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategies for parsing / ingesting corpora - latin_text_latin_library #4

Strategies for parsing / ingesting corpora - latin_text_latin_library #4

lukehollis commented Nov 22, 2015

kylepjohnson commented Nov 22, 2015

lukehollis commented Nov 22, 2015

kylepjohnson commented Nov 22, 2015

Strategies for parsing / ingesting corpora - latin_text_latin_library #4

Strategies for parsing / ingesting corpora - latin_text_latin_library #4

Comments

lukehollis commented Nov 22, 2015

kylepjohnson commented Nov 22, 2015

lukehollis commented Nov 22, 2015

kylepjohnson commented Nov 22, 2015