This repository was archived by the owner on Aug 28, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 16
JSON data format specifications
Luke Hollis edited this page Mar 1, 2017
·
3 revisions
With the JSON data format we're working on, we want to provide the most simple solution possible to rendering many different type of texts to the page. This is a stripped down format that is purely for communicating document structure for NLP purposes and rendering in the CLTK Archive reading interface.
Here's a summarized example of a converted JSON file of Vergil's Aeneid (at https://github.com/cltk/latin_text_perseus/blob/master/json/vergil__aeneid.json):
{
"englishTitle": "The Aeneid",
"originalTitle": "Aeneis",
"source": "Perseus Digital Library",
"sourceLink": "http://www.perseus.tufts.edu/hopper/",
"language": "latin",
"meta": "book-line",
"author": "Vergil",
"text": {
"1": {
"1": "Arma virumque cano, Troiae qui primus ab oris",
"2": "Italiam, fato profugus, Laviniaque venit",
...
},
"2": {
...
}
...
}
}
The fields are defined as follows:
- englishTitle: String - the english translation of the work title
- originalTitle: String - the original work title
- source: String - the title of the source where the digital copy of the work originates
- sourceLink: String - the link to the organization source where the digital copy of the work orginates
- language: String - the slug of the primary language that the work is in
- meta: String (optional) - such as "book-line" or "book-chapter-section" as dictated by the structure of the XML document. These will be used for determining form of the work.
- author: String - the slug of the author, lowercase, ideally as spelled in the author's directory name on this corpus
- text: Dict - the actual text of the document, nested as described in the meta parameter.
These are possible meta values that our frontend will be built to save and render to the page: chapter, chapter-section book-chapter, book-chapter-section, book-line, and poem-line, fragment, fragment-line, line (as in plays)