Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add genre information to metadata of plays #120

Closed
lehkost opened this issue Dec 3, 2020 · 11 comments
Closed

Add genre information to metadata of plays #120

lehkost opened this issue Dec 3, 2020 · 11 comments
Assignees

Comments

@lehkost
Copy link
Member

lehkost commented Dec 3, 2020

After deciding on a mark-up strategy for genre declaration (dracor-org/dracor-schema#3), we are ready to add this info to the metadata of plays in a unified manner for all corpora. For the time being we only mark genre very coarsely on two levels:

comedy or tragedy:

<textClass>
  <keywords>
    <term type="genreTitle">Comedy</term>
  </keywords>
  <classCode scheme="http://www.wikidata.org/entity/">Q40831</classCode>
</textClass>

or

<textClass>
  <keywords>
    <term type="genreTitle">Tragedy</term>
  </keywords>
  <classCode scheme="http://www.wikidata.org/entity/">Q80930</classCode>
</textClass>

libretto or not:

<textClass>
  <keywords>
    <term type="genreTitle">Libretto</term>
  </keywords>
  <classCode scheme="http://www.wikidata.org/entity/">Q131084</classCode>
</textClass>

The API should add two columns to the metadata files:

  1. Genre (Comedy, or Tragedy, or empty if no information is available).
  2. Libretto (1 for yes, this is a libretto, and 0 for no, this is not a libretto).

The background for this feature is:

  1. A very general distinction between the two main genres tragedy and comedy to be able to automatise analyses like this one.
  2. An automatised possibility to exclude libretti from an analysis (or to just assemble libretti for an analysis).

The German corpus already has some files with corresponding markup to test this feature once it's implemented (see dracor-org/gerdracor@98b713b).

@lehkost
Copy link
Member Author

lehkost commented Dec 5, 2020

Update: GerDraCor was fully marked up with basic genre information as described above (see dracor-org/gerdracor@9c2fcf5). So we could test the new function with this corpus first…

@cmil
Copy link
Member

cmil commented Dec 5, 2020

@lehkost How do we deal with corpora that have textClass defined but do not conform to the new markup strategy, e.g. RusDraCor? I would suggest to look for a //textClass/classCode[@scheme = "http://www.wikidata.org/entity/] and, if the Wikidata ID provided matches one of the three mentioned above, set the genre and libretto fields respectively. Non matching textClasses would be ignored.

Regarding the libretto field, I would set it to true if there is a matching textClass with the libretto class code. If the textClass/classCode is specified but is not a libretto I set it to false. If there is no textClass/classCode I'd leave the libretto field empty.

@lehkost
Copy link
Member Author

lehkost commented Dec 5, 2020

Speaking of which, RusDraCor will be updated with proper genre info in the next hour. 🙃

And I agree to your suggestion how to handle textClass, let's only match Wikidata IDs of genres as described above.

cmil added a commit that referenced this issue Dec 5, 2020
@cmil
Copy link
Member

cmil commented Dec 5, 2020

@lehkost Implemented in v0.75.0 deployed on staging. Please test!

@lehkost
Copy link
Member Author

lehkost commented Dec 5, 2020

Works well for GerDraCor, metadata loaded into LibreOffice Calc:

genre-screenshot

The only thing I would change is the handling of Libretto. The assignment of "true" is okay, but every play that has no "Libretto" indication should generally be "false", also those plays that have no other genre information. Reason: If "Libretto" information is not marked up it means that a play is not a libretto.

@lehkost
Copy link
Member Author

lehkost commented Dec 11, 2020

I just added "Tragicomedy" to GerDraCor where appropriate, like this:

<textClass>
  <keywords>
    <term type="genreTitle">Tragicomedy</term>
  </keywords>
  <classCode scheme="http://www.wikidata.org/entity/">Q192881</classCode>
</textClass>

@cmil
Copy link
Member

cmil commented Dec 11, 2020

I just added "Tragicomedy" to GerDraCor where appropriate, like this:
...

Is the idea that repertoire of accepted text classes will grow over time or will it even be completely open to additions? This has implications on how the libretto flag is implemented. I assumed that for now Tragedy, Comedy and Libretto were the only supported text classes.

@lehkost
Copy link
Member Author

lehkost commented Dec 11, 2020

Ideally, the text classes (genres) will grow over time. Tagging genre for such diverse corpora in a TEI document, though, is not the best solution. We might build on a drama genre ontology in the future, but there is no good candidate there yet.

For now I would say we have two levels of genre information: 1. libretto or not; 2. tragedy, comedy or tragicomedy, if applicable. The later group under (2.) might grow in the near future, the first group (1.) probably won't, because it just tells apart libretti and dramas not written for music. I hope that makes sense?

cmil added a commit that referenced this issue Dec 13, 2020
The libretto flag is now always false unless unless the libretto class
code is among the text classes.

See 
#120 (comment)
and 
#120 (comment)
@cmil
Copy link
Member

cmil commented Dec 13, 2020

For now I would say we have two levels of genre information: 1. libretto or not; 2. tragedy, comedy or tragicomedy, if applicable. The later group under (2.) might grow in the near future, the first group (1.) probably won't, because it just tells apart libretti and dramas not written for music.

So for libretti there will never be a chance to distinguish between e.g. dramma giocoso, tragédie lyrique, Singspiel etc.? That looks like an unnecessary restriction to me. In fact, why would we need to couple genre attribution and identification of libretti at all? PR #122 tries to somewhat loosen this coupling while still following the main idea of genre attribution proposed above.

@lehkost
Copy link
Member Author

lehkost commented Dec 13, 2020

Good points! Ideally, we would like to store all genre info we can gather on each play, including the libretto subgenres you mention. The way we started to implement genre markup now doesn't prevent us to further differentiate genre in the future.

On a sidenote, some examples from the emerging French Drama Corpus:

<textClass>
  <keywords>
    <term type="genreTitle">Tragédie</term>
    <term type="genreTitle">vers</term>
  </keywords>
</textClass>
<textClass>
  <keywords>
    <term type="genreTitle">Comédie</term>
    <term type="genreTitle">prose</term>
  </keywords>
</textClass>
<textClass>
  <keywords>
    <term type="genreTitle">Monologue</term>
    <term type="genreTitle">vers</term>
  </keywords>
</textClass>
<textClass>
  <keywords>
    <term type="genreTitle">Proverbe</term>
    <term type="genreTitle">prose</term>
  </keywords>
</textClass>

Also here, Tragedy, Comedy, Monologue and Proverb are not on the same level of attributing a genre, the same way as "prose" and "verse" are a different way of (a much more formal) genre description. We should probably try not to loose any of the information we inherit from other sources (and should try to add rich genre markup for the sources that don't have any).

@cmil
Copy link
Member

cmil commented Dec 13, 2020

I would suggest that the transformation script for dracor-org/fredracor adds the appropriate classCode where genre information found in the originals matches the recognised text classes defined in #122.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants