Add table with character data to API #31

lehkost · 2018-12-10T18:05:04Z

Proposed name:

/corpora/{corpusname}/play/{playname}/characters/csv

Proposed values:

ID and label:

Character ID
Character Label

Three quantitative measures:

Scene Appearances
- = number of scenes a character appears in
Speech Acts
- = number of <sp> per character
Number of Words

Five network-based measures (per character)

Degree
Weighted Degree
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality

As far as I can see, we do not calculate network values per character for API purposes yet. Our Shiny app has implemented this already in the Vertices tab and may serve as point of references for these values.

The text was updated successfully, but these errors were encountered:

cmil · 2019-02-02T13:41:34Z

The above PR implements the /corpora/{corpusnam}/play/{playname}/cast resource. I named it "cast" since we use this term in other places as well, plus it's shorter. The resource provides both JSON (default) and CSV (Accept: text/csv) representation. The individual network measures are not yet implemented.

See also https://dracor.org/documentation/api/#/public/get-cast.

This is a prerequisite for dracor-org/dracor-api#31

cmil · 2019-02-04T13:41:11Z

The /corpora/{corpusnam}/play/{playname}/cast resource now also includes degree, betweenness, eigenvector and closeness properties. These measures are calculated by the dracor-metrics service. While the first three are directly taken from JSNetworkX's corresponding functions, closeness is being calculated using the paths returned by its allPairsShortestPath function.

The values differ significantly from those provided in the Shiny app. At least in the case of closeness centrality there seems to be either a different concept or a mistake on Shiny's side.

Take, for instance, https://dracor.org/ger/lessing-philotas: if closeness centrality is the "reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph" (Wikipedia), Philotas' closeness should be 1/3 = 0.3333333333333333. Shiny gives 1.0000 though. JSNetworkX's values also differ from those of the Shiny app for betweenness centrality, while the degrees match and there is no eigenvector centrality in Shiny.

(see https://dracor.org/api/corpora/ger/play/lessing-philotas/cast)

@lehkost could you have a look at those values and clarify which ones are correct.

Also, I'm not sure how the weighted degree is calculated. It does not seem to be in Shiny yet and I have not found a function in JSNetworkX that would seem to provide it.

And finally, when calculating eigenvector centrality, JSNetworkX throws an exception with four of our plays (gogol-tjazhba, lermontov-strannyj-chelovek, brandes-ariadne-auf-naxos, panizza-nero), which is why in the cast lists of those plays the eigenvector property is always
0 at the moment.

lehkost · 2019-02-04T19:08:59Z

Great work already, so let's try to resolve remaining issues. This is what Gephi throws for Philotas:

Closeness centrality: So it seems that for the example provided, 1.0 is actually the correct value. Philotas has 3 direct connections to the other nodes (so in each case the distance = 1), so the sum of these is 3. Now, the number of other nodes (also 3) divided by that sum is 1.0. For Parmenio, this would be 3 divided by 5 = 0.6. The definition on Wikipedia is a bit hard to grasp, and the formula is different. This paper (PDF) actually has it much shorter: CC = "Number of nodes divided by the sum of the topological distances" – it also cites the benchmark book by Wasserman/Faust.

Weighted degree: This is basically the same value as degree, but instead of adding 1 to the degree number with every new relation of a node, we add the weight number for this pair of nodes (corresponding to the Weight table in our CSV files), i.e., if two characters co-appear in 4 scenes, their weight would be 4.

cmil · 2019-02-05T15:53:21Z

@lehkost I adjusted the closeness centrality and added the weighted degree. Now there is still a mismatch between Gephi's and NetworkX's eigenvector calculation (I checked, Python's NetworkX and JSNetworkX yield the same values for Philotas).

lehkost · 2019-02-09T09:54:44Z

Some more info on the Eigenvector Centrality mismatch, which seems to happen between igraph and NetworkX. We're not the first to notice that (cf. "Eigenvector Centrality Oddity with iGraph, Gephi, and NetworkX"). While that article finds diverging values for all three, igraph, Gephi and NetworkX, we find that igraph and Gephi throw the same results, while NetworkX begs to differ.

To add another example, here's what our R script throws (using igraph) for "Emilia Galotti":

The documentation for igraph and NetworkX both insinuate that they're relying on the same algorithm. Could you maybe check if you throw the 'edge weights' into the formula (which we don't do)? This could explain the different values…

cmil · 2019-02-25T19:01:08Z

I opened #58 to track the eigenvector oddity and would like to close this one by merging #54.

lehkost assigned cmil Dec 10, 2018

cmil mentioned this issue Feb 2, 2019

Cast list #54

Merged

cmil added a commit to dracor-org/dracor-metrics that referenced this issue Feb 4, 2019

output network measures per node

a172f37

This is a prerequisite for dracor-org/dracor-api#31

cmil assigned lehkost Feb 4, 2019

cmil mentioned this issue Feb 25, 2019

Eigenvector Centrality in cast list differs from values calculated by Gephi and igraph #58

Open

cmil closed this as completed in #54 Feb 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add table with character data to API #31

Add table with character data to API #31

lehkost commented Dec 10, 2018

cmil commented Feb 2, 2019

cmil commented Feb 4, 2019 •

edited

Loading

lehkost commented Feb 4, 2019

cmil commented Feb 5, 2019 •

edited

Loading

lehkost commented Feb 9, 2019

cmil commented Feb 25, 2019

Add table with character data to API #31

Add table with character data to API #31

Comments

lehkost commented Dec 10, 2018

Proposed name:

Proposed values:

ID and label:

Three quantitative measures:

Five network-based measures (per character)

cmil commented Feb 2, 2019

cmil commented Feb 4, 2019 • edited Loading

lehkost commented Feb 4, 2019

cmil commented Feb 5, 2019 • edited Loading

lehkost commented Feb 9, 2019

cmil commented Feb 25, 2019

cmil commented Feb 4, 2019 •

edited

Loading

cmil commented Feb 5, 2019 •

edited

Loading