Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table with character data to API #31

Closed
lehkost opened this issue Dec 10, 2018 · 6 comments · Fixed by #54
Closed

Add table with character data to API #31

lehkost opened this issue Dec 10, 2018 · 6 comments · Fixed by #54
Assignees

Comments

@lehkost
Copy link
Member

lehkost commented Dec 10, 2018

Proposed name:

/corpora/{corpusname}/play/{playname}/characters/csv

Proposed values:

ID and label:

  • Character ID
  • Character Label

Three quantitative measures:

  • Scene Appearances
    • = number of scenes a character appears in
  • Speech Acts
    • = number of <sp> per character
  • Number of Words

Five network-based measures (per character)

  • Degree
  • Weighted Degree
  • Betweenness Centrality
  • Closeness Centrality
  • Eigenvector Centrality

As far as I can see, we do not calculate network values per character for API purposes yet. Our Shiny app has implemented this already in the Vertices tab and may serve as point of references for these values.

@cmil cmil mentioned this issue Feb 2, 2019
@cmil
Copy link
Member

cmil commented Feb 2, 2019

The above PR implements the /corpora/{corpusnam}/play/{playname}/cast resource. I named it "cast" since we use this term in other places as well, plus it's shorter. The resource provides both JSON (default) and CSV (Accept: text/csv) representation. The individual network measures are not yet implemented.

See also https://dracor.org/documentation/api/#/public/get-cast.

cmil added a commit to dracor-org/dracor-metrics that referenced this issue Feb 4, 2019
@cmil
Copy link
Member

cmil commented Feb 4, 2019

The /corpora/{corpusnam}/play/{playname}/cast resource now also includes degree, betweenness, eigenvector and closeness properties. These measures are calculated by the dracor-metrics service. While the first three are directly taken from JSNetworkX's corresponding functions, closeness is being calculated using the paths returned by its allPairsShortestPath function.

The values differ significantly from those provided in the Shiny app. At least in the case of closeness centrality there seems to be either a different concept or a mistake on Shiny's side.

Take, for instance, https://dracor.org/ger/lessing-philotas: if closeness centrality is the "reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph" (Wikipedia), Philotas' closeness should be 1/3 = 0.3333333333333333. Shiny gives 1.0000 though. JSNetworkX's values also differ from those of the Shiny app for betweenness centrality, while the degrees match and there is no eigenvector centrality in Shiny.

(see https://dracor.org/api/corpora/ger/play/lessing-philotas/cast)

@lehkost could you have a look at those values and clarify which ones are correct.

Also, I'm not sure how the weighted degree is calculated. It does not seem to be in Shiny yet and I have not found a function in JSNetworkX that would seem to provide it.

And finally, when calculating eigenvector centrality, JSNetworkX throws an exception with four of our plays (gogol-tjazhba, lermontov-strannyj-chelovek, brandes-ariadne-auf-naxos, panizza-nero), which is why in the cast lists of those plays the eigenvector property is always
0 at the moment.

@lehkost
Copy link
Member Author

lehkost commented Feb 4, 2019

Great work already, so let's try to resolve remaining issues. This is what Gephi throws for Philotas:

philotas-gephi-network-values

Closeness centrality: So it seems that for the example provided, 1.0 is actually the correct value. Philotas has 3 direct connections to the other nodes (so in each case the distance = 1), so the sum of these is 3. Now, the number of other nodes (also 3) divided by that sum is 1.0. For Parmenio, this would be 3 divided by 5 = 0.6. The definition on Wikipedia is a bit hard to grasp, and the formula is different. This paper (PDF) actually has it much shorter: CC = "Number of nodes divided by the sum of the topological distances" – it also cites the benchmark book by Wasserman/Faust.

Weighted degree: This is basically the same value as degree, but instead of adding 1 to the degree number with every new relation of a node, we add the weight number for this pair of nodes (corresponding to the Weight table in our CSV files), i.e., if two characters co-appear in 4 scenes, their weight would be 4.

@cmil
Copy link
Member

cmil commented Feb 5, 2019

@lehkost I adjusted the closeness centrality and added the weighted degree. Now there is still a mismatch between Gephi's and NetworkX's eigenvector calculation (I checked, Python's NetworkX and JSNetworkX yield the same values for Philotas).

@lehkost
Copy link
Member Author

lehkost commented Feb 9, 2019

Some more info on the Eigenvector Centrality mismatch, which seems to happen between igraph and NetworkX. We're not the first to notice that (cf. "Eigenvector Centrality Oddity with iGraph, Gephi, and NetworkX"). While that article finds diverging values for all three, igraph, Gephi and NetworkX, we find that igraph and Gephi throw the same results, while NetworkX begs to differ.

To add another example, here's what our R script throws (using igraph) for "Emilia Galotti":

r_screenshot

The documentation for igraph and NetworkX both insinuate that they're relying on the same algorithm. Could you maybe check if you throw the 'edge weights' into the formula (which we don't do)? This could explain the different values…

@cmil
Copy link
Member

cmil commented Feb 25, 2019

I opened #58 to track the eigenvector oddity and would like to close this one by merging #54.

@cmil cmil closed this as completed in #54 Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants