New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formulae for dependency distance calculation on Doc level #77
Comments
Hi @bma-vandijk, thanks for the question and for using the library! The implementation in textdescriptives follows Oya - we calculate the distances from each token to their dependent, and take the mean of this for calculating the mean dependency distance for spans. In our implementation, we calculate the Doc level DD by averaging over the sentence-level mean DDs. To get the sentence level means as in Oya, you could simply do dep_dists, adj_deps = zip(
*[sent._.dependency_distance.values() for sent in doc.sents]
) To calculate the metric you cite from Liu (sum(DD) * (1 / (number of words - the number of sentences)) def liu_doc_dependency(doc: Doc) -> float:
"""Calculate mean dependency distance from Liu, 2008"""
# get sum of token level dependency distance
dd = sum[token._.dependency_distance["dependency_distance"] for token in doc]
return dd * (1 / (len(doc) - len(list(doc.sents)))) Let me know if you have any other questions! |
@HLasse let us add this to the documentation as well |
This is super late, but I would still like to thank you for your swift and helpful response :) |
Hi,
first of all thanks for this very helpful library. I have a question regarding the way dependency distance (DD) for Doc objects is calculated.
Your function on calculating DD for a Doc, returns the DD value here:
"dependency_distance_mean": np.mean(dep_dists)
. The mean returned is as far as I can see, the mean over mean DD of every sentence (contained indep_dist
) constituting the Doc object.the two sources you cite on dependency distance in your documentation (Liu 2008 and Oya, 2008), however, have a different approach.
For calculating DD of a text, Liu seems to take the sum of the absolute DD found in the whole text and multiplies by (1 / (number of words - the number of sentences). Oya seems to takes a mean of means like you do, but for a sentence averages sum of absolute DD over the number of dependency links in an utterance. In your documentation nor in your code I can retrieve how you exactly calculate DD for a text.
Would you please be so kind as to explain with what approach you calculate DD for Doc objects, and provide pointers on how we may adapt the code to e.g. implement approaches by Liu and Oya? Thanks!
Which page or section is this issue related to?
https://hlasse.github.io/TextDescriptives/dependencydistance.html
The text was updated successfully, but these errors were encountered: