Formulae for dependency distance calculation on Doc level #77

bma-vandijk · 2022-12-01T10:51:42Z

Hi,

first of all thanks for this very helpful library. I have a question regarding the way dependency distance (DD) for Doc objects is calculated.

Your function on calculating DD for a Doc, returns the DD value here: "dependency_distance_mean": np.mean(dep_dists). The mean returned is as far as I can see, the mean over mean DD of every sentence (contained in dep_dist) constituting the Doc object.

the two sources you cite on dependency distance in your documentation (Liu 2008 and Oya, 2008), however, have a different approach.

For calculating DD of a text, Liu seems to take the sum of the absolute DD found in the whole text and multiplies by (1 / (number of words - the number of sentences). Oya seems to takes a mean of means like you do, but for a sentence averages sum of absolute DD over the number of dependency links in an utterance. In your documentation nor in your code I can retrieve how you exactly calculate DD for a text.

Would you please be so kind as to explain with what approach you calculate DD for Doc objects, and provide pointers on how we may adapt the code to e.g. implement approaches by Liu and Oya? Thanks!

Which page or section is this issue related to?

https://hlasse.github.io/TextDescriptives/dependencydistance.html

The text was updated successfully, but these errors were encountered:

HLasse · 2022-12-05T09:20:03Z

Hi @bma-vandijk, thanks for the question and for using the library!
The logic for calculating DD is contained in textdescriptives/components/dependency_distance.py with the main logic happening in token_dependency.

The implementation in textdescriptives follows Oya - we calculate the distances from each token to their dependent, and take the mean of this for calculating the mean dependency distance for spans. In our implementation, we calculate the Doc level DD by averaging over the sentence-level mean DDs.

To get the sentence level means as in Oya, you could simply do

dep_dists, adj_deps = zip(
            *[sent._.dependency_distance.values() for sent in doc.sents]
        )

To calculate the metric you cite from Liu (sum(DD) * (1 / (number of words - the number of sentences))
you could so something like the following (assuming you have added the dependency distance pipeline)

def liu_doc_dependency(doc: Doc) -> float:
    """Calculate mean dependency distance from Liu, 2008"""
    # get sum of token level dependency distance
    dd = sum[token._.dependency_distance["dependency_distance"] for token in doc]
    return dd * (1 / (len(doc) - len(list(doc.sents))))

Let me know if you have any other questions!

KennethEnevoldsen · 2022-12-05T10:30:10Z

@HLasse let us add this to the documentation as well

Fixes #77

bma-vandijk · 2023-07-11T07:47:08Z

This is super late, but I would still like to thank you for your swift and helpful response :)

bma-vandijk added the documentation Improvements or additions to documentation label Dec 1, 2022

bma-vandijk changed the title ~~Formulae for dependency distance calulcation on Doc level~~ Formulae for dependency distance calculation on Doc level Dec 5, 2022

KennethEnevoldsen assigned HLasse Dec 5, 2022

HLasse mentioned this issue Dec 5, 2022

Update docs #71

Closed

HLasse added a commit that referenced this issue Dec 14, 2022

docs: docs for dependency distance formula

ec360fc

Fixes #77

HLasse mentioned this issue Dec 14, 2022

docs: docs for dependency distance formula #89

Merged

HLasse closed this as completed Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formulae for dependency distance calculation on Doc level #77

Formulae for dependency distance calculation on Doc level #77

bma-vandijk commented Dec 1, 2022 •

edited

HLasse commented Dec 5, 2022 •

edited

KennethEnevoldsen commented Dec 5, 2022

bma-vandijk commented Jul 11, 2023

Formulae for dependency distance calculation on Doc level #77

Formulae for dependency distance calculation on Doc level #77

Comments

bma-vandijk commented Dec 1, 2022 • edited

Which page or section is this issue related to?

HLasse commented Dec 5, 2022 • edited

KennethEnevoldsen commented Dec 5, 2022

bma-vandijk commented Jul 11, 2023

bma-vandijk commented Dec 1, 2022 •

edited

HLasse commented Dec 5, 2022 •

edited