## Dependency Parsing Using spaCy

Dependency parsing is a technique used in Natural Language Processing (NLP) to analyze the grammatical structure of a sentence and establish relationships between "head" words and words which modify those heads. spaCy, a popular NLP library in Python, offers robust support for dependency parsing, allowing users to easily extract this kind of grammatical information from text.

Dependency parsing is the process of extracting the dependency graph of a sentence to represent its grammatical structure. It defines the dependency relationship between headwords and their dependents. The head of a sentence has no dependency and is called the root of the sentence. The verb is usually the root of the sentence. All other words are linked to the headword.

The dependencies can be mapped in a directed graph representation where:

Words are the nodes.
Grammatical relationships are the edges.
Dependency parsing helps you know what role a word plays in the text and how different words relate to each other.

Here's how dependency parsing works in spaCy:

- Parsing Process: When spaCy processes a text, it automatically performs dependency parsing. This process involves assigning a syntactic structure to the sentence, where each word is linked to its "head" in the sentence (the word that it depends on, or is modified by) along with the type of link that connects them (the dependency).
- Dependency Labels: Each word in a sentence is assigned a dependency label, such as nsubj (nominal subject), dobj (direct object), or aux (auxiliary verb). These labels describe the type of syntactic relationship between the word and its head.
- Visualization: spaCy provides a visualization tool called displacy which can be used to visualize the dependency tree of a sentence, making it easier to understand and analyze the grammatical structure.
- Accessing Dependency Information: In spaCy, the dependency label for each token can be accessed using the .dep\_ attribute, and the head of each token can be accessed using the .head attribute. This allows for detailed analysis and manipulation of the parsed data.
- Applications: Dependency parsing is useful in various NLP tasks such as information extraction, question answering, and text summarization, where understanding the grammatical structure of sentences is important.
- Accuracy and Performance: spaCy's dependency parser is known for its accuracy and efficiency. It uses machine learning models trained on large annotated corpora, making it capable of handling a wide range of languages and text types.

Dependency parsing with spaCy provides a deep understanding of the grammatical structure of sentences, enabling advanced text analysis and the extraction of meaningful information from text.


In [8]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")

In [9]:
# Example 1
piano_text = "Gus is learning piano"
piano_doc = nlp(piano_text)
for token in piano_doc:
    print(
        f"""
TOKEN: {token.text}
=====
{token.tag_ = }
{token.head.text = }
{token.dep_ = }"""
    )


TOKEN: Gus
=====
token.tag_ = 'NNP'
token.head.text = 'learning'
token.dep_ = 'nsubj'

TOKEN: is
=====
token.tag_ = 'VBZ'
token.head.text = 'learning'
token.dep_ = 'aux'

TOKEN: learning
=====
token.tag_ = 'VBG'
token.head.text = 'learning'
token.dep_ = 'ROOT'

TOKEN: piano
=====
token.tag_ = 'NN'
token.head.text = 'learning'
token.dep_ = 'dobj'


In [15]:
displacy.render(piano_doc, style="dep", jupyter=True, options={"distance": 200})

In [10]:
# Example 2
text = "John bought a new laptop from the store yesterday."
doc = nlp(text)
for token in doc:
    print(
        f"""
TOKEN: {token.text}
=====
{token.tag_ = }
{token.head.text = }
{token.dep_ = }
"""
    )


TOKEN: John
=====
token.tag_ = 'NNP'
token.head.text = 'bought'
token.dep_ = 'nsubj'


TOKEN: bought
=====
token.tag_ = 'VBD'
token.head.text = 'bought'
token.dep_ = 'ROOT'


TOKEN: a
=====
token.tag_ = 'DT'
token.head.text = 'laptop'
token.dep_ = 'det'


TOKEN: new
=====
token.tag_ = 'JJ'
token.head.text = 'laptop'
token.dep_ = 'amod'


TOKEN: laptop
=====
token.tag_ = 'NN'
token.head.text = 'bought'
token.dep_ = 'dobj'


TOKEN: from
=====
token.tag_ = 'IN'
token.head.text = 'bought'
token.dep_ = 'prep'


TOKEN: the
=====
token.tag_ = 'DT'
token.head.text = 'store'
token.dep_ = 'det'


TOKEN: store
=====
token.tag_ = 'NN'
token.head.text = 'from'
token.dep_ = 'pobj'


TOKEN: yesterday
=====
token.tag_ = 'NN'
token.head.text = 'bought'
token.dep_ = 'npadvmod'


TOKEN: .
=====
token.tag_ = '.'
token.head.text = 'bought'
token.dep_ = 'punct'



In [11]:
displacy.render(doc, style="dep", jupyter=True, options={"distance": 90})

In [16]:
import spacy
from spacy import displacy

# Load the spaCy model
nlp = spacy.load("en_core_web_sm")

# Example sentence
text = "John bought a new laptop from the store yesterday."

# Process the text
doc = nlp(text)

# Extracting tokens and their dependency information
dependencies = [(token.text, token.dep_, token.head.text) for token in doc]

dependencies
# Note: The visualization part can't be displayed in this text output, but I will explain how it can be done.

[('John', 'nsubj', 'bought'),
 ('bought', 'ROOT', 'bought'),
 ('a', 'det', 'laptop'),
 ('new', 'amod', 'laptop'),
 ('laptop', 'dobj', 'bought'),
 ('from', 'prep', 'bought'),
 ('the', 'det', 'store'),
 ('store', 'pobj', 'from'),
 ('yesterday', 'npadvmod', 'bought'),
 ('.', 'punct', 'bought')]