<a href="https://colab.research.google.com/github/acastellanos-ie/NLP-MBD-EN-2023-A-Electives/blob/main/tagging_parsing_practice/parsing_practice_solution_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Google Colab Configuration

**Execute this steps to configure the Google Colab environment in order to execute this notebook. It is not required if you are executing it locally and you have properly configured your local environment according to what explained in the Github Repository.**

The first step is to clone the repository to have access to all the data and files

In [None]:
repository_name = "NLP-MBD-EN-2023-A-Electives"
repository_url = 'https://github.com/acastellanos-ie/' + repository_name

In [None]:
! git clone $repository_url

Install the requirements

In [None]:
! pip install -Uqqr $repository_name/tagging_parsing_practice/requirements.txt

Ensure that you have the GPU runtime activated:

![](https://miro.medium.com/max/3006/1*vOkqNhJNl1204kOhqq59zA.png)

Now you have everything you need to execute the code in Colab

# Dependency Parsing with spacy

For this practice, we will use the [spaCy](https://https://spacy.io/) library, which provides pre-trained models for various NLP tasks, including dependency parsing. In this example, we'll demonstrate how to perform dependency parsing and visualize the results using spaCy

We need to start by downloading the pre-trained spacy model for English. For more details about the available models, please check the spacy documentation: https://spacy.io/models

In [None]:
! python -m spacy download en_core_web_sm


Now we can load the pre-trained model that we just downloaded

In [None]:
import spacy
from spacy import displacy
from IPython.core.display import display, HTML


nlp = spacy.load("en_core_web_sm")

Let's define a simple sample text and perform the dependency parsing

In [None]:
text = "The quick brown fox jumps over the lazy dog."

doc = nlp(text)

for token in doc:
    print(f"{token.text} <--{token.dep_}-- {token.head.text}")


The <--det-- fox
quick <--amod-- fox
brown <--amod-- fox
fox <--nsubj-- jumps
jumps <--ROOT-- jumps
over <--prep-- jumps
the <--det-- dog
lazy <--amod-- dog
dog <--pobj-- over
. <--punct-- jumps


The output is not straightforward to understand and analyze, but we can always show the dependency tree.

In [None]:
# Render the dependency tree using displaCy
html = displacy.render(doc, style="dep", jupyter=False)

# Display the rendered HTML in the Jupyter Notebook
display(HTML(html))

This example demonstrates a simple usage for dependency parsing using the spaCy library. It loads a pre-trained model, performs dependency parsing on a sample text, and displays the dependency parse tree both in text format and as a visualization.

Now we will try to do something more interesting

# Applied Dependency Parsing: SVO Detection

One interesting application of dependency parsing is extracting relationships between entities in a sentence, such as subject-verb-object (SVO) triples. This can be useful for tasks like information extraction, knowledge graph construction, or question-answering systems.

Here's an example of how to extract SVO triples using the dependency parser from the spaCy library:

First we need to define a function to find the subject and object connected to a verb

In [None]:
def find_subject_object_pairs(parsed_sentence):
    subject = None
    obj = None
    pairs = []

    for token in parsed_sentence:
        if "subj" in token.dep_:
            subject = token
        if "obj" in token.dep_:
            obj = token

        if subject and obj:
            pairs.append((subject, token, obj))
            subject = None
            obj = None

    return pairs


Then, we need to define a function that uses the `find_subject_object_pairs` function to extract SVO triples from a text

In [None]:
def extract_svo_triples(text, nlp):
    doc = nlp(text)
    svo_triples = []

    for token in doc:
        if "subj" in token.dep_:
            subject = token
            verb = token.head
            for obj in verb.children:
                if "obj" in obj.dep_:
                    svo_triples.append((subject, verb, obj))
                elif obj.dep_ == "prep":
                    for pobj in obj.children:
                        if pobj.dep_ == "pobj":
                            svo_triples.append((subject, verb, pobj))

    return svo_triples


Finally, we can use the pre-trained model and the SVO extraction function to extract SVO triples from a sample text

In [None]:
text = "The quick brown fox jumps over the lazy dog. John bought a new car. Mary gave John a book."
svo_triples = extract_svo_triples(text, nlp)

for triple in svo_triples:
    print(f"Subject: {triple[0].text}, Verb: {triple[1].text}, Object: {triple[2].text}")


Subject: fox, Verb: jumps, Object: dog
Subject: John, Verb: bought, Object: car
Subject: Mary, Verb: gave, Object: book


# Question Answering by means of Dependency Parsing

Let's use the SVO extraction code to build a simple question-answering system. This system will be able to answer basic "who did what" questions based on a given text

In [None]:
def simple_qa(question, svo_triples):
    question_doc = nlp(question)
    question_verb = None

    for token in question_doc:
        if "VERB" in token.pos_:
            question_verb = token
            break

    if question_verb is not None:
        for triple in svo_triples:
            subject, verb, obj = triple
            if verb.lemma_ == question_verb.lemma_:
                return f"{subject.text} {verb.text} {obj.text}"

    return "I don't know the answer."


Let's test the QA syste that we have generated with some questions

In [None]:
text = "John bought a new car. Mary gave John a book. Alice traveled to Paris."

# Extract SVO triples from the text
svo_triples = extract_svo_triples(text, nlp)

# Test questions
questions = [
    "Who bought a car?",
    "What did John buy?",
    "Who gave John a book?",
    "What did Mary give to John?",
    "Who traveled to Paris?",
]

for question in questions:
    answer = simple_qa(question, svo_triples)
    print(f"Question: {question}\nAnswer: {answer}\n")


Question: Who bought a car?
Answer: John bought car

Question: What did John buy?
Answer: John bought car

Question: Who gave John a book?
Answer: Mary gave book

Question: What did Mary give to John?
Answer: Mary gave book

Question: Who traveled to Paris?
Answer: Alice traveled Paris



This example demonstrates how to use the SVO extraction code to build a simple question-answering system. The system can answer basic "who did what" questions based on a given text by matching the parsed question's main verb with the SVO triples extracted from the text. Note that this is a very simple system and may not be able to handle complex questions or understand variations in phrasing. However, it's a nice example of how to leverage SVO triples for a practical application.