# EXTRACTING SUBJECTS AND OBJECTS OF THE SENTENCE

Sometimes, we might need to fnd the subject and direct objects of the sentence, and that
can easily be accomplished with the spacy package.

# How to do it…
We will use the subtree attribute of tokens to fnd the complete noun chunk that is the
subject or direct object of the verb (see the Getting the dependency parse recipe for more
information). Let's get started:

IMPORT SPACY

In [1]:
import spacy

LOAD THE SPACY ENGINE

In [3]:
nlp = spacy.load("en_core_web_sm")

WE WILL GET THE LIST OF SENTENCES WE WILL BE PROCESSING:

In [8]:
sentences = ["The big black cat stared at the small dog.", "Jane watched her Brother in the evenings."]

We will use two functions to fnd the subject and the direct object of the sentence.
These functions will loop through the tokens and return the subtree that contains
the token with subj or dobj in the dependency tag, respectively. Here is the
subject function:

In [9]:
def get_subject_phrase(doc):
  for token in doc:
    if ("sub" in token.dep_):
      subtree = list(token.subtree)
      start = subtree[0].i
      end = subtree[-1].i + 1
      return doc[start:end]

Here is the direct object function. If the sentence does not have a direct object, it
will return None:

In [12]:
def get_object_phrase(doc):
  for token in doc:
    if ("dobj" in token.dep_):
      subtree = list(token.subtree)
      start = subtree[0].i
      end = subtree[-1].i + 1
      return doc[start:end]


We can now loop through the sentences and print out their subjects and objects:

In [13]:
for sentence in sentences:
  doc = nlp(sentence)
  subject_phrase = get_subject_phrase(doc)
  object_phrase = get_object_phrase(doc)
  print(subject_phrase)
  print(object_phrase)

The big black cat
None
Jane
her Brother


# How it works…
The code uses the spacy engine to parse the sentence. Then, the subject function loops
through the tokens, and if the dependency tag contains subj, it returns that token's
subtree, which is a Span object. There are diﬀerent subject tags, including nsubj for
regular subjects and nsubjpass for subjects of passive sentences, so we want to look for
both.
The object function works exactly the same as the subject function, except it looks for the
token that has dobj (direct object) in its dependency tag. Since not all sentences have
direct objects, it returns None in those cases.
In step 1, we import spaCy, and in step 2, we load the spacy engine. In step 3, we
initialize a list with the sentences we will be processing.
In step 4, we create the get_subject_phrase function, which gets the subject of the
sentence. It looks for the token that has a dependency tag that contains subj and then
returns the subtree that contains that token. Tere are several subject dependency tags,
including nsubj and nsubjpass (for a subject of a passive sentence), so we look for the
most general pattern.
EBSCOhost - printed on 2/9/2023 7:36 AM via . All use subject to https://www.ebsco.com/terms-of-useExtracting subjects and objects of the sentence 53
In step 5, we create the get_object_phrase function, which gets the direct object of
the sentence. It works similarly to the get_subject_phrase, but looks for the dobj
dependency tag instead of a tag that contains "subj".
In step 6, we loop through the list of sentences we created in step 3, and use the preceding
functions to fnd the subjects and direct objects in the sentences. For the sentence Te
big black cat stared at the small dog, the subject is the big black cat, and there is no direct
object (the small dog is the object of the preposition at). For the sentence Jane watched her
brother in the evenings, the subject is Jane and the direct object is her brother.