# Lab1-Assignment

Copyright: Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

This notebook describes the assignment for Lab 1 of the text mining course. 

**Points**: each exercise is prefixed with the number of points you can obtain for the exercise.

We assume you have worked through the following notebooks:
* **Lab1.1-introduction**
* **Lab1.2-introduction-to-NLTK**
* **Lab1.3-introduction-to-spaCy** 

In this assignment, you will process an English text (**Lab1-apple-samsung-example.txt**) with both NLTK and spaCy and discuss the similarities and differences.

# Students
- Chileshe Lukwesa (2675080)
- Denise Mooren (2659193)
- Hasan Shahoud (2631087)
- Mohamad Abdulfattah (2608683)

## Credits
The notebooks in this block have been originally created by [Marten Postma](https://martenpostma.github.io). Adaptations were made by [Filip Ilievski](http://ilievski.nl).

## Tip: how to read a file from disk
Let's open the file **Lab1-apple-samsung-example.txt** from disk.

In [1]:
from pathlib import Path

In [2]:
cur_dir = Path().resolve() # this should provide you with the folder in which this notebook is placed
path_to_file = Path.joinpath(cur_dir, 'Lab1-apple-samsung-example.txt')
print(path_to_file)
print('does path exist? ->', Path.exists(path_to_file))

C:\Users\hasan\Documents\development\tm\lab_sessions\lab1\solved\Lab1-apple-samsung-example.txt
does path exist? -> True


If the output from the code cell above states that **does path exist? -> False**, please check that the file **Lab1-apple-samsung-example.txt** is in the same directory as this notebook.

In [3]:
with open(path_to_file) as infile:
    text = infile.read()

print('number of characters', len(text))

number of characters 1142


## [total points: 4] Exercise 1: NLTK
In this exercise, we use NLTK to apply **Part-of-speech (POS) tagging**, **Named Entity Recognition (NER)**, and **Constituency parsing**. The following code snippet already performs sentence splitting and tokenization. 

In [4]:
import nltk
from nltk.tokenize import sent_tokenize
from nltk import word_tokenize

In [5]:
sentences_nltk = sent_tokenize(text)

In [6]:
tokens_per_sentence = []
for sentence_nltk in sentences_nltk:
    sent_tokens = word_tokenize(sentence_nltk)
    tokens_per_sentence.append(sent_tokens)

We will use lists to keep track of the output of the NLP tasks. We can hence inspect the output for each task using the index of the sentence.

In [7]:
sent_id = 1
print('SENTENCE', sentences_nltk[sent_id])
print('TOKENS', tokens_per_sentence[sent_id])

SENTENCE The six phones and tablets affected are the Galaxy S III, running the new Jelly Bean system, the Galaxy Tab 8.9 Wifi tablet, the Galaxy Tab 2 10.1, Galaxy Rugby Pro and Galaxy S III mini.
TOKENS ['The', 'six', 'phones', 'and', 'tablets', 'affected', 'are', 'the', 'Galaxy', 'S', 'III', ',', 'running', 'the', 'new', 'Jelly', 'Bean', 'system', ',', 'the', 'Galaxy', 'Tab', '8.9', 'Wifi', 'tablet', ',', 'the', 'Galaxy', 'Tab', '2', '10.1', ',', 'Galaxy', 'Rugby', 'Pro', 'and', 'Galaxy', 'S', 'III', 'mini', '.']


### [point: 1] Exercise 1a: Part-of-speech (POS) tagging
Use `nltk.pos_tag` to perform part-of-speech tagging on each sentence.

Use `print` to **show** the output in the notebook (and hence also in the exported PDF!).

In [8]:
pos_tags_per_sentence = []
for tokens in tokens_per_sentence:
    pos = nltk.pos_tag(tokens)
    print(pos, '\n')
    pos_tags_per_sentence.append(pos)

[('https', 'NN'), (':', ':'), ('//www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html', 'JJ'), ('Documents', 'NNS'), ('filed', 'VBN'), ('to', 'TO'), ('the', 'DT'), ('San', 'NNP'), ('Jose', 'NNP'), ('federal', 'JJ'), ('court', 'NN'), ('in', 'IN'), ('California', 'NNP'), ('on', 'IN'), ('November', 'NNP'), ('23', 'CD'), ('list', 'NN'), ('six', 'CD'), ('Samsung', 'NNP'), ('products', 'NNS'), ('running', 'VBG'), ('the', 'DT'), ('``', '``'), ('Jelly', 'RB'), ('Bean', 'NNP'), ("''", "''"), ('and', 'CC'), ('``', '``'), ('Ice', 'NNP'), ('Cream', 'NNP'), ('Sandwich', 'NNP'), ("''", "''"), ('operating', 'VBG'), ('systems', 'NNS'), (',', ','), ('which', 'WDT'), ('Apple', 'NNP'), ('claims', 'VBZ'), ('infringe', 'VB'), ('its', 'PRP$'), ('patents', 'NNS'), ('.', '.')] 

[('The', 'DT'), ('six', 'CD'), ('phones', 'NNS'), ('and', 'CC'), ('tablets', 'NNS'), ('affected', 'VBN'), ('are', 'VBP'), ('the', 'DT'), ('Galaxy', 'NNP'), ('S', 'NNP'), ('III', 'NN

### [point: 1] Exercise 1b: Named Entity Recognition (NER)
Use `nltk.chunk.ne_chunk` to perform Named Entity Recognition (NER) on each sentence.

Use `print` to **show** the output in the notebook (and hence also in the exported PDF!).

In [9]:
ner_tags_per_sentence = []
for tokens_pos_tagged in pos_tags_per_sentence:
  named_entities = nltk.chunk.ne_chunk(tokens_pos_tagged)
  print(named_entities, '\n')
  ner_tags_per_sentence.append(named_entities)

(S
  https/NN
  :/:
  //www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html/JJ
  Documents/NNS
  filed/VBN
  to/TO
  the/DT
  (ORGANIZATION San/NNP Jose/NNP)
  federal/JJ
  court/NN
  in/IN
  (GPE California/NNP)
  on/IN
  November/NNP
  23/CD
  list/NN
  six/CD
  (ORGANIZATION Samsung/NNP)
  products/NNS
  running/VBG
  the/DT
  ``/``
  Jelly/RB
  (GPE Bean/NNP)
  ''/''
  and/CC
  ``/``
  Ice/NNP
  Cream/NNP
  Sandwich/NNP
  ''/''
  operating/VBG
  systems/NNS
  ,/,
  which/WDT
  (PERSON Apple/NNP)
  claims/VBZ
  infringe/VB
  its/PRP$
  patents/NNS
  ./.) 

(S
  The/DT
  six/CD
  phones/NNS
  and/CC
  tablets/NNS
  affected/VBN
  are/VBP
  the/DT
  (ORGANIZATION Galaxy/NNP)
  S/NNP
  III/NNP
  ,/,
  running/VBG
  the/DT
  new/JJ
  (PERSON Jelly/NNP Bean/NNP)
  system/NN
  ,/,
  the/DT
  (ORGANIZATION Galaxy/NNP)
  Tab/NNP
  8.9/CD
  Wifi/NNP
  tablet/NN
  ,/,
  the/DT
  (ORGANIZATION Galaxy/NNP)
  Tab/NNP
  2/CD
  10.1/CD
  ,/,
  (

### [points: 2] Exercise 1c: Constituency parsing
Use the `nltk.RegexpParser` to perform constituency parsing on each sentence.

Use `print` to **show** the output in the notebook (and hence also in the exported PDF!).

In [10]:
constituent_parser = nltk.RegexpParser('''
NP: {<DT>? <JJ>* <NN>*} # NP
P: {<IN>}           # Preposition
V: {<V.*>}          # Verb
PP: {<P> <NP>}      # PP -> P NP
VP: {<V> <NP|PP>*}  # VP -> V (NP|PP)*''')

In [11]:
constituency_output_per_sentence = []
for entity in ner_tags_per_sentence:
  constituent_structure = constituent_parser.parse(entity)
  print(constituent_structure, '\n')
  constituency_output_per_sentence.append(constituent_structure)

(S
  (NP https/NN)
  :/:
  (NP
    //www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html/JJ)
  Documents/NNS
  (VP (V filed/VBN))
  to/TO
  (NP the/DT)
  (ORGANIZATION San/NNP Jose/NNP)
  (NP federal/JJ court/NN)
  (P in/IN)
  (GPE California/NNP)
  (P on/IN)
  November/NNP
  23/CD
  (NP list/NN)
  six/CD
  (ORGANIZATION Samsung/NNP)
  products/NNS
  (VP (V running/VBG) (NP the/DT))
  ``/``
  Jelly/RB
  (GPE Bean/NNP)
  ''/''
  and/CC
  ``/``
  Ice/NNP
  Cream/NNP
  Sandwich/NNP
  ''/''
  (VP (V operating/VBG))
  systems/NNS
  ,/,
  which/WDT
  (PERSON Apple/NNP)
  (VP (V claims/VBZ))
  (VP (V infringe/VB))
  its/PRP$
  patents/NNS
  ./.) 

(S
  (NP The/DT)
  six/CD
  phones/NNS
  and/CC
  tablets/NNS
  (VP (V affected/VBN))
  (VP (V are/VBP) (NP the/DT))
  (ORGANIZATION Galaxy/NNP)
  S/NNP
  III/NNP
  ,/,
  (VP (V running/VBG) (NP the/DT new/JJ))
  (PERSON Jelly/NNP Bean/NNP)
  (NP system/NN)
  ,/,
  (NP the/DT)
  (ORGANIZATION Gala

Augment the RegexpParser so that it also detects Named Entity Phrases (NEP), e.g., that it detects *Galaxy S III* and *Ice Cream Sandwich*

In [12]:
constituent_parser_v2 = nltk.RegexpParser('''
NP: {<DT>? <JJ>* <NN>*} # NP
P: {<IN>}           # Preposition
V: {<V.*>}          # Verb
PP: {<P> <NP>}      # PP -> P NP
VP: {<V> <NP|PP>*}  # VP -> V (NP|PP)*
NEP: {<NNP>*}             # NEP -> NNP''')

In [13]:
constituency_v2_output_per_sentence = []
for entity in ner_tags_per_sentence:
  constituent_structure = constituent_parser_v2.parse(entity)
  # Note: NLTK recognizes some NNP as an organization or person, therefore they are not nested under NEP properly.
  print(constituent_structure)
  constituency_v2_output_per_sentence.append(constituent_structure)

(S
  (NP https/NN)
  :/:
  (NP
    //www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html/JJ)
  Documents/NNS
  (VP (V filed/VBN))
  to/TO
  (NP the/DT)
  (ORGANIZATION San/NNP Jose/NNP)
  (NP federal/JJ court/NN)
  (P in/IN)
  (GPE California/NNP)
  (P on/IN)
  (NEP November/NNP)
  23/CD
  (NP list/NN)
  six/CD
  (ORGANIZATION Samsung/NNP)
  products/NNS
  (VP (V running/VBG) (NP the/DT))
  ``/``
  Jelly/RB
  (GPE Bean/NNP)
  ''/''
  and/CC
  ``/``
  (NEP Ice/NNP Cream/NNP Sandwich/NNP)
  ''/''
  (VP (V operating/VBG))
  systems/NNS
  ,/,
  which/WDT
  (PERSON Apple/NNP)
  (VP (V claims/VBZ))
  (VP (V infringe/VB))
  its/PRP$
  patents/NNS
  ./.)
(S
  (NP The/DT)
  six/CD
  phones/NNS
  and/CC
  tablets/NNS
  (VP (V affected/VBN))
  (VP (V are/VBP) (NP the/DT))
  (ORGANIZATION Galaxy/NNP)
  (NEP S/NNP III/NNP)
  ,/,
  (VP (V running/VBG) (NP the/DT new/JJ))
  (PERSON Jelly/NNP Bean/NNP)
  (NP system/NN)
  ,/,
  (NP the/DT)
  (ORGANIZ

## [total points: 1] Exercise 2: spaCy
Use Spacy to process the same text as you analyzed with NLTK.

In [14]:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')

In [15]:
doc = nlp(text) # insert code here

In [16]:
sentences_spacy = list(doc.sents)

In [17]:
# print each sentence, and its tokens with their pos.
for sent in sentences_spacy:
  print('SENTENCE: ', sent, '\n')
  for token in sent:
    print('TOKEN: "{}"'.format(token), token.pos_, token.tag_)


SENTENCE:  https://www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html

Documents filed to the San Jose federal court in California on November 23 list six Samsung products running the "Jelly Bean" and "Ice Cream Sandwich" operating systems, which Apple claims infringe its patents. 

TOKEN: "https://www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html" PROPN NNP
TOKEN: "

" SPACE _SP
TOKEN: "Documents" NOUN NNS
TOKEN: "filed" VERB VBD
TOKEN: "to" ADP IN
TOKEN: "the" DET DT
TOKEN: "San" PROPN NNP
TOKEN: "Jose" PROPN NNP
TOKEN: "federal" ADJ JJ
TOKEN: "court" NOUN NN
TOKEN: "in" ADP IN
TOKEN: "California" PROPN NNP
TOKEN: "on" ADP IN
TOKEN: "November" PROPN NNP
TOKEN: "23" NUM CD
TOKEN: "list" NOUN NN
TOKEN: "six" NUM CD
TOKEN: "Samsung" PROPN NNP
TOKEN: "products" NOUN NNS
TOKEN: "running" VERB VBG
TOKEN: "the" DET DT
TOKEN: """ PUNCT ``
TOKEN: "Jelly" PROPN NNP
TOKEN: "Bean" PROPN N

In [18]:
# Named entity recognition
displacy.render(doc, jupyter=True, style='ent')

In [19]:
# Dependency and Constituency parsing
for sent in sentences_spacy:
  print('SENTENCE: ', sent, '\n')
  for token in sent:
    print('TOKEN: "{}"'.format(token), token.dep_, token.head)

SENTENCE:  https://www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html

Documents filed to the San Jose federal court in California on November 23 list six Samsung products running the "Jelly Bean" and "Ice Cream Sandwich" operating systems, which Apple claims infringe its patents. 

TOKEN: "https://www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html" compound 


TOKEN: "

" dep 


TOKEN: "Documents" appos 


TOKEN: "filed" acl Documents
TOKEN: "to" prep filed
TOKEN: "the" det court
TOKEN: "San" nmod Jose
TOKEN: "Jose" nmod court
TOKEN: "federal" amod court
TOKEN: "court" pobj to
TOKEN: "in" prep court
TOKEN: "California" pobj in
TOKEN: "on" prep filed
TOKEN: "November" pobj on
TOKEN: "23" nummod November
TOKEN: "list" appos 


TOKEN: "six" nummod products
TOKEN: "Samsung" compound products
TOKEN: "products" appos list
TOKEN: "running" acl products
TOKEN: "the" det Bean
TOKEN: """ 

In [20]:
# The drawing is huge and doesn't fit the width of the screen, therefore, we add prints in the previous cell.

small tip: You can use **sents = list(doc.sents)** to be able to use the index to access a sentence like **sents[2]** for the third sentence.


## [total points: 7] Exercise 3: Comparison NLTK and spaCy
We will now compare the output of NLTK and spaCy, i.e., in what do they differ?

### [points: 3] Exercise 3a: Part of speech tagging
Compare the output from NLTK and spaCy regarding part of speech tagging.

* To compare, you probably would like to compare sentence per sentence. Describe if the sentence splitting is different for NLTK than for spaCy. If not, where do they differ?
* After checking the sentence splitting, select a sentence for which you expect interesting results and perhaps differences. Motivate your choice.
* Compare the output in `token.tag` from spaCy to the part of speech tagging from NLTK for each token in your selected sentence. Are there any differences? This is not a trick question; it is possible that there are no differences.

In [21]:
sents_nltk_len = len(sentences_nltk)
# remove the sentence that contains only the new line string.
sentences_spacy = [s for s in sentences_spacy if s.text != '\n']
sents_spacy_len = len(sentences_spacy)
max_sents_len = min(sents_nltk_len, sents_spacy_len)

In [22]:
for i in range(max_sents_len):
  print(i+1)
  if sents_nltk_len > i:
    print('NLTK: ', sentences_nltk[i], '\n')
  if sents_spacy_len > i:
    print('SPACY: ', sentences_spacy[i], '\n')  

1
NLTK:  https://www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html

Documents filed to the San Jose federal court in California on November 23 list six Samsung products running the "Jelly Bean" and "Ice Cream Sandwich" operating systems, which Apple claims infringe its patents. 

SPACY:  https://www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html

Documents filed to the San Jose federal court in California on November 23 list six Samsung products running the "Jelly Bean" and "Ice Cream Sandwich" operating systems, which Apple claims infringe its patents. 

2
NLTK:  The six phones and tablets affected are the Galaxy S III, running the new Jelly Bean system, the Galaxy Tab 8.9 Wifi tablet, the Galaxy Tab 2 10.1, Galaxy Rugby Pro and Galaxy S III mini. 

SPACY:  The six phones and tablets affected are the Galaxy S III, running the new Jelly Bean system, the Galaxy Tab 8.9 Wifi table

**We select the very firs sentence. This is because it contains a URL, names of people and companies and locations.**

In [23]:
selected_sent_nltk = pos_tags_per_sentence[0]
selected_sent_spacy = sentences_spacy[0]
print("SPACY and NLTK POS TAGGING: ")
for tag in selected_sent_nltk:
  print("NLTK: ", tag)
for token in selected_sent_spacy:
  print('SPACY: ', token.text, token.tag_,)

SPACY and NLTK POS TAGGING: 
NLTK:  ('https', 'NN')
NLTK:  (':', ':')
NLTK:  ('//www.telegraph.co.uk/technology/apple/9702716/Apple-Samsung-lawsuit-six-more-products-under-scrutiny.html', 'JJ')
NLTK:  ('Documents', 'NNS')
NLTK:  ('filed', 'VBN')
NLTK:  ('to', 'TO')
NLTK:  ('the', 'DT')
NLTK:  ('San', 'NNP')
NLTK:  ('Jose', 'NNP')
NLTK:  ('federal', 'JJ')
NLTK:  ('court', 'NN')
NLTK:  ('in', 'IN')
NLTK:  ('California', 'NNP')
NLTK:  ('on', 'IN')
NLTK:  ('November', 'NNP')
NLTK:  ('23', 'CD')
NLTK:  ('list', 'NN')
NLTK:  ('six', 'CD')
NLTK:  ('Samsung', 'NNP')
NLTK:  ('products', 'NNS')
NLTK:  ('running', 'VBG')
NLTK:  ('the', 'DT')
NLTK:  ('``', '``')
NLTK:  ('Jelly', 'RB')
NLTK:  ('Bean', 'NNP')
NLTK:  ("''", "''")
NLTK:  ('and', 'CC')
NLTK:  ('``', '``')
NLTK:  ('Ice', 'NNP')
NLTK:  ('Cream', 'NNP')
NLTK:  ('Sandwich', 'NNP')
NLTK:  ("''", "''")
NLTK:  ('operating', 'VBG')
NLTK:  ('systems', 'NNS')
NLTK:  (',', ',')
NLTK:  ('which', 'WDT')
NLTK:  ('Apple', 'NNP')
NLTK:  ('claims', 'VB

As expected, there are several differences between Spacy and NLTK taggings. Most noticeable are:
1. URL: spacy considers the inputted URL as a **proper noun (NNP)**, while NLTK splits it into two different parts: (1) a singular noun (NN) and an adjective (JJ). This later is most likely due to the semi-colon (:). 
2. Nouns: NLTK confused the _"Jelly Bean"_ noun, despite the surrounding quotes. It assumed that Jelly was an adverb (RB) and Bean a proper noun (NNP). Whereas Spacy seemed to assign the correct tags to both (NNP).
3. Verbs: again NLTK couldn't extract the proper PoS of the subsentence: _"which Apple claims infringe its patents."_ It considered **claims** as a 3rd person present tense verb (VBZ) and **infringe** non-3rd person present tense verb (VBP). On the other hand, spacy was smart enough to recognize that "Apple claims" were nouns; the first singular and second plural (NNP and NNPS respectively.) As a result, *infringe* was considered as a non-3rd person singular present verb (VBP), which obviously is correct.
4. Perhaps not so important, but NLTK assigns TO to the preposition _"to"_, while spacy assigns IN (preposition or subordinating conjunction) to it.

### [points: 2] Exercise 3b: Named Entity Recognition (NER)
* Describe differences between the output from NLTK and spaCy for Named Entity Recognition. Which one do you think performs better?


Overall, spaCy performs better than NLTK at Named Entity Recognition. That is, while spaCy consistently recognizes 'Apple' and 'Samsung' as organizations, NLTK labels these as persons and/or geo-political entities.

Some other detailed examples of the differences between Spacy and NLTK:

- GPE: the city San Jose is recognized by the spacy NER as a Geopolitical entity(GPE) whilst the NLTK NER recognizes it as an organization. However, when it comes to California for example the NLTK NER does read it as a GPE. Additionally, the spacy NER also mislabels 'the Galaxy S III' as a Geopolitical entity (GPE).
- DATE: the NLTK NER does not seem to recognize the date 'November 23', instead it classifies the word 'November' without the number as a Proper Noun, Singular(NNP) and '23' as a Cardinal Number(CD). The spacy NER does label it as DATE.
- ORG: When it comes to organizations the spacy NER also seems to perform better than the NLTK. For instance, the company 'Apple' in the first sentence is recoginzed by the NLTK NER as a Person. In the third sentence the NLTK NER recognizes 'Apple' as a GPE, which is still incorrect.
- PERSON: As described above the NLTK NER sees 'Apple' incorrectly as a Person. The same goes for 'Galaxy Rugby Pro', 'Galaxy S', 'Samsung'.
- NORP: The word 'South Korean' is recognized as a location by the NLTK NER whilst also still being labeled correctly as an Adjective (JJ) to the word 'firm'. The spacy NER labels it as a Nationalities or religious or political groups(NORP), which also comes somewhat close to the actual meaning.
- MONEY: The spacy NER has correcly labeled the '1.05bn' as money. The NLTK NER on the other hand does not seem to have a label for money so it only recognizes it as a Cardinal Number(CD).

### [points: 2] Exercise 3c: Constituency/dependency parsing
Choose one sentence from the text and run constituency parsing using NLTK and dependency parsing using spaCy.
* describe briefly the difference between constituency parsing and dependency parsing
* describe differences between the output from NLTK and spaCy.

In [24]:
nltk_sentence = pos_tags_per_sentence[2]
parser_output = constituent_parser.parse(nltk_sentence)
print(parser_output)

(S
  Apple/NNP
  (VP (V stated/VBD))
  it/PRP
  (VP (V had/VBD))
  (VP (V â€œacted/VBN))
  quickly/RB
  and/CC
  diligently/RB
  ''/''
  (PP (P in/IN) (NP order/NN))
  to/TO
  ``/``
  (VP (V determine/VB) (PP (P that/IN) (NP these/DT)))
  newly/RB
  (VP (V released/VBN))
  products/NNS
  (VP (V do/VBP))
  (VP
    (V infringe/VB)
    (NP many/JJ)
    (PP (P of/IN) (NP the/DT same/JJ)))
  claims/NNS
  already/RB
  (VP (V asserted/VBN))
  (P by/IN)
  Apple/NNP
  ./.
  ''/'')


In [25]:
sents = list(doc.sents)
spacy_sentence = sents[3]

for token in spacy_sentence:
    print(token.text, token.dep_, token.head)


 dep stated
Apple nsubj stated
stated ROOT stated
it nsubj â€œacted
had aux â€œacted
â€œacted ccomp stated
quickly advmod â€œacted
and cc quickly
diligently conj quickly
" punct â€œacted
in prep â€œacted
order pobj in
to aux determine
" punct determine
determine acl order
that mark infringe
these det products
newly advmod released
released amod products
products nsubj infringe
do aux infringe
infringe ccomp determine
many dobj infringe
of prep many
the det claims
same amod claims
claims pobj of
already advmod asserted
asserted acl claims
by agent asserted
Apple pobj by
. punct stated
" punct stated


**Describe briefly the difference between constituency parsing and dependency parsing**  
Constitiuency parsing identifies the distinct phrases that make up a sentence e.g. noun phrases, verb phrases, etc. while dependency parsing indicates the syntactic function of words in a sentence e.g. subject, object, etc. as well as mapping how individual words and phrases relate to one another in the sentence.

**Describe differences between the output from NLTK and spaCy.**  
The output from NLTK marks out the distinct phrases and sub-phrases contained within the full sentence using nesting. The output from spaCy, on the other hand, indicates the syntactic function of individual tokens and only relates them to the token that is the head of the phrase that the token appears in, without indicating the order of phrases in the sentence. For instance, *Apple*  is recognized by Spacy as a noun subject, whereas by *NLTK* as an NNP (proper noun.)


# End of this notebook