## Activity 1 (Topic 2) : Named Entity Recognition (NER using Spacy)  ###
---
This exercise will give demonstrate how NLTK can be used to extract NER from documents. it will also give the participants some experience in testing and adjusting the parameters of the NLTK NER tools.

### Step 1 : Importing and Loading Spacy
---

During processing, spaCy first tokenizes the text, i.e. segments it into words, punctuation and so on. This is done by applying rules specific to each language. For example, punctuation at the end of a sentence should be split off – whereas “U.K.” should remain one token. Each Doc consists of individual tokens, and we can iterate over them:

In [2]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(token.text)

Apple
is
looking
at
buying
U.K.
startup
for
$
1
billion


### Step 2 : linguistic annotations in Spacy
---
spaCy provides a variety of linguistic annotations to give you insights into a text’s grammatical structure. This includes the word types, like the parts of speech, and how the words are related to each other. For example, if you’re analyzing text, it makes a huge difference whether a noun is the subject of a sentence, or the object – or whether “google” is used as a verb, or refers to the website or company in a specific context.

In [3]:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(token.text, token.pos_, token.dep_)

Apple PROPN nsubj
is AUX aux
looking VERB ROOT
at ADP prep
buying VERB pcomp
U.K. PROPN dobj
startup NOUN dep
for ADP prep
$ SYM quantmod
1 NUM compound
billion NUM pobj


### Step 3 : Tokenization in Spacy
---
During processing, spaCy first tokenizes the text, i.e. segments it into words, punctuation and so on. This is done by applying rules specific to each language. For example, punctuation at the end of a sentence should be split off – whereas “U.K.” should remain one token. Each Doc consists of individual tokens, and we can iterate over them:

In [4]:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(token.text)

Apple
is
looking
at
buying
U.K.
startup
for
$
1
billion


### Step 4 : Named Entity Recognition using Spacy
---
A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction. Because models are statistical and strongly depend on the examples they were trained on, this doesn’t always work perfectly and might need some tuning later, depending on your use case.

Named entities are available as the ents property of a Doc:

In [5]:
def show_ents(doc):
  if doc.ents:
    for ent in doc.ents:
      print(ent.text+' - '+str(ent.start_char)+' - '+str(ent.end_char)+str(ent.label_)+' - '+str(spacy.explain(ent.label_)))
  else:
    print("No Named entities Found")


In [6]:
import spacy

ner = spacy.load("en_core_web_sm")
doc = ner("Doc Raga is teaching in AI-NLP in Batangas City, today (October 26, 2023)")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

print("\n\n")
doc2 = ner(u'May I go to Washington, DC next May to see the Washington Monument?')
show_ents(doc2)
#for ent in doc.ents:
#    print(ent.text, ent.start_char, ent.end_char, ent.label_)


Doc Raga 0 8 PERSON
AI-NLP 24 30 ORG
Batangas City 34 47 GPE
today 49 54 DATE
October 26, 2023 56 72 DATE



Washington, DC - 12 - 26GPE - Countries, cities, states
next May - 27 - 35DATE - Absolute or relative dates or periods
the Washington Monument - 43 - 66ORG - Companies, agencies, institutions, etc.


### Step 4 : Interface for testing Named Entity Recognition using Spacy
---

Enter a sample text and see if Spacy can identify the embedded NER

In [None]:
from spacy import displacy
sample_text = 'This is a sample text'
while sample_text != '?':
  sample_text = input('Enter a sample text:')
  print('Raw Text Data: \n',sample_text,'\n')
  doc = ner(sample_text)
  show_ents(doc)
  print('\n')
  displacy.render(doc,style="ent",jupyter=True)
print("Program ended")

Raw Text Data: 
 DENVER — Michael Malone thought of his father, legendary coach Brendan Malone, who recently passed away. The son wished his dad could have been at Ball Arena on Tuesday night to witness it all, the moment each of them wanted. Jamal Murray said the celebration of a championship was a lot better than he thought it would be. He, of course, was itching to just get to real basketball. And collectively, it was a great moment for the Denver Nuggets, watching a banner go high into the rafters.  Christian Braun and DeAndre Jordan allowed reporters to see and examine the bulky rings after the Nuggets took care of the Los Angeles Lakers in a 119-107 win before a sold-out and animated crowd. And the crowd, well, toward the end of Denver’s first win of the season, it chanted, “Who’s your daddy?” at the Lakers, an ode to a little trash talk from the championship parade in June.  For sure, there were plenty of extras Tuesday night as the curtain rose on another NBA season. But if the