<a href="https://colab.research.google.com/github/Azizkhaled/NLP-with-Aziz/blob/main/NER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Named Entity Recognition (NER) With SpaCy

In [1]:
import spacy

## Download NER model

Well start by downloading the model. can be found at [spacy models](https://spacy.io/models)

In [2]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.6.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m32.1 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [3]:
nlp = spacy.load('en_core_web_sm')


In [4]:
txt = ("Given the recent downturn in stocks especially in tech which is likely to persist as yields keep going up, "
       "I thought it would be prudent to share the risks of investing in ARK ETFs, written up very nicely by "
       "[The Bear Cave](https://thebearcave.substack.com/p/special-edition-will-ark-invest-blow). The risks comes "
       "primarily from ARK's illiquid and very large holdings in small cap companies. ARK is forced to sell its "
       "holdings whenever its liquid ETF gets hit with outflows as is especially the case in market downturns. "
       "This could force very painful liquidations at unfavorable prices and the ensuing crash goes into a "
       "positive feedback loop leading into a death spiral enticing even more outflows and predatory shorts.")


### Perform and display NER

In [5]:
doc = nlp(txt)

In [8]:
from spacy import displacy
displacy.render(doc, style='ent')

'<div class="entities" style="line-height: 2.5; direction: ltr">Given the recent downturn in stocks especially in tech which is likely to persist as yields keep going up, I thought it would be prudent to share the risks of investing in \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    ARK\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n ETFs, written up very nicely by [\n<mark class="entity" style="background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    The Bear Cave](https://thebearcave.substack.com/p\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">ORG</span>\n</mark>\n/special-edition-will-ark-invest-blow). The risks comes primarily from \n<mark c

## Extract Entity

In [10]:
spacy.explain('GPE')

'Countries, cities, states'

In [15]:
doc.ents[0].label_

'GPE'

In [16]:
doc.ents[0].text

'ARK'

In [17]:
for entity in doc.ents:
    print(f"{entity.label_}: {entity.text}")

GPE: ARK
ORG: The Bear Cave](https://thebearcave.substack.com/p
ORG: ARK
ORG: ARK
ORG: ETF


We're almost there. Now, we need to filter out any entities that are not ORG entities, and append those remaining ORGs to an organization list:

In [18]:
# initialize our list
org_list = []

for entity in doc.ents:
    # if label_ is ORG, we append text, otherwise ignore
    if entity.label_ == 'ORG':
        org_list.append(entity.text)

org_list

['The Bear Cave](https://thebearcave.substack.com/p', 'ARK', 'ARK', 'ETF']

## Assignment

In [19]:
text= "Apple reached an all-time high stock price of 143 dollars this January."


In [24]:
doc = nlp(text)
displacy.render(doc, style='ent')

'<div class="entities" style="line-height: 2.5; direction: ltr">\n<mark class="entity" style="background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Apple\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">ORG</span>\n</mark>\n reached an all-time high stock price of \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    143 dollars\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">MONEY</span>\n</mark>\n \n<mark class="entity" style="background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    this January\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-le

In [25]:
for entity in doc.ents:
    print(f"{entity.label_}: {entity.text}")

ORG: Apple
MONEY: 143 dollars
DATE: this January


In [26]:
# initialize our list
org_list = []

for entity in doc.ents:
    # if label_ is ORG, we append text, otherwise ignore
    if entity.label_ == 'ORG':
        org_list.append(entity.text)

org_list

['Apple']