# spaCy: Dependency Parsing + NER

spaCy is a free open-source library for [Natural Language Processing](https://spacy.io/usage/linguistic-features) in Python. It features NER, POS tagging, dependency parsing, word vectors and more.


In [None]:
import spacy
from spacy import displacy

In [None]:
# Load the English language model
# DOCUMENTATION:  https://spacy.io/api/top-level
nlp = spacy.load("en_core_web_sm")

In [None]:
#sentence = input('Enter a sentence: ')

In [None]:
#print(sentence)

In [None]:
sentence = "Issa and Iyvonne went to Inglewood. They work at HBO."


In [None]:
doc = nlp(sentence)

In [None]:
len(doc)

## Dependency Parsing

Tokenization and dependency parsing happens by default in : <br>
```doc = nlp(sentence)```
<br>
<br>
 Docs: [Dependency Parser](https://spacy.io/api/dependencyparser)





In [None]:
# view some attr of DependencyParser class: token.dep, token.head ...
for token in doc:
  print(f"{token}: {token.dep}")

## Dependency Parsing Visualization
Using `displaCy` from `spaCy`, we can visualize the dependency parsing tree with:


*   `displacy.render()`
*   `displacy.serve()`

The visualization can be customized using `options`.

<br> Docs: [Visualization](https://spacy.io/usage/visualizers)

In [None]:
# displacy.serve(doc,style='dep')

In [None]:
#use .render for colab/jupyter
displacy.render(doc ,style='dep', jupyter=True)

In [None]:
# styled viz
options = {"compact": True, "bg": "#09a3d5",
           "color": "white", "font": "Source Sans Pro"}
displacy.render(doc, style="dep", jupyter = True, options=options)

## Task: Name Entity Recognition (NER)

In [None]:
recognized_entities = []

for token in doc:
    # check if the token is a named entity, which is determined based on dependency parsing
    if token.ent_type_:
        entity_type = token.ent_type_
        recognized_entities.append((token.text, entity_type))

for entity, entity_type in recognized_entities:
    print(f"{entity} is a {entity_type}")

## Task: NER + Custom Rules

In [None]:
#custom rule function
@spacy.Language.component("custom_ner_rule")
def custom_ner_rule(doc):
  """
  Custom Named Entity Recognition (NER) rule to identify 'Issa' as a PERSON

  Args:
  doc (spacy.tokens.Doc): The spaCy processed document.

  Returns:
  spacy.tokens.Doc: The processed document with recognized entities.

  """
  entities = []
  for token in doc:
      # Check if the token is a named entity based on dependency parsing
      if token.ent_type_:
          entity_type = token.ent_type_
          recognized_entities.append((token.text, entity_type))

      # Add a custom rule to recognize "Issa" as a "PERSON"
      if token.text == "Issa":
          entities.append(("Issa", "PERSON"))


  # Create spans for recognized entities
  with doc.retokenize() as retokenizer:
      for entity_text, entity_type in entities:
          words = entity_text.split()
          start = doc.text.find(entity_text)
          end = start + len(entity_text)
          retokenizer.merge(doc[start:end], attrs={"ent_type": entity_type})

  return doc  # Return the processed Doc object

In [None]:
# sentence = """
# Rae, who also costars in the upcoming American Fiction with Jeffery Wright,
# recently debuted her very own Viarae Prosecco, an Italian sparkler she created
# and founded in partnership with E. & J.
# """

In [None]:
print(sentence)

In [None]:
# Add the custom rule to the pipeline
nlp.add_pipe("custom_ner_rule")
doc = nlp(sentence)

In [None]:
recognized_entities = []
recognized_entities

In [None]:
for token in doc:
    if token.ent_type_:
        entity_type = token.ent_type_
        recognized_entities.append((token.text, entity_type))

In [None]:
for entity, entity_type in recognized_entities:
    print(f"{entity} is a {entity_type}")

<font color="purple"> <b>Note:</b> There is a known issue with this output. </font>

Expected:
```
Issa is a PERSON
Iyvonne is a PERSON
Inglewood is a ORG
HBO is a ORG
```

[![Built with spaCy](https://img.shields.io/badge/built%20with-spaCy-09a3d5.svg)](https://spacy.io)