The main purpose of this code is to demonstrate various natural language processing (NLP) tasks using the spaCy library, focusing on named entity recognition (NER). The code performs the following key operations:

    Load and Inspect the NLP Pipeline:
        Load a pre-trained English NLP model (en_core_web_sm) from spaCy.
        Display the components of the NLP pipeline.

    Named Entity Recognition (NER):
        Process a sample text to create a doc object, which contains the NLP analysis.
        Extract and print named entities from the text, along with their labels and explanations.
        Visualize the named entities using spaCy's displacy tool.
        Display the labels used by the NER component of the pipeline.

    Basic Entity Extraction and Explanation:
        Process another text to extract and print named entities, their labels, and explanations.

    Extended Entity Extraction with Character Positions:
        Process a text to extract named entities along with their start and end character positions in the text.
        Process a variation of the text to extract and print named entities and their labels.

    Span Operations and Custom Entity Creation:
        Extract a span of tokens from the processed document.
        Print the type of the extracted span.
        Create new named entity spans with specific labels.
        Add these custom entity spans to the document.
        Print the updated named entities in the document, including the newly added custom entities.

### Load and Inspect the NLP Pipeline / Named Entity Recognition (NER)

In [1]:
import spacy
from spacy.tokens import Span

In [2]:
# Load the pre-trained spaCy model for English
nlp = spacy.load("en_core_web_sm")
# Display the pipeline components
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [3]:
# Create a document by processing a text through the NLP pipeline
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
# Iterate over the named entities in the document and print their text, label, and label explanation
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Tesla Inc  |  ORG  |  Companies, agencies, institutions, etc.
$45 billion  |  MONEY  |  Monetary values, including unit


In [4]:
from spacy import displacy
# Visualize the named entities in the document using displaCy
displacy.render(doc, style="ent")

In [5]:
# Display the labels used by the NER component
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

### 1. Basic Entity Extraction and Explanation






In [6]:
# Process a new text to create another document
doc = nlp("Michael Bloomberg founded Bloomberg in 1982")
# Iterate over the named entities in the new document and print their text, label, and label explanation
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
Bloomberg | PERSON | People, including fictional
1982 | DATE | Absolute or relative dates or periods


### 2. Extended Entity Extraction with Character Positions


In [7]:
# Process another text to create a new document
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion")
# Iterate over the named entities in the document and print their text, label, and character positions
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", ent.start_char, "|", ent.end_char)

Tesla Inc  |  ORG  |  0 | 9
Twitter Inc  |  ORG  |  30 | 41
$45 billion  |  MONEY  |  46 | 57


In [8]:
# Process another variation of the text to create a new document
doc = nlp("Tesla is going to acquire Twitter for $45 billion")
# Iterate over the named entities in the document and print their text and label
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  PRODUCT
$45 billion  |  MONEY


### 3. Span Operations and Custom Entity Creation

In [9]:
# Extract a span of tokens from the document
s = doc[2:5]
s

going to acquire

In [10]:
# Display the type of the extracted span
type(s)

spacy.tokens.span.Span

In [11]:
# Create new named entity spans with specific labels
s1 = Span(doc, 0, 1, label="ORG")
s2 = Span(doc, 5, 6, label="ORG")

# Set the newly created spans as entities in the document
doc.set_ents([s1, s2], default="unmodified")

In [12]:
# Iterate over the named entities in the document and print their text and label
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  ORG
$45 billion  |  MONEY
