In NLP, "nlp.pipeline" refers to a series of processing steps that are applied to raw text data to perform specific NLP tasks. The outputs of the pipeline can vary depending on the components included, but some common outputs include:

- tok2vec: This is a vector representation of a token (word, phrase, etc.), often generated using techniques such as word embeddings or sentence embeddings. The goal of tok2vec is to capture the semantic meaning of the token in a numerical format that can be used by NLP algorithms.

- Tagger: A part-of-speech (POS) tagger is an NLP component that labels each token in a sentence with its corresponding part of speech (e.g., noun, verb, adjective, etc.). The tagger helps to understand the structure of the sentence and identify important words and phrases.

- Parser: A parser is an NLP component that analyzes the grammatical structure of a sentence to determine the relationships between its components. Parsing can provide information about the subject-verb relationships, noun phrases, and other linguistic features of a sentence.

- Named Entity Recognizer (NER): An NER is an NLP component that identifies named entities, such as people, organizations, and locations, in a text. NER can be useful for information extraction, question answering, and other NLP tasks.

- Dependency Parser: A dependency parser is an NLP component that determines the dependencies between words in a sentence. The parser can provide information about the relationships between words, such as subject-verb relationships, and can be useful for understanding the meaning of a sentence.

- Sentiment Analyzer: A sentiment analyzer is an NLP component that determines the sentiment expressed in a text. Sentiment analysis can be used to identify the overall mood or tone of a text and can be applied to various tasks, such as opinion mining, product reviews, and social media analysis.

- Text Classifier: A text classifier is an NLP component that assigns predefined categories or labels to a given text. Text classification can be used for tasks such as spam detection, sentiment analysis, and topic classification.

In [1]:
import spacy

In [2]:
nlp = spacy.blank("en")

doc = nlp("Hanuman can swim the sea in 30 hours. It took him 2 days from USA to India.")

for token in doc:
    print(token)

Hanuman
can
swim
the
sea
in
30
hours
.
It
took
him
2
days
from
USA
to
India
.


In [3]:
nlp.pipe_names

[]

In [5]:
# download en_core_web_sm(pytohn -m spacy download en_core_web_sm) -> a pre-trained pipeline for English language 

nlp = spacy.load("en_core_web_sm")

In [6]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x17b7974bfa0>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x17b7974bee0>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x17b7b8e2f90>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x17b7baa1780>),
 ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x17b7baee800>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x17b7b8e2d60>)]

In [7]:
doc = nlp("Hanuman can swim the sea in 30 hours. It took him 2 days from USA to India.")

for token in doc:
    print(token," | ", token.pos_, " | ", token.lemma_)

Hanuman  |  PROPN  |  Hanuman
can  |  AUX  |  can
swim  |  VERB  |  swim
the  |  DET  |  the
sea  |  NOUN  |  sea
in  |  ADP  |  in
30  |  NUM  |  30
hours  |  NOUN  |  hour
.  |  PUNCT  |  .
It  |  PRON  |  it
took  |  VERB  |  take
him  |  PRON  |  he
2  |  NUM  |  2
days  |  NOUN  |  day
from  |  ADP  |  from
USA  |  PROPN  |  USA
to  |  ADP  |  to
India  |  PROPN  |  India
.  |  PUNCT  |  .


In [8]:
doc = nlp("Tata Group had bought Jaguar Cars for $2.3 billion")

for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tata Group  |  ORG
Jaguar Cars  |  ORG
$2.3 billion  |  MONEY


In [9]:
doc = nlp("Tata Group had bought Jaguar Cars for $2.3 billion")

for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Tata Group  |  ORG  |  Companies, agencies, institutions, etc.
Jaguar Cars  |  ORG  |  Companies, agencies, institutions, etc.
$2.3 billion  |  MONEY  |  Monetary values, including unit


In [10]:
#To display the statement

from spacy import displacy

displacy.render(doc, style="ent")

In [14]:
doc = nlp("Mr.Tata founded Tata Inc in 1868 in Mumbai")

for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Tata  |  PERSON  |  People, including fictional
Tata Inc  |  ORG  |  Companies, agencies, institutions, etc.
1868  |  DATE  |  Absolute or relative dates or periods
Mumbai  |  GPE  |  Countries, cities, states
