# Quickstart
See more: https://spacy.io/usage/models

In [48]:
# English version
import spacy

nlp = spacy.load("en_core_web_sm")
docs = nlp("Tesla Inc stock price increased by 100 points today.")

for i in docs:
    print(i.text, "|", i.pos_)
print("--------------------")
for j in docs.ents:
    print(j.text, "|", j.label_, "|", spacy.explain(j.label_))

Tesla | PROPN
Inc | PROPN
stock | NOUN
price | NOUN
increased | VERB
by | ADP
100 | NUM
points | NOUN
today | NOUN
. | PUNCT
--------------------
Tesla Inc | ORG | Companies, agencies, institutions, etc.
100 | CARDINAL | Numerals that do not fall under another type
today | DATE | Absolute or relative dates or periods


# Use displacy to show 

In [47]:
from spacy import displacy

displacy.render(docs,style="dep")
displacy.render(docs,style="ent")

# pipe_names attribute in Spacy
- `tok2vec`: A component that converts tokens (words) into numerical vectors, which is a common first step in processing text for machine learning models.
- `tagger`: A part-of-speech tagging component that assigns part-of-speech labels (like noun, verb, adjective, etc.) to each token in the text.
- `parser`: A dependency parser that analyzes the grammatical structure of a sentence, establishing relationships between "head" words and words that modify those heads.
- `attribute_ruler`: A component that can be used to set token attributes based on pattern rules. It's useful for customizing the tokenization process.
- `lemmatizer`: A component that reduces words to their base or root form (lemma). For example, the lemma of "running" is "run".
- `ner`: Named Entity Recognition component that identifies named entities (like person names, locations, organizations, etc.) in the text.

In [54]:
nlp.pipe_names 

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [52]:
# See what entities Spacy supports
nlp.pipe_labels["ner"]

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

In [62]:
# English version
import spacy

nlp = spacy.load("en_core_web_sm")
docs = nlp("Tesla Inc stock price increased by 100 points today.")

# Token
for i in docs:
    print(i, "|", type(i))

print("--------------------")
# Span
print(docs[2:5], "|", type(docs[2:5]))

Tesla | <class 'spacy.tokens.token.Token'>
Inc | <class 'spacy.tokens.token.Token'>
stock | <class 'spacy.tokens.token.Token'>
price | <class 'spacy.tokens.token.Token'>
increased | <class 'spacy.tokens.token.Token'>
by | <class 'spacy.tokens.token.Token'>
100 | <class 'spacy.tokens.token.Token'>
points | <class 'spacy.tokens.token.Token'>
today | <class 'spacy.tokens.token.Token'>
. | <class 'spacy.tokens.token.Token'>
--------------------
stock price increased | <class 'spacy.tokens.span.Span'>


# Labaling the entities

In [70]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
docs = nlp("Tsmc invested Nvdia for $100 million.")

for j in docs.ents:
    print(j.text, "|", j.label_, "|", spacy.explain(j.label_))

displacy.render(docs,style="ent")

Nvdia | GPE | Countries, cities, states
$100 million | MONEY | Monetary values, including unit


In [72]:
import spacy
from spacy import displacy
from spacy.tokens import Span

span1 = Span(docs, 0, 1, label="ORG")
span2 = Span(docs, 2, 3, label="ORG")

docs.set_ents([span1, span2],default="unmodified")

for j in docs.ents:
    print(j.text, "|", j.label_, "|", spacy.explain(j.label_))

displacy.render(docs,style="ent")

Tsmc | ORG | Companies, agencies, institutions, etc.
Nvdia | ORG | Companies, agencies, institutions, etc.
$100 million | MONEY | Monetary values, including unit
