## Spacy

In [1]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.5.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [2]:
import spacy

nlp = spacy.load('en_core_web_sm')

In [3]:
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

In [4]:
for ent in doc.ents:
    print(ent.label_, ent.text)


ORG Apple
GPE U.K.
MONEY $1 billion


In [5]:
from spacy import displacy

displacy.render(doc, style='ent', jupyter=True)


**All Spacy NER Tags. Some may overlap**


* PERSON - People, including fictional.
* NORP - Nationalities or religious or political groups.
* FAC - Buildings, airports, highways, bridges, etc.
* ORG - Companies, agencies, institutions, etc.
* GPE - Countries, cities, states.
* LOC - Non-GPE locations, mountain ranges, bodies of water.
* PRODUCT - Objects, vehicles, foods, etc. (Not services)
* EVENT - Named hurricanes, battles, wars, sports events, etc.
* WORK_OF_ART - Titles of books, songs, etc.
* LAW - Named documents made into laws.
* LANGUAGE - Any named language.
* DATE - Absolute or relative dates or periods.
* TIME - Times smaller than a day.
* PERCENT - Percentage, including "%".
* MONEY - Monetary values, including unit.
* QUANTITY - Measurements, as of weight or distance.
* ORDINAL - "first", "second", etc.
* CARDINAL - Numerals that do not fall under another type.


*Spacy's NER tags are based on the OntoNotes 5 named entity annotation scheme. OntoNotes is a large corpus of English-language text that has been manually annotated with various linguistic annotations, including named entity tags. The OntoNotes 5 corpus was created by a collaboration of researchers from several institutions, and it includes annotations for a range of named entity types, including those used by Spacy.*

*OntoNotes is a widely used resource in natural language processing (NLP) research, and its named entity annotation scheme is considered to be a standard in the field. Spacy's adoption of this standard allows it to be interoperable with other NLP tools and resources that use the same annotation scheme.*

## Custom