# SpaCy

“Industrial-Strength” NLP Library
- very qucky
- Easy to install and use
- Well integrated with other ML and DL libraries


### Installation and models


In [1]:

#!pip install spacy
#!python -m spacy download en_core_web_md # scarica un modello per l'inglese
#!python -m spacy download it_core_news_sm # scarica un modello per l'italiano

### Loading the model

- the model is initialized within an object calling the `spacy.load()` with the model we want to load as a parameter


In [2]:
import spacy
nlp = spacy.load("en_core_web_md")

### Text annotation

To annotate some text, we pass a string to the `nlp`object (the SpaCy model)

In [3]:
doc = nlp("This is a sentence in English. This is another sentence, and we want to analyze them with Spacy.")
doc

This is a sentence in English. This is another sentence, and we want to analyze them with Spacy.

The call returns a `Document` object with all the _default_ SpaCy annotations

### Annotations

- The `Document` object is an _iterabile_ containing all the sentences and their tokens

In [4]:
for sentence in doc.sents:
    print(sentence)

This is a sentence in English.
This is another sentence, and we want to analyze them with Spacy.


- Tokens are ojects as well that contain all the token-level annotations

In [5]:
for sentence in doc.sents:
    for token in sentence:
        print(f"{token.text}\t{token.lemma_}\t{token.pos_}\t{token.tag_}\t{token.dep_}")

This	this	PRON	DT	nsubj
is	be	AUX	VBZ	ROOT
a	a	DET	DT	det
sentence	sentence	NOUN	NN	attr
in	in	ADP	IN	prep
English	English	PROPN	NNP	pobj
.	.	PUNCT	.	punct
This	this	PRON	DT	nsubj
is	be	AUX	VBZ	ROOT
another	another	DET	DT	det
sentence	sentence	NOUN	NN	attr
,	,	PUNCT	,	punct
and	and	CCONJ	CC	cc
we	we	PRON	PRP	nsubj
want	want	VERB	VBP	conj
to	to	PART	TO	aux
analyze	analyze	VERB	VB	xcomp
them	they	PRON	PRP	dobj
with	with	ADP	IN	prep
Spacy	Spacy	PROPN	NNP	pobj
.	.	PUNCT	.	punct


- Using `displacy` it is possible to visualize some of the annotations, like the dependencies

In [6]:
from spacy import displacy

displacy.render(doc, style="dep", jupyter = True)

- The doc/sent object contains also Named Entities

In [7]:
text = "Hi my name is Alessandro and I am a researcher at University of Pisa."

doc = nlp(text)

doc.ents

(Alessandro, University of Pisa)

- Named entities can be visualized with `displacy` as well

In [8]:
displacy.render(doc, style="ent", jupyter = True)

### And much more

- SpaCy is very feature-rich
- Useful at all steps of a linguistic annotation pipeline
- Clear and well documented
- Many tutorials at https://spacy.io/usage/spacy-101

# SciSpaCy

## Installations and Models

SciSpaCy is avaliable via pip.

To install it:

In [9]:
#!python -m pip install scispacy

To install models:

In [10]:
#!python -m pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_sm-0.5.3.tar.gz

Models are then loaded as SpaCy objects with the same characteristics

In [11]:
import spacy
nlp = spacy.load("en_core_sci_sm")

text = "Alterations in the hypocretin receptor 2 and preprohypocretin genes produce narcolepsy in some animals."

doc = nlp(text)
for sentence in doc.sents:
    for token in sentence:
        print(f"{token.text}\t{token.lemma_}\t{token.pos_}\t{token.tag_}\t{token.dep_}")

Alterations	alteration	NOUN	NNS	nsubj
in	in	ADP	IN	case
the	the	DET	DT	det
hypocretin	hypocretin	NOUN	NN	compound
receptor	receptor	NOUN	NN	nmod
2	2	NUM	CD	nummod
and	and	CCONJ	CC	cc
preprohypocretin	preprohypocretin	NOUN	NN	compound
genes	gene	NOUN	NNS	conj
produce	produce	VERB	VBP	ROOT
narcolepsy	narcolepsy	NOUN	NN	dobj
in	in	ADP	IN	case
some	some	DET	DT	det
animals	animal	NOUN	NNS	nmod
.	.	PUNCT	.	punct


  deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(  # type: ignore[union-attr]


Also in visualization

In [12]:
from spacy import displacy

displacy.render(doc, style="dep", jupyter = True)

Entities by default are obtained with the entity detector

In [13]:
displacy.render(doc, style="ent", jupyter = True)

## NER

NER models are separate from other models in SciSpacy.
Installation is as for normal models.

In [14]:
#!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_ner_bionlp13cg_md-0.5.3.tar.gz

In [None]:
import spacy
nlp = spacy.load("en_ner_bionlp13cg_md") #can use other models as well

text = "Alterations in the hypocretin receptor 2 and preprohypocretin genes produce narcolepsy in some animals."

doc = nlp(text)
displacy.render(doc, style="ent", jupyter = True)

In [None]:
doc.ents

Different models yield different results

## And more

SciSpaCy has other features like entity linking.

For more check the repository: https://github.com/allenai/scispacy

There is also a demo available here: https://scispacy.apps.allenai.org/