## Spacy: word embeddings and dependency parsing

In [1]:
import spacy 
from spacy import displacy
nlp = spacy.load("en_ner_bc5cdr_md") # load spacy model for biomedical text (SciSpacy repository)
print("These are the components of the pipeline: ",nlp.pipe_names)

These are the components of the pipeline:  ['tagger', 'parser', 'ner']


Now we can pass text through this pipeline and visualize the properties of the tokens

In [2]:
text = "Aspirin treats headaches. Aspirin can help prevent heart attacks"
doc = nlp(text) # encode text
print("Text","Lemma","POS","Tag","DEP","Shape","Alpha","Stopword")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
            token.shape_, token.is_alpha, token.is_stop)

Text Lemma POS Tag DEP Shape Alpha Stopword
Aspirin aspirin NOUN NN nsubj Xxxxx True False
treats treat VERB VBZ ROOT xxxx True False
headaches headache NOUN NNS dobj xxxx True False
. . PUNCT . punct . False False
Aspirin aspirin NOUN NN nsubj Xxxxx True False
can can AUX MD aux xxx True True
help help VERB VB ROOT xxxx True False
prevent prevent VERB VB xcomp xxxx True False
heart heart NOUN NN compound xxxx True False
attacks attack NOUN NNS dobj xxxx True False


In [3]:
displacy.render(doc, style="ent",jupyter=True)

Visualize the vector representation of a token: 

In [4]:
print("Vecotor representation for ", doc[0].text,doc[0].vector)

Vecotor representation for  Aspirin [ 2.24932e-01  2.03395e-01  5.74836e-02 -1.96803e-01  5.23851e-02
  2.24480e-01  7.91169e-03  8.96598e-02 -1.15640e-01  3.40637e-01
 -5.23125e-02  2.05858e-01  1.23958e-01 -2.48504e-01  1.49680e-01
 -1.71793e-01 -4.03890e-02  5.93002e-02  9.78609e-02 -2.51881e-01
  1.42991e-01  1.21897e-02  2.73099e-01 -1.05418e-01 -2.76472e-01
 -2.00318e-01  2.79996e-02  1.29650e-01  8.99813e-03 -4.11708e-01
 -1.11245e-01 -3.56730e-01 -3.22773e-01  1.76244e-01  6.32070e-02
  3.53679e-01 -3.32153e-01  2.62634e-01 -2.15946e-01 -8.91495e-02
 -2.89161e-01 -2.39671e-01 -1.46115e-01  4.71541e-01  2.18519e-01
 -9.89318e-02 -1.70151e-01 -1.58439e-01 -1.41072e-01 -2.11200e-01
  4.99705e-02 -9.47361e-02 -1.36008e-01 -2.46474e-01 -4.75351e-01
 -8.90834e-02  3.87687e-01 -3.21877e-01 -9.66057e-02  2.53608e-01
  4.24328e-01 -2.09000e-01  4.27253e-02 -2.93681e-01 -3.12664e-01
 -4.08665e-01 -6.07253e-02 -2.71681e-01  7.16692e-02  1.17430e-02
  2.03171e-01  2.32340e-01  3.87970e-02 

Visualize dependencies

In [5]:
displacy.render(doc, style = "dep",jupyter=True)

Have a look at all the models available for biomedical text in scispacy: https://github.com/allenai/scispacy