# spaCy Visualization

(C) 2024 by [Damir Cavar](http://damir.cavar.me/)

**Version:** 0.2, November 2024

**Download:** This and various other Jupyter notebooks are available from my [GitHub repo](https://github.com/dcavar/python-tutorial-for-ipython).


The following examples show how linguistic annotations from spaCy can be visualized.

In [1]:
import spacy
from spacy import displacy

For the examples below we use the transformer model:

In [3]:
nlp = spacy.load("en_core_web_trf")

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


## Part-of-Speech Tags Visualized


The [Universal PoS tags](https://universaldependencies.org/u/pos/) are:

| tag   | part of speech            |
| ----- | ------------------------- |
| ADJ   | adjective                 |
| ADP   | adposition                |
| ADV   | adverb                    |
| AUX   | auxiliary                 |
| CCONJ | coordinating conjunction  |
| DET   | determiner                |
| INTJ  | interjection              |
| NOUN  | noun                      |
| NUM   | numeral                   |
| PART  | particle                  |
| PRON  | pronoun                   |
| PROPN | proper noun               |
| PUNCT | punctuation               |
| SCONJ | subordinating conjunction |
| SYM   | symbol                    |
| VERB  | verb                      |
| X     | other                     |

In spaCy these tags are accessible via the `pos_` property of tokens, as described in [the documentation](https://spacy.io/usage/linguistic-features).

To visualize the tags in specific colors we need to specify [Displacy options](https://spacy.io/usage/visualizers) as follows:

In [10]:
pos_tags = ["ADJ", "ADP", "ADV", "AUX", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PART", "PRON", "PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X" ]
pos_colors = {"ADJ": "blueviolet",
"ADP": "lightpink",
"ADV": "turquoise",
"AUX": "lime",
"CCONJ": "khaki",
"DET": "orange",
"INTJ": "cornflowerblue",
"NOUN": "lightblue",
"NUM": "salmon",
"PART": "yellow",
"PRON": "forestgreen",
"PROPN": "lightred",
"PUNCT": "lightgreen",
"SCONJ": "gold",
"SYM": "violet",
"VERB": "red",
"X": "orange"
}
pos_options = {"ents": pos_tags, "colors": pos_colors}

We can send an example sentence through the NLP pipeline now:

In [21]:
doc = nlp("Peter Smith bought a red car in Chicago.")

To visualize the part-of-speech (PoS) tags, we need to convert them into a data structure that is a list of dictionary entries. Each of the dict entries contains the start and end offsets and the PoS-label.

In [23]:
doc_pos_ents = []
for token in doc:
	doc_pos_ents.append( {"start": token.idx, "end": token.idx+len(token.text), "label": token.pos_} )
doc_pos = {"text": doc.text, "ents": doc_pos_ents}

Now we can render the PoS-tags for the text in the specified colors.

In [24]:
displacy.render(doc_pos, style="ent", options=pos_options, manual=True, jupyter=True)

To display the Named Entities in the text, we do not need any special conversion of the data in the doc-object:

In [25]:
displacy.render(doc, style="ent", jupyter=True)

Add manual entity annotation:

In [26]:
displacy.render(doc, options={"spans_key": "span_ruler"}, style="span", jupyter=True)

**(C) 2024 by [Damir Cavar](http://damir.cavar.me/) <<dcavar@iu.edu>>**