![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# **NER Visualizer**

This notebook will cover using the `NER Visualizer`.

`NER Visualizer` is a part of `Spark NLP Display`, which is an open-source python library for visualizing the annotations generated with Spark NLP.




**📖 Learning Objectives:**

1. Understand how `NER Visualizer` works.

2. Understand how `NER Visualizer` can be used to visualize the entities extracted using Spark NLP.

3. Become comfortable using the different parameters of the annotator.


**🔗 Helpful Links:**

- Documentation : [Spark NLP Display](https://nlp.johnsnowlabs.com/docs/en/display)

- For extended examples of usage, see the [Spark NLP repository](https://github.com/JohnSnowLabs/spark-nlp-display).



## **📜 Background**

`Spark NLP Display` is an open-source python [library](https://nlp.johnsnowlabs.com/docs/en/display) for visualizing the annotations generated with Spark NLP. It currently offers **out-of-the-box** support for the following types of annotations:

<br/>

*   Dependency Parser
*   Named Entity Resolution (NER)
*   Entity Resolution
*   Relation Extraction
*   Assertion Status

<br/>

The ability to **quickly visualize** the entities/relations/assertion status, etc. generated using Spark NLP is a very useful feature for speeding up the development process as well as for understanding the obtained results. 

Getting all of this in a one liner is extremely convenient, especially when running Jupyter notebooks which offers full support for html visualizations.

<br/>

The visualisation classes work with the outputs returned by both **`Pipeline.transform()`** function and **`LightPipeline.fullAnnotate()`**.

## **🎬 Colab Setup**

In [None]:
!pip install spark-nlp
!pip install pyspark
!pip install spark-nlp-display

In [None]:
from sparknlp.pretrained import PretrainedPipeline
from sparknlp_display import NerVisualizer
from sparknlp.annotator import *
from sparknlp.base import *
import pyspark.sql.functions as F

import sparknlp

# Start Spark Session
spark = sparknlp.start()

## **NER Output Visualisation**

`NerVisualizer` highlights the named entities that are identified by Spark NLP and also displays their labels as decorations on top of the analyzed text.  

The colors assigned to the predicted labels can be configured to fit the particular needs of the application.

In [3]:
example = [
           """One of the most prominent centers of the Renaissance was Florence, Italy, which produced many of the era's most influential painters. 
           Among these was Sandro Botticelli, whose masterpieces include "The Birth of Venus" and "Primavera," both of which feature mythological 
           themes and strikingly beautiful figures. Another great Renaissance painter from Florence was Michelangelo, known for his stunning frescoes 
           on the ceiling of the Sistine Chapel, as well as his sculptures such as the "David" and "Pieta."""
]

### Using a **Pretrained Pipeline** - [Recognize Entities DL Pipeline](https://nlp.johnsnowlabs.com/2021/03/23/recognize_entities_dl_en.html)

A Pretrained Pipeline in Spark NLP is a pre-built NLP pipeline that allows users to quickly perform a variety of text analysis tasks without having to build the pipeline from scratch. 

**`recognize_entities_dl`** is a [pretrained pipeline](https://nlp.johnsnowlabs.com/2021/03/23/recognize_entities_dl_en.html) that can be used to process text with a simple pipeline that performs basic processing steps and recognizes entities. It performs most of the common text processing tasks on your dataframe.

In [16]:
pipeline = PretrainedPipeline('recognize_entities_dl', lang='en')

recognize_entities_dl download started this may take some time.
Approx size to download 160.1 MB
[OK!]


In [17]:
ppres = pipeline.fullAnnotate(example)[0]
ppres.keys()

dict_keys(['entities', 'document', 'token', 'ner', 'embeddings', 'sentence'])

#### Observe the effects of changing some parameters on the visuals and save the results as html file.:

In [18]:
from sparknlp_display import NerVisualizer
visualiser = NerVisualizer()
print()
visualiser.display(ppres, label_col='entities', document_col='document', save_path=f"display_recognize_entities.html")




#### Assign a color to an entity.

In [19]:
visualiser.set_label_colors({'LOC':'#008080', 'PER':'#800080'})
visualiser.display(ppres, label_col='entities')

#### Set a label filter - just show the entities labelled as "PER".



In [20]:
visualiser.display(ppres, label_col='entities', document_col='document',
                   labels=['PER'])

print()
print ('color code for label "PER": ' + visualiser.get_label_color('PER'))


color code for label "PER": #800080


### Using a **Pretrained NER Model** - [ner_dl](https://nlp.johnsnowlabs.com/2020/03/19/ner_dl_en.html)

A pretrained Model in Spark NLP is a machine learning or deep learning model that has been trained on a large dataset to perform a specific NLP task, such as named entity recognition, part-of-speech tagging, sentiment analysis, or text classification. 

The pretrained Models in Spark NLP are trained on large, diverse datasets, and are designed to work well with text in multiple languages.

In [21]:
documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("token")

embeddings = WordEmbeddingsModel.pretrained('glove_100d').\
                  setInputCols(["document", 'token']).\
                  setOutputCol("embeddings")

public_ner = NerDLModel.pretrained('ner_dl', 'en') \
          .setInputCols(["document", "token", "embeddings"]) \
          .setOutputCol("ner")

ner_converter = NerConverter() \
                .setInputCols(["document", "token", "ner"]) \
                  .setOutputCol("entities")

nlpPipeline = Pipeline(stages=[ documentAssembler, 
                                 tokenizer,
                                 embeddings,
                                 public_ner,
                                 ner_converter
                                 ])

empty_df = spark.createDataFrame([['']]).toDF("text")

pipelineModel = nlpPipeline.fit(empty_df)
lmodel = LightPipeline(pipelineModel)

glove_100d download started this may take some time.
Approximate size to download 145.3 MB
[OK!]
ner_dl download started this may take some time.
Approximate size to download 13.6 MB
[OK!]


### **🔦 Light Pipeline Results**

[LightPipeline](https://nlp.johnsnowlabs.com/docs/en/concepts#using-spark-nlps-lightpipeline) is a Spark NLP specific Pipeline class equivalent to the Spark ML Pipeline. 

The difference is that it’s execution does not hold to Spark principles, instead it computes everything locally (but in parallel) in order to achieve **fast results** when dealing with **small amounts of data**.

In [22]:
cpres = lmodel.fullAnnotate(example)[0]
cpres.keys()

dict_keys(['entities', 'document', 'token', 'ner', 'embeddings'])

In [23]:
from sparknlp_display import NerVisualizer
visualiser = NerVisualizer()

print()
visualiser.display(ppres, label_col='entities', document_col='document', save_path=f"display_result_ner.html")




#### Assign a color to an entity.

In [24]:
visualiser.set_label_colors({'LOC':'#008080', 'PER':'#800080'})
visualiser.display(ppres, label_col='entities')

#### Set a label filter - just show the entities labelled as "LOC".

In [25]:
visualiser.display(ppres, label_col='entities', document_col='document',
                   labels=['LOC'])