#Named Entity Visualization
In this notebook, we first create NER pipeline and then inspect the basic method results and SparkNLP Ner Visualizer results. So, we will have a chance to see the power of the visualizer.  <br/>

Firstly, setting up packages and libraries

In [4]:
!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/colab_setup.sh -O - | bash

--2021-10-23 16:14:49--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/colab_setup.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1608 (1.6K) [text/plain]
Saving to: ‘STDOUT’


2021-10-23 16:14:50 (33.1 MB/s) - written to stdout [1608/1608]

setup Colab for PySpark 3.0.2 and Spark NLP 3.1.0
Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Get:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ Packages [69.5 kB]
Ign:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_

In [5]:
import sparknlp
spark= sparknlp.start()

In [6]:
from sparknlp.annotator import *
from sparknlp.base import *
from pyspark.ml import Pipeline
from pyspark.sql import functions as F
import pandas as pd

Now, I will create annotators and models and put them into a pipeline. 

In [12]:
documentAssembler= DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer= Tokenizer()\
    .setInputCols(["document"])\
    .setOutputCol("token")

spell_checker= ContextSpellCheckerModel.pretrained()\
    .setInputCols("token")\
    .setOutputCol("checked")

word_embedding= WordEmbeddingsModel.pretrained("glove_100d")\
    .setInputCols(["document", "checked"])\
    .setOutputCol("embeddings")

onto_ner = NerDLModel.pretrained("onto_100", 'en') \
          .setInputCols(["document", "checked", "embeddings"]) \
          .setOutputCol("ner")

ner_converter= NerConverter()\
    .setInputCols(["document", "checked", "ner"])\
    .setOutputCol("entities")

nlp_pipeline= Pipeline(stages=[ 
                               documentAssembler,
                               tokenizer,
                               spell_checker,
                               word_embedding,
                               onto_ner,
                               ner_converter
])

spellcheck_dl download started this may take some time.
Approximate size to download 111.4 MB
[OK!]
glove_100d download started this may take some time.
Approximate size to download 145.3 MB
[OK!]
onto_100 download started this may take some time.
Approximate size to download 13.5 MB
[OK!]


Fitting the pipeline

In [83]:
empty_df= spark.createDataFrame([[" "]]).toDF("text")
model= nlp_pipeline.fit(empty_df)

In [158]:
example= ["""Wesley Sneijder is a great player and he has numbers of achievements such as a World Cup, a UEFA Champions League title in his career.
             However, he declared his retirement last week on BBCSport livestream"""] #sample text data
df= spark.createDataFrame([example]).toDF("text")

By the classic method we transform the model. 

In [159]:
ner_result= model.transform(df)
ner_result.columns

['text', 'document', 'token', 'checked', 'embeddings', 'ner', 'entities']

Inspecting the results in classic method.

In [160]:
result_df=  ner_result.select(F.explode(F.arrays_zip("token.result", "checked.result", "ner.result", "entities.result")).alias("col"))\
                .select(F.expr("col['0']").alias("token"),
                        F.expr("col['1']").alias("spell_checked"),
                        F.expr("col['2']").alias("ner"),
                        F.expr("col['3']").alias("entities"))
result_df.show(truncate=False)

+------------+-------------+--------+---------------------+
|token       |spell_checked|ner     |entities             |
+------------+-------------+--------+---------------------+
|Wesley      |Wesley       |B-PERSON|Wesley Sneijder      |
|Sneijder    |Snider       |I-PERSON|a World Cup          |
|is          |is           |O       |UEFA Champions League|
|a           |a            |O       |last week            |
|great       |great        |O       |BBCSport             |
|player      |player       |O       |null                 |
|and         |and          |O       |null                 |
|he          |he           |O       |null                 |
|has         |has          |O       |null                 |
|numbers     |numbers      |O       |null                 |
|of          |of           |O       |null                 |
|achievements|achievements |O       |null                 |
|such        |such         |O       |null                 |
|as          |as           |O       |nul

We created pipeline as well as model and saw the results. <br/>
Now, we will install **sparknlp display** package and see the result by using sparknlp **LightPipeine**

Creating LightPipeline and annotating it.

In [161]:
lp= LightPipeline(model)
lp_result= lp.fullAnnotate(example)[0]
lp_result.keys()

dict_keys(['entities', 'checked', 'document', 'token', 'ner', 'embeddings'])

Finally, creating visualizer and see the awesome result. 

In [None]:
!pip install spark-nlp-display

In [None]:
from sparknlp_display import NerVisualizer

In [162]:
visualizer= NerVisualizer()
visualizer.display(lp_result,
                   label_col="entities",
                   document_col="document")