![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/text/english/pretrained-pipelines/Explain%20Document%20DL.ipynb)

# Explain Documents with Deep Learning

In [None]:
# Only run this cell when you are using Spark NLP on Google Colab
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
[K     |████████████████████████████████| 215.7MB 55kB/s 
[K     |████████████████████████████████| 204kB 45.2MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 122kB 9.4MB/s 
[?25h

First we import the necessary modules.

In [None]:
import sparknlp

spark = sparknlp.start()

print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)

Spark NLP version:  4.3.1
Apache Spark version:  3.3.0


In [None]:
from sparknlp.pretrained import PretrainedPipeline
from sparknlp.base import *

Now, we load a pipeline model which contains the following annotators:
Tokenizer, Deep Sentence Detector, Lemmatizer, Stemmer, Part of Speech (POS) and Context Spell Checker

In [None]:
pipeline = PretrainedPipeline('explain_document_dl')

explain_document_dl download started this may take some time.
Approx size to download 169.4 MB
[OK!]


We simply annotate our text (string) and the pipeline does the rest

In [None]:
text = 'He would love to visit many beautful cities wth you. He lives in an amazing country.'
result = pipeline.annotate(text)

We can see the output of each annotator below. This one is doing so many things at once!

In [None]:
list(result.keys())

['entities',
 'stem',
 'checked',
 'lemma',
 'document',
 'pos',
 'token',
 'ner',
 'embeddings',
 'sentence']

In [None]:
result['sentence']

['He would love to visit many beautful cities wth you.',
 'He lives in an amazing country.']

In [None]:
result['lemma']

['He',
 'would',
 'love',
 'to',
 'visit',
 'many',
 'beautiful',
 'city',
 'wth',
 'you',
 '.',
 'He',
 'life',
 'in',
 'an',
 'amazing',
 'country',
 '.']

In [None]:
list(zip(result['checked'], result['pos']))

[('He', 'PRP'),
 ('would', 'MD'),
 ('love', 'VB'),
 ('to', 'TO'),
 ('visit', 'VB'),
 ('many', 'JJ'),
 ('beautiful', 'JJ'),
 ('cities', 'NNS'),
 ('wth', 'NN'),
 ('you', 'PRP'),
 ('.', '.'),
 ('He', 'PRP'),
 ('lives', 'VBZ'),
 ('in', 'IN'),
 ('an', 'DT'),
 ('amazing', 'JJ'),
 ('country', 'NN'),
 ('.', '.')]