![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# Use pretrained `explain_document` Pipeline

### Stages

 * DocumentAssembler
 * SentenceDetector
 * Tokenizer
 * Lemmatizer
 * Stemmer
 * Part of Speech
 * SpellChecker (Norvig)


In [None]:
import sparknlp

### Let's create a Spark Session for our app

In [None]:
spark = sparknlp.start()

#### This is our testing document, we'll use it to exemplify all different pipeline stages.

In [None]:
testDoc = [
"French author who helped pioner the science-fiction genre. \
Verne wrate about space, air, and underwater travel before \
navigable aircrast and practical submarines were invented, \
and before any means of space travel had been devised. "
]

In [None]:
pipeline = PretrainedPipeline('explain_document_ml')

#### We are not interested in handling big datasets, let's switch to LightPipelines for speed.

In [None]:
result = pipeline.annotate(testDoc)

#### Let's analyze these results - first let's see what sentences we detected

In [None]:
[content['sentence'] for content in result]

#### Now let's see how those sentences were tokenized

In [None]:
[content['token'] for content in result]


#### Notice some spelling errors? the pipeline takes care of that as well

In [None]:
[content['spell'] for content in result]

#### Now let's see the lemmas

In [None]:
[content['lemmas'] for content in result]

#### Let's check the stems, any difference with the lemmas shown bebore?

In [None]:
[content['stems'] for content in result]

#### Now it's the turn on Part Of Speech(POS)

In [None]:
pos = [content['pos'] for content in result]
token = [content['token'] for content in result]
# let's put token and tag together
list(zip(token[0], pos[0]))