<img src="https://nlp.johnsnowlabs.com/assets/images/logo.png" width="180" height="50" style="float: left;">

## Runing Pretrained models

In the following example, we walk-through different use cases of some of our Pretrained models and pipelines which could be used off the shelf.

There is BasicPipeline which will return tokens, normalized tokens, lemmas and part of speech tags. The AdvancedPipeline will return same as the BasicPipeline plus Stems, Spell Checked tokens and NER entities using the CRF model. All the pipelines and pre trained models are downloaded from internet at run time hence would require internet access. 

### Spark `2.4` and Spark NLP `1.8.2`

#### 1. Call necessary imports and create the spark session

In [None]:
import os
import sys
sys.path.append('../../')

print(sys.version)

from sparknlp.pretrained import ResourceDownloader
from sparknlp.base import DocumentAssembler
from sparknlp.annotator import *

from pyspark.sql import SparkSession
from pyspark.ml import Pipeline

spark = SparkSession.builder \
    .appName("Model Downloader")\
    .master("local[*]")\
    .config("spark.driver.memory","4G")\
    .config("spark.driver.maxResultSize", "2G")\
    .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:1.8.2")\
    .config("spark.kryoserializer.buffer.max", "500m")\
    .getOrCreate()

#### 2. Create a dummy spark dataframe

In [None]:

l = [
  (1,'To be or not to be'),
  (2,'This is it!')
]

data = spark.createDataFrame(l, ['docID','text'])

#### 3. We use predefined BasicPipeline in order to annotate a dataframe with it

In [None]:
# download predefined - pipelines
from sparknlp.pretrained import PretrainedPipeline

basic_pipeline = PretrainedPipeline("pipeline_basic")
basic_data = basic_pipeline.annotate(data, 'text') 
basic_data.show()

#### We can also annotate a single string

In [None]:
# annotat quickly from string
basic_pipeline.annotate("This world is made up of good and bad things")

#### 4. Now we intend to use one of the fast pretrained models such as Preceptron model which is a POS model trained with ANC American Corpus 

In [None]:

document_assembler = DocumentAssembler() \
    .setInputCol("text")

sentence_detector = SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")
    
# download directly - models
pos = PerceptronModel.pretrained() \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("pos")
    
pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, pos])

output = pipeline.fit(data).transform(data)
output.show()

#### 5. Now we proceed to download a Fast CRF Named Entity Recognitionl which is trained with Glove embeddings. Then, we retrieve the Basic Pipeline and combine these models to use them appropriately meeting their requirements.

In [None]:
# download predefined - models
pos = PerceptronModel.pretrained() \
     .setInputCols(["document", "normal"])\
     .setOutputCol("pos")

ner = NerCrfModel.pretrained()
ner.setInputCols(["pos", "normal", "document"]).setOutputCol("ner")

annotation_data = basic_pipeline.transform(data)

pos_tagged = pos.transform(annotation_data)
ner_tagged = ner.transform(pos_tagged)
ner_tagged.show()

#### 6. Finally, lets try a pre trained sentiment analysis pipeline

In [None]:
PretrainedPipeline("pipeline_vivekn").annotate("This is a good movie!!!")