![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/annotation/english/spell-check-ml-pipeline/Pretrained-SpellCheckML-Pipeline.ipynb)

## 0. Colab Setup

In [None]:
# This is only to setup PySpark and Spark NLP on Colab
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
[K     |████████████████████████████████| 215.7MB 60kB/s 
[K     |████████████████████████████████| 204kB 42.6MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 122kB 2.8MB/s 
[?25h

# Use pretrained `spell_check_ml` Pipeline


* DocumentAssembler
* SentenceDetector
* Tokenizer
* NorvigSweetingApproach


In [None]:
import sys

#Spark ML and SQL
from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql.functions import array_contains
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType

#Spark NLP
import sparknlp
from sparknlp.pretrained import PretrainedPipeline
from sparknlp.annotator import *
from sparknlp.common import RegexRule
from sparknlp.base import DocumentAssembler, Finisher

### Let's create a Spark Session for our app

In [None]:
spark = sparknlp.start()

print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)


Spark NLP version:  2.5.0
Apache Spark version:  2.4.4


In [None]:
pipeline = PretrainedPipeline('check_spelling', lang='en')

check_spelling download started this may take some time.
Approx size to download 892.6 KB
[OK!]


In [None]:
result=pipeline.annotate("Yestarday I lost my blue unikorn and I wass really sad! This is an exampe of how wrog my english is.")

In [None]:
list(zip(result['token'], result['checked']))

[('Yestarday', 'Yesterday'),
 ('I', 'I'),
 ('lost', 'lost'),
 ('my', 'my'),
 ('blue', 'blue'),
 ('unikorn', 'unicorn'),
 ('and', 'and'),
 ('I', 'I'),
 ('wass', 'was'),
 ('really', 'really'),
 ('sad', 'sad'),
 ('!', '!'),
 ('This', 'This'),
 ('is', 'is'),
 ('an', 'an'),
 ('exampe', 'example'),
 ('of', 'of'),
 ('how', 'how'),
 ('wrog', 'wrong'),
 ('my', 'my'),
 ('english', 'english'),
 ('is', 'is'),
 ('.', '.')]

We fixed the spelling of `yesterday`, `unicorn`, `was`, `example`, and `wrong` with `check_spelling` Pipeline.