![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start.ipynb)

# Spark NLP Quick Start
### How to use Spark NLP pretrained pipelines

In [24]:
import os

# Install java
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! java -version

# Install pyspark
! pip install --ignore-installed -q pyspark==2.4.4

# Install Spark NLP
! pip install --ignore-installed -q spark-nlp==2.5.0

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)


In [25]:
import sparknlp 

spark = sparknlp.start()

print("Spark NLP version: ", sparknlp.version())
print("Apache Spark version: ", spark.version)

Spark NLP version:  2.5.0
Apache Spark version:  2.4.4


In [0]:
from sparknlp.pretrained import PretrainedPipeline 

Let's use Spark NLP pre-trained pipeline for `named entity recognition`

`NOTE`: if you are using `Windows` please use this pipeline instead: `recognize_entities_dl_noncontrib`

In [27]:
pipeline = PretrainedPipeline('recognize_entities_dl', 'en')

recognize_entities_dl download started this may take some time.
Approx size to download 159 MB
[OK!]


In [0]:
result = pipeline.annotate('Google has announced the release of a beta version of the popular TensorFlow machine learning library.') 

In [29]:
print(result['ner'])

['B-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'O', 'O', 'O', 'O']


In [30]:
print(result['entities'])

['Google', 'TensorFlow']


Let's use Spark NLP pre-trained pipeline for `sentiment` analysis

In [31]:
pipeline = PretrainedPipeline('analyze_sentiment', 'en') 

analyze_sentiment download started this may take some time.
Approx size to download 4.9 MB
[OK!]


In [0]:
result = pipeline.annotate('This is a very boring movie. I recommend others to awoid this movie is not good..')

In [33]:
print(result['sentiment'])

['negative', 'negative', 'negative']


In [34]:
print(result['checked'])

['This', 'is', 'a', 'very', 'boring', 'movie', '.', 'I', 'recommend', 'others', 'to', 'avoid', 'this', 'movie', 'is', 'not', 'good', '.', '.']


The word `awoid` has been corrected to `avoid` by spell checker insdie this pipeline