![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/text/english/openai-completion/OpenAICompletion.ipynb)

## OpenAICompletion in SparkNLP

In this notebook, we'll explore the process of utilizing OpenAICompletition within SparkNLP's framework.

Spark NLP offers a seamless integration with various OpenAI APIs, presenting a powerful synergy. With the introduction of Spark NLP 5.1.0, leveraging the OpenAICompletition and OpenAIEmbeddings transformers becomes achievable. This integration not only ensures the utilization of OpenAI's capabilities but also capitalizes on Spark's inherent scalability advantages.

Colab Setup

In [3]:
!wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

Installing PySpark 3.2.3 and Spark NLP 5.2.3
setup Colab for PySpark 3.2.3 and Spark NLP 5.2.3
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m281.5/281.5 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.6/547.6 kB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.7/199.7 kB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


## Spark NLP Settings

All you need to do is to setup your [OpenAI API Key](https://platform.openai.com/docs/api-reference/authentication) and add it to Spark properties

In [None]:
print("Enter your OPENAI API Key:")
OPENAI_API_KEY = input()

In [4]:
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from sparknlp.base import LightPipeline

In [5]:
import sparknlp
# let's start Spark with Spark NLP
openai_params = {"spark.jsl.settings.openai.api.key": OPENAI_API_KEY}
spark = sparknlp.start(params=openai_params)

In [6]:
document_assembler = DocumentAssembler() \
        .setInputCol("text") \
        .setOutputCol("document")

openai_completion = OpenAICompletion() \
       .setInputCols("document") \
       .setOutputCol("completion") \
       .setModel("gpt-3.5-turbo-instruct") \
       .setMaxTokens(50)

# Define the pipeline
pipeline = Pipeline(stages=[
    document_assembler, openai_completion
])

In [7]:
empty_df = spark.createDataFrame([[""]], ["text"])
sample_text= [["Generate a restaurant review."], ["Write a review for a local eatery."], ["Create a JSON with a review of a dining experience."]]
sample_df= spark.createDataFrame(sample_text).toDF("text")
sample_df.show()

+--------------------+
|                text|
+--------------------+
|Generate a restau...|
|Write a review fo...|
|Create a JSON wit...|
+--------------------+



In [8]:
pipeline_model = pipeline.fit(empty_df)
completion_df = pipeline_model.transform(sample_df)

In [9]:
completion_df.select("completion").show(truncate=False)

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|completion                                                                                                                                                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[{document, 0, 223, \n\nI recently had the pleasure of dining at "Elevate", a trendy new restaurant nestled in the heart of the city. From the moment I walked in, I was greeted with warmth and hospital

LightPipeline

In [10]:
light_pipeline_openai = LightPipeline(pipeline_model)

In [11]:
light_pipeline_openai.fullAnnotate("Generate a negative review of a movie")

[{'document': [Annotation(document, 0, 36, Generate a negative review of a movie, {}, [])],
  'completion': [Annotation(document, 0, 215, 
   
   I recently watched the movie "The Last Dance" and boy, was I disappointed. To say that this movie was a waste of time would be an understatement. From start to finish, it was an absolute snooze fest.
   
   First of all, {}, [])]}]

In [12]:
light_pipeline_openai.annotate("Generate a negative review of a movie")

{'document': ['Generate a negative review of a movie'],
 'completion': ['\n\nThe movie "Piece of Garbage" was a complete waste of my time. From start to finish, it was filled with terrible acting, a nonsensical plot, and cheap special effects. I couldn\'t believe I actually paid money to']}