# Working with OpenAI GPT models

just like any other APIs, you can send a request to openAI chatGPT server and get the response back from your query.

[<img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6UwyTHKO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t637s1yazyyxfl31ymmq.jpg">](https://res.cloudinary.com/practicaldev/image/fetch/s--6UwyTHKO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t637s1yazyyxfl31ymmq.jpg)


You need to:
* [Set up an account](https://auth0.openai.com/u/signup/identifier?state=hKFo2SBLZVEyMlJSRDNkbWVMUWVYdU5SVGZKQWltY016ek1POaFur3VuaXZlcnNhbC1sb2dpbqN0aWTZIEJxeTRsb191RnZySEV0b2dlYnRZdGNzQWpZdkRWZjI4o2NpZNkgRFJpdnNubTJNdTQyVDNLT3BxZHR3QjNOWXZpSFl6d0Q)
* [Get an API Key](https://platform.openai.com/api-keys)
* Add money!

In [2]:
!pip install --upgrade openai

Collecting openai
  Obtaining dependency information for openai from https://files.pythonhosted.org/packages/38/ae/0a6b73156176c10ff52b94f5444712bcdb8d22dddf68f106c14f0937e390/openai-1.2.4-py3-none-any.whl.metadata
  Using cached openai-1.2.4-py3-none-any.whl.metadata (16 kB)
Collecting anyio<4,>=3.5.0 (from openai)
  Obtaining dependency information for anyio<4,>=3.5.0 from https://files.pythonhosted.org/packages/19/24/44299477fe7dcc9cb58d0a57d5a7588d6af2ff403fdd2d47a246c91a3246/anyio-3.7.1-py3-none-any.whl.metadata
  Using cached anyio-3.7.1-py3-none-any.whl.metadata (4.7 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.8.0-py3-none-any.whl (20 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Obtaining dependency information for httpx<1,>=0.23.0 from https://files.pythonhosted.org/packages/82/61/a5fca4a1e88e40969bbd0cf0d981f3aa76d5057db160b94f49603fc18740/httpx-0.25.1-py3-none-any.whl.metadata
  Using cached httpx-0.25.1-py3-none-any.whl.metadata (7.1 kB)
Collecti

In [4]:
from openai import OpenAI
from datasets import load_dataset

import random
import pandas as pd
from pprint import pprint

# I created a local config.py file to manage my secret keys
from config import API_KEY 

In [5]:
client = OpenAI(api_key=API_KEY)

### Text classification with a LLM

In [6]:
# download and cache the dataset:
raw_datasets = load_dataset("imdb")

In [7]:
raw_datasets['train']['text'][1]

'"I Am Curious: Yellow" is a risible and pretentious steaming pile. It doesn\'t matter what one\'s political views are because this film can hardly be taken seriously on any level. As for the claim that frontal male nudity is an automatic NC-17, that isn\'t true. I\'ve seen R-rated films with male nudity. Granted, they only offer some fleeting views, but where are the R-rated films with gaping vulvas and flapping labia? Nowhere, because they don\'t exist. The same goes for those crappy cable shows: schlongs swinging in the breeze but not a clitoris in sight. And those pretentious indie movies like The Brown Bunny, in which we\'re treated to the site of Vincent Gallo\'s throbbing johnson, but not a trace of pink visible on Chloe Sevigny. Before crying (or implying) "double-standard" in matters of nudity, the mentally obtuse should take into account one unavoidably obvious anatomical difference between men and women: there are no genitals on display when actresses appears nude, and the s

In [8]:
raw_datasets['train']['label'][1]

0

## LLM Parameters

- **Number of tokens**:  allows you to set a limit to how many tokens are generated.
- **Temperature**: controls the randomness of the LLM's output. the lower the temperature, the more deterministic the response is.
- **Top-k**:  tells the model to pick the next token from the top ‘k’ tokens in its list, sorted by probability.
-T **op-p**: is similar to `top-k`` but picks from the top tokens based on the sum of their probabilities.

In [9]:
params ={
    "model_name": "gpt-3.5-turbo",
    "temperature": 0.1,
    "max_tokens":256
}

def classifier(input_text, parameters, client=client):
    
    
    messages=[
    {"role": "system", "content": "You are a useful assitant for the imdb website. You should read the submitted movie review by a user below and decide if it is a positive or negative. return the result with 0 or 1 for negative and positive respectively"},
    {"role": "user", "content": input_text}
    ]
    
    response = client.chat.completions.create(
        model=parameters["model_name"],
        messages=messages,
        temperature=parameters["temperature"], 
        max_tokens=parameters["max_tokens"],
    )

    return response.choices[0].message.content

In [10]:
classifier(raw_datasets['train']['text'][1], params)

'0'

I can now run the same query on all the rows and get the response

In [11]:
# select random 20 review and their label
random_idx = random.sample(range(1, 25000), 20)


sel_text = [raw_datasets['train']['text'][i] for i in random_idx]
sel_labels = [raw_datasets['train']['label'][i] for i in random_idx]

In [12]:
# turn to a dataframe to make it easier to see and manipulate data/
df = pd.DataFrame([sel_text, sel_labels]).T
df.columns = ['text', 'label']

df.head()

Unnamed: 0,text,label
0,My guess is that this director/writer had some...,0
1,"If you know the story of Grey Owl, you'll love...",1
2,Scarecrow is set in the small American town of...,0
3,Slow and riddled with inaccuracy. Over-looking...,0
4,Who made this film? I love this film? Somebody...,1


**pro tip**: **Partial Functions**

A partial function allows us to call a second function with fixed values in certain arguments.

In [13]:
from functools import partial

classifier_pd = partial(classifier, parameters=params)


In [14]:
%%time
df['predicted'] = df['text'].apply(classifier_pd)

You can't get away from data cleaning!!

In [None]:
df['predicted'] = df['predicted'].apply(lambda x: '1' if "positive" in x else x)
df['predicted'] = df['predicted'].apply(lambda x: '0' if "negative" in x else x)

df[["label", "predicted"]] = df[["label", "predicted"]].apply(pd.to_numeric)

In [None]:
from sklearn.metrics import classification_report

print(classification_report(df["label"], df["predicted"]))

# Langchain

LangChain provides many modules that can be used to build language model applications

In [None]:
!pip install langchain
!pip install typing-inspect==0.8.0
!pip install typing_extensions==4.5.0

In [None]:
from langchain.chat_models import ChatOpenAI

from config import API_KEY 

In [None]:
# LLM Model: The language model is the core reasoning engine.
llm = ChatOpenAI(openai_api_key=API_KEY)


There are two types of language models:

* `LLM`: underlying model takes a string as input and returns a string
* `ChatModel`: underlying model takes a list of messages as input and returns a message

In [None]:
from langchain.schema import HumanMessage

In [None]:
review = raw_datasets['train']['text'][1]

text = f"""
You are a useful assitant for the imdb website.
You should read the submitted movie review by a user below and decide if it is a positive or negative.
Return the result with 0 or 1 for negative and positive respectively.

{review}
"""

messages = [HumanMessage(content=text)]
messages


The simplest way to call an LLM or ChatModel is using `.invoke()`

In [None]:
llm.invoke(text)


## prompt template

When you don't want to pass user input directly into an LLM, you can add the it to a larger piece of text, called a `prompt template`. It provides additional context on the specific task at hand.

In the previous example, the text we passed to the model contained instructions. By using prompt templates, we only have to provide the review itself, without having to worry about giving the model instructions.

In [None]:
from langchain.prompts import PromptTemplate


In [None]:
prompt = PromptTemplate.from_template("""
You are a useful assitant for the imdb website.
You should read the submitted movie review by a user below and decide if it is a positive or negative.
Return the result with 0 or 1 for negative and positive respectively.

{review}
"""
)

prompt.format(review=raw_datasets['train']['text'][1])

## Output parser

`OutputParsers` convert the raw output of a language model into a format that can be used downstream.


In [None]:
from langchain.schema import BaseOutputParser

class BooleanParser(BaseOutputParser):
    """Parse the output of an LLM call to 0 and 1."""


    def parse(self, text: str):
        """Parse the output of an LLM call."""
        r = 0 if 'negative' in s else 1
        return r


## Putting it all together

We can now combine all these into one chain. This chain will take input variables, pass those to a prompt template to create a prompt, pass the prompt to a language model, and then pass the output through an (optional) output parser

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import BaseOutputParser


In [None]:
class BooleanParser(BaseOutputParser):
    """Parse the output of an LLM call to 0 and 1."""


    def parse(self, text: str):
        """Parse the output of an LLM call."""
        r = 0 if 'negative' in text else 1
        return r

prompt = PromptTemplate.from_template("""
You are a useful assitant for the imdb website.
You should read the submitted movie review by a user below and decide if it is a positive or negative.
Return the result with 0 or 1 for negative and positive respectively.

{review}
"""
)

llm = ChatOpenAI(openai_api_key=API_KEY)

In [None]:
chain = prompt | llm | BooleanParser()

In [None]:
chain.invoke({"review": raw_datasets['train']['text'][1]})

# [Code Assitant](https://python.langchain.com/docs/expression_language/cookbook/code_writing)

In [None]:
!pip install langchain_experimental

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema.output_parser import StrOutputParser
from langchain_experimental.utilities import PythonREPL

from config import API_KEY 

In [None]:
template = """Write some python code to solve the user's problem. 

Return only python code in Markdown format, e.g.:

```python
....
```"""
prompt = ChatPromptTemplate.from_messages([("system", template), ("human", "{input}")])

model = ChatOpenAI(api_key=API_KEY)

In [None]:
def _sanitize_output(text: str):
    _, after = text.split("```python")
    return after.split("```")[0]

In [None]:
chain = prompt | model | StrOutputParser() | _sanitize_output | PythonREPL().run

In [None]:
chain.invoke({"input": "use sklearn pipelines to create a xgboost classifier pipeline with scaling "})