## Step 1 - Add your OpenAI API key.
The key must be taken from a registered OpenAI account.
Link to the keys sub-menu https://platform.openai.com/account/api-keys.

In [25]:
%env OPENAI_API_KEY YOUR_OPENAI_API_KEY

env: OPENAI_API_KEY=YOUR_OPENAI_API_KEY


## Step 2 - Installing necessary libraries
The libraries below are needed for:
- Work with CSV files;
- Visualising files characteristics;
- Creating embeddings;
- Prompting GPT model;

In [5]:
!pip install openai
!pip install "openai[embedding_utils]"
!pip install pandas
!pip install numpy
!pip install scikit-learn
!pip install langchain
!pip install matplotlib


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgra

## Step 3 - Libraries import
Imported libraries will be used across the whole notebook

In [6]:
import pandas as pd
import openai
import numpy as np
from langchain import OpenAI, LLMChain
from langchain.prompts import Prompt

from openai.embeddings_utils import cosine_similarity

from utils import get_embedding

## Step 4 - Configuration
The parameters below will be used across the whole notebook

In [7]:
"""
The model to be used for creating embeddings
"""
embedding_model = "text-embedding-ada-002"

"""
The characters encoding for text-embedding-ada-002
"""
embedding_encoding = "cl100k_base"

"""
The maximal tokens number that a text chunk passed to the embeddings model can have.
Actual maximum for text-embedding-ada-002 is 8191
"""
max_tokens = 8000

## Step 5 - Loading a prepared dataset
The dataset loaded below is the result of another notebook that must be run prior to the current one.
The [data preparation](./data_preparation.ipynb) notebook.

### IMPORTANT!
The required file `docs/50_reviews_with_embeddings.csv` already created and is present in the repo, so there is no need to separately run the data preparation step before playing with this notebook!

In [10]:
df = pd.read_csv("./docs/50_reviews_with_embeddings.csv")
df["embedding"] = df.embedding.apply(eval).apply(np.array)

df.head(5)

Unnamed: 0.1,Unnamed: 0,ProductId,UserId,Score,Summary,Text,combined,n_tokens,embedding
0,346130,B004TJF3BE,A2TZKSY1ZWPOU9,5,Great Hot Cider!!!,It is hard to find much of anything sugarfree ...,Title: Great Hot Cider!!!; Content: It is hard...,46,"[0.0007746443734504282, -0.0042549604550004005..."
1,135890,B001ACMCLM,A2PCNXBSKCABG5,4,GOOD GLUTEN FREE BREAD STCK MIX,Makes very good break sticks.. Also can be use...,Title: GOOD GLUTEN FREE BREAD STCK MIX; Conten...,52,"[-0.0011603275779634714, -0.009558608755469322..."
2,182237,B004LM9KHW,A1AOOCCQ27K9IT,3,French Vanilla Wolfgang Puck,Product is easy to use.... Just cut or tear pa...,Title: French Vanilla Wolfgang Puck; Content: ...,123,"[0.004102123901247978, -0.014700944535434246, ..."
3,354602,B000LKU3A6,A2YRK0YLBN5CC2,3,"Good flavor, but a wet mess","I got the teriyaki flavor and, while the flavo...","Title: Good flavor, but a wet mess; Content: I...",267,"[-0.01738598570227623, 0.001981155714020133, -..."
4,320387,B008JA73RG,AFJFXN42RZ3G2,4,Neither too sweet nor fizzy,V8 V-Fusion may appear to be the typical energ...,Title: Neither too sweet nor fizzy; Content: V...,230,"[-0.0018182089552283287, -0.031320970505476, 0..."


## Step 6 - Search functionality
Search is used to get the most relevant to the prompt data.

At the high level search creates embeddings (via OpenAI API) for the user provided prompt and then uses them to find the "closest" records in the prepared dataset.

In [17]:
def search(source_df, query_str, depth):
    """
    Generates embeddings for the given "query_str" and produces search response based on the "embeddings" vectors similarity.

    Returns the number of most compatible records according to the "dept" param

    :param source_df: Pandas dataframe
    :param query_str: str
    :param depth: int
    :return: Pandas dataframe
    """
    query_embedding = get_embedding(query_str, model=embedding_model)
    source_df["similarities"] = df.embedding.apply(lambda x: cosine_similarity(x, query_embedding))
    res = source_df.sort_values("similarities", ascending=False).head(depth)
    return res

## Step 7 - Prepare your prompt template
Prompt template is used for amending the user provided prompt with additional information and settings for the final ML model (GPT in this case). As the result the ML model receives not only the user prompt but also any necessary restrictions, settings, data to produce the final result (text generation in the GPT case).

In [15]:
# The prompt below is just an example and very likely requires customisation
prompt_template = """
You are Alex, Senior Infrastructure Engineer.  You will talk to the human conversing with you and provide meaningful answers as they ask questions.
Avoid saying things like "how can I help you?".
Be social and engaging while you speak, and be very logically and technically oriented.

Use appropriate language and encouraging statements.
Don't make your answers so long unless you are asked your opinion, something about your past, or if you are asked to explain a concept.
Don't repeat an identical answer if you have given it in the past, or it appears in ConversationHistory.
Be honest, if you can't answer something, tell the human that you can't provide an answer.
Use the following pieces of MemoryContext to answer the question at the end. Also remember ConversationHistory is a list of Conversation objects.
---
ConversationHistory: {history}
---
MemoryContext: {context}
---
Human: {question}
Bot:
"""

prompt = Prompt(template=prompt_template, input_variables=["history", "context", "question"])
llm_chain = LLMChain(prompt=prompt, llm=OpenAI(temperature=0.15))

## Step 8 - Predict functionality
The search functionality and defined prompt above will be used to product the ML model response to the user input

### Further evolution of the below:
- Semantic search recommendations https://github.com/openai/openai-cookbook/blob/main/examples/Semantic_text_search_using_embeddings.ipynb;
- Question answering using embeddings https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb;

In [22]:
def search_contexts_by_text_prompt(text_prompt):
    res = search(df, query_str=text_prompt, depth=10)
    str_formatted_results = "\n\n - ".join(res.combined.values)
    print(f'Found results: {str_formatted_results}')
    combined_values_list = list(res.combined.values)
    prompt_contexts = []
    for idx, value in enumerate(combined_values_list):
        prompt_contexts.append(f'Context {idx}:\n{value}')
    return prompt_contexts

history = []

def predict_answer(question_str):
    contexts = search_contexts_by_text_prompt(question_str)
    answer = llm_chain.predict(
        question=question_str,
        context="\n\n".join(contexts),
        # History makes the ML stick to the conversation flow. Can be used to make additional prompt based on the result of the previous ones
        # history=history
        history=[]
    )
    history.append(f'Human: {question_str}')
    history.append(f'Bot: {answer}')
    return answer

## Step 9 - Q/A testing

In [23]:
question = "Who are you?"
answer = predict_answer(question)
answer


    Input: Who are you?
    OpenAI API tokens usage:
    - Prompt tokens: 4
    - Total tokens: 4
    
Found results: Title: Morning Coffee; Content: Great coffee at a good price. I'm a subscription buyer and I buy this month after month. What more can I say?

 - Title: WOW.....; Content: This chocolate is amazing..I love the taste and smell, this is the only chocolate for me...I found a new love!

 - Title: Arizona Green Tea with Pomegranate & Acai; Content: I love the convenience of the mail order process as I cannot locate this product in any super market in my area.

 - Title: super; Content: I love this coffee!  And such a great price.  Will buy more when I am running out which will be soon.

 - Title: Supurb; Content: I was introduced to Lagavulin 16 three days ago. I'm no single malt Scotch authority, but this was straight out delicious; better than anything I've tried, including highly-touted and expensive products served at Robert Burns tastings. Microsoft: put "Lagavulin" in

"Hello, I'm Alex, the Senior Infrastructure Engineer. I'm here to help you with any questions you may have about infrastructure engineering."

In [24]:
question = "What is Nectresse Sweetener?"
answer = predict_answer(question)
answer


    Input: What is Nectresse Sweetener?
    OpenAI API tokens usage:
    - Prompt tokens: 8
    - Total tokens: 8
    
Found results: Title: Its LIGHT but very nice sweetener; Content: This Nectresse Sweetener is rather light in taste --but is very nice over other light foods such as fruits , berries , melons -----anywhere that lightness yet sweetness is desired.<br /><br />I also enjoyed it in iced teas .....especially my fruited teas like strawberry or orange.....<br /><br />I also used it hot teas as welll---but since it was summer more cold drinks went to table than hot.But it would be great on and in either.<br /><br />I did not try it as a cooking sweetener because of its smaller sized containers....so I cannot recommend it for baking or cooking.<br /><br />I can and whole heartedly DO RECOMMEND it for any type of sweetening  where its to be sprinkled over such as cereal , oatmeals ect.......or in drinks of any type...and fruits as I mentioned earlier !!<br /><br />NO bad tasted

'Nectresse Sweetener is a light-tasting sweetener that is great for adding sweetness to light foods such as fruits, berries, melons, and iced or hot teas. It can also be used to sweeten cereal, oatmeal, and other dishes. However, it is not recommended for baking or cooking due to its smaller sized containers.'

In [None]:
question = "What do you think about a hot cider?"
answer = predict_answer(question)
answer

In [None]:
question = "Does hot cider taste good?"
answer = predict_answer(question)
answer