<h2> Pre-Requisites </h2>

For access to Azure OpenAI resource, fill this <a href="https://learn.microsoft.com/en-us/legal/cognitive-services/openai/limited-access.">form</a>.


<h2>Deploying Azure Models</h2>

<h4> Text Similarity Models </h4>
<li> text-embedding-ada-002 </li>
<li> text-similarity-davinci-001 </li>

<h4> Completions </h4>
<li> text-davinci-003 </li>
<li> gpt-35-turbo (v0301) </li>
<li> gpt-35-turbo-16k </li>

Create resource in West Europe/South Central US for access to above models. gpt-35-turbo-16k is available in North Central US.

For GPT4, fill this <a href="https://aka.ms/oai/get-gpt4">form</a>.

<h2> The Code </h2>

<h4> Embeddings API </h4>
<h5> Libraries </h5>
<br>
<li> Open AI library for Python </li>
Provides pre-defined set of classes for API resources that initialize themselves dynamically from API responses which makes it compatible with a wide range of versions of the OpenAI API.

In [None]:
pip install openai

<br>
<li> NumPy </li>
Provides a large collection of high-level mathematical functions to operate on multi-dimensional arrays and matrices.

In [None]:
pip install numpy

<br>
<li> Pandas </li>
Provides utils for data analysis as frames/tables.

In [None]:
pip install pandas

<h4> Setup OpenAI Configs </h4>

In [None]:
import openai
openai.api_type = "azure"
openai.api_key = ""
openai.api_base = ""
openai.api_version = ""


<h4> The Embedding API Call </h4>

In [None]:
def createEmbeddings(text):
    response = openai.Embedding.create(
        input=text,
        engine= "text-embedding-ada-002" #"text-similarity-davinci-001"
    )
    return response['data'][0]['embedding']

In [None]:
embeddingVector = createEmbeddings("Harry Potter and the sorceror's stone")
print (embeddingVector)

In [None]:
print (len(embeddingVector))

In [None]:
pip install plotly

In [None]:
from openai.embeddings_utils import cosine_similarity
import time
start_time = time.time()
print (cosine_similarity(createEmbeddings("Harry Potter"), embeddingVector))
print("--- %s seconds ---" % (time.time() - start_time))

In [None]:
from numpy import dot
from numpy.linalg import norm
def findCosineSimilarity(a,b):
    cos_sim = dot(a, b)/(norm(a)*norm(b))
    return cos_sim

start_time = time.time()
print(findCosineSimilarity(createEmbeddings("Harry Potter"), embeddingVector))
print("--- %s seconds ---" % (time.time() - start_time))

<h4> Applications </h4>

- Search (where results are ranked by relevance to a query string)
- Clustering (where text strings are grouped by similarity)
- Recommendations (where items with related text strings are recommended)
- Anomaly detection (where outliers with little relatedness are identified)
- Diversity measurement (where similarity distributions are analyzed)
- Classification (where text strings are classified by their most similar label)

In [None]:
reviews = [
    {"text": "Kettle Chips flavors can be hit or miss.  Some of their flavors are terrible.  But this is very simple and delicious.  Probably one of their best flavors.<br /><br />Pros: Thick, crunchy potato chips with light salt that doesn't compromise on taste, eco-friendly business that isn't just giving us lip service<br /><br />Cons: The occasional burnt chip and the bag is a pain to open.  They have a tab that you can pull down but most of the time I end up tearing down the entire side of the bag. Use scissors instead."},
    {"text": "These chips are the only ones I found to be tasty and healthy. They have fewer fat calories plus higher fiber for those who want good taste and nutrition--the perfect blend!"},
    {"text": "This cinnamon cake loaf has a wonderful natural flavor.  It's moist and tender and a great little sweet treat when you want something that isn't terribly bad for your diet.  It's delicious!"},
    {"text": "Recently purchased this cereal assuming it would contain nothing but healthy ingredients, then I read the label and discovered it contains partially hydrogenated soybean oil and high fructose corn syrup."},
    {"text": "Great taffy at a great price.  There was a wide assortment of yummy taffy.  Delivery was very quick.  If your a taffy lover, this is a deal."},
    {"text": "great product, poor delivery:  The coffee is excellent and I am a repeat buyer.  Problem this time was with the UPS delivery.  They left the box in front of my garage door in the middle of the driveway"}
]

In [None]:
from openai.embeddings_utils import get_embedding

for review in reviews:
    review["embeddings"] = get_embedding(
        review["text"],
        engine="text-embedding-ada-002"
    )
print (reviews[0])

<h4> Let's search for chips flavours</h4>

In [None]:
search_key_embeddings = createEmbeddings("Chips flavours")
for review in reviews:
    review["cosineSim"] = findCosineSimilarity(review["embeddings"], search_key_embeddings)
    print (review["cosineSim"])

In [None]:
from operator import itemgetter
reviews.sort(key=itemgetter('cosineSim'), reverse=True)
    

In [None]:
reviews[:2]

<h2>Similarity for recommendations </h2>

In [None]:
a = createEmbeddings("an apple a day keeps the doctor away")
b = createEmbeddings("orange juice is rich in VitaminC")
c = createEmbeddings("Niki Lauda is the OG F1 champ")
d = createEmbeddings("Sebastian Vettel drove for Aston Martin before retiring from Formula One at the end of the 2022 season")

print (findCosineSimilarity(a,b))
print (findCosineSimilarity(c,b))
print (findCosineSimilarity(a,c))
print (findCosineSimilarity(d,c))


In [None]:
resume = createEmbeddings("react, nodejs")

jd1 = createEmbeddings("python, r, responsible AI, semantic kernel")
jd2 = createEmbeddings("account management, power bi, jira")
jd3 = createEmbeddings("react, angular, materialui")
jd4 = createEmbeddings("bootstrap, angular, fluent, flutter")
jd5 = createEmbeddings("sql, dbms, mongodb, vectordb")

print (findCosineSimilarity(resume, jd1))
print (findCosineSimilarity(resume, jd2))
print (findCosineSimilarity(resume, jd3))
print (findCosineSimilarity(resume, jd4))
print (findCosineSimilarity(resume, jd5))

<h2> Completions API </h2>

In [None]:
response = openai.Completion.create(
  engine="trialDavinci",
  prompt="",
  temperature=1,
  max_tokens=100,
  top_p=0.5,
  frequency_penalty=0,
  presence_penalty=0,
  best_of=1,
  stop=None)

