<font color='green'>
Pip install is the command you use to install Python packages with the help of a tool called Pip package manager.
<br><br>Installing LangChain package
</font>

In [1]:
!pip install langchain



<font color='green'>
Installing Openai package, which includes the classes that we can use to communicate with Openai services
<font>

In [2]:
!pip install Openai



## Let's use OpenAI

<font color='green'>
Imports the Python built-in module called "os."
<br>This module provides a way to interact with the operating system, such as accessing environment variables, working with files and directories, executing shell commands, etc
<br><br>
The environ attribute is a dictionary-like object that contains the environment variables of the current operating system session
<br><br>
By accessing os.environ, you can retrieve and manipulate environment variables within your Python program. For example, you can retrieve the value of a specific environment variable using the syntax os.environ['VARIABLE_NAME'], where "VARIABLE_NAME" is the name of the environment variable you want to access.
<font>

In [3]:
import os
os.environ["OPENAI_API_KEY"] = "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"

<font color='green'>
LangChain has built a Wrapper around OpenAI APIs, using which we can get access to all the services OpenAI provides.
<br>
The code snippet below imports a specific class called 'OpenAIEmbeddings'(Wrapper around OpenAI large language models) from the 'embeddings' module of the 'langchain' library.

<font>

In [4]:
from langchain.embeddings import OpenAIEmbeddings

<font color='green'>
Initialize the OpenAIEmbeddings object
<font>

In [5]:
embeddings = OpenAIEmbeddings()

<font color='green'>
Let's read our input data and get its embedding representation, so that we use it up for our future tasks
<font>

In [7]:
import pandas as pd
df = pd.read_excel('data.xlsx')
print(df)

     Words
0   School
1  College
2      Car
3     Bike
4    Apple
5   Orange
6   Banana


<font color='green'>
    We can use "apply" to apply the get_embedding function to each row in the dataframe because our words are stored in a pandas dataframe. In order to save time and to save the calculated word embeddings in a new csv file called "word_embeddings.csv" rather than calling OpenAI once more to carry out these computations.
    <font>

In [8]:
df['embedding'] = df['Words'].apply(lambda x: embeddings.embed_query(x))
df.to_csv('word_embeddings.csv')

<font color='green'>
    Let's load the existing file, which contains the embeddings, so that we can save chargers by not hitting the API repeatedly
    <font>

In [9]:
new_df = pd.read_csv('word_embeddings.csv')
print(new_df)

   Unnamed: 0    Words                                          embedding
0           0   School  [0.00558300968259573, 0.009224693290889263, -0...
1           1  College  [0.008637277409434319, -0.009738443419337273, ...
2           2      Car  [-0.004948626272380352, -0.012337295338511467,...
3           3     Bike  [0.0054834443144500256, -0.013623781502246857,...
4           4    Apple  [0.01444941945374012, -0.0039136698469519615, ...
5           5   Orange  [0.020670153200626373, -0.029327111318707466, ...
6           6   Banana  [-0.013021613471210003, -0.019990751519799232,...


<font color='green'>
Let's get the embeddings for our text
<font>

In [10]:
our_Text = "Mango"

In [11]:
text_embedding = embeddings.embed_query(our_Text)

In [12]:
print (f"Our embedding is {text_embedding}")

Our embedding is [-0.0034083453938364983, -0.01983037404716015, 0.010483776219189167, -0.01615050993859768, 0.006273655686527491, 0.00859912484884262, -0.02514573186635971, -0.014233915135264397, 0.0011579430429264903, -0.033016547560691833, -0.0011914834612980485, 0.0037245836574584246, 0.0019357613055035472, 0.01682770811021328, -0.021619195118546486, 0.004817042965441942, 0.03273544833064079, -0.008228582330048084, 0.00938492827117443, -0.01572885923087597, -0.015882186591625214, 0.002833366859704256, 0.0014757784083485603, -0.01243231538683176, 0.004622189328074455, 0.009531867690384388, 0.02451964281499386, -0.013658936135470867, 0.011563458479940891, 0.0030409980099648237, 0.04778711125254631, -0.015026107430458069, -0.0043474771082401276, -0.009896020404994488, -0.017619900405406952, 0.004599828738719225, -0.010304894298315048, -0.006343930494040251, -0.006308793090283871, -0.0183737613260746, 0.01443835161626339, -0.008330801501870155, -0.015652194619178772, -0.0180543288588523

<font color='green'>
    We can determine how similar a word is to other words in our dataframe after we have a vector representing that word.
    <br>
By computing the cosine similarity of the word vector for our search term to each word embedding in our dataframe.
    <font>

In [13]:
from openai.embeddings_utils import cosine_similarity

df["similarity score"] = df['embedding'].apply(lambda x: cosine_similarity(x, text_embedding))

df

Unnamed: 0,Words,embedding,similarity score
0,School,"[0.00558300968259573, 0.009224693290889263, -0...",0.781657
1,College,"[0.008637277409434319, -0.009738443419337273, ...",0.782224
2,Car,"[-0.004948626272380352, -0.012337295338511467,...",0.780015
3,Bike,"[0.0054834443144500256, -0.013623781502246857,...",0.797835
4,Apple,"[0.01444941945374012, -0.0039136698469519615, ...",0.813925
5,Orange,"[0.020670153200626373, -0.029327111318707466, ...",0.843855
6,Banana,"[-0.013021613471210003, -0.019990751519799232,...",0.898687


<font color='green'>
    Sorting by similarity values in dataframe reveals Banana, Orange, and Apple are closest to searched term, such as Mango.
    <font>

In [14]:
df.sort_values("similarity score", ascending=False).head(10)

Unnamed: 0,Words,embedding,similarity score
6,Banana,"[-0.013021613471210003, -0.019990751519799232,...",0.898687
5,Orange,"[0.020670153200626373, -0.029327111318707466, ...",0.843855
4,Apple,"[0.01444941945374012, -0.0039136698469519615, ...",0.813925
3,Bike,"[0.0054834443144500256, -0.013623781502246857,...",0.797835
1,College,"[0.008637277409434319, -0.009738443419337273, ...",0.782224
0,School,"[0.00558300968259573, 0.009224693290889263, -0...",0.781657
2,Car,"[-0.004948626272380352, -0.012337295338511467,...",0.780015
