# 00 Import Libraries
We need `Pandas` to read in Data and `openai` to get the Embeddings.

In [1]:
import pandas as pd
from openai import OpenAI

# 01 Functions
We need a function to open Files and a function to return the Embeddings of our Data. 

The function named <code>get_embedding</code> takes two arguments: `text`and `model`:<br><br>
<ol>
    <li><strong>Text Processing:</strong> The function first processes the `text` argument by replacing newline characters (`\n`) with spaces so that the input text is in a single line.</li>
<li><strong>Generate Embeddings</strong> We then call `embeddings_create` with our list containing the processed text and the model we want to use to get the Embeddings.</li>
<li><strong>Return Embeddings</strong> We access the first element from our list to get the Embedding for our input text.</li>
</ol>

In [2]:
# FUNCTION TO OPEN FILES
def open_file(filepath):
    with open(filepath, 'r', encoding='utf-8') as infile:
        return infile.read()

# FUNCTION TO GET EMBEDDINGS
def get_embedding(text, model):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

# 02 Credentials
As we are going to using OpenAI's Embedding model `text-embedding-3-small` we need a valif OpenAi key.

In [4]:
# LOAD OPENAI CREDENTIALS
client = OpenAI(api_key=open_file('KEYS/openaiapikey.txt'))

# 03 Read in Data
We now need to load our database which we have saved in a JSON file.

In [12]:
# READ JSON INTO DATAFRAME
df = pd.read_json('json/books_info.json', encoding='utf-8')
df

Unnamed: 0,book_title,book_category,book_description,book_link,book_rating,book_price,book_availability
0,It's Only the Himalayas,Travel,"“Wherever you go, whatever you do, just . . . ...",its-only-the-himalayas_981/index.html,Two,£45.17,In stock
1,Full Moon over Noah’s ...,Travel,Acclaimed travel writer Rick Antonson sets his...,full-moon-over-noahs-ark-an-odyssey-to-mount-a...,Four,£49.43,In stock
2,See America: A Celebration ...,Travel,To coincide with the 2016 centennial anniversa...,see-america-a-celebration-of-our-national-park...,Three,£48.87,In stock
3,Vagabonding: An Uncommon Guide ...,Travel,With a new foreword by Tim Ferriss •There’s no...,vagabonding-an-uncommon-guide-to-the-art-of-lo...,Two,£36.94,In stock
4,Under the Tuscan Sun,Travel,A CLASSIC FROM THE BESTSELLING AUTHOR OF UNDER...,under-the-tuscan-sun_504/index.html,Three,£37.33,In stock
...,...,...,...,...,...,...,...
995,Why the Right Went ...,Politics,“Dionne's expertise is evident in this finely ...,why-the-right-went-wrong-conservatism-from-gol...,Four,£52.65,In stock
996,Equal Is Unfair: America's ...,Politics,We’ve all heard that the American Dream is van...,equal-is-unfair-americas-misguided-fight-again...,One,£56.86,In stock
997,Amid the Chaos,Cultural,Some people call Eritrea the “North Korea of A...,amid-the-chaos_788/index.html,One,£36.58,In stock
998,Dark Notes,Erotica,They call me a slut. Maybe I am.Sometimes I do...,dark-notes_800/index.html,Five,£19.19,In stock


# 04 Get Embeddings
Now we call our `get_embeddings`method for all book descriptions in our database and save them in a new column **text_embedding**.<br><br> <strong><em>CAUTION: This may take a while!</em></strong>

In [10]:
# GET EMEBDDINGS FOR DATABASE
df['text_embedding'] = df.book_description.apply(lambda x: get_embedding(x, model='text-embedding-3-small'))

# 05 Save Embeddings
Lastly we save everything back into a new local JSON file.

In [11]:
# SAVE EMBEDDINGS IN NEW JSON
with open('json/books_info_embeddings.json', 'w', encoding='utf-8') as file:
    df.to_json(file, orient='records', indent=4, force_ascii=False)