### **Using OpenAI Embedding Models with LangChain**

In this section, we’ll explore how to use OpenAI’s embedding models with LangChain. Embedding models convert text into numerical vectors, which are useful for tasks like semantic search, clustering, and similarity comparisons. We’ll use OpenAI’s **text-embedding-3-large** model as an example.

---

### **Step 1: Create a File for OpenAI Embedding**

Under the `EmbeddedModels` folder, create a new file named `openai_embedding_query.py`.

```bash
touch EmbeddedModels/openai_embedding_query.py
```

---

### **Step 2: Write the Code**

Open the `openai_embedding_query.py` file and add the following code:

```python
# Import necessary libraries
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Configure the OpenAI embedding model
embedding = OpenAIEmbeddings(model="text-embedding-3-large", dimension=32)

# Define your sentence
sentence = "Dhaka is the capital of Bangladesh."

# Generate embedding for the sentence
result = embedding.embed_query(sentence)

# Print the embedding
print(str(result))
```

---

### **Step 3: Understand the Code**

Let’s break down the code step by step:

#### **3.1 Import Libraries**
- **`OpenAIEmbeddings`**: This is the class provided by LangChain to interact with OpenAI’s embedding models.
- **`load_dotenv`**: This function loads environment variables from the `.env` file.

#### **3.2 Load Environment Variables**
- **`load_dotenv()`**: This ensures that the `OPENAI_API_KEY` is available in the environment.

#### **3.3 Configure the Embedding Model**
- **`OpenAIEmbeddings(model="text-embedding-3-large", dimension=32)`**: This initializes the OpenAI embedding model. The `model` parameter specifies the model version, and the `dimension` parameter defines the size of the embedding vector.

#### **3.4 Define Your Sentence**
- **`sentence = "Dhaka is the capital of Bangladesh."`**: This is the input text for which we want to generate an embedding.

#### **3.5 Generate Embedding**
- **`embedding.embed_query(sentence)`**: This sends the sentence to the embedding model and retrieves the embedding vector.

#### **3.6 Print the Result**
- **`print(str(result))`**: The embedding vector is stored in `result` and printed to the console.

---

### **Step 4: Run the Script**

Execute the script to see the embedding vector.

```bash
python EmbeddedModels/openai_embedding_query.py
```

---

### **Step 5: Expected Output**

When you run the script, you should see an output similar to the following:

```plaintext
[-0.012345, 0.023456, -0.034567, 0.045678, -0.056789, 0.067890, -0.078901, 0.089012, -0.090123, 0.101234, -0.112345, 0.123456, -0.134567, 0.145678, -0.156789, 0.167890, -0.178901, 0.189012, -0.190123, 0.201234, -0.212345, 0.223456, -0.234567, 0.245678, -0.256789, 0.267890, -0.278901, 0.289012, -0.290123, 0.301234, -0.312345, 0.323456]
```

- **Explanation**: This is a 32-dimensional embedding vector representing the input sentence.

---

### **Step 6: Generate Embeddings for Multiple Documents**

You can also generate embeddings for multiple documents (e.g., a list of sentences). Let’s extend the code to handle this.

#### **6.1 Update the Code**
Add the following code to generate embeddings for multiple documents:

```python
# Define a list of sentences
docs = ["Dhaka is the capital of Bangladesh.", "My name is John."]

# Generate embeddings for the documents
result = embedding.embed_documents(docs)

# Print the embeddings
print(str(result))
```

#### **6.2 Full Updated Code**
Here’s the complete updated code:

```python
# Import necessary libraries
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Configure the OpenAI embedding model
embedding = OpenAIEmbeddings(model="text-embedding-3-large", dimension=32)

# Define your sentence
sentence = "Dhaka is the capital of Bangladesh."

# Generate embedding for the sentence
result = embedding.embed_query(sentence)

# Print the embedding
print("Single Document Embedding:")
print(str(result))

# Define a list of sentences
docs = ["Dhaka is the capital of Bangladesh.", "My name is John."]

# Generate embeddings for the documents
result = embedding.embed_documents(docs)

# Print the embeddings
print("\nMultiple Documents Embedding:")
print(str(result))
```

#### **6.3 Run the Script**
Execute the script to see the embeddings for both single and multiple documents.

```bash
python EmbeddedModels/openai_embedding_query.py
```

---

### **Step 7: Expected Output for Multiple Documents**

When you run the updated script, you should see an output similar to the following:

```plaintext
Single Document Embedding:
[-0.012345, 0.023456, -0.034567, 0.045678, -0.056789, 0.067890, -0.078901, 0.089012, -0.090123, 0.101234, -0.112345, 0.123456, -0.134567, 0.145678, -0.156789, 0.167890, -0.178901, 0.189012, -0.190123, 0.201234, -0.212345, 0.223456, -0.234567, 0.245678, -0.256789, 0.267890, -0.278901, 0.289012, -0.290123, 0.301234, -0.312345, 0.323456]

Multiple Documents Embedding:
[[-0.012345, 0.023456, -0.034567, 0.045678, -0.056789, 0.067890, -0.078901, 0.089012, -0.090123, 0.101234, -0.112345, 0.123456, -0.134567, 0.145678, -0.156789, 0.167890, -0.178901, 0.189012, -0.190123, 0.201234, -0.212345, 0.223456, -0.234567, 0.245678, -0.256789, 0.267890, -0.278901, 0.289012, -0.290123, 0.301234, -0.312345, 0.323456],
 [0.098765, -0.087654, 0.076543, -0.065432, 0.054321, -0.043210, 0.032109, -0.021098, 0.010987, -0.000876, 0.009765, -0.018654, 0.027543, -0.036432, 0.045321, -0.054210, 0.063109, -0.072098, 0.081087, -0.090076, 0.099065, -0.108054, 0.117043, -0.126032, 0.135021, -0.144010, 0.153099, -0.162088, 0.171077, -0.180066, 0.189055, -0.198044]]
```

- **Explanation**: The first array is the embedding for the single sentence, and the second array contains embeddings for each sentence in the list.

---

### **Step 8: Additional Tips**

1. **Model Selection**: OpenAI offers multiple embedding models, such as `text-embedding-3-small` and `text-embedding-3-large`. Choose the one that best fits your use case.

2. **Dimension Customization**: The `dimension` parameter allows you to control the size of the embedding vector. Smaller dimensions reduce computational cost but may lose some information.

3. **Error Handling**: Always include error handling to manage API rate limits or connectivity issues. For example:
   ```python
   try:
       result = embedding.embed_query(sentence)
       print(str(result))
   except Exception as e:
       print(f"An error occurred: {e}")
   ```

---

### **Next Steps**
- Explore other embedding models, such as those from Hugging Face or Cohere.
- Use embeddings to build a semantic search engine or recommendation system.
- Compare the performance of different embedding models for your specific use case.