# Getting started with `embeddings`

In [1]:
from langchain.embeddings import BedrockEmbeddings

## Embedding a sentence 

In [2]:
embeddings = BedrockEmbeddings()
text = "New Delhi is the capital of India"  

query_result = embeddings.embed_query(text)  

print(len(query_result))

1536


## Embedding a group of sentences/document

In [3]:
embeddings = BedrockEmbeddings()
texts = ["New Delhi is the capital of India", 
         "Welcome to India", 
         "I am going to play football today"]

doc_vectors = embeddings.embed_documents(texts)

print(f"No. of vectors : {len(doc_vectors)}")
print(f"Dimension of each vectors : {[len(i) for i in doc_vectors]}")

No. of vectors : 3
Dimension of each vectors : [1536, 1536, 1536]


# Vector datastore (Amazon Aurora using `pgvector`)

<div style="background-color: #f0f8ff; padding: 10px; border-radius: 5px; font-size: 1.1em;">
<b>Prerequisite:</b>
<ol>
    <li>Have an <b>Aurora cluster ready</b>.</li>
    <li>Create the <b>pgvector extension</b> on your Aurora PostgreSQL database (DB) cluster:
        <pre style="font-size: 1.1em;"><code>
        CREATE EXTENSION vector;
        </code></pre>
    </li>
</ol>
</div>


We can connect to the Aurora cluster and check 


```sql
-- SHOW the current database
SELECT current_database();

-- SHOW all the tables in the database
SELECT table_name
FROM postgres.information_schema.tables
WHERE table_schema = 'public';
```

## Importing few of the `libs`

In [4]:
import os
from dotenv import load_dotenv
from langchain.vectorstores.pgvector import PGVector, DistanceStrategy

## Loading all the `env` variables 


In [5]:
load_dotenv()

True

## Create a `collection` and `connect` with the Vector store

In [6]:
# Collection Name 
COLLECTION_NAME = "my_collection"

# Connection String
CONNECTION_STRING = PGVector.connection_string_from_db_params(driver = os.getenv("PGVECTOR_DRIVER"),
                                                              user = os.getenv("PGVECTOR_USER"),                                      
                                                              password = os.getenv("PGVECTOR_PASSWORD"),                                  
                                                              host = os.getenv("PGVECTOR_HOST"),                                            
                                                              port = os.getenv("PGVECTOR_PORT"),                                          
                                                              database = os.getenv("PGVECTOR_DATABASE"),
                                                              )  

# Text Embedding model
embeddings = BedrockEmbeddings()

# Creating the VectorDB store instance   
my_vector_store = PGVector(collection_name=COLLECTION_NAME,
                           connection_string=CONNECTION_STRING,
                           embedding_function=embeddings,
                           distance_strategy = DistanceStrategy.EUCLIDEAN
                          )

At this point, `LangChain` will create **two** tables in the Aurora database:
- `langchain_pg_collection`
- `langchain_pg_embedding`


We can see these **two** tables 

![Vector Store on AWS](images/sql_lc_show_tables.png)

We can see our newly created `COLLECTION` **my_collection**

![Vector Store on AWS](images/sql_lc_show_tables2.png)

Let's see what we have inside the other table `langchain_pg_embedding` 

![Vector Store on AWS](images/sql_lc_show_tables3.png)

## Create some `vectors`

In [7]:
texts = ["New Delhi is the capital of India", 
         "Welcome to India", 
         "I am going to play football today"]

# Text --> Embeddings --> Vectors --> Aurora 
my_vector_store.from_texts(
                            texts=texts,
                            collection_name=COLLECTION_NAME,
                            connection_string=CONNECTION_STRING,
                            embedding=embeddings
                          );

Let's now check the `langchain_pg_embedding` table

![Vector Store on AWS](images/sql_lc_show_tables4.png)

#### Add few more vectors

In [8]:
texts = ['The sky is clear tonight.',
         'Cats are curious animals.',
         "It's raining in Paris.",
         'Learning Python can be fun.',
         'Coffee tastes better with friends.',
         'I live in Boston, its a beautiful city',
         'There is museum next to my home',
         'Music brings people together.',
         'The museum is closed on Mondays in few places',
         'In few places museums are open 7 days a week. like in my city']

# Text --> Embeddings --> Vectors --> Aurora 
my_vector_store.from_texts(
                            texts=texts,
                            collection_name=COLLECTION_NAME,
                            connection_string=CONNECTION_STRING,
                            embedding=embeddings
                          );


## Run `similarity search` with PGVector with distance.

In [9]:
my_vector_store.similarity_search(query = "Are museums open all the days in any city", k=4) 

[Document(page_content='In few places museums are open 7 days a week. like in my city'),
 Document(page_content='The museum is closed on Mondays in few places'),
 Document(page_content='There is museum next to my home'),
 Document(page_content='I live in Boston, its a beautiful city')]