# PGVector to AlloyDB Migration
Self Link: [go/pg-to-alloy-migration-search](http://go/pg-to-alloy-migration-search)

## Introduction

In this codelab, you'll learn how to use AlloyDB interface for vector search in any DB created using the PGVector interface.

This would allow you to migrate from using [PGVector](https://api.python.langchain.com/en/latest/vectorstores/langchain_postgres.vectorstores.PGVector.html#langchain_postgres.vectorstores.PGVector) to [AlloyDB Vector Store](https://github.com/googleapis/langchain-google-alloydb-pg-python/blob/main/docs/vector_store.ipynb) search methods.

The AlloyDB interface simplifies secure connections to the AlloyDB database, even for users with little experience.


## Before you begin

This notebook assumes that you have done the following:

* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)
* [Enable the AlloyDB API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)
* [Create a AlloyDB cluster and instance](https://cloud.google.com/alloydb/docs/cluster-create)
* [Create a AlloyDB database](https://cloud.google.com/alloydb/docs/quickstart/create-and-connect)
* [Add a User to the database](https://cloud.google.com/alloydb/docs/database-users/about)


  ### 🦜🔗 Library Installation
  Install the integration library, `langchain-google-alloydb-pg`, and the library for the embedding service, `langchain-google-vertexai`.

In [None]:
%pip install --upgrade --quiet  langchain-google-alloydb-pg langchain-google-vertexai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.7 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.8/2.7 MB[0m [31m26.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.7/2.7 MB[0m [31m50.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m35.6 MB/s[0m eta [36m0:00:00[0m
[?25h

**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.

In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

### 🔐 Authentication
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.

* If you are using Colab to run this notebook, use the cell below and continue.

* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)

In [None]:
from google.colab import auth

auth.authenticate_user()

  ### ☁ Set Your Google Cloud Project
  Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.
  
  If you don't know your project ID, try the following:

  * Run `gcloud config list`.
  * Run `gcloud projects list`.
  * See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "twisha-dev"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}

Updated property [core/project].


## Basic Usage

### Set AlloyDB database values

Find your database values, in the [AlloyDB Instances page](https://console.cloud.google.com/alloydb/clusters).

In [None]:
# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
CLUSTER = "twisha-dev-cluster"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "test_db"  # @param {type: "string"}

### AlloyDBEngine Connection Pool

One of the requirements and arguments to establish AlloyDB as a vector store is a `AlloyDBEngine` object. The `AlloyDBEngine`  configures a connection pool to your AlloyDB database, enabling successful connections from your application and following industry best practices.

To create a `AlloyDBEngine` using `AlloyDBEngine.from_instance()` you need to provide only 5 things:

1. `project_id` : Project ID of the Google Cloud Project where the AlloyDB instance is located.
2. `region` : Region where the AlloyDB instance is located.
3. `cluster`: The name of the AlloyDB cluster.
4. `instance` : The name of the AlloyDB instance.
5. `database` : The name of the database to connect to on the AlloyDB instance.


By default, [IAM database authentication](https://cloud.google.com/alloydb/docs/connect-iam) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the environment.

Optionally, [built-in database authentication](https://cloud.google.com/alloydb/docs/database-users/about) using a username and password to access the AlloyDB database can also be used. Just provide the optional `user` and `password` arguments to `AlloyDBEngine.from_instance()`:

* `user` : Database user to use for built-in database authentication and login.
* `password` : Database password to use for built-in database authentication and login.

In [None]:
# @title Set Your Values Here { display-mode: "form" }
USER = "postgres"  # @param {type: "string"}
PASSWORD = "alloydb"  # @param {type: "string"}

Create a connection to your AlloyDB for PostgreSQL instance using the AlloyDBEngine class.



In [None]:
from langchain_google_alloydb_pg import AlloyDBEngine, AlloyDBVectorStore, Column
from langchain_core.documents import Document
import uuid

engine = AlloyDBEngine.from_instance(
    project_id=PROJECT_ID,
    region=REGION,
    cluster=CLUSTER,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)

### Create an embedding class instance

You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).
You may need to enable Vertex AI API to use `VertexAIEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings).

In [None]:
# enable Vertex AI API
!gcloud services enable aiplatform.googleapis.com

In [None]:
from langchain_google_vertexai import VertexAIEmbeddings

embeddings_service = VertexAIEmbeddings(
    model_name="textembedding-gecko@003", project=PROJECT_ID
)

### Initialize an AlloyDB Loader

Intialize an AlloyDB Loader to fetch collection uuid from the "langchain_pg_collection" table

In [None]:
from langchain_google_alloydb_pg import AlloyDBLoader

collection_name = "test_table"
collection_loader = AlloyDBLoader.create_sync(
    engine=engine,
    query=f"SELECT * from langchain_pg_collection WHERE name='{collection_name}'"
)
doc = collection_loader.load()
uuid = doc[0].page_content

### Initialize an AlloyDBVectorStore on the embeddings data

In [None]:
embedding_vectorstore = AlloyDBVectorStore.create_sync(
    engine=engine,
    table_name="langchain_pg_embedding",
    embedding_service=embeddings_service,
    content_column="document",
    metadata_columns=["cmetadata", "collection_id"],
    id_column="id",
)

### Perform basic similarity search

In [None]:
# Equivalent PGVector code:
# pg_vectorstore.similarity_search(
#     "cats", k=5
# )

embedding_vectorstore.similarity_search(
    "cats", k=5, filter=f"collection_id='{uuid}'"
)

[Document(metadata={'cmetadata': {'id': 1, 'topic': 'animals', 'location': 'pond'}, 'collection_id': UUID('90462d20-97e5-4094-ba4b-9e2b776938e6')}, page_content='there are cats in the pond'),
 Document(metadata={'cmetadata': {'id': 5, 'topic': 'art', 'location': 'museum'}, 'collection_id': UUID('90462d20-97e5-4094-ba4b-9e2b776938e6')}, page_content='the new art exhibit is fascinating'),
 Document(metadata={'cmetadata': {'id': 6, 'topic': 'art', 'location': 'museum'}, 'collection_id': UUID('90462d20-97e5-4094-ba4b-9e2b776938e6')}, page_content='a sculpture exhibit is also at the museum'),
 Document(metadata={'cmetadata': {'id': 2, 'topic': 'animals', 'location': 'pond'}, 'collection_id': UUID('90462d20-97e5-4094-ba4b-9e2b776938e6')}, page_content='ducks are also found in the pond'),
 Document(metadata={'cmetadata': {'id': 9, 'topic': 'reading', 'location': 'library'}, 'collection_id': UUID('90462d20-97e5-4094-ba4b-9e2b776938e6')}, page_content='the library hosts a weekly story time for 

### Perform similarity search with metadata filters

The filter should be written using SQL syntax as it forms part of the WHERE clause in your query.

In [None]:
embedding_vectorstore.similarity_search(
    "cats", k=5, filter=f"collection_id='{uuid}' and cmetadata->>'topic' = 'animals'"
)

[Document(metadata={'cmetadata': {'id': 1, 'topic': 'animals', 'location': 'pond'}, 'collection_id': UUID('90462d20-97e5-4094-ba4b-9e2b776938e6')}, page_content='there are cats in the pond'),
 Document(metadata={'cmetadata': {'id': 2, 'topic': 'animals', 'location': 'pond'}, 'collection_id': UUID('90462d20-97e5-4094-ba4b-9e2b776938e6')}, page_content='ducks are also found in the pond')]