## Purposee

1.Big Query Vector Store !

2.RAG with Filters

3.LLM Response

https://python.langchain.com/v0.2/docs/integrations/platforms/google/#bigquery-vector-search

https://cloud.google.com/bigquery/docs/vector-search-intro

Google Cloud BigQuery Vector Search lets you use GoogleSQL to do semantic search, using vector indexes for fast approximate results, or using brute force for exact results.

#### Reading from GCP Bucket

In [None]:
%pip install python-dotenv
%pip install --upgrade --quiet  langchain-google-community[gcs]
%pip install unstructured
%pip install --upgrade --quiet  langchain langchain-google-vertexai google-cloud-bigquery

In [31]:
from dotenv import load_dotenv #For loading data from the Env file
import os 
import json # For save the data as .json on disk

from langchain_google_community import GCSDirectoryLoader #For getting data from GCP Bucket

from langchain_google_vertexai import VertexAIEmbeddings # For document embedding

from google.cloud import bigquery # Big Query client 

from langchain_community.vectorstores import BigQueryVectorSearch #For Vector Store

from langchain_community.vectorstores.utils import DistanceStrategy

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.runnables import RunnableParallel, RunnablePassthrough

from langchain_google_vertexai import ChatVertexAI


In [32]:
load_dotenv()

True

In [33]:

loader = GCSDirectoryLoader(project_name=os.getenv('PROJECT_NAME'), bucket=os.getenv('BUCKET_NAME'))  #Put GCP  your Project Name & Bucket Name
lstDoc = loader.load()
lstDoc

[Document(metadata={'source': 'gs://strg-light-sourcedoc/Boson.txt'}, page_content='Boson, subatomic particle with integral spin (i.e., angular momentum in quantum-mechanical units of 0, 1, etc.) that is governed by the Bose-Einstein statistics (q.v.). Bosons include mesons (e.g., pions and kaons), nuclei of even mass number (e.g., helium-4), and the particles required to embody the fields of quantum field theory (e.g., photons and gluons). Bosons differ significantly from a group of subatomic particles known as fermions in that there is no limit to the number that can occupy the same quantum state. This behaviour gives rise, for example, to the remarkable properties of helium-4 when it is cooled to become a superfluid.'),
 Document(metadata={'source': 'gs://strg-light-sourcedoc/James Clerk Maxwell.txt'}, page_content='James Clerk Maxwell (born June 13, 1831, Edinburgh, Scotland—died November 5, 1879, Cambridge, Cambridgeshire, England) was a Scottish physicist best known for his formu

#### What is the type?

In [34]:
#what is the type ?
print(type(lstDoc[0]))

<class 'langchain_core.documents.base.Document'>


#### What are the Properties ?

In [35]:
vars(lstDoc[0])

{'id': None,
 'metadata': {'source': 'gs://strg-light-sourcedoc/Boson.txt'},
 'page_content': 'Boson, subatomic particle with integral spin (i.e., angular momentum in quantum-mechanical units of 0, 1, etc.) that is governed by the Bose-Einstein statistics (q.v.). Bosons include mesons (e.g., pions and kaons), nuclei of even mass number (e.g., helium-4), and the particles required to embody the fields of quantum field theory (e.g., photons and gluons). Bosons differ significantly from a group of subatomic particles known as fermions in that there is no limit to the number that can occupy the same quantum state. This behaviour gives rise, for example, to the remarkable properties of helium-4 when it is cooled to become a superfluid.',
 'type': 'Document'}

#### Adding some MetaData [in case Id]

In [36]:
for i in range(len(lstDoc)):
    lstDoc[i].metadata['Id']=i

In [37]:
for doc in lstDoc:
    print(vars(doc))

{'id': None, 'metadata': {'source': 'gs://strg-light-sourcedoc/Boson.txt', 'Id': 0}, 'page_content': 'Boson, subatomic particle with integral spin (i.e., angular momentum in quantum-mechanical units of 0, 1, etc.) that is governed by the Bose-Einstein statistics (q.v.). Bosons include mesons (e.g., pions and kaons), nuclei of even mass number (e.g., helium-4), and the particles required to embody the fields of quantum field theory (e.g., photons and gluons). Bosons differ significantly from a group of subatomic particles known as fermions in that there is no limit to the number that can occupy the same quantum state. This behaviour gives rise, for example, to the remarkable properties of helium-4 when it is cooled to become a superfluid.', 'type': 'Document'}
{'id': None, 'metadata': {'source': 'gs://strg-light-sourcedoc/James Clerk Maxwell.txt', 'Id': 1}, 'page_content': 'James Clerk Maxwell (born June 13, 1831, Edinburgh, Scotland—died November 5, 1879, Cambridge, Cambridgeshire, Eng

In [38]:
lstSrtContent=[doc.page_content for doc in lstDoc]
lstSrtContent

['Boson, subatomic particle with integral spin (i.e., angular momentum in quantum-mechanical units of 0, 1, etc.) that is governed by the Bose-Einstein statistics (q.v.). Bosons include mesons (e.g., pions and kaons), nuclei of even mass number (e.g., helium-4), and the particles required to embody the fields of quantum field theory (e.g., photons and gluons). Bosons differ significantly from a group of subatomic particles known as fermions in that there is no limit to the number that can occupy the same quantum state. This behaviour gives rise, for example, to the remarkable properties of helium-4 when it is cooled to become a superfluid.',
 'James Clerk Maxwell (born June 13, 1831, Edinburgh, Scotland—died November 5, 1879, Cambridge, Cambridgeshire, England) was a Scottish physicist best known for his formulation of electromagnetic theory. He is regarded by most modern physicists as the scientist of the 19th century who had the greatest influence on 20th-century physics, and he is r

GCP Information

In [39]:

PROJECT_ID = os.getenv('PROJECT_ID') 

# Set the project id
#! gcloud config set project {PROJECT_ID}

REGION = os.getenv('REGION') 
DATASET = os.getenv('DATASET') 
TABLE = os.getenv('TABLE')  

os.environ["GOOGLE_API_KEY"]=os.getenv('GOOGLE_API_KEY')  

#### Embedding

In [40]:


embedding = VertexAIEmbeddings(
    model_name=os.getenv('MODEL_NAME') , project=PROJECT_ID
)

#### Creating the Vector DB

In [41]:


client = bigquery.Client(project=PROJECT_ID, location=REGION)
client.create_dataset(dataset=DATASET, exists_ok=True)



Dataset(DatasetReference('proj-traning', 'vector_dataset'))

https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.utils.DistanceStrategy.html

https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.bigquery_vector_search.BigQueryVectorSearch.html

In [42]:


store = BigQueryVectorSearch(
    project_id=PROJECT_ID,
    dataset_name=DATASET,
    table_name=TABLE,
    location=REGION,
    embedding=embedding,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

#### Adding Doccument to the Vector DB

In [43]:
store.add_documents(lstDoc)

['9c56632480f84edf92c9a922c13736cb',
 '1973f2f4be314f89b7ea79fb7078249a',
 '7e3394e4f0834fc99875552948c015ed',
 '7f7158be0e654046ad6203e2e110fd63',
 '3ec26e88f3be436689df421ec5710dc3',
 '3d08ab41745c4180bab0d1fbead3668d',
 '687ba21f6cd643fa98fa4fe71097f2fa']

#### Search Data

In [47]:


query='S N Bose'
filter={
    "Id":6    
    }
query_vector = embedding.embed_query(query)

#ret_Val2 = store.similarity_search_by_vector(query_vector)
ret_Val2 = store.similarity_search_by_vector(query_vector,filter=filter)

print(ret_Val2)

json_output=[{'page_content': doc.page_content, 'metadata': doc.metadata} for doc in ret_Val2]

with open('VectorSearch.json', "w") as json_file:
        json.dump(json_output, json_file, indent=4)

[Document(metadata={'Id': 6, 'source': 'gs://strg-light-sourcedoc/SNB and Einstein.txt', '__id': '687ba21f6cd643fa98fa4fe71097f2fa', '__job_id': 'job_CB6zWdvCrUHLb5iAcIytSi3BgY1d'}, page_content='In 1924 an Indian physicist called Satyendra Nath Bose wrote to Albert Einstein saying he had solved a problem in quantum physics that had stumped the great man. One century on, Robert P Crease and Gino Elia explain how the correspondence led to the notion of Bose–Einstein condensation and why it revealed the power of diverse thinking. Short but sweet In 1924 Satyendra Nath Bose (left) wrote to Albert Einstein (right) saying he had developed a more satisfactory derivation of Planck’s law. The resulting correspondence, which was brief but deep, led to the prediction of what we now call Bose–Einstein condensation. (Courtesy: Left: Falguni Sarkar, courtesy AIP Emilio Segrè Visual Archives. Right: AIP Emilio Segrè Visual Archives, W. F. Meggers Gallery of Nobel Laureates Collection) One day in Jun

#### The Retriever

In [48]:
retriever = store.as_retriever()

#### The Chat Model

In [49]:

model=ChatVertexAI(model="gemini-pro")

#### Executing Chain [LangChain Expression Language (LCEL)]

https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel

https://python.langchain.com/v0.2/docs/concepts/#runnable-interface



In [50]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
output_parser = StrOutputParser()

setup_and_retrieval = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser

chain.invoke("Who was S N Bose?")

'## S N Bose: A Renowned Indian Physicist\n\n## Contributions:\n\n* Collaborated with Albert Einstein on the theory of the gaslike qualities of electromagnetic radiation.\n* Developed a more satisfactory derivation of Planck\'s Law, which led to the prediction of Bose–Einstein condensation. \n* Conducted research and published papers in various fields:\n    * Statistical mechanics\n    * Electromagnetic properties of the ionosphere\n    * Theories of X-ray crystallography and thermoluminescence\n    * Unified field theory\n\n## Education and Career:\n\n* Graduated from the University of Calcutta\n* Taught at the University of Dacca (1921–45) and then at Calcutta (1945–56)\n\n## Collaboration with Einstein:\n\n* Their collaboration helped bridge the "transnational nature of the quantum"\n* It exemplified the growing importance of international collaboration in science\n\n## Legacy:\n\n* His work has had a significant impact on physics, particularly in the areas of quantum mechanics and 