# Local Query System Using Gemini, RAG, and Vector Stores
                                                                   

Lead Data Scientist: Y.B.

## Project Overview
This project provides a step-by-step guide to building a query system for local data using the free Gemini model, Retrieval-Augmented Generation (RAG), and vector stores. The system enables efficient querying and retrieval of relevant data while leveraging AI-powered insights. This file includes instructions on setting up the environment, configuring dependencies, and executing queries efficiently. Additionally, it covers best practices for optimizing performance and troubleshooting common issues.

In [2]:
from google import genai
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import DirectoryLoader
import config

In [3]:
from langchain.document_loaders import PyPDFLoader

In [4]:
loader = PyPDFLoader('filepath/*')
documents = loader.load()

In [5]:
client = genai.Client(api_key=config.Google_API_KEY50323)

### Gemini Installation tips:
1- Associate with a Google Cloud Project:
Run the following commands to initialize and authenticate your Google Cloud environment:

gcloud init 

gcloud auth application-default login

2- Enable Required APIs and Credentials in your project in google cloud:

In your Google Cloud Project, navigate to the API & Services section.

Enable the relevant APIs needed for Gemini.

Set up the necessary credentials for authentication and access.


In [8]:
import os 
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "filepath/credential.json"

## Embedding

In [9]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004",api_key=config.Google_API_KEY50323)
docsearch = Chroma.from_documents(texts, embeddings)


## Vector Store

In [4]:
from google import genai
from google.genai import types
from langchain_google_genai import ChatGoogleGenerativeAI


In [11]:
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash",
                 temperature=0.7, top_p=0.85)

In [13]:
qa = VectorDBQA.from_chain_type(llm=llm, chain_type="stuff", vectorstore=docsearch)




In [15]:
query = "What's carbon emission for NVDA?"
qa.invoke(query)

{'query': "What's carbon emission for NVDA?",
 'result': "In FY24, NVIDIA's Scope 1 emissions were 14,390 MT CO2e, Scope 2 (market-based) were 40,555 MT CO2e, and Scope 3 emissions totaled 3,637,478 MT CO2e."}