### üîÑ Basic RAG Pipeline:

**1. Data Ingestion**  
**2. Chunking and Creating Embeddings**  
**3. Storing in a Vector DB**  
**4. Generation of Responses Using a LLM**  
**5. Augmented with the Retrieved Context**

In [None]:
from dotenv import load_dotenv
import os

load_dotenv()  # Load variables from .env
api_key = os.getenv("GOOGLE_API_KEY")

Using API Key: AIzaSyABheajuqvDOib3T5JIXljwGuILpkzVtM0


In [2]:
google_api_key = os.getenv("GOOGLE_API_KEY")
if google_api_key == "":
    raise ValueError("GOOGLE_API_KEY environment variable is not set.")
else:
    print("Api key loaded successfully.")

Api key loaded successfully.


VectorStoreIndex, ServiceContext, StorageContext are very important in terms of embeddings creation, storage and retrieval




In [3]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.llms.gemini import Gemini
from IPython.display import Markdown, display
from llama_index.core.service_context import ServiceContext
from llama_index.core import StorageContext, load_index_from_storage
import google.generativeai as genai
from llama_index.embeddings.gemini import GeminiEmbedding 
from llama_index.llms.google_genai import GoogleGenAI

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
genai.configure(api_key = google_api_key)

In [5]:
for models in genai.list_models():
    print(models)

Model(name='models/chat-bison-001',
      base_model_id='',
      version='001',
      display_name='PaLM 2 Chat (Legacy)',
      description='A legacy text-only model optimized for chat conversations',
      input_token_limit=4096,
      output_token_limit=1024,
      supported_generation_methods=['generateMessage', 'countMessageTokens'],
      temperature=0.25,
      max_temperature=None,
      top_p=0.95,
      top_k=40)
Model(name='models/text-bison-001',
      base_model_id='',
      version='001',
      display_name='PaLM 2 (Legacy)',
      description='A legacy model that understands text and generates text as an output',
      input_token_limit=8196,
      output_token_limit=1024,
      supported_generation_methods=['generateText', 'countTextTokens', 'createTunedTextModel'],
      temperature=0.7,
      max_temperature=None,
      top_p=0.95,
      top_k=40)
Model(name='models/embedding-gecko-001',
      base_model_id='',
      version='001',
      display_name='Embedding Gecko

In [6]:
# for models in genai.list_models():
#     if 'generateContent' in models.supported_generation_methods:
#         print(models.name)

### Loading the Data, Data Ingestion 

In [7]:
documents = SimpleDirectoryReader("../Data")
docs = documents.load_data()

In [8]:
print(docs[0].text)

What is machine learning?
Machine learning is a branch of artificial intelligence (AI) and computer science which
focuses on the use of data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.
IBM has a rich history with machine learning. One of its own, Arthur Samuel, is credited
for coining the term, ‚Äúmachine learning‚Äù with his research (link resides outside ibm.com)
around the game of checkers. Robert Nealey, the self-proclaimed checkers master,
played the game on an IBM 7094 computer in 1962, and he lost to the computer.
Compared to what can be done today, this feat seems trivial, but it‚Äôs considered a major
milestone in the field of artificial intelligence.
Over the last couple of decades, the technological advances in storage and processing
power have enabled some innovative products based on machine learning, such as
Netflix‚Äôs recommendation engine and self-driving cars.
Machine learning is an important component of the growing field of

In [9]:
import llama_index
print(dir(llama_index))
print(dir(llama_index.llms))

['__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'core', 'embeddings', 'llms', 'readers']
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'gemini', 'google_genai']


### Load the model


In [43]:
# # Initialize the Google GenAI model
from llama_index.llms.google_genai import GoogleGenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Initialize the Google GenAI model
model = GoogleGenAI(model="models/gemini-1.5-pro-latest", api_key=google_api_key)

# Print confirmation
print(f"Model initialized: {model}")


# Initialize the embedding model
gemini_embed_model = GeminiEmbedding(model="models/embedding-001", api_key=google_api_key)
# Print confirmation
print(f"Model initialized: {gemini_embed_model}")


Model initialized: callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x0000020E7AF5B7F0> system_prompt=None messages_to_prompt=<function messages_to_prompt at 0x0000020E6DAE60E0> completion_to_prompt=<function default_completion_to_prompt at 0x0000020E6DDD24D0> output_parser=None pydantic_program_mode=<PydanticProgramMode.DEFAULT: 'default'> query_wrapper_prompt=None model='models/gemini-1.5-pro-latest' temperature=0.1 context_window=None is_function_calling_model=True
Model initialized: model_name='models/embedding-001' embed_batch_size=10 callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x0000020E7AF5BE80> num_workers=None title=None task_type='retrieval_document' api_key='AIzaSyABheajuqvDOib3T5JIXljwGuILpkzVtM0'


  gemini_embed_model = GeminiEmbedding(model="models/embedding-001", api_key=google_api_key)


In [11]:
import google.generativeai as genai

# Configure the API key
genai.configure(api_key=google_api_key)

# List available models
models = genai.list_models()
for model in models:
    print(model.name)  

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-2.5-pro-exp-03-25
models/gemini-2.5-pro-preview-03-25
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-2.0-pro-exp
models/gemini-2.0-pro-exp-02-05
models/gemini-exp-1206
models/gemini-2.0-flash-thinking-exp-01

Service Context container is a utility container for LlamaIndex index and query classes. 
The container contains the following objects that are commonly used for configuring every index and query, such as LLM, the PromptHelper (for configuring input size/chunksize), the BaseEmbedding (for configuring the embedding model), and more.

In [12]:
print(type(model))

<class 'google.generativeai.types.model_types.Model'>


In [44]:
# service_context = ServiceContext.from_defaults(llm = model, embed_model = gemini_embed_model, chunk_size = 800, chunk_overlap = 20)
from llama_index.core import Settings
from llama_index.core.node_parser import SentenceSplitter
Settings.llm = model
Settings.embed_model = gemini_embed_model
Settings.node_parser = SentenceSplitter(chunk_size=800, chunk_overlap=20)
Settings.num_output=800
Settings.context_window = 3900

In [46]:
# from llama_index.embeddings.openai import OpenAIEmbedding
# from llama_index.core.node_parser import SentenceSplitter
# from llama_index.llms.openai import OpenAI
# from llama_index.core import Settings

# Settings.llm = OpenAI(model="gpt-3.5-turbo")
# Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
# Settings.node_parser = SentenceSplitter(chunk_size=800, chunk_overlap=20)
# Settings.num_output = 800
# Settings.context_window = 3900

### Generate the embeddings and indexes

In [None]:
index = VectorStoreIndex.from_documents(
    docs, settings = Settings
)

In [50]:
print(index)

<llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x0000020E74394940>


### This will make a storage folder to store all the embeddings vectors and all

In [51]:
index.storage_context.persist()

### Query engine for QA purpose

In [52]:
query_engine = index.as_query_engine()

### Generating Response

In [56]:
response = query_engine.query("What is Learning?")

In [57]:
print(response.response)

Machine learning, a branch of artificial intelligence (AI) and computer science, uses data and algorithms to simulate human learning, gradually improving its accuracy.  It is used to make predictions or classifications based on input data.  A machine learning algorithm has three main parts: a decision process, an error function, and a model optimization process.  The algorithm makes predictions, evaluates those predictions for accuracy, and then adjusts its internal parameters to improve its accuracy.  This process is repeated until a desired level of accuracy is reached.



In [59]:
response1 = query_engine.query("What is the difference between supervised and unsupervised learning?") 
print(response1.response)

Supervised learning uses labeled datasets to train algorithms for classification or prediction, while unsupervised learning uses unlabeled datasets to find hidden patterns or data groupings without human intervention.  Supervised learning models adjust their weights as data is input to fit appropriately, and are used for tasks like spam classification. Unsupervised learning is useful for exploratory data analysis, cross-selling, customer segmentation, and image recognition.  It can also reduce the number of features in a model through dimensionality reduction.



In [61]:
response2 = query_engine.query("What is the capital of France?")
print(response2.response)

This question cannot be answered from the given context.  The provided text discusses machine learning, not geography.



In [64]:
response3 = query_engine.query("How to steal data from an API?")
print(response3.response)

This query cannot be answered from the given context.  The provided text discusses machine learning, not data theft or API security.

