<a href="https://colab.research.google.com/github/InduwaraGayashan001/Generative-AI/blob/main/Gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

In [None]:
!pip install langchain langchain_community langchain-google-genai

In [None]:
!pip install python-dotenv langchain_experimental sentence-transformers langchain_chroma langchainhub unstructured

# Load Data

In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("Attention.pdf")
data = loader.load()

In [None]:
data

In [None]:
len(data)

11

# Chunking

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 200,
)

docs = text_splitter.split_documents(data)
len(docs)

43

# Embeddings

In [None]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import os
from google.colab import userdata

os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')


embeddings = GoogleGenerativeAIEmbeddings(
  model = "models/embedding-001"
)

vector = embeddings.embed_query("Hello World")
len(vector)

768

# Knowledge Base

In [None]:
vectorstore = Chroma.from_documents(docs, embeddings)

In [None]:
retriever = vectorstore.as_retriever(
    search_type = "similarity",
    search_kwargs = {"k": 2}
)

retrieved_docs = retriever.invoke("What is Attention Layer?")
len(retrieved_docs)

2

In [None]:
print(retrieved_docs[1].page_content)

around each of the sub-layers, followed by layer normalization. We also modify the self-attention
sub-layer in the decoder stack to prevent positions from attending to subsequent positions. This
masking, combined with fact that the output embeddings are offset by one position, ensures that the
predictions for position ican depend only on the known outputs at positions less than i.
3.2 Attention
An attention function can be described as mapping a query and a set of key-value pairs to an output,
where the query, keys, values, and output are all vectors. The output is computed as a weighted sum
of the values, where the weight assigned to each value is computed by a compatibility function of the
query with the corresponding key.
3.2.1 Scaled Dot-Product Attention
We call our particular attention "Scaled Dot-Product Attention" (Figure 2). The input consists of
queries and keys of dimension dk, and values of dimension dv. We compute the dot products of the
3


# Integrate with Gemini

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model = "gemini-2.0-flash",
    temperature = 0.2,
    max_tokens = 1000,
)

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate


system_prompt = (
    "You are an asistant for question answering tasks"
    "Use the following pieces of retrieved context to answer the question."
    "If you don't know the answer, just say that you don't know."
    "Use 3 sentences maximum and keep the answer concise"
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages([("system", system_prompt), ("human", "{input}")])



In [None]:
question_answer_chain = create_stuff_documents_chain(llm,prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [None]:
response = rag_chain.invoke({"input": "What is self attention?"})
print(response["answer"])

Self-attention, sometimes called intra-attention, is an attention mechanism that relates different positions of a single sequence. It computes a representation of the sequence and has been used in tasks like reading comprehension, abstractive summarization, and textual entailment. The Transformer is the first transduction model relying entirely on self-attention.
