## PDF Yüklemesinin Gerçekleştirilmesi

In [61]:
from langchain_community.document_loaders import PyPDFLoader

file_path = "attentionisallyouneed.pdf"
loader = PyPDFLoader(file_path)
data = loader.load()

In [62]:
len(data)

15

## Chunking

In [63]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(data)

In [64]:
print(f"Number of documents after chunking: {len(docs)}")

Number of documents after chunking: 52


In [65]:
docs[7]

Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'attentionisallyouneed.pdf', 'total_pages': 15, 'page': 1, 'page_label': '2'}, page_content='in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes\nit more difficult to learn dependencies between distant positions [ 12]. In the Transformer this is\nreduced to a constant number of operations, albeit at the cost of reduced effective resolution due\nto averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as\ndescribed in section 3.2.\nSelf-attention, sometimes called intra-attention is an attention mechanism relating different positions\nof a singl

## Google GenAI Embeddings'i Kullanarak Embedding oluşturma işlemi

In [66]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv

In [67]:
load_dotenv()

True

In [68]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
vector = embeddings.embed_query("Hello, world!")
vector[:5]

[-0.020920176, 0.009755219, 0.004780903, -0.059421252, 0.0050828396]

## ChromadB üzerine Kayıt

In [69]:
from langchain_chroma import Chroma

In [70]:
vector_store = Chroma.from_documents(documents=docs, embedding=embeddings)

In [71]:
retriever = vector_store.as_retriever(search_type = "similarity", search_kwargs={"k":10})

In [72]:
retrieved_docs = retriever.invoke("What is encoder?")

In [73]:
len(retrieved_docs)

10

In [74]:
print(retrieved_docs[5].page_content)

The Transformer uses multi-head attention in three different ways:
• In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
and the memory keys and values come from the output of the encoder. This allows every
position in the decoder to attend over all positions in the input sequence. This mimics the
typical encoder-decoder attention mechanisms in sequence-to-sequence models such as
[38, 2, 9].
• The encoder contains self-attention layers. In a self-attention layer all of the keys, values
and queries come from the same place, in this case, the output of the previous layer in the
encoder. Each position in the encoder can attend to all positions in the previous layer of the
encoder.
• Similarly, self-attention layers in the decoder allow each position in the decoder to attend to
all positions in the decoder up to and including that position. We need to prevent leftward


## Google Gemini API Yapısını Kullanarak LLM Tetikleme İşlemi

https://docs.langchain.com/oss/python/integrations/chat/google_generative_ai

In [75]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash-lite",
    temperature=0.3,  # Gemini 3.0+ defaults to 1.0
    max_tokens=500,
)

https://reference.langchain.com/python/langchain_core/prompts/

In [76]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains import create_retrieval_chain


In [77]:
system_prompt = (
    "You are a assistant for question answering tasks. "
    "Use the following context to provide accurate and concise answers."
    "If you don't know the answer, just say you don't know. " 
    "Use three sentences maximum."
    "\n\n"
    "{context}"  # context will be filled in with retrieved documents
)

In [82]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("user", "Answer the question based on the context above: {input}"),
    ]
)

## Soru Cevap Zincirini Oluşturma (LLM+PROMPT)

In [83]:
question_answering_chain = create_stuff_documents_chain(llm, prompt)

## RAG Zinciri OLUŞTURMA (rag+ llm)

In [84]:
rag_chain = create_retrieval_chain(retriever, question_answering_chain)

## Kullanıcı Sorgusunu Çalıştırma

In [85]:
response = rag_chain.invoke({"input": "Explain the transformer architecture."})

print("CEVAP:")
print(response["answer"])

CEVAP:
The Transformer architecture relies entirely on an attention mechanism, eschewing recurrence to draw global dependencies between input and output. It uses stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The encoder has six identical layers, each with a multi-head self-attention mechanism and a feed-forward network, with residual connections and layer normalization applied.
