## PDF Yüklemesinin Gerçekleştirlimesi

In [9]:
from langchain_community.document_loaders import PyPDFLoader

file_path = "attentionisallyouneedgemini.pdf"
loader = PyPDFLoader(file_path)
data = loader.load()

In [10]:
len(data)

15

## Veriyi parçalara ayırma(Chunking işlemi)

In [11]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(data)

In [12]:
print(f"Number of documents after chunking: {len(docs)}")

Number of documents after chunking: 48


In [13]:
docs[7]

Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'attentionisallyouneedgemini.pdf', 'total_pages': 15, 'page': 1, 'page_label': '2'}, page_content='Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 35].\nHere, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence\nof continuous representations z = (z1, ..., zn). Given z, the decoder then generates an output\nsequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive\n[10], consuming the previously generated symbols as additional input when generating the next.\n2')

## Google Generative AI Embeddings'i Kullanarak Embedding Oluşturma İşlemi

In [14]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv

In [2]:
load_dotenv()

True

In [5]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
vector = embeddings.embed_query("hello, world!")
vector[:5]

[-0.02276923693716526,
 0.010134130716323853,
 0.011886735446751118,
 -0.09669032692909241,
 -0.0027089761570096016]

## ChromaDB Üzerine Kayıt İşlemi

In [7]:
from langchain_chroma import Chroma

In [16]:
vector_store = Chroma.from_documents(documents=docs, embedding = embeddings)

In [None]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k":10})

In [18]:
retrieved_docs = retriever.invoke("What is encoder?")

In [19]:
len(retrieved_docs)

10

In [20]:
print(retrieved_docs[5].page_content)

Figure 1: The Transformer - model architecture.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully
connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,
respectively.
3.1 Encoder and Decoder Stacks
Encoder: The encoder is composed of a stack of N = 6 identical layers. Each layer has two
sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-
wise fully connected feed-forward network. We employ a residual connection [11] around each of
the two sub-layers, followed by layer normalization [ 1]. That is, the output of each sub-layer is
LayerNorm(x + Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer
itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding
layers, produce outputs of dimension dmodel = 512.


## Google Gemini API Yapısını Kullanarak LLM Tetikleme İşlemleri

- Düşük Değerler (0.1-0.4): Daha kesin ve daha tutarlı cevaplar verilir. Model daha tahmin edilebilir hale gelir. 
- Orta Değerler(0.5-0.7): Hem mantıklı hem de yaratıcı cevaplar verilir. 
- Yüksek Değerler (0.7-1): Daha rastgele ve yaratıcı , ancak bazen tutarsız yanıtlar verebilir

In [41]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash-lite",
    temperature=0.3,  # Gemini 3.0+ defaults to 1.0
    max_tokens=500
)

In [42]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

In [43]:
system_prompt = (
    "You are assistant for question-answering tasks"
    "Use the following pieces of context to answer the question at the end."
    "If you don't know the answer, just say that you don't know, don't try to make up an answer." \
    "Use three sentences maximum to answer."
    "\n\n"
    "{context}"
)

In [44]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("user", "{input}"),
    ]
)

## Soru-Cevap Zinciri Oluşturma ( LLM + PROMPT )

In [45]:
question_answering_chain = create_stuff_documents_chain(llm,prompt)

## RAG Zinciri Olutşurma ( RAG + LLM )

In [46]:
rag_chain = create_retrieval_chain(retriever,question_answering_chain)

## Kullanıcı sorgusunu çalıştırma

In [47]:
response = rag_chain.invoke({"input": "Explain the transformer architecture?"})

In [48]:
print(response["answer"])

The Transformer architecture is a model that relies entirely on self-attention mechanisms, eschewing recurrence and convolution. It consists of an encoder and a decoder, each composed of a stack of identical layers. Each encoder layer has a multi-head self-attention mechanism and a position-wise feed-forward network, with residual connections and layer normalization applied around each sub-layer. The decoder has these two sub-layers plus a third sub-layer for multi-head attention over the encoder's output, also incorporating residual connections and layer normalization.
