In [1]:
!pip install langchain
!pip install -U langchain-community
!pip install faiss-cpu

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 k

In [2]:
!pip install -U langchain langchain-google-genai google-generativeai
!pip install PyPDF2

Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.9-py3-none-any.whl.metadata (7.2 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-ai-generativelanguage<0.7.0,>=0.6.18 (from langchain-google-genai)
  Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl.metadata (9.8 kB)
INFO: pip is looking at multiple versions of google-generativeai to determine which version is compatible with other requirements. This could take a while.
Collecting google-generativeai
  Downloading google_generativeai-0.8.5-py3-none-any.whl.metadata (3.9 kB)
  Downloading google_generativeai-0.8.4-py3-none-any.whl.metadata (4.2 kB)
  Downloading google_generativeai-0.8.3-py3-none-any.whl.metadata (3.9 kB)
  Downloading google_generativeai-0.8.2-py3-none-any.whl.metadata (3.9 kB)
INFO: pip is still looking at multiple versions of google-generativeai to determine which version i

In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI
import PyPDF2 as pdf
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain

In [23]:
llm = ChatGoogleGenerativeAI(
    model = "gemini-2.5-pro",
    google_api_key = "AIzaSyBVo-gzuNCdIAKhS9WlhuKRlTK9J2aLMdk",
    temperature = 0
)

In [24]:
def get_text(pdf_path):
  text = ""
  for path in pdf_path:
    reader = pdf.PdfReader(path)
    for page in reader.pages:
      extracted = page.extract_text()
      if extracted:
        text += extracted
  return text

In [25]:
def get_chunks(text):
  splitter = RecursiveCharacterTextSplitter(
      separators = ["\n\n", "\n", " "],
      chunk_size = 1500,
      chunk_overlap = 150
  )

  chunks = splitter.split_text(text)
  return chunks

In [39]:
pdf_files = ["/content/pdf1.pdf", "/content/pdf2.pdf"]

In [26]:
embeddings = HuggingFaceEmbeddings(model_name = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

In [43]:
def get_response(pdf_path, query):
  text = get_text(pdf_path)
  chunks = get_chunks(text)
  vectorstore = FAISS.from_texts(chunks, embedding = embeddings)
  document = vectorstore.similarity_search(query, n_results = 1)
  chain = load_qa_chain(
      llm = llm,
      chain_type = "stuff"
  )
  response = chain.run(input_documents = document, question = query)
  return response

In [44]:
query = "What is scaled dot-product attention?"
response = get_response(pdf_files, query)
print(response)

Based on the provided text, **Scaled Dot-Product Attention** is an attention mechanism defined by the following formula:

`Attention(Q, K, V) = softmax((Q * K^T) / sqrt(dk)) * V`

Here is a breakdown of its components and purpose:

*   **Inputs**: It takes three inputs: queries (Q), keys (K), and values (V).
*   **Dot-Product**: It computes the dot product of the query with all the keys (`Q * K^T`).
*   **Scaling**: The result of the dot product is then scaled by dividing by the square root of the dimension of the keys, `dk`. This scaling factor is `1 / sqrt(dk)`.
*   **Softmax**: A softmax function is applied to the scaled scores to obtain the weights on the values.
*   **Output**: The final output is a weighted sum of the values (V), where the weights are the results of the softmax function.

The key feature is the **scaling factor**. The text explains that for large values of the key dimension (`dk`), the dot products can grow very large in magnitude. This pushes the softmax functio

In [45]:
query = "What is multi-token prediction?"
response = get_response(pdf_files, query)
print(response)

Based on the provided context, **Multi-Token Prediction (MTP)** is a technique where a language model, instead of predicting only the next single token in a sequence, is trained to predict multiple future tokens at each position.

Specifically, the text states:

*   It "extends the prediction scope to multiple future tokens at each position."
*   In the case of DeepSeek-V3, the model predicts the next **2 tokens** instead of just one.
*   This technique serves as a training objective to improve model performance.
*   When combined with speculative decoding, it can significantly accelerate the model's decoding speed, with DeepSeek-V3 achieving 1.8 times the Tokens Per Second (TPS).
*   The acceptance rate for the second predicted token is high (85-90%), demonstrating its reliability.
