# Installing Dependencies

---
I have used the paper - 'Attention is all you need' for this RAG document.  




In [10]:
! pip install -q --upgrade google-generativeai langchain-google-genai chromadb pypdf


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.9/163.9 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m559.5/559.5 kB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m718.3/718.3 kB[0m [31m43.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.4/337.4 kB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m41.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/62.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━

In [11]:
from IPython.display import display
from IPython.display import Markdown
import textwrap


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

# Setting up gemini without langchain  

In [12]:
import google.generativeai as genai
from google.colab import userdata

In [13]:
import os
gemini=userdata.get('gemini')
genai.configure(api_key=gemini)

In [14]:
model = genai.GenerativeModel("gemini-pro")
response = model.generate_content('What should I do with my data?')
to_markdown(response.text)

> **1. Determine Your Data Goals and Objectives:**
> 
> * Define the specific reasons and outcomes you seek from your data.
> * Identify the key questions or problems you aim to solve or understand.
> 
> **2. Clean and Prepare Your Data:**
> 
> * Check for errors, duplicates, and missing values.
> * Transform and normalize data to ensure consistency.
> * Remove irrelevant or redundant information.
> 
> **3. Explore and Analyze Your Data:**
> 
> * Utilize visualizations, charts, and summary statistics to identify patterns and trends.
> * Conduct statistical tests to test hypotheses or identify correlations.
> * Perform machine learning or deep learning algorithms to derive insights.
> 
> **4. Interpret Your Findings:**
> 
> * Summarize the key insights and conclusions drawn from your analysis.
> * Identify actionable recommendations based on the data.
> * Consider potential biases or limitations in your data or analysis.
> 
> **5. Communicate Your Findings:**
> 
> * Present your results in a clear and concise manner.
> * Choose the appropriate visualization or storytelling techniques to effectively convey your message.
> * Use clear language and avoid technical jargon.
> 
> **6. Govern and Secure Your Data:**
> 
> * Establish data policies and procedures to ensure data quality, privacy, and security.
> * Implement data governance tools and technologies to manage and protect your data.
> * Regularly review and update your data governance framework.
> 
> **7. Act on Your Data:**
> 
> * Implement the recommendations derived from your data analysis.
> * Track the impact and results of your actions.
> * Continuously monitor your data to identify new trends or changes.
> 
> **8. Continuous Improvement:**
> 
> * Regularly revisit your data goals and objectives.
> * Incorporate new data sources or analysis techniques to enhance your insights.
> * Seek feedback and collaborate with stakeholders to improve data-driven decision-making.

# Setting up Gemini With LangChain

This is just a simple implementation of how to call Gemini models with langchain

In [78]:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-pro",google_api_key=gemini,
                              temperature=0.2,convert_system_message_to_human=True)

In [16]:
%%time
result = llm.invoke("What are the usecases of LLMs?")


CPU times: user 62.4 ms, sys: 12.7 ms, total: 75 ms
Wall time: 7.3 s


In [17]:
to_markdown(result.content)


> **Content Generation and Enhancement:**
> 
> * **Text generation:** Create engaging and informative articles, stories, poems, scripts, and more.
> * **Summarization:** Condense large amounts of text into concise summaries.
> * **Translation:** Translate text between languages with high accuracy.
> * **Chatbots:** Develop conversational AI assistants that can provide information, answer questions, and engage with users.
> 
> **Code Generation and Analysis:**
> 
> * **Code generation:** Write code snippets, functions, and even entire applications in various programming languages.
> * **Code completion:** Suggest code completions and refactorings to improve code quality and efficiency.
> * **Code analysis:** Detect errors, vulnerabilities, and performance issues in code.
> 
> **Data Analysis and Insights:**
> 
> * **Data summarization:** Summarize large datasets, identify trends, and extract insights.
> * **Question answering:** Provide answers to complex questions from structured or unstructured data sources.
> * **Data classification:** Categorize and label data points based on specific criteria.
> 
> **Research and Education:**
> 
> * **Academic writing:** Assist students and researchers in writing academic papers, dissertations, and presentations.
> * **Knowledge extraction:** Extract key information and insights from research papers, books, and other sources.
> * **Educational content:** Create engaging and interactive educational materials, such as quizzes, simulations, and interactive lessons.
> 
> **Creative Applications:**
> 
> * **Music generation:** Compose original music pieces or generate lyrics.
> * **Image generation:** Create realistic images from text descriptions.
> * **Video generation:** Synthesize videos from text or image inputs.
> 
> **Business and Industry:**
> 
> * **Customer service:** Automate customer interactions and provide personalized support.
> * **Marketing:** Generate marketing content, create personalized campaigns, and analyze customer sentiment.
> * **Healthcare:** Assist in medical diagnosis, treatment planning, and drug discovery.
> * **Finance:** Analyze financial data, generate reports, and forecast market trends.
> 
> **Other Use Cases:**
> 
> * **Personal assistance:** Create personalized to-do lists, reminders, and schedules.
> * **Language learning:** Improve language skills through interactive conversations and exercises.
> * **Game development:** Develop AI-powered NPCs, generate game content, and design levels.

# Developing simple RAG application with PDF

In [27]:
# !pip install pypdf
# !pip install langchain
!pip install langchain_community


Collecting langchain_community
  Downloading langchain_community-0.2.6-py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.21.3-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.2/49.2 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Installing collected packages: mypy-extensi

In [32]:
import urllib
import warnings
from pathlib import Path as p
from pprint import pprint

import pandas as pd
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA



warnings.filterwarnings("ignore")
# restart python kernal if issues with langchain import.

# Loading PDF

In [35]:
file_path = ('/content/1706.03762v7.pdf')
loader = PyPDFLoader(file_path)
pages = loader.load_and_split()

print(pages[3].page_content)

Figure 1: The Transformer - model architecture.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully
connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,
respectively.
3.1 Encoder and Decoder Stacks
Encoder: The encoder is composed of a stack of N= 6 identical layers. Each layer has two
sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-
wise fully connected feed-forward network. We employ a residual connection [ 11] around each of
the two sub-layers, followed by layer normalization [ 1]. That is, the output of each sub-layer is
LayerNorm( x+ Sublayer( x)), where Sublayer( x)is the function implemented by the sub-layer
itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding
layers, produce outputs of dimension dmodel = 512 .
Decoder: The decoder is also composed of a stack of N= 6identical layers.

# Loading Embedders and then embedding documents   

In [57]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001" , google_api_key = gemini)

# Example embedding
vetor = embeddings.embed_query("hello, world!")
vector[:5]

[0.05168594419956207,
 -0.030764883384108543,
 -0.03062233328819275,
 -0.02802734449505806,
 0.01813092641532421]

Getting the document in the correct format so that we can run RAG models on it

In [41]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
context = "\n\n".join(str(p.page_content) for p in pages)
texts = text_splitter.split_text(context)

In [48]:
texts

['Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and co

In [52]:
vector_index = Chroma.from_texts(texts, embeddings).as_retriever(search_kwargs={"k":5})
# {"k":5} this is to return only 5 documents from the vector search and not any more.


In [81]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vector_index,
    return_source_documents=True )

In [82]:
question = "Describe the Multi-head attention layer in detail?"
result = qa_chain({"query": question})
result["result"]

"The Multi-Head Attention layer is a neural network layer that allows the model to attend to different parts of the input sequence simultaneously. It consists of several attention layers running in parallel, each of which attends to a different subspace of the input. The outputs of the attention layers are then concatenated and projected to produce the final output of the Multi-Head Attention layer.\n\nThe Multi-Head Attention layer is typically used in transformer models, which are a type of neural network that is particularly well-suited for processing sequential data. Transformer models have been shown to achieve state-of-the-art results on a variety of natural language processing tasks, including machine translation, text summarization, and question answering.\n\nThe Multi-Head Attention layer is a key component of transformer models, and it plays an important role in the model's ability to learn long-range dependencies in the input sequence. By attending to different parts of the 

In [83]:
Markdown(result["result"])


The Multi-Head Attention layer is a neural network layer that allows the model to attend to different parts of the input sequence simultaneously. It consists of several attention layers running in parallel, each of which attends to a different subspace of the input. The outputs of the attention layers are then concatenated and projected to produce the final output of the Multi-Head Attention layer.

The Multi-Head Attention layer is typically used in transformer models, which are a type of neural network that is particularly well-suited for processing sequential data. Transformer models have been shown to achieve state-of-the-art results on a variety of natural language processing tasks, including machine translation, text summarization, and question answering.

The Multi-Head Attention layer is a key component of transformer models, and it plays an important role in the model's ability to learn long-range dependencies in the input sequence. By attending to different parts of the input sequence simultaneously, the Multi-Head Attention layer can capture relationships between words that are far apart in the sequence. This is important for tasks such as machine translation, where the model needs to be able to understand the meaning of a sentence even if the words are in a different order in the target language.

The Multi-Head Attention layer is a powerful tool that can be used to improve the performance of transformer models on a variety of natural language processing tasks. It is a key component of transformer models, and it plays an important role in the model's ability to learn long-range dependencies in the input sequence.

In [85]:
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vector_index,
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [86]:
question = "Describe the Multi-head attention layer in detail?"
result = qa_chain({"query": question})
result["result"]

'Multi-head attention is a type of attention mechanism that allows a model to jointly attend to information from different representation subspaces at different positions. It is composed of several attention layers running in parallel, each of which attends to a different subspace of the input. The outputs of the individual attention layers are then concatenated and projected to produce the final output of the multi-head attention layer.\n\nMulti-head attention is beneficial because it allows the model to learn different types of relationships between the input elements. For example, one attention head might learn to attend to the syntactic relationships between words, while another attention head might learn to attend to the semantic relationships between words. This allows the model to capture a more complete representation of the input data.\n\nThe number of attention heads in a multi-head attention layer is a hyperparameter that can be tuned to optimize the performance of the model

In [87]:
question = "Describe LSTM in detail?"
result = qa_chain({"query": question})
Markdown(result["result"])

I'm sorry, but this context does not mention anything about LSTM, so I cannot answer this question from the provided context.