<a href="https://colab.research.google.com/github/Ahmad10Raza/Generative-AI-with-LangChain/blob/master/16.%20Chatting%20With%20PDFs%20Using%20GEMINI%20Pro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [20]:
!pip -q install langchain_experimental langchain_core
!pip -q install google-generativeai==0.3.1
!pip -q install google-ai-generativelanguage==0.4.0
!pip -q install langchain-google-genai
!pip -q install "langchain[docarray]"



In [21]:
!pip -q install langchain
!pip -q install PyPDF2

## The Game plan


<img src="https://dl.dropboxusercontent.com/s/gxij5593tyzrvsg/Screenshot%202023-04-26%20at%203.06.50%20PM.png" alt="vectorstore">


<img src="https://dl.dropboxusercontent.com/s/v1yfuem0i60bd88/Screenshot%202023-04-26%20at%203.52.12%20PM.png" alt="retreiver chain">


In [22]:
# Download the PDF Reid Hoffman book with GPT-4 from his free download link
!wget -q https://www.impromptubook.com/wp-content/uploads/2023/03/impromptu-rh.pdf

In [26]:
import getpass
import os



In [27]:
from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

In [29]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001",google_api_key=""
)
vector = embeddings.embed_query("hello, world!")
vector[:5]

[0.05636945, 0.0048285457, -0.0762591, -0.023642512, 0.05329321]

In [30]:
# location of the pdf file/files.
doc_reader = PdfReader('/content/impromptu-rh.pdf')

In [31]:
# read data from the file and put them into a variable called raw_text
raw_text = ''
for i, page in enumerate(doc_reader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

In [32]:
len(raw_text)

371090

In [33]:
raw_text[:100]

'Impromptu\nAmplifying Our Humanity \nThrough AI\nBy Reid Hoffman  \nwith GPT-4Impromptu: AmplIfyIng our '

### Text Spliter

In [34]:
# Splitting up the text into smaller chunks for indexing
text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200, #striding over the text
    length_function = len,
)
texts = text_splitter.split_text(raw_text)

In [35]:
texts[20]

'Because, really, an AI book? When things are moving so \nquickly? Even with a helpful AI on hand to speed the process, \nany such book would be obsolete before we started to write it—\nthat’s how fast the industry is moving.\nSo I hemmed and hawed for a bit. And then I thought of a frame \nthat pushed me into action.\nThis didn’t have to be a comprehensive “book” book so much as \na travelog, an informal exercise in exploration and discovery, \nme (with GPT-4) choosing one path among many. A snapshot \nmemorializing—in a subjective and decidedly not definitive \nway—the AI future we were about to experience.\nWhat would we see? What would impress us most? What would \nwe learn about ourselves in the process? Well aware of the brief \nhalf-life of this travelog’s relevance, I decided to press ahead.\nA month later, at the end of November 2022, OpenAI released \nChatGPT, a “conversational agent,” aka chatbot, a modified \nversion of GPT-3.5 that they had fine-tuned through a process'

## Making the embeddings

In [37]:
!pip install faiss-gpu

Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2


In [38]:
docsearch = FAISS.from_texts(texts, embeddings)

In [39]:
docsearch.embedding_function

GoogleGenerativeAIEmbeddings(model='models/embedding-001', task_type=None, google_api_key=SecretStr('**********'))

In [40]:
query = "how does GPT-4 change social media?"
docs = docsearch.similarity_search(query)

In [41]:
len(docs)

4

In [42]:
docs[0]

Document(page_content='laid the groundwork for my creation\n- The countless data scientists and engineers who \nhave contributed to my training and development over \nthe years\n- The early adopters and enthusiasts who have \nembraced and championed my capabilities, even when \nothers were skeptical\n- GPT-4')

## Plain QA Chain

In [43]:
from langchain.chains.question_answering import load_qa_chain
from langchain_google_genai import ChatGoogleGenerativeAI


llm = ChatGoogleGenerativeAI(model="gemini-pro",
                             temperature=0.7,
                             google_api_key="AI")

chain = load_qa_chain(llm,
                      chain_type="stuff") # we are going to stuff all the docs in at once

In [46]:
# check the prompt
chain.llm_chain.prompt_template

AttributeError: ignored

In [45]:
chain.llm_chain.prompt.template.template

AttributeError: ignored

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:

ValueError: ignored

In [50]:
llm = ChatGoogleGenerativeAI(model="gemini-pro", convert_system_message_to_human=True,
                             google_api_key="AIz")

In [51]:
query = "who are the authors of the book?"
docs = docsearch.similarity_search(query)
chain.run(input_documents=docs, question=query)

ValueError: ignored

In [52]:
query = "who are the authors of the book?"
docs = docsearch.similarity_search(query)
docs = docs[1:]
chain.run(input_documents=docs, question=query)

ValueError: ignored