# Retriever Augmented Generation

## load the document

### pdf loader

In [1]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('./src/dataset/mail parser.pdf')
document = loader.load()  # load the entire pdf
len(document)

3

In [12]:
loader

<langchain_community.document_loaders.url.UnstructuredURLLoader at 0x25f4f92df00>

In [5]:
document[1]

Document(metadata={'source': './src/dataset/mail parser.pdf', 'page': 1}, page_content='1. Creat ing a Scenario:  You s tart by creating a new scenario, which is a workflow that \nconnects different apps and services.  For example, email, LLMs like ChatGPT, google \nsheets etcetera.  \n2. Add Modules:  Add the apps and services you want to connect as modules. Each \nmodule represents a specific action (e.g., sending an email, retrieving data) . For \ninstance, retrieving data from emails the module would be a text parser.  \n3. Set Triggers:  Choose a trigger module that starts the scenario when a certain event \noccurs (e.g., a new email arrives, a file is updated).  For instance, when a new email \narrives the email is retrieved from the inbox to parse the data in it using NLP.  \n4. Define Actions:  Add modules that perform actions based on the trigger (e.g., send a \nnotification, create a record in a database).  For instance, once the appropriate data is \nretrieved from an email 

### website loader

In [4]:
from langchain_community.document_loaders import UnstructuredURLLoader
urls = ['https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal']
loader = UnstructuredURLLoader(urls=urls)
url_document = loader.load()

In [61]:
url_document[:1]

[Document(metadata={'source': 'https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal'}, page_content='')]

#### Split the document into chunks

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split data
splitter = RecursiveCharacterTextSplitter(chunk_size=200)
chunks = splitter.split_documents(document)

In [15]:
chunks[0].page_content

'Email parser workflow and description  \nInput  \nFor this particular workflow the input is an email straight from the inbox , the following image'

#### create embeddings out of the splitted documents

In [3]:
from langchain_chroma import Chroma  # a vector db to store embeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings  # to create embeddings
import os
from dotenv import load_dotenv

load_dotenv()
GOOGLE_API_KEY= os.getenv("GOOGLE_API_KEY")

  from .autonotebook import tqdm as notebook_tqdm


##### test vector embeddings

In [4]:
embeddings = GoogleGenerativeAIEmbeddings(model='models/embedding-001')
vector = embeddings.embed_query("Hello, world!")
len(vector), vector[:5]

(768,
 [0.05069493129849434,
  -0.0275444146245718,
  -0.03001042827963829,
  -0.02415528893470764,
  0.014552797190845013])

#### create vector embeddings then store in a vector store

In [6]:
# apply the embeddings to entire document chucks then store in a vectorStore using chroma
vectorStore = Chroma.from_documents(documents=chunks, embedding=GoogleGenerativeAIEmbeddings(model='models/embedding-001'))

In [7]:
vectorStore

<langchain_chroma.vectorstores.Chroma at 0x167e3bfe470>

In [8]:
# Our retriever to fetch documents related to users question
retriever = vectorStore.as_retriever(search_type='similarity', search_kwargs={'k': 10})

In [9]:
retriever_docs = retriever.invoke("What is the a zapier?")

In [10]:
retriever_docs

[Document(metadata={'page': 1, 'source': './src/dataset/mail parser.pdf'}, page_content='notification, create a record in a database).  For instance, once the appropriate data is \nretrieved from an email the data is append onto a spread sheet.'),
 Document(metadata={'page': 1, 'source': './src/dataset/mail parser.pdf'}, page_content='sheets etcetera.  \n2. Add Modules:  Add the apps and services you want to connect as modules. Each \nmodule represents a specific action (e.g., sending an email, retrieving data) . For'),
 Document(metadata={'page': 1, 'source': './src/dataset/mail parser.pdf'}, page_content='retrieved from an email the data is append onto a spread sheet.  \n5. Set Filters and Conditions:  Apply filters or conditions to control when actions should'),
 Document(metadata={'page': 1, 'source': './src/dataset/mail parser.pdf'}, page_content='occurs (e.g., a new email arrives, a file is updated).  For instance, when a new email \narrives the email is retrieved from the inbox 

In [11]:
retriever_docs[4].page_content

'1. Creat ing a Scenario:  You s tart by creating a new scenario, which is a workflow that \nconnects different apps and services.  For example, email, LLMs like ChatGPT, google \nsheets etcetera.'

In [22]:
from langchain_google_genai import ChatGoogleGenerativeAI
gemini_llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", 
                                    temprature=0, 
                                    max_tokens=100)

In [24]:
response = gemini_llm.invoke("write me a story i can tell my crush")

In [26]:
response.content

"Once upon a time, in a bustling city lit by starlight and glowing windows, lived a shy baker. They made the most incredible treats, pastries that tasted like whispers of summer sunsets and breads as comforting as a warm hug.  But despite their talent, the baker was too timid to share their creations with the one person they truly admired - a kind and clever artist with eyes like the summer sky.\n\nOne day, the baker decided to be brave. They carefully packaged their most delicate pastries – moon-shaped cookies dusted with sugar like starlight – and left them anonymously at the artist's studio.  \n\nThe artist, upon finding the treats, was charmed. Every day, a new gift appeared: a loaf of bread swirled with cinnamon like a painter's palette, a tart decorated with sugared violets, each one a masterpiece of flavor and design. \n\nIntrigued, the artist began leaving sketches in return: whimsical drawings of dancing teacups, portraits of smiling fruits, and landscapes painted with coffee 

In [16]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ("""
            You are AI powered chatbot designed to provide 
            information and assistance for customers based on the context
            provided to you only. Do not make anything up.
            Only use the context you are provided.
            
            context: {context} 
            Question: {input}
            """)

prompt = ChatPromptTemplate.from_messages(
    [
        ('system', prompt_template),
        ('human', "{input}")
    ]
)

#### Create execution chain

In [14]:
question_answer_chain = create_stuff_documents_chain(gemini_llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [15]:
response = rag_chain.invoke({'input': "What are the steps to automate workflow in make.com?"})
print(response['answer'])

1. **Creating a Scenario:**  You start by creating a new scenario, which is a workflow that connects different apps and services. For example, email, LLMs like ChatGPT, Google sheets, etc.

2. **Set Triggers:** Choose a trigger module that starts the scenario when a certain event occurs (e.g., a new email arrives, a file is updated). For instance, when a new email.

3. **Set Filters and Conditions:** Apply filters or conditions to control when actions should be executed. This is especially useful when you wish to abide by a particular.

4. **Define Actions:** Add modules that perform actions based on the trigger (e.g., send a notification, create a record in a database).  For instance, once the appropriate data is.

5. **Test and Run:** Test the scenario to ensure it works as expected, then activate it to run automatically based on your trigger conditions. 

