## Step 1: Environment Setup


First, we install all necessary libraries:

In [None]:
!pip install langchain langchain-experimental langchain-community langchain-openai openai chromadb pypdf sentence_transformers gradio langchain-together

These libraries provide the tools needed for document processing, vector storage, and interaction with OpenAI’s language models.

<hr>

## Step 2: Document Loading

We load the documents using Langchain’s community document loaders:

In [None]:
import os

#document loader
from langchain_community.document_loaders import PyPDFLoader

# vector store
from langchain_community.vectorstores import Chroma

#llm
from langchain_openai import OpenAI



In [None]:
pages[16]

The **PyPDFLoader** class helps in loading PDF documents, which will be used to train our chatbot.



<hr>

## Step 3: Splitting the Documents

To handle large documents, we split them into smaller chunks:

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_docs(documents, chunk_size=500, chunk_overlap=100):
  text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
  docs = text_splitter.split_documents(documents)
  return docs

In [None]:
new_pages = split_docs(pages)
len(new_pages)

This ensures that our model can process the text efficiently by breaking it into manageable pieces.

__Some overlapping of words will be done....__

In [None]:
new_pages[500].page_content

In [None]:
new_pages[499].page_content

## Step 4: Creating Embeddings

Next, we create embeddings for the document chunks using a Sentence Transformer model:

**SENTENCE TRANSFORMER:** A Sentence Transformer is a type of deep learning model designed to generate embeddings (numerical representations) of sentences or text chunks. These embeddings capture semantic meaning and can be used for various natural language processing tasks such as similarity search, clustering, and classification. The model is typically built using transformer architecture, like BERT, and fine-tuned for sentence-level tasks.

In [None]:
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

db = Chroma.from_documents(new_pages, embedding_function)

The __embeddings__ help in converting text data into numerical format, which is necessary for similarity searches.

## Step 5: Setting Up the Language Model

We configure the language model using the Together library:

In [None]:
from langchain_together import Together


llm = Together(
    model="meta-llama/Llama-2-70b-chat-hf",
    max_tokens=256,
    temperature=0,
    top_k=1,
    together_api_key="ENTER YOUR API KEY"

    #https://api.together.ai/settings/api-keys
)

This setup specifies the model parameters and API key required to use the meta-llama model.

## Step 6: Configuring the Retriever

We set up the retriever with a similarity threshold to ensure relevant responses:

In [None]:
retriever = db.as_retriever(similarity_score_threshold = 0.9)

This retriever helps in fetching the most relevant document chunks based on the user's query.



<hr>

## Step 7: Defining the Prompt Template

We define a prompt template that structures the input for the language model:

In [None]:
from langchain.prompts import PromptTemplate
prompt_template = """Please answer questions related AWS (Amazon web services). Try explaining in simple words. Answer in less than 100 words. If you don't know the answer simply respond as "Don't know man!"
 CONTEXT: {context}
 QUESTION: {question}"""

PROMPT = PromptTemplate(template = f"[INST] {prompt_template} [/INST]", input_variables=["context", "question"])

This template guides the language model on how to respond to queries.

## Step 8: Creating the QA Chain


We create a RetrievalQA chain using the defined components:

In [None]:
from langchain.chains import RetrievalQA
chain = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type='stuff',
    retriever= retriever,
    input_key = 'query',
    return_source_documents = True,
    chain_type_kwargs={"prompt":PROMPT},
    verbose=True

)

The RetrievalQA chain combines the language model and retriever to generate responses based on the input query.

## Step 9: Testing the Chatbot

Finally, we test the chatbot by inputting queries and getting responses:

In [None]:
query = input()
response = chain(query)
response['result']

This step allows users to interact with the chatbot and receive answers to their AWS-related questions.