# LLM Question-Answering System

In the era of information overload, developing efficient and effective question-answering systems has become increasingly vital. By leveraging the capabilities of LangChain for orchestration, Weaviate as a vector database, and the Mistral LLM, this system integrates advanced natural language processing techniques to enhance user interaction and information retrieval. The architecture not only ensures quick access to vast knowledge bases but also demonstrates the potential for scalable solutions in various domains, including education, customer support, and beyond.

First we have to install library that we need while making LLM question_answering system.



In [1]:
!pip install weaviate-client langchain langchain-community tiktoken pypdf rapidocr-onnxruntime sentence-transformers

Collecting weaviate-client
  Downloading weaviate_client-4.8.1-py3-none-any.whl.metadata (3.6 kB)
Collecting langchain
  Downloading langchain-0.3.1-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.1-py3-none-any.whl.metadata (2.8 kB)
Collecting tiktoken
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting pypdf
  Downloading pypdf-5.0.1-py3-none-any.whl.metadata (7.4 kB)
Collecting rapidocr-onnxruntime
  Downloading rapidocr_onnxruntime-1.3.24-py3-none-any.whl.metadata (1.3 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-3.1.1-py3-none-any.whl.metadata (10 kB)
Collecting httpx<=0.27.0,>=0.25.0 (from weaviate-client)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting validators==0.34.0 (from weaviate-client)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting authlib<1.3.2,>=1.2.1 (from weaviate-c

In this step initializing API KEY and CLUSTER URL inside variable.

In [2]:
WEAVIATE_API_KEY = "PGvZdI8MBvDxx19BD988REnlpjf5k1zn1sXj"
WAVEIATE_CLUSTER = "https://uofdus5xtmismovjzeoglw.c0.asia-southeast1.gcp.weaviate.cloud"

* from langchain.vectorstores import Weaviate: This imports the Weaviate vector store integration from the LangChain library. LangChain provides tools for building language model applications, and the vector store helps in storing and querying embeddings for similarity search.
* import weaviate: This imports the weaviate Python client, which provides the interface to interact with the Weaviate instance.
* WEAVIATE_API_KEY: This is a placeholder where you need to put your actual API key for authenticating with your Weaviate instance. Weaviate is a vector search engine, and access to it may require an API key.
* WEAVIATE_URL: This is the URL of your Weaviate instance. Replace it with your actual cluster URL (for example, https://your-weaviate-instance.com).
* client = weaviate.Client(): This line creates a weaviate.Client object, which is the main interface to interact with the Weaviate server. It allows you to send API requests to the server, such as querying data or performing operations like storing and retrieving embeddings.
* url=WEAVIATE_URL: This specifies the URL of the Weaviate instance where you're connecting.
* auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY): This adds authentication to the connection using the API key provided. AuthApiKey is used here to securely authenticate your connection with Weaviate.

In [None]:
from langchain.vectorstores import Weaviate
import weaviate

WEAVIATE_API_KEY = WEAVIATE_API_KEY
WAVEIATE_URL = WAVEIATE_CLUSTER

client = weaviate.Client(
    url=WAVEIATE_URL, auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY)
)

* When working in environments that may not support UTF-8 by default (for example, older systems or systems configured with non-UTF-8 locales), this can help avoid issues with character encodings.
* It’s particularly useful in internationalized applications or when reading and writing text files that contain a variety of characters from different languages.
* However, it's generally good practice to be mindful of such overrides since they affect how your entire Python program handles encodings.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

**What is embedding?**

An embedding is a way of representing data (like words, images, or categories) as dense vectors in a continuous, lower-dimensional space. These vectors capture the meaning or features of the data, allowing similar items to be placed closer together.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
embeddings=HuggingFaceEmbeddings(
    model_name=embedding_model_name,
)

**Why we use PyPDFLoader?**

PyPDFLoader is a tool used to load and extract text from PDF files in Python, particularly within the LangChain library. It simplifies the process of extracting text from PDFs, including those with complex formatting, such as multi-column layouts or embedded images. The key advantage of PyPDFLoader is its seamless integration with LangChain, allowing extracted content to be easily used in language model applications for tasks like summarization, question-answering, or information retrieval. Additionally, it handles large documents efficiently, making it well-suited for workflows that involve processing multiple PDFs or large volumes of text data.

In [None]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/drive/MyDrive/Data/Carl Sagan - Cosmos.pdf", extract_images=True)
pages=loader.load()

**What is chunking?**

* Chunking is the process of dividing large data into smaller, manageable parts called chunks. In natural language processing (NLP) or data processing, chunking helps break down text, data, or sequences for easier analysis, processing, or memory management.
* We use chunking to make large data sets or texts more manageable for analysis and processing. It helps improve efficiency, memory usage, and performance in tasks like natural language processing, allowing models to handle and analyze data more effectively.*italicized text*

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
docs = text_splitter.split_documents(pages)

The code snippet creates a Weaviate vector database using the from_documents(...) method, which takes a list of documents (docs) and converts them into vector representations using a specified embedding model. The client=client argument connects to the Weaviate server, and by_text=False indicates that the indexing is based on vector embeddings rather than raw text. This setup allows for efficient semantic similarity searches and document retrieval.

In [None]:
vector_db = Weaviate.from_documents(
    docs, embeddings, client=client, by_text=False
)

**What is similarity search?**

Similarity search is a technique used to identify items that are similar to a given query within a dataset, and it plays a crucial role in various fields such as information retrieval, natural language processing, and recommendation systems.

In [None]:
print(vector_db.similarity_search("What is cosmos?", k=3)[0].page_content)

This code creates a structured prompt for a question-answering assistant using the LangChain library. It imports the ChatPromptTemplate class and defines a template that specifies the assistant's role, instructions for answering questions based on the provided context, and includes placeholders {question} and {context} for dynamic substitution. By creating an instance of the prompt template with ChatPromptTemplate.from_template(template), it enables the generation of prompts that facilitate concise and relevant answers. This setup is particularly useful for applications like chatbots and virtual assistants that require context-aware responses.








In [None]:
from langchain.prompts import ChatPromptTemplate

template="""You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use ten sentences maximum and keep the answer concise and upto the point.
Question: {question}
Context: {context}
Answer:
"""

In [None]:
prompt = ChatPromptTemplate.from_template(template)

In [None]:
prompt

**What is a HuggingFace?**

Hugging Face is a leading platform for large language models (LLMs), offering the popular Transformers library that simplifies the use of pre-trained models for various natural language processing tasks. Its Model Hub enables users to share and access LLMs easily, promoting collaboration while providing tools for datasets and deployment, making advanced NLP technologies widely accessible.

In [None]:
from google.colab import userdata
huggingfacehub_api_token = "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"

In [None]:
from langchain import HuggingFaceHub

model = HuggingFaceHub(
    huggingfacehub_api_token=huggingfacehub_api_token,
    repo_id="mistralai/Mistral-7B-Instruct-v0.1",
    model_kwargs={"temperature":1, "max_length":500}
)

Retrieval-Augmented Generation (RAG) chain using the LangChain library, integrating a retriever, prompt, language model, and output parser to enhance question-answering capabilities. It imports the necessary classes, such as RunnablePassthrough, which allows inputs to pass through unchanged, and StrOutputParser, which formats the model's output as a string. The vector_db is converted into a retriever to fetch relevant context based on questions. The RAG chain combines these components, processing the input through the prompt and model before formatting the output. This setup enables advanced question-answering systems that leverage both retrieval and generation for meaningful responses.

In [None]:
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

output_parser=StrOutputParser()
retriever=vector_db.as_retriever()

rag_chain = (
    {"context": retriever,  "question": RunnablePassthrough()}
    | prompt
    | model
    | output_parser
)

This line of code invokes the RAG chain with the questions,then splits the response at the string Answer to extract and print the generated answer.

In [None]:
print(rag_chain.invoke("what's difference in earth and sun").split("Answer:")[-1])