# LangChain-based Question Answering System

This notebook implements a sophisticated question-answering system using LangChain and a fine-tuned language model. Here's a high-level overview of the main components and processes:

1. **Model and Dataset Setup:**
   - The system is designed to work with two dataset types: 'sleep' and 'cars'.
   - A pre-trained language model (fine-tuned Llama 3.2 1B) is used as the base for question answering.

2. **Prompt Engineering:**
   - Different prompt templates are defined for each dataset type, including a system prompt and two types of user prompts: basic and RAG (Retrieval-Augmented Generation).
   - The RAG prompt incorporates additional context ("resources") to enhance the model's responses.

3. **LangChain Integration:**
   - The notebook utilizes LangChain's components such as HuggingFacePipeline, LLMChain, and PromptTemplate to create a structured pipeline for processing queries and generating responses.

4. **Retrieval-Augmented Generation (RAG):**
   - The system implements a RAG approach, where relevant information is retrieved from a knowledge base and incorporated into the prompt to provide more accurate and contextual answers.

5. **Dataset Processing:**
   - The notebook includes code for handling datasets, possibly for training, evaluation, or as a source of information for the RAG system.

6. **Model Configuration:**
   - The language model is configured with specific parameters such as maximum sequence length and new token generation limits to optimize performance and output quality.

This notebook essentially creates a flexible and powerful question-answering system that can adapt to different domains (sleep science or automobile history) and leverage external knowledge to provide informative responses.


In [1]:
import json
import transformers  # type: ignore

from tqdm.auto import tqdm  # type: ignore

from langchain.llms import HuggingFacePipeline  # type: ignore
from langchain.chains import LLMChain  # type: ignore
from langchain.prompts import PromptTemplate  # type: ignore

from datasets import Dataset, DatasetDict  # type: ignore

# disable warnings
import warnings
warnings.filterwarnings("ignore")

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
dataset_type = 'sleep'
PROMPT_MODE = 'rag'

In [3]:
BASE_PATH = "/home/stepan/cars-sleep-chatbot"
MODEL_ID = f"{BASE_PATH}/models/{dataset_type}/llama-3_2-1b-it"
MAX_NEW_TOKENS = 8192
MAX_SEQ_LENGTH = 32768 - MAX_NEW_TOKENS

In [4]:
PROMPTS = {
    'cars': {
        'system': "You are an expert in sleep science with in-depth knowledge of sleep physiology, circadian rhythms, sleep disorders, and the impact of sleep on health and cognitive performance. Your task is to generate insightful and varied answers on sleep-related topics. The answers should be diverse in complexity, suitable for learners and experts alike.",
        'basic': "Human: Generate me an answer to the given question: {question}\n\nAssistant:",
        'rag': "Use resources provided to answer the following question.\nResources: {resources}\n\nHuman: Generate me an answer to the given question: {question}\n\nAssistant:",
    },
    'sleep': {
        'system': "You are an expert in the history of automobiles with in-depth knowledge of the development of automobiles from the late 19th century to the present day. Your task is to generate insightful and varied answers on automobile history. The answers should be diverse in complexity, suitable for learners and experts alike.",
        'basic': "Human: Generate me an answer to the given question: {question}\n\nAssistant:",
        'rag': "Use resources provided to answer the following question.\nResources: {resources}\n\nHuman: Generate me an answer to the given question: {question}\n\nAssistant:",
    }
}

In [5]:
text_generation_pipeline = transformers.pipeline(
    model=MODEL_ID,
    task="text-generation",
    temperature=0.5,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=MAX_NEW_TOKENS,
)

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


In [6]:
llama_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

prompt = PromptTemplate(
    input_variables=["question", "resources"],
    template=PROMPTS[dataset_type][PROMPT_MODE],
)

llm_chain = LLMChain(llm=llama_llm, prompt=prompt)

In [7]:
def load_data(file_path):
    with open(file_path, "r") as f:
        data = json.load(f)
    return data


def to_dataset(data):
    restructured_data = {
        "question": [],
        "resources": [],
        "answer": [],
    }

    for qna in data:
        restructured_data["question"].append(qna["question"])
        restructured_data["answer"].append(qna["answer"])
        restructured_data["resources"].append('\n'.join([resource['summary'] for resource in qna["citation"]]))

    return Dataset.from_dict(restructured_data)


def prepare_dataset(base_path=None):
    test_cars = load_data(f"{base_path}/data/test_qa_car.json")
    test_sleep = load_data(f"{base_path}/data/test_qa_sleep.json")

    test_cars_dataset = to_dataset(test_cars)
    test_sleep_dataset = to_dataset(test_sleep)

    return {"cars": test_cars_dataset, "sleep": test_sleep_dataset}

In [8]:
dataset = prepare_dataset(base_path=BASE_PATH)

In [None]:
predictions = []
for question in tqdm(dataset[dataset_type]["question"]):
    predictions.append(llm_chain.invoke({"question": question}))
# save predictions
with open(f"{BASE_PATH}/data/{dataset_type}_predictions.json", "w") as f:
    json.dump(predictions, f)

# Test without RAG

# RAG

In [10]:
from langchain_community.document_loaders import TextLoader # type: ignore
from langchain.text_splitter import CharacterTextSplitter, NLTKTextSplitter # type: ignore
from langchain.vectorstores import FAISS # type: ignore
from langchain.embeddings.huggingface import HuggingFaceEmbeddings  # type: ignore
from langchain.schema.runnable import RunnablePassthrough # type: ignore
from langchain.schema import Document # type: ignore
import nltk # type: ignore

nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /home/stepan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [11]:
loader = TextLoader(f"{BASE_PATH}/data/{dataset_type}.txt")
docs = loader.load()

In [12]:
print(f"Number of documents loaded: {len(docs)}")
print(f"Length of the first document: {len(docs[0].page_content)}")
print(f"First 100 characters of the document: {docs[0].page_content[:100]}")

Number of documents loaded: 1
Length of the first document: 273058
First 100 characters of the document: Yawning and an Introduction to Sleep Yawn. There, I said it. And I even provided an image (ﬁgure I.1


In [13]:
text_splitter = NLTKTextSplitter(chunk_size=250, chunk_overlap=20)
chunked_documents = text_splitter.split_documents(docs)

Created a chunk of size 337, which is longer than the specified 250
Created a chunk of size 360, which is longer than the specified 250
Created a chunk of size 316, which is longer than the specified 250
Created a chunk of size 255, which is longer than the specified 250
Created a chunk of size 382, which is longer than the specified 250
Created a chunk of size 293, which is longer than the specified 250
Created a chunk of size 565, which is longer than the specified 250
Created a chunk of size 313, which is longer than the specified 250
Created a chunk of size 275, which is longer than the specified 250
Created a chunk of size 273, which is longer than the specified 250
Created a chunk of size 311, which is longer than the specified 250
Created a chunk of size 275, which is longer than the specified 250
Created a chunk of size 363, which is longer than the specified 250
Created a chunk of size 262, which is longer than the specified 250
Created a chunk of size 301, which is longer tha

In [14]:
len(chunked_documents)

1312

In [15]:
for doc in chunked_documents:
    doc.metadata['dataset_type'] = dataset_type

In [16]:
db = FAISS.from_documents(chunked_documents, HuggingFaceEmbeddings(model_name='sentence-transformers/multi-qa-MiniLM-L6-dot-v1'))
retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 4, 'score_threshold': 0.5},
    filter={'dataset_type': dataset_type}
)

In [17]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [18]:
rag_chain = ( {"resources": retriever | format_docs, "question": RunnablePassthrough()} | llm_chain)

In [19]:
predictions = []
for question in tqdm(dataset[dataset_type]["question"]):
    predictions.append(rag_chain.invoke(question))
# save predictions
with open(f"{BASE_PATH}/data/{dataset_type}_rag_predictions.json", "w") as f:
    json.dump(predictions, f)

  0%|          | 0/27 [00:00<?, ?it/s]Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
 37%|███▋      | 10/27 [02:45<04:02, 14.24s/it]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
100%|██████████| 27/27 [07:50<00:00, 17.41s/it]
