# RAG Chatbot with PDF Knowledge Base

This notebook demonstrates how to build a high-performance Retrieval-Augmented Generation (RAG) chatbot

We will use:
- **Generator LLM:** `unsloth/llama-3-8b-Instruct` (A powerful version of Llama 3)
- **Retriever Model:** `BAAI/bge-large-en-v1.5` (A top-tier embedding model)
- **Vector Store:** ChromaDB

### Install Dependencies

First, we need to install the necessary Python libraries. `accelerate` and `bitsandbytes` are required to load the quantized 4-bit model efficiently.

### 1. Import Libraries

In [1]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from sentence_transformers import SentenceTransformer
import chromadb
from pypdf import PdfReader
from datasets import Dataset
import numpy as np
import textwrap


### 2. Load and Process the PDF

This step remains the same. We'll load the `Aluminium.pdf` file, extract its text content, and then split the text into smaller, manageable chunks.

In [2]:
def extract_text_from_pdf(pdf_path):
    """Extracts text from a PDF file."""
    reader = PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text() or ""
    return text

def split_text_into_chunks(text, chunk_size=500, chunk_overlap=50):
    """Splits text into overlapping chunks."""
    chunks = []
    current_pos = 0
    while current_pos < len(text):
        end_pos = current_pos + chunk_size
        chunk = text[current_pos:end_pos]
        chunks.append(chunk)
        current_pos += chunk_size - chunk_overlap
    return [chunk for chunk in chunks if chunk.strip()] 


pdf_path = 'Aluminium.pdf'


pdf_text = extract_text_from_pdf(pdf_path)
text_chunks = split_text_into_chunks(pdf_text)


documents_dict = {'text': text_chunks}
dataset = Dataset.from_dict(documents_dict)

print(f"Successfully loaded and split the PDF into {len(dataset)} chunks.")

Successfully loaded and split the PDF into 42 chunks.


In [3]:
print(pdf_text)

Aluminium
Ore & Mining: Bauxite ore (mainly in tropical countries) is the principal source of alumina. Global
bauxite mines are often large open-pit operations producing 3‚Äì5‚ÄØtonnes of ore per tonne of Al.
(India‚Äôs bauxite reserves lie mainly in Odisha and Jharkhand.). 
Production Steps: Primary Al production is a three-step process (bauxite mining, alumina
refining via Bayer , then Hall‚ÄìH√©roult electrolysis). In India, alumina (Al‚ÇÇO‚ÇÉ) is refined (Bayer
process) at plants in Odisha/Chhattisgarh, and smelters (Hall‚ÄìH√©roult) are coal-power-intensive.
Smelting uses carbon anodes, yielding CO‚ÇÇ and trace PFCs. Global production uses ~13‚Äì15‚ÄØkWh/
kg Al (47‚Äì54‚ÄØMJ/kg). Indian smelters report higher energy/GHG intensity (~20‚ÄØtCO‚ÇÇ/t Al) due to
coal-heavy grids (global average ~15‚ÄØtCO‚ÇÇ/t). 
Energy Intensity: Primary Al is highly energy-intensive. Global best practice uses ‚âà13‚ÄØkWh/kg
(46‚ÄØMJ/kg) . Secondary (recycled) Al needs only ~5% of that energy. India‚Äôs

In [4]:
print(text_chunks[0])
print(len(text_chunks))
print(text_chunks[1])

Aluminium
Ore & Mining: Bauxite ore (mainly in tropical countries) is the principal source of alumina. Global
bauxite mines are often large open-pit operations producing 3‚Äì5‚ÄØtonnes of ore per tonne of Al.
(India‚Äôs bauxite reserves lie mainly in Odisha and Jharkhand.). 
Production Steps: Primary Al production is a three-step process (bauxite mining, alumina
refining via Bayer , then Hall‚ÄìH√©roult electrolysis). In India, alumina (Al‚ÇÇO‚ÇÉ) is refined (Bayer
process) at plants in Odisha/Chhattisgarh, 
42
(Bayer
process) at plants in Odisha/Chhattisgarh, and smelters (Hall‚ÄìH√©roult) are coal-power-intensive.
Smelting uses carbon anodes, yielding CO‚ÇÇ and trace PFCs. Global production uses ~13‚Äì15‚ÄØkWh/
kg Al (47‚Äì54‚ÄØMJ/kg). Indian smelters report higher energy/GHG intensity (~20‚ÄØtCO‚ÇÇ/t Al) due to
coal-heavy grids (global average ~15‚ÄØtCO‚ÇÇ/t). 
Energy Intensity: Primary Al is highly energy-intensive. Global best practice uses ‚âà13‚ÄØkWh/kg
(46‚ÄØMJ/kg) . Secondary 

### 3. Create Text Embeddings with BGE-Large

We'll use the new, more accurate BGE embedding model to convert our text chunks into numerical vectors.

In [5]:

if torch.backends.mps.is_available():
    device = torch.device("mps")
else:
    device = torch.device("cpu")

embedding_model_name = 'BAAI/bge-large-en-v1.5'
embedding_model = SentenceTransformer(embedding_model_name, device=device)

embeddings = embedding_model.encode(dataset['text'], show_progress_bar=True)


dataset = dataset.add_column('embeddings', embeddings.tolist())

print("Embeddings created with BGE-Large and added to the dataset.")

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

Embeddings created with BGE-Large and added to the dataset.


### 4. Build the ChromaDB Collection

This step remains the same. We will load the documents and their new embeddings into our in-memory vector store.

In [6]:
client = chromadb.Client()


collection = client.get_or_create_collection(name="aluminium_kb_v2")

doc_ids = [str(i) for i in range(len(dataset))]
documents_list = [doc for doc in dataset['text']]


collection.add(
    embeddings=np.array(dataset['embeddings']),
    
    documents=documents_list,
    ids=doc_ids
)

print(f"ChromaDB collection created with {collection.count()} documents.")

ChromaDB collection created with 42 documents.


### 5. Define the RAG Chatbot with Llama 3

This is the core of our new chatbot. We load the 4-bit quantized Llama 3 model and create a pipeline.

In [7]:
llm_model_name = 'unsloth/llama-3-8b-Instruct'

tokenizer = AutoTokenizer.from_pretrained(llm_model_name)
model = AutoModelForCausalLM.from_pretrained(
    llm_model_name,
    torch_dtype=torch.float16
)
model.to(device)

# Create the pipeline for text generation
llm_pipeline = pipeline('text-generation', model=model, tokenizer=tokenizer)

def retrieve_context(query, k=3):
    query_embedding = embedding_model.encode([query]).tolist()
    results = collection.query(query_embeddings=query_embedding, n_results=k)
    retrieved_chunks = results['documents'][0]
    return " ".join(retrieved_chunks)

def generate_answer(query, context):
    # Llama 3 uses a specific chat template
    messages = [
        {"role": "system", "content": "You are a helpful assistant. Answer the user's question based on the provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
    ]
    
    prompt = llm_pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
    )
    
    terminators = [
        llm_pipeline.tokenizer.eos_token_id,
        llm_pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
    ]

    outputs = llm_pipeline(
        prompt,
        max_new_tokens=256,
        eos_token_id=terminators,
        do_sample=True,
        temperature=0.6,
        top_p=0.9,
    )
    
    # Extract the response from the generated text
    generated_text = outputs[0]['generated_text']
    response = generated_text[len(prompt):].strip()
    return response

def chatbot(query):
    print(f"‚ùì Query: {query}")
    context = retrieve_context(query)
    answer = generate_answer(query, context)
    print(f"\nAnswer:\n{textwrap.fill(answer, width=80)}")

`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use mps:0


### 6. Ask a Question!

Now, let's test our new high-performance RAG chatbot. The answers should be significantly more detailed, coherent, and human-like.

In [None]:
user_query = "What is red mud and how is it managed in India?"
chatbot(user_query)

‚ùì Query: What is red mud and how is it managed in India?

ü§ñ Llama 3 Answer:
According to the provided context, red mud is a type of waste generated during
the bauxite mining process, specifically from the production of alumina. It is a
byproduct that contains caustic, iron, and titanium oxides. In India, red mud is
classified as hazardous waste and is subject to strict regulations for safe
disposal.  To manage red mud in India, the Central Pollution Control Board
(CPCB) has brought in strict norms to curb its generation. The CPCB has also
issued guidelines, "CPCB Guidelines on Handling of Red Mud" (2013-24), which
address the management of alumina plant waste. According to these guidelines,
the following measures are recommended:  1. Minimize red mud generation by
improving ore quality and washing. 2. Store red mud in lined storage facilities
to prevent environmental contamination. 3. Use CPCB-approved treatment and
disposal methods for red mud. 4. Submit annual returns to the Sta

In [10]:
user_query_2 = "Explain the concept of a circular economy for metals like aluminium and copper."
chatbot(user_query_2)

‚ùì Query: Explain the concept of a circular economy for metals like aluminium and copper.

ü§ñ Llama 3 Answer:
The concept of a circular economy for metals like aluminium and copper aims to
keep these materials at high value and eliminate waste. This is achieved through
several key actions:  1. Design products for disassembly: Designing products,
such as vehicles or electronics, to make it easy to recover metals like
aluminium and copper boosts recycling rates. 2. Extend product life: Reusing or
refurbishing products to extend their life, reducing the need for frequent
replacements and subsequent waste generation. 3. Maximize recycling and closed-
loop recovery: Recovering metals from waste and recycling them back into new
products, reducing the need for primary production and the associated
environmental impacts. 4. Adopt industrial symbiosis: One industry's waste can
be another industry's input, for example, sending waste heat or chemicals from
mines and smelters to other industrie

In [11]:
q = "what is langchain"
chatbot(q)

‚ùì Query: what is langchain

Answer:
Based on the provided context, I couldn't find any information about
"langchain". However, I can help you with the question about ISO 14040 steps for
Life Cycle Assessment (LCA) of Aluminium production. According to the provided
text, the steps are:  1. Define the Goal & Scope (including functional unit) 2.
Compile the Life Cycle Inventory (LCI) 3. Perform Life Cycle Impact Assessment
(LCIA)  If you have any further questions or concerns, please feel free to ask!
