In [None]:
!pip install transformers langchain sentence-transformers chromadb gradio langchain-community




In [None]:
!pip install huggingface_hub
!pip install bitsandbytes, accelerate


[31mERROR: Invalid requirement: 'bitsandbytes,': Expected end or semicolon (after name and no valid version specifier)
    bitsandbytes,
                ^[0m[31m
[0m

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.llms import HuggingFacePipeline
from sentence_transformers import SentenceTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
import gradio as gr
from langchain_community.llms import HuggingFacePipeline
from langchain_community.embeddings import HuggingFaceEmbeddings


In [None]:
from huggingface_hub import login

login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### LLM Documentation: `get_llm` Function

This function sets up a text-generation pipeline using the LLaMA-2 model. Here's a step-by-step explanation for intermediate readers:

#### Step 1: Specify the Model
The variable `model_name` is set to `meta-llama/Llama-2-7b-chat-hf`, which specifies the pre-trained model to be used for text generation. This model is optimized for conversational tasks.


#### Step 2: Load the Tokenizer
A tokenizer is loaded using `AutoTokenizer.from_pretrained(model_name)` to:
- **Preprocess:** Convert text input into tokens that the model understands.
- **Postprocess:** Convert tokens back into human-readable text after generation.


#### Step 3: Load the Model
The model is loaded using `AutoModelForCausalLM.from_pretrained`. Key options include:
- **`device_map="auto"`**: Automatically allocate model layers to available hardware resources.
- **`offload_folder="./offload"`**: Use disk storage to handle layers if memory is insufficient.
- **`load_in_8bit=True`**: Quantize the model to 8-bit for reduced memory usage and faster inference.


#### Step 4: Create the Pipeline
The Hugging Face `pipeline` is configured for `text-generation`. Key parameters:
- **`max_new_tokens=200`**: Limits the number of tokens generated in each response.
- **`truncation=True`**: Ensures input text fits within model constraints.


#### Step 5: Wrap in `HuggingFacePipeline`
The pipeline is wrapped in a `HuggingFacePipeline` object, making it compatible with frameworks like LangChain.

#### Summary
The `get_llm` function streamlines the setup process for using a powerful language model in a conversational or text-generation task. This approach is efficient and ready for integration with advanced tools.


In [None]:
def get_llm():
    model_name = "meta-llama/Llama-2-7b-chat-hf"
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map="auto",        # Automatically allocate resources
        offload_folder="./offload",  # Use disk for offloading
        load_in_8bit=True,         # Quantize model to 8-bit
    )

    hf_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=200,
        truncation=True,
    )
    return HuggingFacePipeline(pipeline=hf_pipeline)


### `document_loader` Function

The `document_loader` function loads and processes a PDF file for further analysis.

#### Parameters
- **`file`**: Path to the PDF file to be loaded.

#### Steps
1. Initializes a `PyPDFLoader` to load the PDF.
2. Extracts the content using `load()`.
3. Prints the number of pages loaded.

#### Return Value
- Returns the loaded document as a structured list for further use.


In [None]:
def document_loader(file):
    loader = PyPDFLoader(file)
    loaded_document = loader.load()
    print(f"Loaded {len(loaded_document)} pages.")
    return loaded_document

### `text_splitter` Function

The `text_splitter` function divides large text documents into smaller chunks for easier processing.

#### Parameters
- **`data`**: The document to be split into chunks.

#### Steps
1. Uses `RecursiveCharacterTextSplitter` to split the text into chunks.
   - **`chunk_size`**: Maximum size of each chunk (1000 characters).
   - **`chunk_overlap`**: Overlap between adjacent chunks (50 characters).
2. Prints the number of chunks created.

#### Return Value
- Returns a list of text chunks.


In [None]:
def text_splitter(data):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50, length_function=len)
    chunks = text_splitter.split_documents(data)
    print(f"Split into {len(chunks)} chunks.")
    return chunks


### `huggingface_embedding` Function

The `huggingface_embedding` function loads a pre-trained sentence embedding model to generate embeddings for text.

#### Steps
1. Loads the **"sentence-transformers/all-MiniLM-L6-v2"** model from Hugging Face for embedding generation.
2. Prints a success message after loading the model.

#### Return Value
- Returns the loaded embedding model (`HuggingFaceEmbeddings`).


In [None]:
def huggingface_embedding():
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    embed_model = HuggingFaceEmbeddings(model_name=model_name)
    print("Embeddings model loaded successfully!")
    return embed_model


### `vector_database` Function

The `vector_database` function creates a vector store by embedding document chunks and storing them for fast similarity search.

#### Steps
1. Calls the `huggingface_embedding` function to load the embedding model.
2. Records the start time to measure the performance of the vector database creation.
3. Uses the **Chroma** library to create a vector store (`Chroma.from_documents`), which stores document embeddings.
4. Prints the time taken to create the vector store.

#### Return Value
- Returns the created vector store (`vectordb`).


In [None]:
import time
def vector_database(chunks):
    embedding_model = huggingface_embedding()
    start_time = time.time()
    vectordb = Chroma.from_documents(chunks, embedding_model)
    print(f"Vector store created in {time.time() - start_time} seconds.")
    return vectordb

### `retriever` Function

The `retriever` function loads documents, splits them into chunks, and creates a retriever from a vector database for efficient information retrieval.

#### Steps
1. **Load documents** using the `document_loader` function, which loads the contents of the provided file.
2. **Split the documents** into manageable chunks using the `text_splitter` function.
3. **Create a vector database** by calling the `vector_database` function, which stores the chunked documents as embeddings.
4. Converts the vector database into a **retriever** for efficient retrieval of relevant information.

#### Return Value
- Returns a **retriever** object that can be used to fetch relevant chunks from the vector database.


In [None]:
def retriever(file):
    documents = document_loader(file)
    chunks = text_splitter(documents)
    print(f"Retrieved {len(chunks)} chunks for retrieval.")
    vectordb = vector_database(chunks)
    retriever = vectordb.as_retriever()
    return retriever


### `retriever_qa` Function

The `retriever_qa` function processes a query by retrieving relevant information from a file and generating a response using a Language Model (LLM).

#### **Steps**

1. **Load LLM**  
   Calls the `get_llm()` function to load a pre-configured language model for answering the query.

2. **Initialize Retriever**  
   Uses the `retriever(file)` function to create a retriever object from the provided file. This object handles fetching relevant chunks of data.

3. **Set Up RetrievalQA**  
   Configures the question-answering chain with `RetrievalQA.from_chain_type()`.  
   - **Parameters**:
     - `llm`: The LLM instance used for generating responses.
     - `chain_type`: Set to `"stuff"`, defining how the retrieved information is combined.
     - `retriever`: The retriever object for fetching document chunks.
     - `return_source_documents`: Set to `False` to exclude source documents from the output.

4. **Execute Query**  
   Calls `qa.run(query)` to process the query and generate the response.

5. **Error Handling**  
   Wraps the entire process in a `try-except` block to gracefully handle errors.


#### **Return Value**
- Returns the generated response as a string (`response.strip()`).
- If an error occurs, returns a string with the error message.


In [None]:
def retriever_qa(file, query):
    try:
        llm = get_llm()
        retriever_obj = retriever(file)
        qa = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=retriever_obj,
            return_source_documents=False
        )
        response = qa.run(query)

        return response.strip()
    except Exception as e:
        return f"Error: {str(e)}"


### `rag_application` (Gradio Interface)

The `rag_application` creates a user interface using **Gradio** to interact with the retrieval-augmented generation (RAG) model. This interface allows users to upload a PDF and ask questions based on its content.

#### Key Components:
1. **Inputs:**
   - **File Upload (`gr.File`)**: Users can upload a single PDF file. The file must be in `.pdf` format.
   - **Textbox (`gr.Textbox`)**: Users can type their query or question. It supports multiline input with a placeholder text.
   
2. **Outputs:**
   - **Textbox (`gr.Textbox`)**: Displays the chatbot's response to the input query.

3. **Function (`fn`)**:
   - The function `retriever_qa` is called when the user interacts with the interface. It processes the PDF and query, then provides the output.

4. **Title and Description:**
   - The interface includes a title ("RAG Chatbot") and a description explaining the functionality: upload a PDF and ask any question related to the document.

#### Usage:
- Users upload a PDF, input their query, and receive an answer based on the document’s content.


In [None]:
rag_application = gr.Interface(
    fn=retriever_qa,
    allow_flagging="never",
    inputs=[
       gr.File(label="Upload PDF File", file_count="single", file_types=['.pdf'], type="filepath"),
       gr.Textbox(label="Input Query", lines=2, placeholder="Type your question here...")
    ],
    outputs=gr.Textbox(label="Output"),
    title="RAG Chatbot",
    description="Upload a PDF document and ask any question. The chatbot will try to answer using the provided document."
)



In [None]:
pip install pypdf



In [None]:
rag_application.launch(server_name="0.0.0.0", server_port=7860)

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://b9ae837de733352d52.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


