<a href="https://colab.research.google.com/github/KingOz-stack/RAG_Pipeline/blob/main/RAG_PIPELINE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**AI-Driven Document Classification & Retrieval Notebook Using Open Source Model (Mistra)**

This notebook showcases an AI-powered system for extracting, classifying, and querying documents. It efficiently processes PDFs, categorizes them into predefined classes, and builds vector indexes for fast and accurate semantic search. Using retrieval-augmented generation (RAG), the system routes user queries to the most relevant documents, delivering precise and context-aware responses.


### Environment Set Up

In [None]:
# Install required libraries with CUDA support
!pip install -q torch \
    llama-cpp-python==0.2.90 --no-cache-dir --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu123 \
    pymupdf \
    llama-index-llms-llama-cpp \
    llama-index-embeddings-huggingface

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m444.5/444.5 MB[0m [31m37.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m159.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m209.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m200.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m243.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m248.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m176.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m206.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━

In [None]:
import torch
import os
from llama_cpp import Llama

# Check CUDA availability and version
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
!nvcc --version

CUDA available: True
GPU: Tesla T4
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0


In [None]:
# Define model path
model_path = "/content/mistral.gguf"

# Download Mistral model if not already present
if not os.path.exists(model_path):
    !wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf \
        -O {model_path}
    print(f"Model downloaded to {model_path}")

# Verify file existence and size
if os.path.exists(model_path):
    print(f"Model file exists. Size: {os.path.getsize(model_path) / (1024 * 1024):.2f} MB")
else:
    print("Model file not found!")

# Load the model with GPU acceleration
try:
    llm = Llama(
        model_path=model_path,
        n_gpu_layers=1,  # Start with 1 layer on GPU to be safe
        n_ctx=2048,      # Context window size
        verbose=True     # Show loading progress
    )
    print("Model loaded successfully!")
except Exception as e:
    print(f"Error loading model: {e}")

--2025-04-02 23:04:05--  https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
Resolving huggingface.co (huggingface.co)... 3.169.137.5, 3.169.137.119, 3.169.137.111, ...
Connecting to huggingface.co (huggingface.co)|3.169.137.5|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs-us-1.hf.co/repos/72/62/726219e98582d16c24a66629a4dec1b0761b91c918e15dea2625b4293c134a92/3e0039fd0273fcbebb49228943b17831aadd55cbcbf56f0af00499be2040ccf9?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27mistral-7b-instruct-v0.2.Q4_K_M.gguf%3B+filename%3D%22mistral-7b-instruct-v0.2.Q4_K_M.gguf%22%3B&Expires=1743638645&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MzYzODY0NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzcyLzYyLzcyNjIxOWU5ODU4MmQxNmMyNGE2NjYyOWE0ZGVjMWIwNzYxYjkxYzkxOGUxNWRlYTI2MjViNDI5M2MxMzRhOTIvM2UwMDM5ZmQwMjczZmNiZWJiNDky

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /content/mistral.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.

Model downloaded to /content/mistral.gguf
Model file exists. Size: 4166.07 MB


llm_load_tensors: ggml ctx size =    0.27 MiB
llm_load_tensors: offloading 1 repeating layers to GPU
llm_load_tensors: offloaded 1/33 layers to GPU
llm_load_tensors:        CPU buffer size =  4165.37 MiB
llm_load_tensors:      CUDA0 buffer size =   132.50 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =   248.00 MiB
llama_kv_cache_init:      CUDA0 KV buffer size =     8.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.12 MiB
llama_new_context_with_model:      CUDA0 compute buffer

Model loaded successfully!


AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
Model metadata: {'tokenizer.chat_template': "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}", 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': 

# **Importing Required Libraries**

In [None]:
import fitz  # PyMuPDF
import re
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine

* `fitz `(PyMuPDF) is used for extracting text from PDF documents.

* `re `is the regular expressions module for text processing.

* `llama_index.core` provides components for document indexing, retrieval, and querying.

* `LlamaCPP `is used to integrate an open-source LLM for classification and query processing.

* `HuggingFaceEmbedding `is used for embedding document texts for vector search.

* `RetrieverQueryEngine` and `CompactAndRefine` help process queries and synthesize responses.







# **DocumentClassifier Class Initialization**

In [None]:
class DocumentClassifier:
    def __init__(self, model_path, embedding_model="BAAI/bge-small-en-v1.5"):

This class initializes an LLM-based document classifier.

`model_path:` Specifies the path to the LLM model (e.g., Mistral).

`embedding_model: `Loads a Hugging Face model for text embeddings to perform similarity searches.

In [None]:
self.llm = LlamaCPP(
    model_path=model_path,
    temperature=0.1,
    max_new_tokens=50,
    context_window=4096
)

* Loads the Mistral LLM with specific parameters:

* temperature=0.1 keeps responses consistent.

* max_new_tokens=50 controls response length.

* context_window=4096 ensures enough memory for processing long text.

In [None]:
self.embed_model = HuggingFaceEmbedding(model_name=embedding_model)

* Loads a transformer-based embedding model for document indexing and retrieval.

In [None]:
self.DOCUMENT_CATEGORIES = [
    'Bank Statement',
    'Pay Slip',
    'Appraisal Report',
    'Lender Fees Worksheet',
    'Sample Contract'
]

* Defines document categories that the classifier will identify.

In [None]:
self.index_map = {}

* A dictionary to store vector indexes for each classified document type.

# **3. Extracting Text from PDFs**

In [None]:
def extract_pdf_text(self, pdf_path):
    """Extract text from a given PDF file."""
    try:
        doc = fitz.open(pdf_path)
        text = "\n".join([page.get_text("text", sort=True) for page in doc])
        return text
    except Exception as e:
        print(f"Error extracting text from {pdf_path}: {e}")
        return ""

* Opens a PDF and extracts text from all pages.

* Returns extracted text as a single string.

* If an error occurs, it prints a message instead of breaking.

# **4. Preprocessing Text for Classification**

In [None]:
def prepare_document_for_classification(self, text):
    """Create a structured representation of document text."""

Extracts important portions of a document for classification.

In [None]:
doc_length = len(text)
first_part = text[:min(500, doc_length)]
middle_start = max(0, doc_length // 2 - 250)
middle_part = text[middle_start:middle_start + min(500, doc_length - middle_start)]
last_part = text[-500:] if doc_length > 500 else text

* Extracts three key sections: beginning, middle, and end.

* Helps classification by focusing on meaningful parts of a document.

In [None]:
potential_headers = [
    line.strip() for line in text.split('\n')
    if line.strip() and len(line.strip()) < 50 and line.strip().isupper()
][:5]

* Identifies potential document headers (titles, section names) for better classification.


# **5. Classifying Documents**

In [None]:
def classify_document(self, text):
    """Classify document into predefined categories."""

* Uses LLM to classify a document into one of the predefined categories

In [None]:
prompt = f"""Classify this document into one of these precise categories:
...
ONLY respond with the EXACT category name:"""

* Provides an LLM prompt that describes each category and asks for a classification.

In [None]:
response = self.llm.complete(prompt)
classified_type = response.text.strip()

* Calls the LLM to generate a classification.

In [None]:
for category in self.DOCUMENT_CATEGORIES:
    if category.lower() in classified_type.lower():
        return category

* Performs fuzzy matching to ensure classification aligns with predefined categories.


# **6. Building Indexes for Classified Documents**

In [None]:
def build_indexes(self, document_paths):
    """Builds vector indexes for classified documents."""

* Loops through each document, extracts text, classifies it, and stores it in a vector index.

In [None]:
if doc_type != "Unknown":
    document = Document(text=text, metadata={"doc_type": doc_type})

    if doc_type not in self.index_map:
        self.index_map[doc_type] = VectorStoreIndex.from_documents(
            [document], embed_model=self.embed_model
        )
    else:
        self.index_map[doc_type].insert(document)

* Stores classified documents in vector databases for retrieval.


# **7. Weighted Retrieval Approach**

In [None]:
def route_query(self, query):
    """Routes a query to the relevant document type and retrieves an answer."""

    type_prompt = f"""Classify this query to the most relevant document type:
...
Respond ONLY with: Bank Statement, Pay Slip, Appraisal Report, Lender Fees Worksheet, Sample Contract, or Unknown"""

# Uses LLM to decide which document type is most relevant.

doc_type_response = self.llm.complete(type_prompt)
doc_type = doc_type_response.text.strip()

# Calls the LLM to classify the query.

if doc_type not in self.index_map:
    return f"Unable to route query to document type: {doc_type}"

if doc_type not in self.index_map:
    return f"Unable to route query to document type: {doc_type}"

#If the classified document type isn’t indexed, the query can't be answered.


retriever = self.index_map[doc_type].as_retriever(similarity_top_k=2)
response_synthesizer = CompactAndRefine(llm=self.llm)
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer
)

#* Implements hybrid retrieval:
#* Uses similarity-based document retrieval.
#* Synthesizes a refined answer using LLM.

response = query_engine.query(query)
return f"📄 **Document Type:** {doc_type}\n🔍 **Answer:** {response}"
#Retrieves an answer from the most relevant document type.

# **8.  Running the Classifier and Query System**

In [None]:
def main():
    document_paths = [
        "/content/payslip_sample_image.pdf",
        "/content/sample_bank_statement.pdf",
        "/content/appraisal_report.pdf",
        "/content/LenderFeesWorksheetNew.pdf",
        "/content/sample_contract.pdf"
    ]
#Lists the paths to documents that need classification.

model_path = "/content/mistral.gguf"
classifier = DocumentClassifier(model_path)
classifier.build_indexes(document_paths)

#Loads the Mistral model and builds document indexes

test_queries = [
    "What is my net salary?",
    "What is the appraised value of the house?",
    "What was my last deposit?"
]

#Example queries to test document classification and retrieval.

for query in test_queries:
    print(f"\nQuery: {query}")
    print(classifier.route_query(query))
#Runs each query through the classifier and prints the response.

if __name__ == "__main__":
    main()
#Ensures the script runs when executed.

# **RAG PIPELINE IS BELOW:**

In [None]:
import fitz  # PyMuPDF
import re
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine

class DocumentClassifier:
    def __init__(self, model_path, embedding_model="BAAI/bge-small-en-v1.5"):
        # Initialize LLM with optimized parameters
        self.llm = LlamaCPP(
            model_path=model_path,
            temperature=0.1,  # Slightly lower temperature for more consistent results
            max_new_tokens=50,  # Increased to allow more flexibility
            context_window=4096
        )

        # Initialize embedding model
        self.embed_model = HuggingFaceEmbedding(model_name=embedding_model)

        # Predefined document categories
        self.DOCUMENT_CATEGORIES = [
            'Bank Statement',
            'Pay Slip',
            'Appraisal Report',
            'Lender Fees Worksheet',
            'Sample Contract'
        ]

        # Index map to store document indexes
        self.index_map = {}

    def extract_pdf_text(self, pdf_path):
        """Extract text from PDF with improved text extraction."""
        try:
            doc = fitz.open(pdf_path)
            text = "\n".join([page.get_text("text", sort=True) for page in doc])
            return text
        except Exception as e:
            print(f"Error extracting text from {pdf_path}: {e}")
            return ""

    def prepare_document_for_classification(self, text):
        """Create a more comprehensive document representation."""
        doc_length = len(text)

        # Extract key sections
        first_part = text[:min(500, doc_length)]
        middle_start = max(0, doc_length // 2 - 250)
        middle_part = text[middle_start:middle_start + min(500, doc_length - middle_start)]
        last_part = text[-500:] if doc_length > 500 else text

        # Extract potential headers or key phrases
        potential_headers = [
            line.strip() for line in text.split('\n')
            if line.strip() and len(line.strip()) < 50 and line.strip().isupper()
        ][:5]

        return {
            "first_part": first_part,
            "middle_part": middle_part,
            "last_part": last_part,
            "total_length": doc_length,
            "potential_headers": "\n".join(potential_headers) if potential_headers else ""
        }

    def classify_document(self, text):
        """Classify document with improved prompting and fallback mechanisms."""
        doc_info = self.prepare_document_for_classification(text)

        prompt = f"""Classify this document into one of these precise categories:
        - Bank Statement: Official financial record showing account transactions
        - Pay Slip: Employment earnings document with salary details
        - Appraisal Report: Professional property valuation document
        - Lender Fees Worksheet: Detailed loan cost breakdown
        - Sample Contract: Legal document template or example
        - Unknown: If no clear match exists

        Extracted Document Characteristics:
        First Excerpt: {doc_info['first_part']}
        Middle Excerpt: {doc_info['middle_part']}
        End Excerpt: {doc_info['last_part']}
        Potential Headers: {doc_info['potential_headers']}
        Total Length: {doc_info['total_length']} characters

        ONLY respond with the EXACT category name:"""

        try:
            response = self.llm.complete(prompt)
            classified_type = response.text.strip()

            # Fuzzy matching for categories
            for category in self.DOCUMENT_CATEGORIES:
                if category.lower() in classified_type.lower():
                    return category

            return "Unknown"

        except Exception as e:
            print(f"Classification error: {e}")
            return "Unknown"

    def build_indexes(self, document_paths):
        """Build vector indexes for classified documents."""
        classified_docs = {}

        for doc_path in document_paths:
            text = self.extract_pdf_text(doc_path)
            doc_type = self.classify_document(text)

            if doc_type != "Unknown":
                document = Document(text=text, metadata={"doc_type": doc_type})

                if doc_type not in self.index_map:
                    self.index_map[doc_type] = VectorStoreIndex.from_documents(
                        [document],
                        embed_model=self.embed_model
                    )
                else:
                    self.index_map[doc_type].insert(document)

        return self.index_map

    def route_query(self, query):
        """Route query to appropriate document type and retrieve answer."""
        # Use LLM to determine document type
        type_prompt = f"""Classify this query to the most relevant document type:
        Query: {query}
        Respond ONLY with: Bank Statement, Pay Slip, Appraisal Report, Lender Fees Worksheet, Sample Contract, or Unknown"""

        try:
            doc_type_response = self.llm.complete(type_prompt)
            doc_type = doc_type_response.text.strip()

            # Fuzzy matching
            for category in self.DOCUMENT_CATEGORIES:
                if category.lower() in doc_type.lower():
                    doc_type = category
                    break

            # Check if document type exists in our index
            if doc_type not in self.index_map:
                return f"Unable to route query to document type: {doc_type}"

            # Create query engine
            retriever = self.index_map[doc_type].as_retriever(similarity_top_k=2)
            response_synthesizer = CompactAndRefine(llm=self.llm)
            query_engine = RetrieverQueryEngine(
                retriever=retriever,
                response_synthesizer=response_synthesizer
            )

            # Execute query
            response = query_engine.query(query)
            return f"Document Type: {doc_type}\n Answer: {response}"

        except Exception as e:
            return f"Query routing error: {e}"

# Example usage
def main():
    # Paths to your documents
    document_paths = [
        "/content/payslip_sample_image.pdf",
        "/content/sample_bank_statement.pdf",
        "/content/appraisal_report.pdf",
        "/content/LenderFeesWorksheetNew.pdf",
        "/content/sample_contract.pdf"
    ]

    # Path to your Mistral model
    model_path = "/content/mistral.gguf"

    # Initialize classifier
    classifier = DocumentClassifier(model_path)

    # Build indexes
    classifier.build_indexes(document_paths)

    # Test queries
    test_queries = [
        "What is my net salary?",
        "What is the appraised value of the house?",
        "What was my last deposit?"
        "How much tax was deducted from the paycheck?"
    ]

    for query in test_queries:
        print(f"\nQuery: {query}")
        print(classifier.route_query(query))

if __name__ == "__main__":
    main()

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /content/mistral.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.


Query: What is my net salary?



llama_print_timings:        load time =    2456.24 ms
llama_print_timings:      sample time =       0.60 ms /    10 runs   (    0.06 ms per token, 16528.93 tokens per second)
llama_print_timings: prompt eval time =    1105.21 ms /    51 tokens (   21.67 ms per token,    46.14 tokens per second)
llama_print_timings:        eval time =    5652.63 ms /     9 runs   (  628.07 ms per token,     1.59 tokens per second)
llama_print_timings:       total time =    6766.53 ms /    60 tokens
Llama.generate: 1 prefix-match hit, remaining 329 prompt tokens to eval

llama_print_timings:        load time =    2456.24 ms
llama_print_timings:      sample time =       0.30 ms /     6 runs   (    0.05 ms per token, 19933.55 tokens per second)
llama_print_timings: prompt eval time =    2523.99 ms /   329 tokens (    7.67 ms per token,   130.35 tokens per second)
llama_print_timings:        eval time =    2916.91 ms /     5 runs   (  583.38 ms per token,     1.71 tokens per second)
llama_print_timings:   

Document Type: Pay Slip
 Answer: 9500.

Query: What is the appraised value of the house?



llama_print_timings:        load time =    2456.24 ms
llama_print_timings:      sample time =       0.64 ms /    12 runs   (    0.05 ms per token, 18779.34 tokens per second)
llama_print_timings: prompt eval time =    1106.82 ms /    59 tokens (   18.76 ms per token,    53.31 tokens per second)
llama_print_timings:        eval time =    6895.72 ms /    11 runs   (  626.88 ms per token,     1.60 tokens per second)
llama_print_timings:       total time =    8010.22 ms /    70 tokens
Llama.generate: 1 prefix-match hit, remaining 2330 prompt tokens to eval

llama_print_timings:        load time =    2456.24 ms
llama_print_timings:      sample time =       1.18 ms /    21 runs   (    0.06 ms per token, 17872.34 tokens per second)
llama_print_timings: prompt eval time =    8593.07 ms /  2330 tokens (    3.69 ms per token,   271.15 tokens per second)
llama_print_timings:        eval time =   14152.41 ms /    20 runs   (  707.62 ms per token,     1.41 tokens per second)
llama_print_timings:  

Document Type: Appraisal Report
 Answer:  The appraised value of the house is $1,918,507.

Query: What was my last deposit?



llama_print_timings:        load time =    2456.24 ms
llama_print_timings:      sample time =       0.47 ms /     9 runs   (    0.05 ms per token, 19027.48 tokens per second)
llama_print_timings: prompt eval time =    1079.55 ms /    54 tokens (   19.99 ms per token,    50.02 tokens per second)
llama_print_timings:        eval time =    4553.34 ms /     8 runs   (  569.17 ms per token,     1.76 tokens per second)
llama_print_timings:       total time =    5638.29 ms /    62 tokens
Llama.generate: 1 prefix-match hit, remaining 1414 prompt tokens to eval

llama_print_timings:        load time =    2456.24 ms
llama_print_timings:      sample time =       1.45 ms /    31 runs   (    0.05 ms per token, 21453.29 tokens per second)
llama_print_timings: prompt eval time =    5569.21 ms /  1414 tokens (    3.94 ms per token,   253.90 tokens per second)
llama_print_timings:        eval time =   19563.20 ms /    30 runs   (  652.11 ms per token,     1.53 tokens per second)
llama_print_timings:  

Document Type: Bank Statement
 Answer:  Your last deposit was $2,678.39 on July 31, 2018, via an ATM.
