# RAG Pipeline for Audit Report Generation

This notebook demonstrates how to augment your fine-tuned Audit LLM with a Retrieval-Augmented Generation (RAG) system.

**Goal**: Combine the *internal knowledge* of your fine-tuned model (style, general audit rules) with *external knowledge* (specific client data, recent financial statements) to generate accurate reports.

## 1. Setup
We use `langchain`, `chromadb` (vector store), and `sentence-transformers` (embeddings).

In [1]:
!pip install -q langchain langchain-classic langchain-text-splitters langchain-community langchain-huggingface chromadb sentence-transformers bitsandbytes accelerate peft

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter  
from langchain_classic.chains.retrieval_qa.base import RetrievalQA
from langchain_core.prompts import PromptTemplate
from langchain_community.docstore.document import Document  

print("‚úÖ All imports successful!")

‚úÖ All imports successful!


In [3]:
# Mount Google Drive (for Colab)
import sys
from pathlib import Path
if 'google.colab' in sys.modules:
    from google.colab import drive
    try:
        drive.mount('/content/drive')
        print("‚úÖ Google Drive mounted successfully!")
    except Exception as e:
        print(f"‚ö†Ô∏è Drive mount failed: {e}")
else:
    print("‚ÑπÔ∏è Not running in Colab. Skipping Drive mount.")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úÖ Google Drive mounted successfully!


## 2. Load Your Fine-Tuned Model
We load the model you saved in the previous step.

In [4]:
# Path to your saved fine-tuned model (adapters)
# Ensure this path is correct in your environment
MODEL_PATH = "/content/drive/MyDrive/Self_Supervised_finetuning_Model/audit-mistral-7b-dpo"
BASE_MODEL_ID = "mistralai/Mistral-7B-v0.1"
# Clear GPU cache first
import gc
torch.cuda.empty_cache()
gc.collect()
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
)

print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Load Adapters
from peft import PeftModel
model = PeftModel.from_pretrained(model, MODEL_PATH)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

# Create Text Generation Pipeline
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2, # Low temp for factual generation
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=512,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

Loading model...


Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


## 3. Prepare RAG Knowledge Base
**Task**: Populate `enterprise_documents` below with your specific company data (financials, meeting minutes, internal memos).

In [5]:
# --- PASTE YOUR DATA HERE ---
enterprise_data = [

    # --- General Information ---
    "Mazars LLP is a UK Limited Liability Partnership and part of the Mazars international integrated partnership network.",
    "The inspection was conducted by the Financial Reporting Council (FRC) Audit Inspection Unit (AIU) for the year ended 31 March 2012.",
    "The inspection period covered April 2011 to November 2011 and reviewed audits with financial year ends between 31 August 2010 and 31 March 2011.",

    # --- Firm Structure & Financial Data ---
    "Mazars LLP had 15 UK offices as of 31 August 2011.",
    "The firm's turnover for the year ended 31 August 2011 was ¬£109.1 million.",
    "¬£44.3 million of turnover related to audit and assurance services.",
    "The firm had 108 partners, of whom 53 were authorized to sign audit reports, plus four audit directors.",
    "The AIU estimated the firm audited eleven entities within inspection scope, including two main market listed entities.",

    "The AIU reviewed four audit engagements selected on a risk basis.",
    "Two audits were performed to a good standard, one was acceptable with improvements required, and one required significant improvement.",
    "Areas of particular focus included going concern, valuation of assets at fair value, impairment of assets, revenue recognition, related parties, group audits, and reporting to Audit Committees.",

    # --- Key Findings ---
    "Weaknesses were identified in audit evidence sufficiency in three of the four audits reviewed.",
    "Going concern assessments were inadequate in three of the four audits; in two cases no formal assessment was obtained from management.",
    "Insufficient procedures were performed to assess the independence and competence of experts in three audits.",
    "Substantive analytical procedures were weak in three audits due to imprecise expectations and insufficient corroboration.",
    "In two audits, insufficient work was performed regarding management override of controls and journal testing.",
    "Weaknesses were identified in communications with Audit Committees across all four audits.",
    "In some cases, independence threats from non-audit services were not properly identified or mitigated.",

]
# ----------------------------

# Convert strings to Document objects
docs = [Document(page_content=text) for text in enterprise_data]

# Splitter (if you have long documents)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
all_splits = text_splitter.split_documents(docs)

# Create Vector Database
# We use 'all-MiniLM-L6-v2' for embeddings (fast and effective)
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vectorstore = Chroma.from_documents(documents=all_splits, embedding=embedding_model)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) # Retrieve top 3 relevant chunks

 # --- Governance & Independence Issues ---
    "Non-audit fees exceeded audit fees significantly in certain engagements, creating independence threats.",
    "The firm lacked a central register for monitoring business relationships with audited entities.",
    "Performance appraisals included references to selling non-audit services to audit clients, contrary to ethical standards.",
    "The firm‚Äôs central approval process for audit acceptance only covered high-risk entities.",
    "There was no formal process to reassess audit reports if Annual Quality Review findings raised concerns.",

    # --- Regulatory Outcome ---
    "The AIU recommended that Mazars LLP‚Äôs audit registration be continued.",
    "The firm committed to continuous improvement and agreed to address identified weaknesses."

    # --- Inspection Dates & Timeline ---
    "Inspection year covered: year ended 31 March 2012.",
    "Inspection fieldwork period: April 2011 to November 2011.",
    "Audit financial year ends reviewed ranged from 31 August 2010 to 31 March 2011.",
    "Public report issued on 10 May 2012.",
    "Private report finalized in March 2012.",
    "Previous limited inspection occurred in 2009.",
    "Inspection cycle frequency: every 2 years.",

    # --- Firm Size & Financial Metrics ---
    "Total UK offices: 15.",
    "Total turnover for year ended 31 August 2011: ¬£109.1 million.",
    "Audit and assurance revenue: ¬£44.3 million.",
    "Audit revenue percentage of total turnover: approximately 40.6 percent.",
    "Total partners: 108.",
    "Partners authorized to sign audit reports: 53.",
    "Audit directors authorized to sign reports: 4.",
    "Estimated audited entities within AIU scope: 11.",
    "Entities listed on main London Stock Exchange market: 2.",

    # --- Audit Review Sample Statistics ---
    "Total audit engagements reviewed: 4.",
    "Audits rated good standard: 2 (50%).",
    "Audits rated acceptable with improvements required: 1 (25%).",
    "Audits requiring significant improvement: 1 (25%).",
    "Audits with insufficient audit evidence findings: 3 out of 4 (75%).",
    "Audits with going concern weaknesses: 3 out of 4 (75%).",
    "Audits with insufficient expert independence assessment: 3 out of 4 (75%).",
    "Audits with weaknesses in substantive analytical procedures: 3 out of 4 (75%).",
    "Audits with fraud-related testing deficiencies: 2 out of 4 (50%).",
    "Audits with Audit Committee communication weaknesses: 4 out of 4 (100%).",

    # --- Independence & Fee Ratios ---
    "In one listed company audit, non-audit fees exceeded four times the audit fee (greater than 400%).",
    "In one non-listed company audit, non-audit fees exceeded twice the audit fee (greater than 200%).",

    # --- Governance & Partner Oversight ---
    "Engagement partner had 8 consecutive years of senior involvement in one audit without independent partner review.",
    "Central acceptance approval process covered only entities classified as high risk.",
    "Firm operates under 2 business units: Public Interest Entities and Owner Managed Businesses.",

    # --- Structural & Regulatory Data ---
    "Registered office located at Tower Bridge House, London E1W 1DD.",
    "FRC headquarters located at 71-91 Aldwych, London WC2B 4HN.",
    "Company registration number: 2486368.",
    "Mazars LLP registration number: OC308299."


## 4. Setup RAG Chain
We define a custom prompt that forces the model to use the retrieved context.

In [6]:
template = """
You are an expert auditor provided with specific internal documents.
Use the following pieces of context to write a section of the audit report.
If the answer is not in the context, say you don't have enough information.

Context:
{context}

Instruction: {question}

Audit Report Section:
"""

prompt = PromptTemplate(
    template=template, 
    input_variables=["context", "question"]
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)

## 5. Generate Report
Now we ask the model to generate report sections based on the loaded data.

In [7]:
# Example 1: Revenue Generation
query = "bult a audit report on Mazars LLP "
result = qa_chain.invoke(query)
print("--- Generated Revenue Section ---\n")
print(result['result'])

# Example 2: Risk Assessment
""" print("\n--- Generated Risk Section ---\n")
query = "What are the key audit matters regarding market risks?"
result = qa_chain.invoke(query)
print(result['result']) """

--- Generated Revenue Section ---


You are an expert auditor provided with specific internal documents.
Use the following pieces of context to write a section of the audit report.
If the answer is not in the context, say you don't have enough information.

Context:
Mazars LLP is a UK Limited Liability Partnership and part of the Mazars international integrated partnership network.

Mazars LLP had 15 UK offices as of 31 August 2011.

¬£44.3 million of turnover related to audit and assurance services.

Instruction: bult a audit report on Mazars LLP 

Audit Report Section:

The audit report should be written in a clear and concise manner. It should be written in a professional tone and should include all relevant information. The report should be easy to understand and should be free from any errors or omissions.


' print("\n--- Generated Risk Section ---\n")\nquery = "What are the key audit matters regarding market risks?"\nresult = qa_chain.invoke(query)\nprint(result[\'result\']) '

In [8]:
from sentence_transformers import SentenceTransformer, util

# 1. Run the RAG Query
query = "What was Mazars LLP's total turnover and how many partners did they have in 2011?"
print(f"üîç Question: {query}")

# Get response from your RAG system
result = qa_chain.invoke(query)
model_response = result['result'] # <--- This defines the variable!

print(f"ü§ñ AI Response: {model_response[:200]}...")

# 2. Check Similarity (Relativity)
print("\n" + "-"*50)
print("üéØ CALULCATING RELATIVITY (Cosine Similarity)")
print("-"*50)

sim_model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode Query and Response
emb_query = sim_model.encode(query, convert_to_tensor=True)
emb_resp = sim_model.encode(model_response, convert_to_tensor=True)

# Calculate Score
score = util.cos_sim(emb_query, emb_resp).item()

print(f"Relativity Score: {score:.4f}")

if score > 0.6:
    print("‚úÖ The model is staying on topic.")
else:
    print("‚ùå The model is talking about something else (High risk of hallucination).")

üîç Question: What was Mazars LLP's total turnover and how many partners did they have in 2011?
ü§ñ AI Response: 
You are an expert auditor provided with specific internal documents.
Use the following pieces of context to write a section of the audit report.
If the answer is not in the context, say you don't hav...

--------------------------------------------------
üéØ CALULCATING RELATIVITY (Cosine Similarity)
--------------------------------------------------
Relativity Score: 0.7767
‚úÖ The model is staying on topic.


## 6. LLM-as-a-Judge: Evaluate Response Accuracy

This section uses a separate LLM (or the same model) to judge whether the generated responses are grounded in the provided enterprise data or if the model is hallucinating.

In [38]:
!pip install -q -U google-genai

[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m53.1/53.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m724.7/724.7 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h

In [None]:
from google import genai
from google.genai import types
import json
import re
from typing import List, Dict, Any

# 1. Setup API Key and Client
GEMINI_API_KEY = "Your-Api-Key"  

# Initialize the new Client
client = genai.Client(api_key=GEMINI_API_KEY)

# Using the requested model
MODEL_ID = "gemini-2.5-flash"

def extract_json(text: str) -> Dict[str, Any]:
    """Helper to safely get JSON from markdown response"""
    # Clean up markdown code blocks if present
    text = re.sub(r"```(json)?", "", text)
    text = text.replace("```", "").strip()
    
    try:
        # Try direct load
        return json.loads(text)
    except json.JSONDecodeError:
        # Fallback: Try to find { ... } block
        match = re.search(r'\{.*\}', text, re.DOTALL)
        if match:
            try:
                return json.loads(match.group(0))
            except json.JSONDecodeError:
                pass
        return None

def llm_as_judge_evaluation(
    relevant_data: List[str],
    query: str,
    llm_response: str
) -> Dict[str, Any]:
    
    full_context = "\n".join([f"- {item}" for item in relevant_data])
    
    # Prompt explicitly asking for JSON
    prompt = f"""
    You are an expert Audit Fact-Checker.
    VERIFY if the AI Response is FULLY supported by the Provided Data.
    
    Provided Data:
    {full_context}
    
    User Query: {query}
    AI Response: {llm_response}
    
    Task:
    1. Identify any facts in the AI Response NOT present in Provided Data (Hallucinations).
    2. Respond with ONLY a JSON object.
    
    Format:
    {{
        "faithfulness_score": <int 0-100>,
        "hallucination_detected": <bool>,
        "hallucinated_facts": [<list of strings>],
        "verdict": "<short explanation>"
    }}
    """
    
    try:
        print(f"üîÑ Calling Gemini SDK ({MODEL_ID})...")
        
        # New SDK Call Structure
        response = client.models.generate_content(
            model=MODEL_ID,
            contents=prompt,
            config=types.GenerateContentConfig(
                temperature=0.0,
                response_mime_type="application/json"
            )
        )
        
        # Access text directly from the response object
        # Note: The new SDK sometimes returns response.text or you access candidates
        result_text = response.text 
        
        judgment = extract_json(result_text)
        
        if judgment:
            print("‚úÖ Evaluation Complete!")
            return judgment
        else:
            raise ValueError(f"Empty or invalid JSON response. Raw text: {result_text}")

    except Exception as e:
        print(f"‚ùå Gemini SDK Error: {e}")
        return {
            "faithfulness_score": 0,
            "hallucination_detected": True,
            "hallucinated_facts": [f"Error: {str(e)}"],
            "verdict": "Evaluation Failed"
        }

# --- OPTIMIZED TEST: Single Query with Focused Data ---
print("="*70)
print("üîç FOCUSED HALLUCINATION DETECTION TEST (Gemini 2.5 Flash)")
print("="*70)

# Select ONLY relevant data for this specific test
test_query = "What was Mazars LLP's total turnover and how many partners did they have in 2011?"

# Manually select only the relevant facts (simulating what RAG retriever would return)
relevant_facts = [
    "The firm's turnover for the year ended 31 August 2011 was ¬£109.1 million.",
    "¬£44.3 million of turnover related to audit and assurance services.",
    "The firm had 108 partners, of whom 53 were authorized to sign audit reports, plus four audit directors.",
    "Total UK offices: 15.",
    "Audit revenue percentage of total turnover: approximately 40.6 percent."
]

print(f"\nüìã QUERY: {test_query}")
print(f"\nüìä RELEVANT DATA SENT TO MODEL ({len(relevant_facts)} facts):")
for i, fact in enumerate(relevant_facts, 1):
    print(f"  {i}. {fact}")

# Check if qa_chain is defined before invoking
if 'qa_chain' in globals():
    try:
        # Get model response
        result = qa_chain.invoke(test_query)
        model_response = result['result']

        print(f"\nü§ñ MODEL'S RESPONSE:")
        print(model_response)

        # Judge evaluation
        print("\n" + "-"*70)
        print("‚öñÔ∏è GEMINI HALLUCINATION CHECK:")
        print("-"*70)

        judgment = llm_as_judge_evaluation(
            relevant_data=relevant_facts,  
            query=test_query,
            llm_response=model_response
        )

        print(json.dumps(judgment, indent=2))

        # Clear verdict
        if judgment.get('hallucination_detected'):
            print("\n‚ùå HALLUCINATION DETECTED!")
            print("\nüö® Facts NOT in the provided data:")
            for fact in judgment.get('hallucinated_facts', []):
                print(f"  ‚Ä¢ {fact}")
        else:
            print("\n‚úÖ ALL FACTS ARE GROUNDED IN PROVIDED DATA")

        print(f"\nüìä Faithfulness Score: {judgment.get('faithfulness_score', 0)}/100")
            
    except Exception as e:
        print(f"\n‚ùå Error during RAG execution: {e}")
else:
    print("\n‚ö†Ô∏è 'qa_chain' is not defined. Please run the RAG Setup cells above first.")

üîç FOCUSED HALLUCINATION DETECTION TEST (Gemini 2.5 Flash)

üìã QUERY: What was Mazars LLP's total turnover and how many partners did they have in 2011?

üìä RELEVANT DATA SENT TO MODEL (5 facts):
  1. The firm's turnover for the year ended 31 August 2011 was ¬£109.1 million.
  2. ¬£44.3 million of turnover related to audit and assurance services.
  3. The firm had 108 partners, of whom 53 were authorized to sign audit reports, plus four audit directors.
  4. Total UK offices: 15.
  5. Audit revenue percentage of total turnover: approximately 40.6 percent.

ü§ñ MODEL'S RESPONSE:

You are an expert auditor provided with specific internal documents.
Use the following pieces of context to write a section of the audit report.
If the answer is not in the context, say you don't have enough information.

Context:
Mazars LLP had 15 UK offices as of 31 August 2011.

Mazars LLP had 15 UK offices as of 31 August 2011.

Mazars LLP is a UK Limited Liability Partnership and part of the Mazars i