# Module 771764 – MSc Research Project

## StructureGPT: Multi-Model Retrieval-Augmented Generation System for UK Building Regulations using Low-Rank Adaptation and Quantization
### Student ID: 202403820 | Samuel Datubo Jaja
### MSc Artificial Intelligence & Data Science | DAIM - Data-Science Artificial Intelligence & Modelling

# Notebook 2 - GOV.UK FAQ Data Curation & Fine-tuning (LoRA & Quantization) 

## Data Curation 

In [1]:
#Using gpt-4o mini @ $0.15/$0.6 per 1,000 tokens (input/output) instead of GPT-3.5 Turbo: $0.5/$1.5 per 1,000 tokens (input/output) for better comprehension
#A hybrid dataset of 3,000 instruction-style Q&A pairs was curated from UK Building Regulations. 
#Pre-existing GOV.UK FAQs were used where available (10 PDFs), while sections lacking sufficient coverage were extended using GOV.UK PDF content.
#For superior comprehension GPT-4o-mini was prompted to help convert regulation text into structured Q&A format. The dataset was balanced across sections and aligned with instruction pair formatting. 

In [None]:
#Install required packages: OpenAI API client, PDF reader, and progress bar   
!pip install openai pypdf tqdm

Defaulting to user installation because normal site-packages is not writeable
Collecting openai
  Downloading openai-1.65.5-py3-none-any.whl.metadata (27 kB)
Collecting pypdf
  Downloading pypdf-5.3.1-py3-none-any.whl.metadata (7.3 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.8.2-cp312-cp312-win_amd64.whl.metadata (5.3 kB)
Downloading openai-1.65.5-py3-none-any.whl (474 kB)
   ---------------------------------------- 0.0/474.5 kB ? eta -:--:--
   --- ----------------------------------- 41.0/474.5 kB 991.0 kB/s eta 0:00:01
   ---------- ----------------------------- 122.9/474.5 kB 2.4 MB/s eta 0:00:01
   ------------------------- -------------- 307.2/474.5 kB 3.2 MB/s eta 0:00:01
   ---------------------------------------- 474.5/474.5 kB 3.7 MB/s eta 0:00:00
Downloading pypdf-5.3.1-py3-none-any.whl (302 kB)
   ---------------------------------------- 0.0/302.0 kB


[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import os
import json
import openai
import time
import re
from pypdf import PdfReader
from tqdm import tqdm
from collections import defaultdict

#OpenAI API key from environment variable
openai.api_key = os.environ.get("OPENAI_API_KEY") 

In [None]:
#Function to extract text from PDFs (with basic encryption handling)
def extract_text_from_pdf(pdf_path):
    """
    Extracts text from a PDF file, handling encrypted files by attempting decryption.

    Args:
        pdf_path (str): Path to the PDF file.

    Returns:
        str: Extracted text from the PDF, or an empty string if extraction fails.
    """
    try:
        reader = PdfReader(pdf_path)

        # Check if PDF is encrypted and try to decrypt with empty password
        if reader.is_encrypted:
            print(f"PDF is encrypted, attempting to unlock with empty password...")
            success = reader.decrypt("")
            if not success:
                print(f"Could not decrypt {pdf_path}. You may need to provide a password.")
                password = input(f"Enter password for {os.path.basename(pdf_path)} (or press Enter to skip): ")
                if password:
                    success = reader.decrypt(password)
                    if not success:
                        print(f"Incorrect password. Skipping {pdf_path}")
                        return ""
                else:
                    return ""

        text = ""
        for page in reader.pages:
            text += page.extract_text() + "\n"
        return text
    except Exception as e:
        print(f"Error extracting text from {pdf_path}: {e}")
        return ""

In [None]:
# Function to identify building regulation section from filename or content
def identify_section(pdf_path, pdf_text):
    """
    Identifies the building regulation section (e.g., Part A, B, etc.) from the PDF filename or content.

    Args:
        pdf_path (str): Path to the PDF file.
        pdf_text (str): Extracted text from the PDF.

    Returns:
        str: Identified section letter (e.g., 'A', 'B'), or 'Unknown' if not identifiable.
    """
    filename = os.path.basename(pdf_path).lower()

    #Try to find part in filename (Part_A, PartB, part-c, etc.)
    part_match = re.search(r'part[_\-\s]?([a-zA-Z])', filename, re.IGNORECASE)
    if part_match:
        return part_match.group(1).upper()

    #Try to find in PDF content
    content_match = re.search(r'part\s+([a-zA-Z])', pdf_text[:5000], re.IGNORECASE)
    if content_match:
        return content_match.group(1).upper()

    #Checking for specific keywords in filename
    if "structure" in filename:
        return "A"
    elif "fire" in filename:
        return "B"
    elif "site" in filename or "contaminant" in filename:
        return "C"
    elif "toxic" in filename:
        return "D"
    elif "sound" in filename or "acoustic" in filename:
        return "E"
    elif "ventilation" in filename:
        return "F"
    elif "sanitation" in filename or "drainage" in filename:
        return "H"
    elif "glazing" in filename:
        return "N"

    # Ask user to identify if automated detection fails
    print(f"Could not automatically identify the section for {filename}")
    section = input("Please enter the section (A, B, C, etc.) or press Enter to mark as 'Unknown': ")
    return section.upper() if section.strip() else "Unknown"

In [None]:
# Function to split text into chunks that preserve paragraphs
def split_text(text, max_chunk_size=7500):
    """
    Splits the input text into chunks, preserving paragraph boundaries and ensuring each chunk does not exceed the maximum size.

    Args:
        text (str): The text to split.
        max_chunk_size (int, optional): Maximum size of each chunk. Defaults to 7500.

    Returns:
        list: List of text chunks.
    """
    # Split by paragraphs
    paragraphs = re.split(r'\n\s*\n', text)
    chunks = []
    current_chunk = []
    current_size = 0

    for paragraph in paragraphs:
        paragraph_size = len(paragraph)

        # If adding this paragraph would exceed the limit, save current chunk and start a new one
        if current_size + paragraph_size > max_chunk_size and current_chunk:
            chunks.append("\n\n".join(current_chunk))
            current_chunk = []
            current_size = 0

        # If a single paragraph is too large, split it by sentences
        if paragraph_size > max_chunk_size:
            sentences = re.split(r'(?<=[.!?])\s+', paragraph)
            sentence_chunk = []
            sentence_size = 0

            for sentence in sentences:
                if sentence_size + len(sentence) > max_chunk_size and sentence_chunk:
                    chunks.append(" ".join(sentence_chunk))
                    sentence_chunk = []
                    sentence_size = 0

                sentence_chunk.append(sentence)
                sentence_size += len(sentence) + 1  # +1 for space

            if sentence_chunk:
                current_chunk.append(" ".join(sentence_chunk))
                current_size += sentence_size
        else:
            current_chunk.append(paragraph)
            current_size += paragraph_size + 2  # +2 for newlines

    # Add the last chunk if it's not empty
    if current_chunk:
        chunks.append("\n\n".join(current_chunk))

    return chunks


In [None]:
# Function to generate Q&A pairs using GPT-4o mini with retry logic
def generate_qa_pairs(text_chunk, section, num_pairs=5, retries=3, base_delay=5):
    """
    Generates question-answer pairs from a text chunk using OpenAI's GPT-4o-mini model.

    Args:
        text_chunk (str): The text chunk to generate Q&A pairs from.
        section (str): The building regulation section associated with the text.
        num_pairs (int, optional): Number of Q&A pairs to generate. Defaults to 5.
        retries (int, optional): Number of retry attempts in case of failure. Defaults to 3.
        base_delay (int, optional): Base delay in seconds between retries. Defaults to 5.

    Returns:
        list: List of generated Q&A pairs.
    """
    for attempt in range(retries):
        try:
            response = openai.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": "You are a helpful assistant that creates training data for fine-tuning a model on UK building regulations."},
                    {"role": "user", "content": f"""
                    Create {num_pairs} detailed question-answer pairs from this UK building regulations Part {section} text.

                    Format your response as a JSON array of objects with these exact fields:
                    - "instruction": "Answer this question about UK building regulations Part {section}"
                    - "input": The question
                    - "output": The detailed answer

                    Make sure:
                    1. Questions are diverse and cover different aspects of Part {section}
                    2. Questions focus on technical details, measurements, requirements, compliance
                    3. Answers are detailed and accurate based on the text
                    4. Each answer includes specific references to regulations where possible

                    ONLY RETURN VALID JSON. Don't add any explanations before or after.

                    TEXT:
                    {text_chunk}
                    """}
                ],
                temperature=0.4,
            )

            #Parse the response
            content = response.choices[0].message.content.strip()

            #Extract JSON content (handles different formats)
            json_content = content
            if "```json" in content:
                json_content = content.split("```json")[1].split("```")[0].strip()
            elif "```" in content:
                json_content = content.split("```")[1].split("```")[0].strip()

            # Try to parse JSON
            try:
                qa_pairs = json.loads(json_content)
                return qa_pairs
            except json.JSONDecodeError as e:
                print(f"JSON parsing error: {e}")
                print(f"Attempting to clean JSON and retry...")

                # Try to fix common JSON issues
                json_content = json_content.replace("'", '"')  #Replacing single quotes with double quotes
                json_content = re.sub(r',\s*}', '}', json_content)  # Remove trailing commas
                json_content = re.sub(r',\s*]', ']', json_content)  # Remove trailing commas in arrays

                try:
                    qa_pairs = json.loads(json_content)
                    return qa_pairs
                except:
                    print("Failed to parse JSON even after cleaning")

        except Exception as e:
            delay = base_delay * (2 ** attempt)  # Exponential backoff
            print(f"Error on attempt {attempt+1}/{retries}: {e}")
            print(f"Retrying in {delay} seconds...")
            time.sleep(delay)

    print("All retries failed. Returning empty list.")
    return []


In [None]:
# Main function to process PDFs and generate training data by section
def create_training_data_by_section(pdf_directory, output_file="construction_training_data.jsonl", questions_per_section=200):
    """
    Processes PDFs in the specified directory to generate training data by building regulation section.

    Args:
        pdf_directory (str): Path to the directory containing PDF files.
        output_file (str, optional): Path to the output JSONL file. Defaults to "construction_training_data.jsonl".
        questions_per_section (int, optional): Number of questions to generate per section. Defaults to 200.

    Returns:
        tuple: A tuple containing:
            - all_qa_pairs (list): List of all generated Q&A pairs.
            - section_qa_files (dict): Dictionary mapping sections to their respective output files.
    """
    pdf_files = [f for f in os.listdir(pdf_directory) if f.endswith('.pdf')]
    print(f"Found {len(pdf_files)} PDF files to process")

    # First pass: Extract text and identify sections
    section_pdfs = defaultdict(list)
    pdf_sections = {}
    pdf_texts = {}

    print("Step 1: Extracting text and identifying sections...")
    for pdf_file in tqdm(pdf_files, desc="Analyzing PDFs"):
        pdf_path = os.path.join(pdf_directory, pdf_file)

        # Extract text
        text = extract_text_from_pdf(pdf_path)
        if not text:
            print(f"  Skipping {pdf_file} - no text extracted")
            continue

        # Identify section
        section = identify_section(pdf_path, text)
        pdf_sections[pdf_file] = section
        pdf_texts[pdf_file] = text
        section_pdfs[section].append(pdf_file)

    print("\nIdentified sections:")
    for section, pdfs in section_pdfs.items():
        print(f"  Section {section}: {len(pdfs)} PDFs")

    # Second pass: Process each section to generate exactly N questions
    all_qa_pairs = []
    section_qa_files = {}

    print("\nStep 2: Generating QA pairs by section...")
    for section, pdfs in section_pdfs.items():
        print(f"\nProcessing Section {section} ({len(pdfs)} PDFs)...")

        # Create a list of all chunks for this section
        section_chunks = []
        pdf_chunk_map = {}  # Maps chunk index to pdf filename

        for pdf_file in pdfs:
            chunks = split_text(pdf_texts[pdf_file])
            for chunk in chunks:
                section_chunks.append(chunk)
                pdf_chunk_map[len(section_chunks) - 1] = pdf_file

        print(f"  Total chunks for Section {section}: {len(section_chunks)}")

        # Calculate questions per chunk
        total_chunks = len(section_chunks)
        if total_chunks == 0:
            print(f"  No content for Section {section}. Skipping.")
            continue

        base_questions_per_chunk = questions_per_section // total_chunks
        extra_questions = questions_per_section % total_chunks

        # Generate QA pairs for each chunk
        section_pairs = []
        questions_generated = 0

        print(f"  Target: {questions_per_section} questions ({base_questions_per_chunk} per chunk + {extra_questions} extra)")

        for i, chunk in enumerate(tqdm(section_chunks, desc=f"Generating QA for Section {section}")):
            # Calculate how many questions to generate from this chunk
            questions_needed = base_questions_per_chunk
            if i < extra_questions:
                questions_needed += 1

            # If we're near the end, adjust to hit exactly our target
            remaining_chunks = total_chunks - i
            questions_remaining = questions_per_section - questions_generated
            if remaining_chunks > 0:
                questions_needed = min(questions_needed, questions_remaining // remaining_chunks)
                if i == total_chunks - 1:  # Last chunk
                    questions_needed = questions_remaining

            # Skip if no questions needed
            if questions_needed <= 0:
                continue

            pdf_file = pdf_chunk_map[i]
            print(f"\n  Chunk {i+1}/{total_chunks} from {pdf_file} ({len(chunk)} chars)")
            print(f"  Generating {questions_needed} questions (total so far: {questions_generated}/{questions_per_section})")

            pairs = generate_qa_pairs(chunk, section, num_pairs=questions_needed)

            # Add metadata
            for pair in pairs:
                pair["source"] = pdf_file
                pair["section"] = section
                pair["instruction"] = f"Answer this question about UK building regulations Part {section}"

            # Track progress
            questions_generated += len(pairs)
            section_pairs.extend(pairs)
            print(f"  Generated {len(pairs)} pairs. Running total: {questions_generated}/{questions_per_section}")

            # Save intermediate results
            section_output = f"section_{section}_qa.jsonl"
            with open(section_output, 'w', encoding='utf-8') as f:
                for pair in section_pairs:
                    f.write(json.dumps(pair, ensure_ascii=False) + '\n')

            section_qa_files[section] = section_output

            # Add adaptive delay based on token usage
            delay = 5 + (len(chunk) // 5000)  # 5 seconds base + 1 second per 5000 chars
            if i < total_chunks - 1:  # Don't delay after the last chunk
                print(f"  Waiting {delay}s before next chunk...")
                time.sleep(delay)

        # Add to overall results
        all_qa_pairs.extend(section_pairs)
        print(f"\nSection {section} complete: {len(section_pairs)} QA pairs")

    # Save all results to main output file
    with open(output_file, 'w', encoding='utf-8') as f:
        for pair in all_qa_pairs:
            f.write(json.dumps(pair, ensure_ascii=False) + '\n')

    print(f"\nAll sections processed.")
    print(f"Total generated: {len(all_qa_pairs)} QA pairs across {len(section_pdfs)} sections")
    print(f"Final results saved to {output_file}")

    # Print summary of each section
    print("\nSummary by section:")
    for section in sorted(section_pdfs.keys()):
        section_count = len([p for p in all_qa_pairs if p.get("section") == section])
        print(f"  Section {section}: {section_count} questions")

    return all_qa_pairs, section_qa_files

In [None]:
if __name__ == "__main__":
    #Get OpenAI API key if not set
    if not openai.api_key or openai.api_key == "your_api_key_here":
        openai.api_key = input("Enter your OpenAI API key: ")

    #Get PDF directory
    pdf_directory = input("Enter the directory containing PDF files (default: ./uk_construction_bot/data/raw/documents): ")
    if not pdf_directory:
        pdf_directory = "./uk_construction_bot/data/raw/documents"

    #Ensure directory exists
    if not os.path.exists(pdf_directory):
        print(f"Directory {pdf_directory} not found.")
        exit(1)

    #Get number of questions per section
    questions_input = input("How many questions per section? (default: 200): ")
    questions_per_section = int(questions_input) if questions_input.strip() else 200

    #Run the process
    qa_pairs, section_files = create_training_data_by_section(
        pdf_directory,
        questions_per_section=questions_per_section
    )

    #Preview examples by section
    print("\nSample QA pairs by section:")
    for section in sorted(section_files.keys()):
        section_pairs = [p for p in qa_pairs if p.get("section") == section]
        if section_pairs:
            print(f"\nSection {section} examples:")
            for i, pair in enumerate(section_pairs[:2]):
                print(f"  Example {i+1} (from {pair.get('source', 'unknown')}):")
                print(f"  Q: {pair['input']}")
                print(f"  A: {pair['output'][:150]}..." if len(pair['output']) > 150 else f"  A: {pair['output']}")

Enter your OpenAI API key:  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
Enter the directory containing PDF files (default: ./uk_construction_bot/data/raw/documents):  
How many questions per section? (default: 200):  


Found 28 PDF files to process
Step 1: Extracting text and identifying sections...


Analyzing PDFs:   4%|██▍                                                                | 1/28 [00:00<00:15,  1.74it/s]

PDF is encrypted, attempting to unlock with empty password...


Analyzing PDFs:   7%|████▊                                                              | 2/28 [00:01<00:12,  2.04it/s]

PDF is encrypted, attempting to unlock with empty password...


Analyzing PDFs:  14%|█████████▌                                                         | 4/28 [00:02<00:16,  1.46it/s]

PDF is encrypted, attempting to unlock with empty password...


Analyzing PDFs:  21%|██████████████▎                                                    | 6/28 [00:17<01:44,  4.76s/it]

PDF is encrypted, attempting to unlock with empty password...


Analyzing PDFs:  25%|████████████████▊                                                  | 7/28 [00:24<01:52,  5.38s/it]

PDF is encrypted, attempting to unlock with empty password...


Analyzing PDFs:  29%|███████████████████▏                                               | 8/28 [00:24<01:14,  3.74s/it]

PDF is encrypted, attempting to unlock with empty password...


Analyzing PDFs:  39%|█████████████████████████▉                                        | 11/28 [00:38<01:01,  3.63s/it]

PDF is encrypted, attempting to unlock with empty password...


Analyzing PDFs:  46%|██████████████████████████████▋                                   | 13/28 [00:53<01:31,  6.09s/it]

PDF is encrypted, attempting to unlock with empty password...


Analyzing PDFs: 100%|██████████████████████████████████████████████████████████████████| 28/28 [01:19<00:00,  2.83s/it]



Identified sections:
  Section A: 2 PDFs
  Section P: 1 PDFs
  Section O: 5 PDFs
  Section T: 2 PDFs
  Section B: 2 PDFs
  Section D: 1 PDFs
  Section E: 2 PDFs
  Section F: 2 PDFs
  Section G: 2 PDFs
  Section H: 1 PDFs
  Section J: 1 PDFs
  Section K: 1 PDFs
  Section L: 1 PDFs
  Section S: 2 PDFs
  Section C: 1 PDFs
  Section M: 1 PDFs
  Section Q: 1 PDFs

Step 2: Generating QA pairs by section...

Processing Section A (2 PDFs)...
  Total chunks for Section A: 7
  Target: 200 questions (28 per chunk + 4 extra)


Generating QA for Section A:   0%|                                                               | 0/7 [00:00<?, ?it/s]


  Chunk 1/7 from approved-document-R_Infrastructure_Electronic_communications.pdf (7398 chars)
  Generating 28 questions (total so far: 0/200)
  Generated 24 pairs. Running total: 24/200
  Waiting 6s before next chunk...


Generating QA for Section A:  14%|███████▊                                               | 1/7 [00:31<03:11, 31.95s/it]


  Chunk 2/7 from approved-document-R_Infrastructure_Electronic_communications.pdf (7127 chars)
  Generating 29 questions (total so far: 24/200)
  Generated 24 pairs. Running total: 48/200
  Waiting 6s before next chunk...


Generating QA for Section A:  29%|███████████████▋                                       | 2/7 [01:19<03:24, 40.90s/it]


  Chunk 3/7 from approved-document-R_Infrastructure_Electronic_communications.pdf (7363 chars)
  Generating 29 questions (total so far: 48/200)
  Generated 24 pairs. Running total: 72/200
  Waiting 6s before next chunk...


Generating QA for Section A:  43%|███████████████████████▌                               | 3/7 [01:55<02:34, 38.69s/it]


  Chunk 4/7 from approved-document-R_Infrastructure_Electronic_communications.pdf (430 chars)
  Generating 29 questions (total so far: 72/200)
  Generated 24 pairs. Running total: 96/200
  Waiting 5s before next chunk...


Generating QA for Section A:  57%|███████████████████████████████▍                       | 4/7 [02:28<01:49, 36.66s/it]


  Chunk 5/7 from approved-document-R_Infrastructure_Electronic_communications.pdf (7307 chars)
  Generating 28 questions (total so far: 96/200)
  Generated 22 pairs. Running total: 118/200
  Waiting 6s before next chunk...


Generating QA for Section A:  71%|███████████████████████████████████████▎               | 5/7 [03:05<01:13, 36.84s/it]


  Chunk 6/7 from approved-document-R_Infrastructure_Electronic_communications.pdf (512 chars)
  Generating 28 questions (total so far: 118/200)
  Generated 0 pairs. Running total: 118/200
  Waiting 5s before next chunk...


Generating QA for Section A:  86%|███████████████████████████████████████████████▏       | 6/7 [03:11<00:26, 26.28s/it]


  Chunk 7/7 from Part_A_FAQ_approved_document_A_Structure.pdf (5767 chars)
  Generating 82 questions (total so far: 118/200)


Generating QA for Section A: 100%|███████████████████████████████████████████████████████| 7/7 [05:14<00:00, 44.93s/it]


  Generated 84 pairs. Running total: 202/200

Section A complete: 202 QA pairs

Processing Section P (1 PDFs)...
  Total chunks for Section P: 6
  Target: 200 questions (33 per chunk + 2 extra)


Generating QA for Section P:   0%|                                                               | 0/6 [00:00<?, ?it/s]


  Chunk 1/6 from approved-document_P_Electrical_Safety.pdf (7460 chars)
  Generating 33 questions (total so far: 0/200)
  Generated 27 pairs. Running total: 27/200
  Waiting 6s before next chunk...


Generating QA for Section P:  17%|█████████▏                                             | 1/6 [00:39<03:19, 39.95s/it]


  Chunk 2/6 from approved-document_P_Electrical_Safety.pdf (4613 chars)
  Generating 34 questions (total so far: 27/200)
  Generated 27 pairs. Running total: 54/200
  Waiting 5s before next chunk...


Generating QA for Section P:  33%|██████████████████▎                                    | 2/6 [01:20<02:40, 40.10s/it]


  Chunk 3/6 from approved-document_P_Electrical_Safety.pdf (5973 chars)
  Generating 33 questions (total so far: 54/200)
  Generated 25 pairs. Running total: 79/200
  Waiting 6s before next chunk...


Generating QA for Section P:  50%|███████████████████████████▌                           | 3/6 [02:05<02:07, 42.50s/it]


  Chunk 4/6 from approved-document_P_Electrical_Safety.pdf (5710 chars)
  Generating 33 questions (total so far: 79/200)
  Generated 28 pairs. Running total: 107/200
  Waiting 6s before next chunk...


Generating QA for Section P:  67%|████████████████████████████████████▋                  | 4/6 [02:41<01:19, 39.86s/it]


  Chunk 5/6 from approved-document_P_Electrical_Safety.pdf (6421 chars)
  Generating 33 questions (total so far: 107/200)
  Generated 28 pairs. Running total: 135/200
  Waiting 6s before next chunk...


Generating QA for Section P:  83%|█████████████████████████████████████████████▊         | 5/6 [03:21<00:39, 40.00s/it]


  Chunk 6/6 from approved-document_P_Electrical_Safety.pdf (5283 chars)
  Generating 65 questions (total so far: 135/200)


Generating QA for Section P: 100%|███████████████████████████████████████████████████████| 6/6 [04:21<00:00, 43.55s/it]


  Generated 43 pairs. Running total: 178/200

Section P complete: 178 QA pairs

Processing Section O (5 PDFs)...
  Total chunks for Section O: 63
  Target: 200 questions (3 per chunk + 11 extra)


Generating QA for Section O:   0%|                                                              | 0/63 [00:00<?, ?it/s]


  Chunk 1/63 from approved-document_Q_Security in Dwellings.pdf (5280 chars)
  Generating 3 questions (total so far: 0/200)
  Generated 3 pairs. Running total: 3/200
  Waiting 6s before next chunk...


Generating QA for Section O:   2%|▊                                                     | 1/63 [00:11<11:42, 11.34s/it]


  Chunk 2/63 from approved-document_Q_Security in Dwellings.pdf (7099 chars)
  Generating 3 questions (total so far: 3/200)
  Generated 3 pairs. Running total: 6/200
  Waiting 6s before next chunk...


Generating QA for Section O:   3%|█▋                                                    | 2/63 [00:24<12:47, 12.58s/it]


  Chunk 3/63 from approved-document_Q_Security in Dwellings.pdf (1786 chars)
  Generating 3 questions (total so far: 6/200)
  Generated 3 pairs. Running total: 9/200
  Waiting 5s before next chunk...


Generating QA for Section O:   5%|██▌                                                   | 3/63 [00:35<11:42, 11.70s/it]


  Chunk 4/63 from approved-document_Q_Security in Dwellings.pdf (7390 chars)
  Generating 3 questions (total so far: 9/200)
  Generated 3 pairs. Running total: 12/200
  Waiting 6s before next chunk...


Generating QA for Section O:   6%|███▍                                                  | 4/63 [00:50<12:55, 13.14s/it]


  Chunk 5/63 from approved-document_Q_Security in Dwellings.pdf (7155 chars)
  Generating 3 questions (total so far: 12/200)
  Generated 3 pairs. Running total: 15/200
  Waiting 6s before next chunk...


Generating QA for Section O:   8%|████▎                                                 | 5/63 [01:14<16:31, 17.10s/it]


  Chunk 6/63 from approved-document_Q_Security in Dwellings.pdf (2599 chars)
  Generating 3 questions (total so far: 15/200)
  Generated 3 pairs. Running total: 18/200
  Waiting 5s before next chunk...


Generating QA for Section O:  10%|█████▏                                                | 6/63 [01:27<14:38, 15.41s/it]


  Chunk 7/63 from approved_document_A_Structure.pdf (7467 chars)
  Generating 3 questions (total so far: 18/200)
  Generated 3 pairs. Running total: 21/200
  Waiting 6s before next chunk...


Generating QA for Section O:  11%|██████                                                | 7/63 [01:39<13:23, 14.34s/it]


  Chunk 8/63 from approved_document_A_Structure.pdf (5459 chars)
  Generating 3 questions (total so far: 21/200)
  Generated 3 pairs. Running total: 24/200
  Waiting 6s before next chunk...


Generating QA for Section O:  13%|██████▊                                               | 8/63 [01:50<12:24, 13.53s/it]


  Chunk 9/63 from approved_document_A_Structure.pdf (7071 chars)
  Generating 3 questions (total so far: 24/200)
  Generated 3 pairs. Running total: 27/200
  Waiting 6s before next chunk...


Generating QA for Section O:  14%|███████▋                                              | 9/63 [02:03<11:49, 13.14s/it]


  Chunk 10/63 from approved_document_A_Structure.pdf (8633 chars)
  Generating 3 questions (total so far: 27/200)
  Generated 3 pairs. Running total: 30/200
  Waiting 6s before next chunk...


Generating QA for Section O:  16%|████████▍                                            | 10/63 [02:15<11:15, 12.74s/it]


  Chunk 11/63 from approved_document_A_Structure.pdf (3318 chars)
  Generating 3 questions (total so far: 30/200)
  Generated 3 pairs. Running total: 33/200
  Waiting 5s before next chunk...


Generating QA for Section O:  17%|█████████▎                                           | 11/63 [02:25<10:32, 12.16s/it]


  Chunk 12/63 from approved_document_A_Structure.pdf (6764 chars)
  Generating 3 questions (total so far: 33/200)
  Generated 3 pairs. Running total: 36/200
  Waiting 6s before next chunk...


Generating QA for Section O:  19%|██████████                                           | 12/63 [02:42<11:35, 13.65s/it]


  Chunk 13/63 from approved_document_A_Structure.pdf (7487 chars)
  Generating 3 questions (total so far: 36/200)
  Generated 3 pairs. Running total: 39/200
  Waiting 6s before next chunk...


Generating QA for Section O:  21%|██████████▉                                          | 13/63 [02:54<10:56, 13.13s/it]


  Chunk 14/63 from approved_document_A_Structure.pdf (7407 chars)
  Generating 3 questions (total so far: 39/200)
  Generated 3 pairs. Running total: 42/200
  Waiting 6s before next chunk...


Generating QA for Section O:  22%|███████████▊                                         | 14/63 [03:05<10:05, 12.36s/it]


  Chunk 15/63 from approved_document_A_Structure.pdf (7018 chars)
  Generating 3 questions (total so far: 42/200)
  Generated 3 pairs. Running total: 45/200
  Waiting 6s before next chunk...


Generating QA for Section O:  24%|████████████▌                                        | 15/63 [03:18<10:06, 12.64s/it]


  Chunk 16/63 from approved_document_A_Structure.pdf (6424 chars)
  Generating 3 questions (total so far: 45/200)
  Generated 3 pairs. Running total: 48/200
  Waiting 6s before next chunk...


Generating QA for Section O:  25%|█████████████▍                                       | 16/63 [03:32<10:02, 12.82s/it]


  Chunk 17/63 from approved_document_A_Structure.pdf (7040 chars)
  Generating 3 questions (total so far: 48/200)
  Generated 3 pairs. Running total: 51/200
  Waiting 6s before next chunk...


Generating QA for Section O:  27%|██████████████▎                                      | 17/63 [03:45<09:59, 13.03s/it]


  Chunk 18/63 from approved_document_A_Structure.pdf (7019 chars)
  Generating 3 questions (total so far: 51/200)
  Generated 3 pairs. Running total: 54/200
  Waiting 6s before next chunk...


Generating QA for Section O:  29%|███████████████▏                                     | 18/63 [03:59<10:03, 13.41s/it]


  Chunk 19/63 from approved_document_A_Structure.pdf (4396 chars)
  Generating 3 questions (total so far: 54/200)
  Generated 3 pairs. Running total: 57/200
  Waiting 5s before next chunk...


Generating QA for Section O:  30%|███████████████▉                                     | 19/63 [04:10<09:19, 12.71s/it]


  Chunk 20/63 from approved_document_A_Structure.pdf (7309 chars)
  Generating 3 questions (total so far: 57/200)
  Generated 3 pairs. Running total: 60/200
  Waiting 6s before next chunk...


Generating QA for Section O:  32%|████████████████▊                                    | 20/63 [04:22<08:57, 12.50s/it]


  Chunk 21/63 from approved_document_A_Structure.pdf (4688 chars)
  Generating 3 questions (total so far: 60/200)
  Generated 3 pairs. Running total: 63/200
  Waiting 5s before next chunk...


Generating QA for Section O:  33%|█████████████████▋                                   | 21/63 [04:40<09:48, 14.01s/it]


  Chunk 22/63 from approved_document_A_Structure.pdf (7493 chars)
  Generating 3 questions (total so far: 63/200)
  Generated 3 pairs. Running total: 66/200
  Waiting 6s before next chunk...


Generating QA for Section O:  35%|██████████████████▌                                  | 22/63 [04:52<09:12, 13.49s/it]


  Chunk 23/63 from approved_document_A_Structure.pdf (5706 chars)
  Generating 3 questions (total so far: 66/200)
  Generated 3 pairs. Running total: 69/200
  Waiting 6s before next chunk...


Generating QA for Section O:  37%|███████████████████▎                                 | 23/63 [05:11<10:07, 15.20s/it]


  Chunk 24/63 from approved_document_A_Structure.pdf (3803 chars)
  Generating 3 questions (total so far: 69/200)
  Generated 3 pairs. Running total: 72/200
  Waiting 5s before next chunk...


Generating QA for Section O:  38%|████████████████████▏                                | 24/63 [05:21<08:42, 13.39s/it]


  Chunk 25/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (6673 chars)
  Generating 3 questions (total so far: 72/200)
  Generated 3 pairs. Running total: 75/200
  Waiting 6s before next chunk...


Generating QA for Section O:  40%|█████████████████████                                | 25/63 [05:33<08:13, 12.97s/it]


  Chunk 26/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (2929 chars)
  Generating 3 questions (total so far: 75/200)
  Generated 3 pairs. Running total: 78/200
  Waiting 5s before next chunk...


Generating QA for Section O:  41%|█████████████████████▊                               | 26/63 [05:48<08:21, 13.56s/it]


  Chunk 27/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (4633 chars)
  Generating 3 questions (total so far: 78/200)
  Generated 3 pairs. Running total: 81/200
  Waiting 5s before next chunk...


Generating QA for Section O:  43%|██████████████████████▋                              | 27/63 [06:01<08:02, 13.40s/it]


  Chunk 28/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (6937 chars)
  Generating 3 questions (total so far: 81/200)
  Generated 3 pairs. Running total: 84/200
  Waiting 6s before next chunk...


Generating QA for Section O:  44%|███████████████████████▌                             | 28/63 [06:16<08:15, 14.16s/it]


  Chunk 29/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (6264 chars)
  Generating 3 questions (total so far: 84/200)
  Generated 3 pairs. Running total: 87/200
  Waiting 6s before next chunk...


Generating QA for Section O:  46%|████████████████████████▍                            | 29/63 [06:31<08:04, 14.25s/it]


  Chunk 30/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7279 chars)
  Generating 3 questions (total so far: 87/200)
  Generated 3 pairs. Running total: 90/200
  Waiting 6s before next chunk...


Generating QA for Section O:  48%|█████████████████████████▏                           | 30/63 [06:46<07:56, 14.42s/it]


  Chunk 31/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (2078 chars)
  Generating 3 questions (total so far: 90/200)
  Generated 3 pairs. Running total: 93/200
  Waiting 5s before next chunk...


Generating QA for Section O:  49%|██████████████████████████                           | 31/63 [06:56<07:04, 13.27s/it]


  Chunk 32/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7433 chars)
  Generating 3 questions (total so far: 93/200)
  Generated 3 pairs. Running total: 96/200
  Waiting 6s before next chunk...


Generating QA for Section O:  51%|██████████████████████████▉                          | 32/63 [07:09<06:45, 13.08s/it]


  Chunk 33/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7123 chars)
  Generating 3 questions (total so far: 96/200)
  Generated 3 pairs. Running total: 99/200
  Waiting 6s before next chunk...


Generating QA for Section O:  52%|███████████████████████████▊                         | 33/63 [07:20<06:16, 12.55s/it]


  Chunk 34/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (2932 chars)
  Generating 3 questions (total so far: 99/200)
  Generated 3 pairs. Running total: 102/200
  Waiting 5s before next chunk...


Generating QA for Section O:  54%|████████████████████████████▌                        | 34/63 [07:34<06:11, 12.80s/it]


  Chunk 35/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7438 chars)
  Generating 3 questions (total so far: 102/200)
  Generated 3 pairs. Running total: 105/200
  Waiting 6s before next chunk...


Generating QA for Section O:  56%|█████████████████████████████▍                       | 35/63 [07:51<06:40, 14.30s/it]


  Chunk 36/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (6262 chars)
  Generating 3 questions (total so far: 105/200)
  Generated 3 pairs. Running total: 108/200
  Waiting 6s before next chunk...


Generating QA for Section O:  57%|██████████████████████████████▎                      | 36/63 [08:05<06:19, 14.04s/it]


  Chunk 37/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (4899 chars)
  Generating 3 questions (total so far: 108/200)
  Generated 3 pairs. Running total: 111/200
  Waiting 5s before next chunk...


Generating QA for Section O:  59%|███████████████████████████████▏                     | 37/63 [08:16<05:40, 13.10s/it]


  Chunk 38/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7455 chars)
  Generating 3 questions (total so far: 111/200)
  Generated 3 pairs. Running total: 114/200
  Waiting 6s before next chunk...


Generating QA for Section O:  60%|███████████████████████████████▉                     | 38/63 [08:31<05:45, 13.83s/it]


  Chunk 39/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (5274 chars)
  Generating 3 questions (total so far: 114/200)
  Generated 3 pairs. Running total: 117/200
  Waiting 6s before next chunk...


Generating QA for Section O:  62%|████████████████████████████████▊                    | 39/63 [08:47<05:45, 14.39s/it]


  Chunk 40/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (3794 chars)
  Generating 3 questions (total so far: 117/200)
  Generated 3 pairs. Running total: 120/200
  Waiting 5s before next chunk...


Generating QA for Section O:  63%|█████████████████████████████████▋                   | 40/63 [08:58<05:10, 13.49s/it]


  Chunk 41/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7405 chars)
  Generating 3 questions (total so far: 120/200)
  Generated 3 pairs. Running total: 123/200
  Waiting 6s before next chunk...


Generating QA for Section O:  65%|██████████████████████████████████▍                  | 41/63 [09:13<05:03, 13.81s/it]


  Chunk 42/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (6231 chars)
  Generating 3 questions (total so far: 123/200)
  Generated 3 pairs. Running total: 126/200
  Waiting 6s before next chunk...


Generating QA for Section O:  67%|███████████████████████████████████▎                 | 42/63 [09:26<04:43, 13.51s/it]


  Chunk 43/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7313 chars)
  Generating 3 questions (total so far: 126/200)
  Generated 3 pairs. Running total: 129/200
  Waiting 6s before next chunk...


Generating QA for Section O:  68%|████████████████████████████████████▏                | 43/63 [09:42<04:44, 14.22s/it]


  Chunk 44/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (4567 chars)
  Generating 3 questions (total so far: 129/200)
  Generated 3 pairs. Running total: 132/200
  Waiting 5s before next chunk...


Generating QA for Section O:  70%|█████████████████████████████████████                | 44/63 [09:53<04:13, 13.34s/it]


  Chunk 45/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7410 chars)
  Generating 3 questions (total so far: 132/200)
  Generated 3 pairs. Running total: 135/200
  Waiting 6s before next chunk...


Generating QA for Section O:  71%|█████████████████████████████████████▊               | 45/63 [10:04<03:49, 12.76s/it]


  Chunk 46/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (6454 chars)
  Generating 3 questions (total so far: 135/200)
  Generated 3 pairs. Running total: 138/200
  Waiting 6s before next chunk...


Generating QA for Section O:  73%|██████████████████████████████████████▋              | 46/63 [10:21<03:55, 13.84s/it]


  Chunk 47/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (5451 chars)
  Generating 3 questions (total so far: 138/200)
  Generated 3 pairs. Running total: 141/200
  Waiting 6s before next chunk...


Generating QA for Section O:  75%|███████████████████████████████████████▌             | 47/63 [10:34<03:39, 13.69s/it]


  Chunk 48/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7302 chars)
  Generating 3 questions (total so far: 141/200)
  Generated 3 pairs. Running total: 144/200
  Waiting 6s before next chunk...


Generating QA for Section O:  76%|████████████████████████████████████████▍            | 48/63 [10:46<03:17, 13.19s/it]


  Chunk 49/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (7190 chars)
  Generating 3 questions (total so far: 144/200)
  Generated 3 pairs. Running total: 147/200
  Waiting 6s before next chunk...


Generating QA for Section O:  78%|█████████████████████████████████████████▏           | 49/63 [10:58<02:57, 12.67s/it]


  Chunk 50/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (6023 chars)
  Generating 3 questions (total so far: 147/200)
  Generated 3 pairs. Running total: 150/200
  Waiting 6s before next chunk...


Generating QA for Section O:  79%|██████████████████████████████████████████           | 50/63 [11:09<02:39, 12.24s/it]


  Chunk 51/63 from Approved_Document_C_site preparation and resistance to contaminates and moisture.pdf (2055 chars)
  Generating 3 questions (total so far: 150/200)
  Generated 3 pairs. Running total: 153/200
  Waiting 5s before next chunk...


Generating QA for Section O:  81%|██████████████████████████████████████████▉          | 51/63 [11:20<02:23, 11.94s/it]


  Chunk 52/63 from approved_document_O_Overheating.pdf (7253 chars)
  Generating 3 questions (total so far: 153/200)
  Generated 3 pairs. Running total: 156/200
  Waiting 6s before next chunk...


Generating QA for Section O:  83%|███████████████████████████████████████████▋         | 52/63 [11:31<02:09, 11.75s/it]


  Chunk 53/63 from approved_document_O_Overheating.pdf (7413 chars)
  Generating 3 questions (total so far: 156/200)
  Generated 3 pairs. Running total: 159/200
  Waiting 6s before next chunk...


Generating QA for Section O:  84%|████████████████████████████████████████████▌        | 53/63 [11:45<02:04, 12.46s/it]


  Chunk 54/63 from approved_document_O_Overheating.pdf (7471 chars)
  Generating 3 questions (total so far: 159/200)
  Generated 3 pairs. Running total: 162/200
  Waiting 6s before next chunk...


Generating QA for Section O:  86%|█████████████████████████████████████████████▍       | 54/63 [11:59<01:53, 12.66s/it]


  Chunk 55/63 from approved_document_O_Overheating.pdf (6034 chars)
  Generating 3 questions (total so far: 162/200)
  Generated 3 pairs. Running total: 165/200
  Waiting 6s before next chunk...


Generating QA for Section O:  87%|██████████████████████████████████████████████▎      | 55/63 [12:13<01:44, 13.05s/it]


  Chunk 56/63 from approved_document_O_Overheating.pdf (7483 chars)
  Generating 3 questions (total so far: 165/200)
  Generated 3 pairs. Running total: 168/200
  Waiting 6s before next chunk...


Generating QA for Section O:  89%|███████████████████████████████████████████████      | 56/63 [12:26<01:32, 13.27s/it]


  Chunk 57/63 from approved_document_O_Overheating.pdf (5233 chars)
  Generating 3 questions (total so far: 168/200)
  Generated 3 pairs. Running total: 171/200
  Waiting 6s before next chunk...


Generating QA for Section O:  90%|███████████████████████████████████████████████▉     | 57/63 [12:45<01:29, 14.83s/it]


  Chunk 58/63 from approved_document_O_Overheating.pdf (7454 chars)
  Generating 3 questions (total so far: 171/200)
  Generated 3 pairs. Running total: 174/200
  Waiting 6s before next chunk...


Generating QA for Section O:  92%|████████████████████████████████████████████████▊    | 58/63 [12:57<01:10, 14.11s/it]


  Chunk 59/63 from approved_document_O_Overheating.pdf (5495 chars)
  Generating 3 questions (total so far: 174/200)
  Generated 3 pairs. Running total: 177/200
  Waiting 6s before next chunk...


Generating QA for Section O:  94%|█████████████████████████████████████████████████▋   | 59/63 [13:10<00:54, 13.63s/it]


  Chunk 60/63 from approved_document_O_Overheating.pdf (7261 chars)
  Generating 3 questions (total so far: 177/200)
  Generated 3 pairs. Running total: 180/200
  Waiting 6s before next chunk...


Generating QA for Section O:  95%|██████████████████████████████████████████████████▍  | 60/63 [13:21<00:38, 12.94s/it]


  Chunk 61/63 from approved_document_O_Overheating.pdf (4691 chars)
  Generating 3 questions (total so far: 180/200)
  Generated 3 pairs. Running total: 183/200
  Waiting 5s before next chunk...


Generating QA for Section O:  97%|███████████████████████████████████████████████████▎ | 61/63 [13:34<00:25, 12.90s/it]


  Chunk 62/63 from Part_O_FAQ_Approved Document O_ Overheating.pdf (7493 chars)
  Generating 3 questions (total so far: 183/200)
  Generated 3 pairs. Running total: 186/200
  Waiting 6s before next chunk...


Generating QA for Section O:  98%|████████████████████████████████████████████████████▏| 62/63 [13:45<00:12, 12.32s/it]


  Chunk 63/63 from Part_O_FAQ_Approved Document O_ Overheating.pdf (6940 chars)
  Generating 14 questions (total so far: 186/200)


Generating QA for Section O: 100%|█████████████████████████████████████████████████████| 63/63 [14:02<00:00, 13.37s/it]


  Generated 12 pairs. Running total: 198/200

Section O complete: 198 QA pairs

Processing Section T (2 PDFs)...
  Total chunks for Section T: 43
  Target: 200 questions (4 per chunk + 28 extra)


Generating QA for Section T:   0%|                                                              | 0/43 [00:00<?, ?it/s]


  Chunk 1/43 from ApprovedDocument_T Toilet accommodation.pdf (4721 chars)
  Generating 4 questions (total so far: 0/200)
  Generated 4 pairs. Running total: 4/200
  Waiting 5s before next chunk...


Generating QA for Section T:   2%|█▎                                                    | 1/43 [00:11<07:50, 11.21s/it]


  Chunk 2/43 from ApprovedDocument_T Toilet accommodation.pdf (5771 chars)
  Generating 4 questions (total so far: 4/200)
  Generated 4 pairs. Running total: 8/200
  Waiting 6s before next chunk...


Generating QA for Section T:   5%|██▌                                                   | 2/43 [00:23<08:15, 12.08s/it]


  Chunk 3/43 from ApprovedDocument_T Toilet accommodation.pdf (7179 chars)
  Generating 4 questions (total so far: 8/200)
  Generated 4 pairs. Running total: 12/200
  Waiting 6s before next chunk...


Generating QA for Section T:   7%|███▊                                                  | 3/43 [00:39<09:09, 13.73s/it]


  Chunk 4/43 from ApprovedDocument_T Toilet accommodation.pdf (6201 chars)
  Generating 4 questions (total so far: 12/200)
  Generated 4 pairs. Running total: 16/200
  Waiting 6s before next chunk...


Generating QA for Section T:   9%|█████                                                 | 4/43 [00:53<09:00, 13.87s/it]


  Chunk 5/43 from ApprovedDocument_T Toilet accommodation.pdf (5596 chars)
  Generating 4 questions (total so far: 16/200)
  Generated 4 pairs. Running total: 20/200
  Waiting 6s before next chunk...


Generating QA for Section T:  12%|██████▎                                               | 5/43 [01:06<08:36, 13.58s/it]


  Chunk 6/43 from ApprovedDocument_T Toilet accommodation.pdf (5727 chars)
  Generating 4 questions (total so far: 20/200)
  Generated 4 pairs. Running total: 24/200
  Waiting 6s before next chunk...


Generating QA for Section T:  14%|███████▌                                              | 6/43 [01:29<10:12, 16.55s/it]


  Chunk 7/43 from ApprovedDocument_T Toilet accommodation.pdf (7056 chars)
  Generating 4 questions (total so far: 24/200)
  Generated 4 pairs. Running total: 28/200
  Waiting 6s before next chunk...


Generating QA for Section T:  16%|████████▊                                             | 7/43 [01:43<09:27, 15.76s/it]


  Chunk 8/43 from ApprovedDocument_T Toilet accommodation.pdf (6855 chars)
  Generating 4 questions (total so far: 28/200)
  Generated 4 pairs. Running total: 32/200
  Waiting 6s before next chunk...


Generating QA for Section T:  19%|██████████                                            | 8/43 [01:55<08:33, 14.66s/it]


  Chunk 9/43 from ApprovedDocument_T Toilet accommodation.pdf (3223 chars)
  Generating 4 questions (total so far: 32/200)
  Generated 4 pairs. Running total: 36/200
  Waiting 5s before next chunk...


Generating QA for Section T:  21%|███████████▎                                          | 9/43 [02:06<07:42, 13.61s/it]


  Chunk 10/43 from ApprovedDocument_T Toilet accommodation.pdf (5565 chars)
  Generating 4 questions (total so far: 36/200)
  Generated 4 pairs. Running total: 40/200
  Waiting 6s before next chunk...


Generating QA for Section T:  23%|████████████▎                                        | 10/43 [02:19<07:20, 13.34s/it]


  Chunk 11/43 from ApprovedDocument_T Toilet accommodation.pdf (6235 chars)
  Generating 4 questions (total so far: 40/200)
  Generated 4 pairs. Running total: 44/200
  Waiting 6s before next chunk...


Generating QA for Section T:  26%|█████████████▌                                       | 11/43 [02:32<07:01, 13.17s/it]


  Chunk 12/43 from ApprovedDocument_T Toilet accommodation.pdf (3235 chars)
  Generating 4 questions (total so far: 44/200)
  Generated 4 pairs. Running total: 48/200
  Waiting 5s before next chunk...


Generating QA for Section T:  28%|██████████████▊                                      | 12/43 [02:46<06:59, 13.52s/it]


  Chunk 13/43 from approved_document_M_Access to and use of buildings.pdf (7468 chars)
  Generating 4 questions (total so far: 48/200)
  Generated 4 pairs. Running total: 52/200
  Waiting 6s before next chunk...


Generating QA for Section T:  30%|████████████████                                     | 13/43 [02:59<06:41, 13.38s/it]


  Chunk 14/43 from approved_document_M_Access to and use of buildings.pdf (6650 chars)
  Generating 4 questions (total so far: 52/200)
  Generated 4 pairs. Running total: 56/200
  Waiting 6s before next chunk...


Generating QA for Section T:  33%|█████████████████▎                                   | 14/43 [03:11<06:15, 12.95s/it]


  Chunk 15/43 from approved_document_M_Access to and use of buildings.pdf (4057 chars)
  Generating 4 questions (total so far: 56/200)
  Generated 4 pairs. Running total: 60/200
  Waiting 5s before next chunk...


Generating QA for Section T:  35%|██████████████████▍                                  | 15/43 [03:29<06:46, 14.53s/it]


  Chunk 16/43 from approved_document_M_Access to and use of buildings.pdf (3934 chars)
  Generating 5 questions (total so far: 60/200)
  Generated 5 pairs. Running total: 65/200
  Waiting 5s before next chunk...


Generating QA for Section T:  37%|███████████████████▋                                 | 16/43 [03:44<06:36, 14.68s/it]


  Chunk 17/43 from approved_document_M_Access to and use of buildings.pdf (7412 chars)
  Generating 5 questions (total so far: 65/200)
  Generated 5 pairs. Running total: 70/200
  Waiting 6s before next chunk...


Generating QA for Section T:  40%|████████████████████▉                                | 17/43 [03:58<06:17, 14.50s/it]


  Chunk 18/43 from approved_document_M_Access to and use of buildings.pdf (3884 chars)
  Generating 5 questions (total so far: 70/200)
  Generated 5 pairs. Running total: 75/200
  Waiting 5s before next chunk...


Generating QA for Section T:  42%|██████████████████████▏                              | 18/43 [04:18<06:37, 15.89s/it]


  Chunk 19/43 from approved_document_M_Access to and use of buildings.pdf (3659 chars)
  Generating 5 questions (total so far: 75/200)
  Generated 5 pairs. Running total: 80/200
  Waiting 5s before next chunk...


Generating QA for Section T:  44%|███████████████████████▍                             | 19/43 [04:38<06:51, 17.16s/it]


  Chunk 20/43 from approved_document_M_Access to and use of buildings.pdf (6300 chars)
  Generating 5 questions (total so far: 80/200)
  Generated 5 pairs. Running total: 85/200
  Waiting 6s before next chunk...


Generating QA for Section T:  47%|████████████████████████▋                            | 20/43 [04:55<06:33, 17.10s/it]


  Chunk 21/43 from approved_document_M_Access to and use of buildings.pdf (6186 chars)
  Generating 5 questions (total so far: 85/200)
  Generated 5 pairs. Running total: 90/200
  Waiting 6s before next chunk...


Generating QA for Section T:  49%|█████████████████████████▉                           | 21/43 [05:06<05:40, 15.48s/it]


  Chunk 22/43 from approved_document_M_Access to and use of buildings.pdf (5549 chars)
  Generating 5 questions (total so far: 90/200)
  Generated 5 pairs. Running total: 95/200
  Waiting 6s before next chunk...


Generating QA for Section T:  51%|███████████████████████████                          | 22/43 [05:30<06:17, 17.97s/it]


  Chunk 23/43 from approved_document_M_Access to and use of buildings.pdf (6752 chars)
  Generating 5 questions (total so far: 95/200)
  Generated 5 pairs. Running total: 100/200
  Waiting 6s before next chunk...


Generating QA for Section T:  53%|████████████████████████████▎                        | 23/43 [05:49<06:07, 18.37s/it]


  Chunk 24/43 from approved_document_M_Access to and use of buildings.pdf (2639 chars)
  Generating 5 questions (total so far: 100/200)
  Generated 5 pairs. Running total: 105/200
  Waiting 5s before next chunk...


Generating QA for Section T:  56%|█████████████████████████████▌                       | 24/43 [06:04<05:28, 17.27s/it]


  Chunk 25/43 from approved_document_M_Access to and use of buildings.pdf (7048 chars)
  Generating 5 questions (total so far: 105/200)
  Generated 5 pairs. Running total: 110/200
  Waiting 6s before next chunk...


Generating QA for Section T:  58%|██████████████████████████████▊                      | 25/43 [06:20<05:03, 16.87s/it]


  Chunk 26/43 from approved_document_M_Access to and use of buildings.pdf (7390 chars)
  Generating 5 questions (total so far: 110/200)
  Generated 5 pairs. Running total: 115/200
  Waiting 6s before next chunk...


Generating QA for Section T:  60%|████████████████████████████████                     | 26/43 [06:39<04:57, 17.52s/it]


  Chunk 27/43 from approved_document_M_Access to and use of buildings.pdf (7322 chars)
  Generating 5 questions (total so far: 115/200)
  Generated 5 pairs. Running total: 120/200
  Waiting 6s before next chunk...


Generating QA for Section T:  63%|█████████████████████████████████▎                   | 27/43 [06:54<04:28, 16.76s/it]


  Chunk 28/43 from approved_document_M_Access to and use of buildings.pdf (7174 chars)
  Generating 5 questions (total so far: 120/200)
  Generated 5 pairs. Running total: 125/200
  Waiting 6s before next chunk...


Generating QA for Section T:  65%|██████████████████████████████████▌                  | 28/43 [07:10<04:08, 16.57s/it]


  Chunk 29/43 from approved_document_M_Access to and use of buildings.pdf (5911 chars)
  Generating 4 questions (total so far: 125/200)
  Generated 4 pairs. Running total: 129/200
  Waiting 6s before next chunk...


Generating QA for Section T:  67%|███████████████████████████████████▋                 | 29/43 [07:33<04:16, 18.34s/it]


  Chunk 30/43 from approved_document_M_Access to and use of buildings.pdf (7014 chars)
  Generating 4 questions (total so far: 129/200)
  Generated 4 pairs. Running total: 133/200
  Waiting 6s before next chunk...


Generating QA for Section T:  70%|████████████████████████████████████▉                | 30/43 [07:53<04:05, 18.87s/it]


  Chunk 31/43 from approved_document_M_Access to and use of buildings.pdf (5910 chars)
  Generating 4 questions (total so far: 133/200)
  Generated 4 pairs. Running total: 137/200
  Waiting 6s before next chunk...


Generating QA for Section T:  72%|██████████████████████████████████████▏              | 31/43 [08:06<03:24, 17.04s/it]


  Chunk 32/43 from approved_document_M_Access to and use of buildings.pdf (5652 chars)
  Generating 4 questions (total so far: 137/200)
  Generated 4 pairs. Running total: 141/200
  Waiting 6s before next chunk...


Generating QA for Section T:  74%|███████████████████████████████████████▍             | 32/43 [08:24<03:13, 17.55s/it]


  Chunk 33/43 from approved_document_M_Access to and use of buildings.pdf (6467 chars)
  Generating 4 questions (total so far: 141/200)
  Generated 4 pairs. Running total: 145/200
  Waiting 6s before next chunk...


Generating QA for Section T:  77%|████████████████████████████████████████▋            | 33/43 [08:50<03:20, 20.09s/it]


  Chunk 34/43 from approved_document_M_Access to and use of buildings.pdf (6956 chars)
  Generating 4 questions (total so far: 145/200)
  Generated 4 pairs. Running total: 149/200
  Waiting 6s before next chunk...


Generating QA for Section T:  79%|█████████████████████████████████████████▉           | 34/43 [09:03<02:40, 17.88s/it]


  Chunk 35/43 from approved_document_M_Access to and use of buildings.pdf (6916 chars)
  Generating 4 questions (total so far: 149/200)
  Generated 4 pairs. Running total: 153/200
  Waiting 6s before next chunk...


Generating QA for Section T:  81%|███████████████████████████████████████████▏         | 35/43 [09:26<02:36, 19.54s/it]


  Chunk 36/43 from approved_document_M_Access to and use of buildings.pdf (3393 chars)
  Generating 4 questions (total so far: 153/200)
  Generated 4 pairs. Running total: 157/200
  Waiting 5s before next chunk...


Generating QA for Section T:  84%|████████████████████████████████████████████▎        | 36/43 [09:41<02:06, 18.04s/it]


  Chunk 37/43 from approved_document_M_Access to and use of buildings.pdf (6810 chars)
  Generating 4 questions (total so far: 157/200)
  Generated 4 pairs. Running total: 161/200
  Waiting 6s before next chunk...


Generating QA for Section T:  86%|█████████████████████████████████████████████▌       | 37/43 [09:56<01:41, 17.00s/it]


  Chunk 38/43 from approved_document_M_Access to and use of buildings.pdf (7223 chars)
  Generating 4 questions (total so far: 161/200)
  Generated 4 pairs. Running total: 165/200
  Waiting 6s before next chunk...


Generating QA for Section T:  88%|██████████████████████████████████████████████▊      | 38/43 [10:11<01:22, 16.47s/it]


  Chunk 39/43 from approved_document_M_Access to and use of buildings.pdf (5482 chars)
  Generating 4 questions (total so far: 165/200)
  Generated 4 pairs. Running total: 169/200
  Waiting 6s before next chunk...


Generating QA for Section T:  91%|████████████████████████████████████████████████     | 39/43 [10:23<01:00, 15.10s/it]


  Chunk 40/43 from approved_document_M_Access to and use of buildings.pdf (5908 chars)
  Generating 4 questions (total so far: 169/200)
  Generated 4 pairs. Running total: 173/200
  Waiting 6s before next chunk...


Generating QA for Section T:  93%|█████████████████████████████████████████████████▎   | 40/43 [10:41<00:47, 15.98s/it]


  Chunk 41/43 from approved_document_M_Access to and use of buildings.pdf (7192 chars)
  Generating 4 questions (total so far: 173/200)
  Generated 4 pairs. Running total: 177/200
  Waiting 6s before next chunk...


Generating QA for Section T:  95%|██████████████████████████████████████████████████▌  | 41/43 [10:53<00:29, 14.94s/it]


  Chunk 42/43 from approved_document_M_Access to and use of buildings.pdf (7321 chars)
  Generating 4 questions (total so far: 177/200)
  Generated 4 pairs. Running total: 181/200
  Waiting 6s before next chunk...


Generating QA for Section T:  98%|███████████████████████████████████████████████████▊ | 42/43 [11:08<00:14, 14.89s/it]


  Chunk 43/43 from approved_document_M_Access to and use of buildings.pdf (5498 chars)
  Generating 19 questions (total so far: 181/200)


Generating QA for Section T: 100%|█████████████████████████████████████████████████████| 43/43 [11:34<00:00, 16.15s/it]


  Generated 17 pairs. Running total: 198/200

Section T complete: 198 QA pairs

Processing Section B (2 PDFs)...
  Total chunks for Section B: 84
  Target: 200 questions (2 per chunk + 32 extra)


Generating QA for Section B:   0%|                                                              | 0/84 [00:00<?, ?it/s]


  Chunk 1/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5535 chars)
  Generating 2 questions (total so far: 0/200)
  Generated 2 pairs. Running total: 2/200
  Waiting 6s before next chunk...


Generating QA for Section B:   1%|▋                                                     | 1/84 [00:11<15:41, 11.34s/it]


  Chunk 2/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6062 chars)
  Generating 2 questions (total so far: 2/200)
  Generated 2 pairs. Running total: 4/200
  Waiting 6s before next chunk...


Generating QA for Section B:   2%|█▎                                                    | 2/84 [00:29<20:57, 15.33s/it]


  Chunk 3/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7048 chars)
  Generating 2 questions (total so far: 4/200)
  Generated 2 pairs. Running total: 6/200
  Waiting 6s before next chunk...


Generating QA for Section B:   4%|█▉                                                    | 3/84 [00:39<17:37, 13.06s/it]


  Chunk 4/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5783 chars)
  Generating 2 questions (total so far: 6/200)
  Generated 2 pairs. Running total: 8/200
  Waiting 6s before next chunk...


Generating QA for Section B:   5%|██▌                                                   | 4/84 [00:55<18:48, 14.10s/it]


  Chunk 5/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5992 chars)
  Generating 2 questions (total so far: 8/200)
  Generated 2 pairs. Running total: 10/200
  Waiting 6s before next chunk...


Generating QA for Section B:   6%|███▏                                                  | 5/84 [01:05<16:44, 12.71s/it]


  Chunk 6/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7320 chars)
  Generating 2 questions (total so far: 10/200)
  Generated 2 pairs. Running total: 12/200
  Waiting 6s before next chunk...


Generating QA for Section B:   7%|███▊                                                  | 6/84 [01:20<17:18, 13.32s/it]


  Chunk 7/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6578 chars)
  Generating 2 questions (total so far: 12/200)
  Generated 2 pairs. Running total: 14/200
  Waiting 6s before next chunk...


Generating QA for Section B:   8%|████▌                                                 | 7/84 [01:31<16:25, 12.80s/it]


  Chunk 8/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7404 chars)
  Generating 2 questions (total so far: 14/200)
  Generated 2 pairs. Running total: 16/200
  Waiting 6s before next chunk...


Generating QA for Section B:  10%|█████▏                                                | 8/84 [01:44<15:55, 12.57s/it]


  Chunk 9/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5957 chars)
  Generating 2 questions (total so far: 16/200)
  Generated 2 pairs. Running total: 18/200
  Waiting 6s before next chunk...


Generating QA for Section B:  11%|█████▊                                                | 9/84 [01:54<14:57, 11.97s/it]


  Chunk 10/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6979 chars)
  Generating 2 questions (total so far: 18/200)
  Generated 2 pairs. Running total: 20/200
  Waiting 6s before next chunk...


Generating QA for Section B:  12%|██████▎                                              | 10/84 [02:07<15:07, 12.26s/it]


  Chunk 11/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7284 chars)
  Generating 2 questions (total so far: 20/200)
  Generated 2 pairs. Running total: 22/200
  Waiting 6s before next chunk...


Generating QA for Section B:  13%|██████▉                                              | 11/84 [02:25<17:03, 14.02s/it]


  Chunk 12/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6766 chars)
  Generating 2 questions (total so far: 22/200)
  Generated 2 pairs. Running total: 24/200
  Waiting 6s before next chunk...


Generating QA for Section B:  14%|███████▌                                             | 12/84 [02:36<15:47, 13.16s/it]


  Chunk 13/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6623 chars)
  Generating 2 questions (total so far: 24/200)
  Generated 2 pairs. Running total: 26/200
  Waiting 6s before next chunk...


Generating QA for Section B:  15%|████████▏                                            | 13/84 [02:49<15:22, 12.99s/it]


  Chunk 14/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7496 chars)
  Generating 2 questions (total so far: 26/200)
  Generated 2 pairs. Running total: 28/200
  Waiting 6s before next chunk...


Generating QA for Section B:  17%|████████▊                                            | 14/84 [03:00<14:29, 12.43s/it]


  Chunk 15/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5337 chars)
  Generating 2 questions (total so far: 28/200)
  Generated 2 pairs. Running total: 30/200
  Waiting 6s before next chunk...


Generating QA for Section B:  18%|█████████▍                                           | 15/84 [03:11<13:42, 11.93s/it]


  Chunk 16/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5995 chars)
  Generating 2 questions (total so far: 30/200)
  Generated 2 pairs. Running total: 32/200
  Waiting 6s before next chunk...


Generating QA for Section B:  19%|██████████                                           | 16/84 [03:21<12:54, 11.39s/it]


  Chunk 17/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5714 chars)
  Generating 2 questions (total so far: 32/200)
JSON parsing error: Expecting property name enclosed in double quotes: line 6 column 5 (char 919)
Attempting to clean JSON and retry...
  Generated 2 pairs. Running total: 34/200
  Waiting 6s before next chunk...


Generating QA for Section B:  20%|██████████▋                                          | 17/84 [03:32<12:32, 11.23s/it]


  Chunk 18/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5064 chars)
  Generating 2 questions (total so far: 34/200)
  Generated 2 pairs. Running total: 36/200
  Waiting 6s before next chunk...


Generating QA for Section B:  21%|███████████▎                                         | 18/84 [03:42<12:08, 11.04s/it]


  Chunk 19/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5708 chars)
  Generating 2 questions (total so far: 36/200)
  Generated 2 pairs. Running total: 38/200
  Waiting 6s before next chunk...


Generating QA for Section B:  23%|███████████▉                                         | 19/84 [04:00<14:03, 12.98s/it]


  Chunk 20/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6733 chars)
  Generating 2 questions (total so far: 38/200)
  Generated 2 pairs. Running total: 40/200
  Waiting 6s before next chunk...


Generating QA for Section B:  24%|████████████▌                                        | 20/84 [04:11<13:06, 12.28s/it]


  Chunk 21/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7400 chars)
  Generating 2 questions (total so far: 40/200)
  Generated 2 pairs. Running total: 42/200
  Waiting 6s before next chunk...


Generating QA for Section B:  25%|█████████████▎                                       | 21/84 [04:21<12:25, 11.83s/it]


  Chunk 22/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7119 chars)
  Generating 2 questions (total so far: 42/200)
  Generated 2 pairs. Running total: 44/200
  Waiting 6s before next chunk...


Generating QA for Section B:  26%|█████████████▉                                       | 22/84 [04:39<13:57, 13.51s/it]


  Chunk 23/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7335 chars)
  Generating 2 questions (total so far: 44/200)
  Generated 2 pairs. Running total: 46/200
  Waiting 6s before next chunk...


Generating QA for Section B:  27%|██████████████▌                                      | 23/84 [04:51<13:23, 13.17s/it]


  Chunk 24/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5918 chars)
  Generating 2 questions (total so far: 46/200)
  Generated 2 pairs. Running total: 48/200
  Waiting 6s before next chunk...


Generating QA for Section B:  29%|███████████████▏                                     | 24/84 [05:01<12:04, 12.07s/it]


  Chunk 25/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6522 chars)
  Generating 2 questions (total so far: 48/200)
  Generated 2 pairs. Running total: 50/200
  Waiting 6s before next chunk...


Generating QA for Section B:  30%|███████████████▊                                     | 25/84 [05:12<11:35, 11.79s/it]


  Chunk 26/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7338 chars)
  Generating 2 questions (total so far: 50/200)
  Generated 2 pairs. Running total: 52/200
  Waiting 6s before next chunk...


Generating QA for Section B:  31%|████████████████▍                                    | 26/84 [05:22<10:57, 11.34s/it]


  Chunk 27/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (4889 chars)
  Generating 2 questions (total so far: 52/200)
  Generated 2 pairs. Running total: 54/200
  Waiting 5s before next chunk...


Generating QA for Section B:  32%|█████████████████                                    | 27/84 [05:32<10:25, 10.98s/it]


  Chunk 28/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6026 chars)
  Generating 2 questions (total so far: 54/200)
  Generated 2 pairs. Running total: 56/200
  Waiting 6s before next chunk...


Generating QA for Section B:  33%|█████████████████▋                                   | 28/84 [05:43<10:15, 11.00s/it]


  Chunk 29/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6061 chars)
  Generating 2 questions (total so far: 56/200)
  Generated 2 pairs. Running total: 58/200
  Waiting 6s before next chunk...


Generating QA for Section B:  35%|██████████████████▎                                  | 29/84 [05:54<10:01, 10.93s/it]


  Chunk 30/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5527 chars)
  Generating 2 questions (total so far: 58/200)
  Generated 2 pairs. Running total: 60/200
  Waiting 6s before next chunk...


Generating QA for Section B:  36%|██████████████████▉                                  | 30/84 [06:05<09:50, 10.94s/it]


  Chunk 31/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5767 chars)
  Generating 2 questions (total so far: 60/200)
  Generated 2 pairs. Running total: 62/200
  Waiting 6s before next chunk...


Generating QA for Section B:  37%|███████████████████▌                                 | 31/84 [06:15<09:26, 10.69s/it]


  Chunk 32/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5754 chars)
  Generating 2 questions (total so far: 62/200)
  Generated 2 pairs. Running total: 64/200
  Waiting 6s before next chunk...


Generating QA for Section B:  38%|████████████████████▏                                | 32/84 [06:34<11:23, 13.15s/it]


  Chunk 33/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6490 chars)
  Generating 2 questions (total so far: 64/200)
  Generated 2 pairs. Running total: 66/200
  Waiting 6s before next chunk...


Generating QA for Section B:  39%|████████████████████▊                                | 33/84 [06:47<11:03, 13.02s/it]


  Chunk 34/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7205 chars)
  Generating 2 questions (total so far: 66/200)
  Generated 2 pairs. Running total: 68/200
  Waiting 6s before next chunk...


Generating QA for Section B:  40%|█████████████████████▍                               | 34/84 [06:59<10:37, 12.74s/it]


  Chunk 35/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6677 chars)
  Generating 2 questions (total so far: 68/200)
  Generated 2 pairs. Running total: 70/200
  Waiting 6s before next chunk...


Generating QA for Section B:  42%|██████████████████████                               | 35/84 [07:13<10:52, 13.31s/it]


  Chunk 36/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6524 chars)
  Generating 2 questions (total so far: 70/200)
  Generated 2 pairs. Running total: 72/200
  Waiting 6s before next chunk...


Generating QA for Section B:  43%|██████████████████████▋                              | 36/84 [07:26<10:24, 13.00s/it]


  Chunk 37/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6853 chars)
  Generating 2 questions (total so far: 72/200)
  Generated 2 pairs. Running total: 74/200
  Waiting 6s before next chunk...


Generating QA for Section B:  44%|███████████████████████▎                             | 37/84 [07:35<09:25, 12.03s/it]


  Chunk 38/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5757 chars)
  Generating 2 questions (total so far: 74/200)
  Generated 2 pairs. Running total: 76/200
  Waiting 6s before next chunk...


Generating QA for Section B:  45%|███████████████████████▉                             | 38/84 [07:46<08:55, 11.65s/it]


  Chunk 39/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6950 chars)
  Generating 2 questions (total so far: 76/200)
  Generated 2 pairs. Running total: 78/200
  Waiting 6s before next chunk...


Generating QA for Section B:  46%|████████████████████████▌                            | 39/84 [07:58<08:49, 11.77s/it]


  Chunk 40/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6920 chars)
  Generating 2 questions (total so far: 78/200)
  Generated 2 pairs. Running total: 80/200
  Waiting 6s before next chunk...


Generating QA for Section B:  48%|█████████████████████████▏                           | 40/84 [08:09<08:24, 11.46s/it]


  Chunk 41/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5437 chars)
  Generating 2 questions (total so far: 80/200)
  Generated 2 pairs. Running total: 82/200
  Waiting 6s before next chunk...


Generating QA for Section B:  49%|█████████████████████████▊                           | 41/84 [08:20<08:05, 11.28s/it]


  Chunk 42/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6479 chars)
  Generating 2 questions (total so far: 82/200)
  Generated 2 pairs. Running total: 84/200
  Waiting 6s before next chunk...


Generating QA for Section B:  50%|██████████████████████████▌                          | 42/84 [08:30<07:45, 11.08s/it]


  Chunk 43/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7275 chars)
  Generating 2 questions (total so far: 84/200)
  Generated 2 pairs. Running total: 86/200
  Waiting 6s before next chunk...


Generating QA for Section B:  51%|███████████████████████████▏                         | 43/84 [08:41<07:23, 10.81s/it]


  Chunk 44/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (3879 chars)
  Generating 2 questions (total so far: 86/200)
  Generated 2 pairs. Running total: 88/200
  Waiting 5s before next chunk...


Generating QA for Section B:  52%|███████████████████████████▊                         | 44/84 [08:54<07:42, 11.57s/it]


  Chunk 45/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7276 chars)
  Generating 2 questions (total so far: 88/200)
  Generated 2 pairs. Running total: 90/200
  Waiting 6s before next chunk...


Generating QA for Section B:  54%|████████████████████████████▍                        | 45/84 [09:05<07:22, 11.35s/it]


  Chunk 46/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (4722 chars)
  Generating 2 questions (total so far: 90/200)
  Generated 2 pairs. Running total: 92/200
  Waiting 5s before next chunk...


Generating QA for Section B:  55%|█████████████████████████████                        | 46/84 [09:15<06:57, 10.99s/it]


  Chunk 47/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6236 chars)
  Generating 2 questions (total so far: 92/200)
  Generated 2 pairs. Running total: 94/200
  Waiting 6s before next chunk...


Generating QA for Section B:  56%|█████████████████████████████▋                       | 47/84 [09:29<07:21, 11.94s/it]


  Chunk 48/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5444 chars)
  Generating 2 questions (total so far: 94/200)
  Generated 2 pairs. Running total: 96/200
  Waiting 6s before next chunk...


Generating QA for Section B:  57%|██████████████████████████████▎                      | 48/84 [09:42<07:17, 12.14s/it]


  Chunk 49/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6067 chars)
  Generating 2 questions (total so far: 96/200)
  Generated 2 pairs. Running total: 98/200
  Waiting 6s before next chunk...


Generating QA for Section B:  58%|██████████████████████████████▉                      | 49/84 [09:53<06:56, 11.91s/it]


  Chunk 50/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6073 chars)
  Generating 2 questions (total so far: 98/200)
  Generated 2 pairs. Running total: 100/200
  Waiting 6s before next chunk...


Generating QA for Section B:  60%|███████████████████████████████▌                     | 50/84 [10:04<06:29, 11.46s/it]


  Chunk 51/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5912 chars)
  Generating 2 questions (total so far: 100/200)
  Generated 2 pairs. Running total: 102/200
  Waiting 6s before next chunk...


Generating QA for Section B:  61%|████████████████████████████████▏                    | 51/84 [10:15<06:22, 11.60s/it]


  Chunk 52/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (4057 chars)
  Generating 2 questions (total so far: 102/200)
  Generated 2 pairs. Running total: 104/200
  Waiting 5s before next chunk...


Generating QA for Section B:  62%|████████████████████████████████▊                    | 52/84 [10:27<06:06, 11.44s/it]


  Chunk 53/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7188 chars)
  Generating 2 questions (total so far: 104/200)
  Generated 2 pairs. Running total: 106/200
  Waiting 6s before next chunk...


Generating QA for Section B:  63%|█████████████████████████████████▍                   | 53/84 [10:38<05:50, 11.32s/it]


  Chunk 54/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7243 chars)
  Generating 2 questions (total so far: 106/200)
  Generated 2 pairs. Running total: 108/200
  Waiting 6s before next chunk...


Generating QA for Section B:  64%|██████████████████████████████████                   | 54/84 [10:49<05:39, 11.30s/it]


  Chunk 55/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5871 chars)
  Generating 2 questions (total so far: 108/200)
  Generated 2 pairs. Running total: 110/200
  Waiting 6s before next chunk...


Generating QA for Section B:  65%|██████████████████████████████████▋                  | 55/84 [10:59<05:14, 10.84s/it]


  Chunk 56/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (4183 chars)
  Generating 2 questions (total so far: 110/200)
  Generated 2 pairs. Running total: 112/200
  Waiting 5s before next chunk...


Generating QA for Section B:  67%|███████████████████████████████████▎                 | 56/84 [11:08<04:54, 10.52s/it]


  Chunk 57/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5993 chars)
  Generating 2 questions (total so far: 112/200)
  Generated 2 pairs. Running total: 114/200
  Waiting 6s before next chunk...


Generating QA for Section B:  68%|███████████████████████████████████▉                 | 57/84 [11:22<05:12, 11.57s/it]


  Chunk 58/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7300 chars)
  Generating 2 questions (total so far: 114/200)
  Generated 2 pairs. Running total: 116/200
  Waiting 6s before next chunk...


Generating QA for Section B:  69%|████████████████████████████████████▌                | 58/84 [11:34<05:03, 11.68s/it]


  Chunk 59/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6532 chars)
  Generating 2 questions (total so far: 116/200)
  Generated 2 pairs. Running total: 118/200
  Waiting 6s before next chunk...


Generating QA for Section B:  70%|█████████████████████████████████████▏               | 59/84 [11:48<05:04, 12.18s/it]


  Chunk 60/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5896 chars)
  Generating 2 questions (total so far: 118/200)
  Generated 2 pairs. Running total: 120/200
  Waiting 6s before next chunk...


Generating QA for Section B:  71%|█████████████████████████████████████▊               | 60/84 [11:58<04:38, 11.60s/it]


  Chunk 61/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5588 chars)
  Generating 2 questions (total so far: 120/200)
  Generated 2 pairs. Running total: 122/200
  Waiting 6s before next chunk...


Generating QA for Section B:  73%|██████████████████████████████████████▍              | 61/84 [12:08<04:13, 11.03s/it]


  Chunk 62/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5656 chars)
  Generating 2 questions (total so far: 122/200)
  Generated 2 pairs. Running total: 124/200
  Waiting 6s before next chunk...


Generating QA for Section B:  74%|███████████████████████████████████████              | 62/84 [12:20<04:14, 11.57s/it]


  Chunk 63/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6356 chars)
  Generating 2 questions (total so far: 124/200)
  Generated 2 pairs. Running total: 126/200
  Waiting 6s before next chunk...


Generating QA for Section B:  75%|███████████████████████████████████████▊             | 63/84 [12:32<04:02, 11.53s/it]


  Chunk 64/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5715 chars)
  Generating 2 questions (total so far: 126/200)
  Generated 2 pairs. Running total: 128/200
  Waiting 6s before next chunk...


Generating QA for Section B:  76%|████████████████████████████████████████▍            | 64/84 [12:50<04:29, 13.47s/it]


  Chunk 65/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5740 chars)
  Generating 2 questions (total so far: 128/200)
  Generated 2 pairs. Running total: 130/200
  Waiting 6s before next chunk...


Generating QA for Section B:  77%|█████████████████████████████████████████            | 65/84 [13:00<03:54, 12.35s/it]


  Chunk 66/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7317 chars)
  Generating 2 questions (total so far: 130/200)
  Generated 2 pairs. Running total: 132/200
  Waiting 6s before next chunk...


Generating QA for Section B:  79%|█████████████████████████████████████████▋           | 66/84 [13:09<03:28, 11.57s/it]


  Chunk 67/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6169 chars)
  Generating 2 questions (total so far: 132/200)
  Generated 2 pairs. Running total: 134/200
  Waiting 6s before next chunk...


Generating QA for Section B:  80%|██████████████████████████████████████████▎          | 67/84 [13:22<03:21, 11.88s/it]


  Chunk 68/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7299 chars)
  Generating 2 questions (total so far: 134/200)
  Generated 2 pairs. Running total: 136/200
  Waiting 6s before next chunk...


Generating QA for Section B:  81%|██████████████████████████████████████████▉          | 68/84 [13:33<03:07, 11.73s/it]


  Chunk 69/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5804 chars)
  Generating 2 questions (total so far: 136/200)
  Generated 2 pairs. Running total: 138/200
  Waiting 6s before next chunk...


Generating QA for Section B:  82%|███████████████████████████████████████████▌         | 69/84 [13:43<02:47, 11.17s/it]


  Chunk 70/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6050 chars)
  Generating 2 questions (total so far: 138/200)
  Generated 2 pairs. Running total: 140/200
  Waiting 6s before next chunk...


Generating QA for Section B:  83%|████████████████████████████████████████████▏        | 70/84 [13:53<02:30, 10.77s/it]


  Chunk 71/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6898 chars)
  Generating 2 questions (total so far: 140/200)
  Generated 2 pairs. Running total: 142/200
  Waiting 6s before next chunk...


Generating QA for Section B:  85%|████████████████████████████████████████████▊        | 71/84 [14:02<02:12, 10.21s/it]


  Chunk 72/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7489 chars)
  Generating 2 questions (total so far: 142/200)
  Generated 2 pairs. Running total: 144/200
  Waiting 6s before next chunk...


Generating QA for Section B:  86%|█████████████████████████████████████████████▍       | 72/84 [14:14<02:07, 10.63s/it]


  Chunk 73/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5416 chars)
  Generating 2 questions (total so far: 144/200)
  Generated 2 pairs. Running total: 146/200
  Waiting 6s before next chunk...


Generating QA for Section B:  87%|██████████████████████████████████████████████       | 73/84 [14:24<01:55, 10.52s/it]


  Chunk 74/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5514 chars)
  Generating 2 questions (total so far: 146/200)
  Generated 2 pairs. Running total: 148/200
  Waiting 6s before next chunk...


Generating QA for Section B:  88%|██████████████████████████████████████████████▋      | 74/84 [14:35<01:46, 10.69s/it]


  Chunk 75/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6318 chars)
  Generating 2 questions (total so far: 148/200)
  Generated 2 pairs. Running total: 150/200
  Waiting 6s before next chunk...


Generating QA for Section B:  89%|███████████████████████████████████████████████▎     | 75/84 [14:47<01:39, 11.01s/it]


  Chunk 76/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (7445 chars)
  Generating 2 questions (total so far: 150/200)
  Generated 2 pairs. Running total: 152/200
  Waiting 6s before next chunk...


Generating QA for Section B:  90%|███████████████████████████████████████████████▉     | 76/84 [14:58<01:29, 11.15s/it]


  Chunk 77/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (5295 chars)
  Generating 2 questions (total so far: 152/200)
  Generated 2 pairs. Running total: 154/200
  Waiting 6s before next chunk...


Generating QA for Section B:  92%|████████████████████████████████████████████████▌    | 77/84 [15:08<01:16, 10.89s/it]


  Chunk 78/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (6787 chars)
  Generating 2 questions (total so far: 154/200)
  Generated 2 pairs. Running total: 156/200
  Waiting 6s before next chunk...


Generating QA for Section B:  93%|█████████████████████████████████████████████████▏   | 78/84 [15:20<01:06, 11.01s/it]


  Chunk 79/84 from Approved_Document_B__fire_safety__volume_1_-_Dwellings__2019_edition_incorporating_2020_and_2022_amendments_collated_with_2025__2026_and_2029_amendments.pdf (3257 chars)
  Generating 2 questions (total so far: 156/200)
  Generated 2 pairs. Running total: 158/200
  Waiting 5s before next chunk...


Generating QA for Section B:  94%|█████████████████████████████████████████████████▊   | 79/84 [15:30<00:53, 10.65s/it]


  Chunk 80/84 from Part_B_FAQ_approved_document_B_Fire safety.pdf (7029 chars)
  Generating 2 questions (total so far: 158/200)
  Generated 2 pairs. Running total: 160/200
  Waiting 6s before next chunk...


Generating QA for Section B:  95%|██████████████████████████████████████████████████▍  | 80/84 [15:40<00:41, 10.46s/it]


  Chunk 81/84 from Part_B_FAQ_approved_document_B_Fire safety.pdf (7436 chars)
  Generating 2 questions (total so far: 160/200)
  Generated 2 pairs. Running total: 162/200
  Waiting 6s before next chunk...


Generating QA for Section B:  96%|███████████████████████████████████████████████████  | 81/84 [15:53<00:33, 11.32s/it]


  Chunk 82/84 from Part_B_FAQ_approved_document_B_Fire safety.pdf (7436 chars)
  Generating 2 questions (total so far: 162/200)
  Generated 2 pairs. Running total: 164/200
  Waiting 6s before next chunk...


Generating QA for Section B:  98%|███████████████████████████████████████████████████▋ | 82/84 [16:05<00:23, 11.63s/it]


  Chunk 83/84 from Part_B_FAQ_approved_document_B_Fire safety.pdf (7182 chars)
  Generating 2 questions (total so far: 164/200)
  Generated 2 pairs. Running total: 166/200
  Waiting 6s before next chunk...


Generating QA for Section B:  99%|████████████████████████████████████████████████████▎| 83/84 [16:16<00:11, 11.42s/it]


  Chunk 84/84 from Part_B_FAQ_approved_document_B_Fire safety.pdf (6490 chars)
  Generating 34 questions (total so far: 166/200)


Generating QA for Section B: 100%|█████████████████████████████████████████████████████| 84/84 [17:04<00:00, 12.19s/it]


  Generated 26 pairs. Running total: 192/200

Section B complete: 192 QA pairs

Processing Section D (1 PDFs)...
  Total chunks for Section D: 3
  Target: 200 questions (66 per chunk + 2 extra)


Generating QA for Section D:   0%|                                                               | 0/3 [00:00<?, ?it/s]


  Chunk 1/3 from approved_document_D_Toxic Substance.pdf (136 chars)
  Generating 66 questions (total so far: 0/200)
  Generated 52 pairs. Running total: 52/200
  Waiting 5s before next chunk...


Generating QA for Section D:  33%|██████████████████▎                                    | 1/3 [01:03<02:07, 63.63s/it]


  Chunk 2/3 from approved_document_D_Toxic Substance.pdf (7469 chars)
  Generating 67 questions (total so far: 52/200)
  Generated 53 pairs. Running total: 105/200
  Waiting 6s before next chunk...


Generating QA for Section D:  67%|████████████████████████████████████▋                  | 2/3 [02:25<01:14, 74.35s/it]


  Chunk 3/3 from approved_document_D_Toxic Substance.pdf (3140 chars)
  Generating 95 questions (total so far: 105/200)


Generating QA for Section D: 100%|███████████████████████████████████████████████████████| 3/3 [03:29<00:00, 69.87s/it]


  Generated 51 pairs. Running total: 156/200

Section D complete: 156 QA pairs

Processing Section E (2 PDFs)...
  Total chunks for Section E: 40
  Target: 200 questions (5 per chunk + 0 extra)


Generating QA for Section E:   0%|                                                              | 0/40 [00:00<?, ?it/s]


  Chunk 1/40 from approved_document_E_Resistance to sound.pdf (6858 chars)
  Generating 5 questions (total so far: 0/200)
  Generated 5 pairs. Running total: 5/200
  Waiting 6s before next chunk...


Generating QA for Section E:   2%|█▎                                                    | 1/40 [00:22<14:52, 22.89s/it]


  Chunk 2/40 from approved_document_E_Resistance to sound.pdf (7385 chars)
  Generating 5 questions (total so far: 5/200)
  Generated 5 pairs. Running total: 10/200
  Waiting 6s before next chunk...


Generating QA for Section E:   5%|██▋                                                   | 2/40 [00:46<14:40, 23.18s/it]


  Chunk 3/40 from approved_document_E_Resistance to sound.pdf (672 chars)
  Generating 5 questions (total so far: 10/200)
  Generated 5 pairs. Running total: 15/200
  Waiting 5s before next chunk...


Generating QA for Section E:   8%|████                                                  | 3/40 [00:59<11:36, 18.83s/it]


  Chunk 4/40 from approved_document_E_Resistance to sound.pdf (7390 chars)
  Generating 5 questions (total so far: 15/200)
  Generated 5 pairs. Running total: 20/200
  Waiting 6s before next chunk...


Generating QA for Section E:  10%|█████▍                                                | 4/40 [01:14<10:17, 17.15s/it]


  Chunk 5/40 from approved_document_E_Resistance to sound.pdf (6565 chars)
  Generating 5 questions (total so far: 20/200)
  Generated 5 pairs. Running total: 25/200
  Waiting 6s before next chunk...


Generating QA for Section E:  12%|██████▊                                               | 5/40 [01:29<09:28, 16.24s/it]


  Chunk 6/40 from approved_document_E_Resistance to sound.pdf (6166 chars)
  Generating 5 questions (total so far: 25/200)
  Generated 5 pairs. Running total: 30/200
  Waiting 6s before next chunk...


Generating QA for Section E:  15%|████████                                              | 6/40 [01:46<09:26, 16.67s/it]


  Chunk 7/40 from approved_document_E_Resistance to sound.pdf (5327 chars)
  Generating 5 questions (total so far: 30/200)
  Generated 5 pairs. Running total: 35/200
  Waiting 6s before next chunk...


Generating QA for Section E:  18%|█████████▍                                            | 7/40 [02:04<09:28, 17.22s/it]


  Chunk 8/40 from approved_document_E_Resistance to sound.pdf (7366 chars)
  Generating 5 questions (total so far: 35/200)
  Generated 5 pairs. Running total: 40/200
  Waiting 6s before next chunk...


Generating QA for Section E:  20%|██████████▊                                           | 8/40 [02:21<08:59, 16.86s/it]


  Chunk 9/40 from approved_document_E_Resistance to sound.pdf (7401 chars)
  Generating 5 questions (total so far: 40/200)
  Generated 5 pairs. Running total: 45/200
  Waiting 6s before next chunk...


Generating QA for Section E:  22%|████████████▏                                         | 9/40 [02:35<08:20, 16.14s/it]


  Chunk 10/40 from approved_document_E_Resistance to sound.pdf (4524 chars)
  Generating 5 questions (total so far: 45/200)
  Generated 5 pairs. Running total: 50/200
  Waiting 5s before next chunk...


Generating QA for Section E:  25%|█████████████▎                                       | 10/40 [02:53<08:16, 16.56s/it]


  Chunk 11/40 from approved_document_E_Resistance to sound.pdf (7418 chars)
  Generating 5 questions (total so far: 50/200)
  Generated 5 pairs. Running total: 55/200
  Waiting 6s before next chunk...


Generating QA for Section E:  28%|██████████████▌                                      | 11/40 [03:12<08:21, 17.29s/it]


  Chunk 12/40 from approved_document_E_Resistance to sound.pdf (7362 chars)
  Generating 5 questions (total so far: 55/200)
  Generated 5 pairs. Running total: 60/200
  Waiting 6s before next chunk...


Generating QA for Section E:  30%|███████████████▉                                     | 12/40 [03:27<07:48, 16.72s/it]


  Chunk 13/40 from approved_document_E_Resistance to sound.pdf (4840 chars)
  Generating 5 questions (total so far: 60/200)
  Generated 5 pairs. Running total: 65/200
  Waiting 5s before next chunk...


Generating QA for Section E:  32%|█████████████████▏                                   | 13/40 [03:41<07:09, 15.89s/it]


  Chunk 14/40 from approved_document_E_Resistance to sound.pdf (6522 chars)
  Generating 5 questions (total so far: 65/200)
  Generated 5 pairs. Running total: 70/200
  Waiting 6s before next chunk...


Generating QA for Section E:  35%|██████████████████▌                                  | 14/40 [03:56<06:44, 15.57s/it]


  Chunk 15/40 from approved_document_E_Resistance to sound.pdf (6920 chars)
  Generating 5 questions (total so far: 70/200)
JSON parsing error: Expecting property name enclosed in double quotes: line 6 column 5 (char 771)
Attempting to clean JSON and retry...
  Generated 5 pairs. Running total: 75/200
  Waiting 6s before next chunk...


Generating QA for Section E:  38%|███████████████████▉                                 | 15/40 [04:10<06:18, 15.16s/it]


  Chunk 16/40 from approved_document_E_Resistance to sound.pdf (6132 chars)
  Generating 5 questions (total so far: 75/200)
  Generated 5 pairs. Running total: 80/200
  Waiting 6s before next chunk...


Generating QA for Section E:  40%|█████████████████████▏                               | 16/40 [04:26<06:07, 15.30s/it]


  Chunk 17/40 from approved_document_E_Resistance to sound.pdf (5515 chars)
  Generating 5 questions (total so far: 80/200)
  Generated 5 pairs. Running total: 85/200
  Waiting 6s before next chunk...


Generating QA for Section E:  42%|██████████████████████▌                              | 17/40 [04:39<05:37, 14.65s/it]


  Chunk 18/40 from approved_document_E_Resistance to sound.pdf (5354 chars)
  Generating 5 questions (total so far: 85/200)
  Generated 5 pairs. Running total: 90/200
  Waiting 6s before next chunk...


Generating QA for Section E:  45%|███████████████████████▊                             | 18/40 [04:54<05:26, 14.82s/it]


  Chunk 19/40 from approved_document_E_Resistance to sound.pdf (6546 chars)
  Generating 5 questions (total so far: 90/200)
  Generated 5 pairs. Running total: 95/200
  Waiting 6s before next chunk...


Generating QA for Section E:  48%|█████████████████████████▏                           | 19/40 [05:18<06:06, 17.46s/it]


  Chunk 20/40 from approved_document_E_Resistance to sound.pdf (5954 chars)
  Generating 5 questions (total so far: 95/200)
  Generated 5 pairs. Running total: 100/200
  Waiting 6s before next chunk...


Generating QA for Section E:  50%|██████████████████████████▌                          | 20/40 [05:38<06:06, 18.32s/it]


  Chunk 21/40 from approved_document_E_Resistance to sound.pdf (6782 chars)
  Generating 5 questions (total so far: 100/200)
  Generated 5 pairs. Running total: 105/200
  Waiting 6s before next chunk...


Generating QA for Section E:  52%|███████████████████████████▊                         | 21/40 [05:58<06:00, 18.96s/it]


  Chunk 22/40 from approved_document_E_Resistance to sound.pdf (3553 chars)
  Generating 5 questions (total so far: 105/200)
  Generated 5 pairs. Running total: 110/200
  Waiting 5s before next chunk...


Generating QA for Section E:  55%|█████████████████████████████▏                       | 22/40 [06:17<05:40, 18.92s/it]


  Chunk 23/40 from approved_document_E_Resistance to sound.pdf (5528 chars)
  Generating 5 questions (total so far: 110/200)
  Generated 5 pairs. Running total: 115/200
  Waiting 6s before next chunk...


Generating QA for Section E:  57%|██████████████████████████████▍                      | 23/40 [06:38<05:30, 19.45s/it]


  Chunk 24/40 from approved_document_E_Resistance to sound.pdf (6856 chars)
  Generating 5 questions (total so far: 115/200)
  Generated 5 pairs. Running total: 120/200
  Waiting 6s before next chunk...


Generating QA for Section E:  60%|███████████████████████████████▊                     | 24/40 [06:53<04:50, 18.15s/it]


  Chunk 25/40 from approved_document_E_Resistance to sound.pdf (6043 chars)
  Generating 5 questions (total so far: 120/200)
  Generated 5 pairs. Running total: 125/200
  Waiting 6s before next chunk...


Generating QA for Section E:  62%|█████████████████████████████████▏                   | 25/40 [07:07<04:15, 17.00s/it]


  Chunk 26/40 from approved_document_E_Resistance to sound.pdf (2998 chars)
  Generating 5 questions (total so far: 125/200)
  Generated 5 pairs. Running total: 130/200
  Waiting 5s before next chunk...


Generating QA for Section E:  65%|██████████████████████████████████▍                  | 26/40 [07:27<04:08, 17.75s/it]


  Chunk 27/40 from approved_document_E_Resistance to sound.pdf (7305 chars)
  Generating 5 questions (total so far: 130/200)
  Generated 5 pairs. Running total: 135/200
  Waiting 6s before next chunk...


Generating QA for Section E:  68%|███████████████████████████████████▊                 | 27/40 [07:43<03:42, 17.14s/it]


  Chunk 28/40 from approved_document_E_Resistance to sound.pdf (4593 chars)
  Generating 5 questions (total so far: 135/200)
  Generated 5 pairs. Running total: 140/200
  Waiting 5s before next chunk...


Generating QA for Section E:  70%|█████████████████████████████████████                | 28/40 [07:57<03:14, 16.24s/it]


  Chunk 29/40 from approved_document_E_Resistance to sound.pdf (5213 chars)
  Generating 5 questions (total so far: 140/200)
  Generated 5 pairs. Running total: 145/200
  Waiting 6s before next chunk...


Generating QA for Section E:  72%|██████████████████████████████████████▍              | 29/40 [08:21<03:24, 18.62s/it]


  Chunk 30/40 from approved_document_E_Resistance to sound.pdf (7476 chars)
  Generating 5 questions (total so far: 145/200)
  Generated 5 pairs. Running total: 150/200
  Waiting 6s before next chunk...


Generating QA for Section E:  75%|███████████████████████████████████████▊             | 30/40 [08:39<03:05, 18.57s/it]


  Chunk 31/40 from approved_document_E_Resistance to sound.pdf (6828 chars)
  Generating 5 questions (total so far: 150/200)
  Generated 5 pairs. Running total: 155/200
  Waiting 6s before next chunk...


Generating QA for Section E:  78%|█████████████████████████████████████████            | 31/40 [09:01<02:55, 19.55s/it]


  Chunk 32/40 from approved_document_E_Resistance to sound.pdf (5594 chars)
  Generating 5 questions (total so far: 155/200)
  Generated 5 pairs. Running total: 160/200
  Waiting 6s before next chunk...


Generating QA for Section E:  80%|██████████████████████████████████████████▍          | 32/40 [09:28<02:55, 21.88s/it]


  Chunk 33/40 from approved_document_E_Resistance to sound.pdf (7322 chars)
  Generating 5 questions (total so far: 160/200)
  Generated 5 pairs. Running total: 165/200
  Waiting 6s before next chunk...


Generating QA for Section E:  82%|███████████████████████████████████████████▋         | 33/40 [09:52<02:36, 22.40s/it]


  Chunk 34/40 from approved_document_E_Resistance to sound.pdf (7486 chars)
  Generating 5 questions (total so far: 165/200)
  Generated 5 pairs. Running total: 170/200
  Waiting 6s before next chunk...


Generating QA for Section E:  85%|█████████████████████████████████████████████        | 34/40 [10:12<02:09, 21.53s/it]


  Chunk 35/40 from approved_document_E_Resistance to sound.pdf (5993 chars)
  Generating 5 questions (total so far: 170/200)
  Generated 5 pairs. Running total: 175/200
  Waiting 6s before next chunk...


Generating QA for Section E:  88%|██████████████████████████████████████████████▍      | 35/40 [10:24<01:34, 18.92s/it]


  Chunk 36/40 from approved_document_E_Resistance to sound.pdf (7462 chars)
  Generating 5 questions (total so far: 175/200)
  Generated 5 pairs. Running total: 180/200
  Waiting 6s before next chunk...


Generating QA for Section E:  90%|███████████████████████████████████████████████▋     | 36/40 [10:43<01:14, 18.73s/it]


  Chunk 37/40 from approved_document_E_Resistance to sound.pdf (7173 chars)
  Generating 5 questions (total so far: 180/200)
  Generated 5 pairs. Running total: 185/200
  Waiting 6s before next chunk...


Generating QA for Section E:  92%|█████████████████████████████████████████████████    | 37/40 [11:00<00:54, 18.20s/it]


  Chunk 38/40 from approved_document_E_Resistance to sound.pdf (3924 chars)
  Generating 5 questions (total so far: 185/200)
  Generated 5 pairs. Running total: 190/200
  Waiting 5s before next chunk...


Generating QA for Section E:  95%|██████████████████████████████████████████████████▎  | 38/40 [11:13<00:33, 16.71s/it]


  Chunk 39/40 from Part_E_FAQ_approved_document_E_Resistance to sound.pdf (7003 chars)
  Generating 5 questions (total so far: 190/200)
  Generated 5 pairs. Running total: 195/200
  Waiting 6s before next chunk...


Generating QA for Section E:  98%|███████████████████████████████████████████████████▋ | 39/40 [11:32<00:17, 17.39s/it]


  Chunk 40/40 from Part_E_FAQ_approved_document_E_Resistance to sound.pdf (1165 chars)
  Generating 5 questions (total so far: 195/200)


Generating QA for Section E: 100%|█████████████████████████████████████████████████████| 40/40 [11:40<00:00, 17.52s/it]


  Generated 5 pairs. Running total: 200/200

Section E complete: 200 QA pairs

Processing Section F (2 PDFs)...
  Total chunks for Section F: 18
  Target: 200 questions (11 per chunk + 2 extra)


Generating QA for Section F:   0%|                                                              | 0/18 [00:00<?, ?it/s]


  Chunk 1/18 from approved_document_F_Ventilation.pdf (7313 chars)
  Generating 11 questions (total so far: 0/200)
  Generated 11 pairs. Running total: 11/200
  Waiting 6s before next chunk...


Generating QA for Section F:   6%|███                                                   | 1/18 [00:25<07:18, 25.81s/it]


  Chunk 2/18 from approved_document_F_Ventilation.pdf (7386 chars)
  Generating 11 questions (total so far: 11/200)
  Generated 10 pairs. Running total: 21/200
  Waiting 6s before next chunk...


Generating QA for Section F:  11%|██████                                                | 2/18 [00:49<06:35, 24.71s/it]


  Chunk 3/18 from approved_document_F_Ventilation.pdf (7463 chars)
  Generating 11 questions (total so far: 21/200)
  Generated 11 pairs. Running total: 32/200
  Waiting 6s before next chunk...


Generating QA for Section F:  17%|█████████                                             | 3/18 [01:10<05:43, 22.88s/it]


  Chunk 4/18 from approved_document_F_Ventilation.pdf (7475 chars)
  Generating 11 questions (total so far: 32/200)
  Generated 11 pairs. Running total: 43/200
  Waiting 6s before next chunk...


Generating QA for Section F:  22%|████████████                                          | 4/18 [01:34<05:29, 23.54s/it]


  Chunk 5/18 from approved_document_F_Ventilation.pdf (7495 chars)
  Generating 11 questions (total so far: 43/200)
  Generated 11 pairs. Running total: 54/200
  Waiting 6s before next chunk...


Generating QA for Section F:  28%|███████████████                                       | 5/18 [02:03<05:30, 25.40s/it]


  Chunk 6/18 from approved_document_F_Ventilation.pdf (7415 chars)
  Generating 11 questions (total so far: 54/200)
  Generated 11 pairs. Running total: 65/200
  Waiting 6s before next chunk...


Generating QA for Section F:  33%|██████████████████                                    | 6/18 [02:24<04:44, 23.73s/it]


  Chunk 7/18 from approved_document_F_Ventilation.pdf (7406 chars)
  Generating 11 questions (total so far: 65/200)
  Generated 11 pairs. Running total: 76/200
  Waiting 6s before next chunk...


Generating QA for Section F:  39%|█████████████████████                                 | 7/18 [02:43<04:06, 22.43s/it]


  Chunk 8/18 from approved_document_F_Ventilation.pdf (7271 chars)
  Generating 11 questions (total so far: 76/200)
  Generated 11 pairs. Running total: 87/200
  Waiting 6s before next chunk...


Generating QA for Section F:  44%|████████████████████████                              | 8/18 [03:04<03:37, 21.76s/it]


  Chunk 9/18 from approved_document_F_Ventilation.pdf (7446 chars)
  Generating 11 questions (total so far: 87/200)
  Generated 11 pairs. Running total: 98/200
  Waiting 6s before next chunk...


Generating QA for Section F:  50%|███████████████████████████                           | 9/18 [03:25<03:13, 21.55s/it]


  Chunk 10/18 from approved_document_F_Ventilation.pdf (7356 chars)
  Generating 11 questions (total so far: 98/200)
  Generated 10 pairs. Running total: 108/200
  Waiting 6s before next chunk...


Generating QA for Section F:  56%|█████████████████████████████▍                       | 10/18 [03:52<03:07, 23.39s/it]


  Chunk 11/18 from approved_document_F_Ventilation.pdf (7487 chars)
  Generating 11 questions (total so far: 108/200)
  Generated 10 pairs. Running total: 118/200
  Waiting 6s before next chunk...


Generating QA for Section F:  61%|████████████████████████████████▍                    | 11/18 [04:16<02:44, 23.55s/it]


  Chunk 12/18 from approved_document_F_Ventilation.pdf (7436 chars)
  Generating 11 questions (total so far: 118/200)
  Generated 10 pairs. Running total: 128/200
  Waiting 6s before next chunk...


Generating QA for Section F:  67%|███████████████████████████████████▎                 | 12/18 [04:46<02:32, 25.41s/it]


  Chunk 13/18 from approved_document_F_Ventilation.pdf (7495 chars)
  Generating 11 questions (total so far: 128/200)
  Generated 11 pairs. Running total: 139/200
  Waiting 6s before next chunk...


Generating QA for Section F:  72%|██████████████████████████████████████▎              | 13/18 [05:06<01:59, 23.93s/it]


  Chunk 14/18 from approved_document_F_Ventilation.pdf (7363 chars)
  Generating 11 questions (total so far: 139/200)
  Generated 10 pairs. Running total: 149/200
  Waiting 6s before next chunk...


Generating QA for Section F:  78%|█████████████████████████████████████████▏           | 14/18 [05:35<01:41, 25.37s/it]


  Chunk 15/18 from approved_document_F_Ventilation.pdf (7356 chars)
  Generating 11 questions (total so far: 149/200)
  Generated 11 pairs. Running total: 160/200
  Waiting 6s before next chunk...


Generating QA for Section F:  83%|████████████████████████████████████████████▏        | 15/18 [06:00<01:15, 25.20s/it]


  Chunk 16/18 from approved_document_F_Ventilation.pdf (7461 chars)
  Generating 11 questions (total so far: 160/200)
  Generated 11 pairs. Running total: 171/200
  Waiting 6s before next chunk...


Generating QA for Section F:  89%|███████████████████████████████████████████████      | 16/18 [06:20<00:47, 23.64s/it]


  Chunk 17/18 from approved_document_F_Ventilation.pdf (6058 chars)
  Generating 11 questions (total so far: 171/200)
  Generated 10 pairs. Running total: 181/200
  Waiting 6s before next chunk...


Generating QA for Section F:  94%|██████████████████████████████████████████████████   | 17/18 [06:46<00:24, 24.35s/it]


  Chunk 18/18 from Part_F_FAQ_Approved Document_F_Ventilation.pdf (6380 chars)
  Generating 19 questions (total so far: 181/200)


Generating QA for Section F: 100%|█████████████████████████████████████████████████████| 18/18 [07:18<00:00, 24.35s/it]


  Generated 17 pairs. Running total: 198/200

Section F complete: 198 QA pairs

Processing Section G (2 PDFs)...
  Total chunks for Section G: 34
  Target: 200 questions (5 per chunk + 30 extra)


Generating QA for Section G:   0%|                                                              | 0/34 [00:00<?, ?it/s]


  Chunk 1/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5044 chars)
  Generating 5 questions (total so far: 0/200)
  Generated 5 pairs. Running total: 5/200
  Waiting 6s before next chunk...


Generating QA for Section G:   3%|█▌                                                    | 1/34 [00:18<10:21, 18.82s/it]


  Chunk 2/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6229 chars)
  Generating 5 questions (total so far: 5/200)
  Generated 5 pairs. Running total: 10/200
  Waiting 6s before next chunk...


Generating QA for Section G:   6%|███▏                                                  | 2/34 [00:33<08:44, 16.39s/it]


  Chunk 3/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6503 chars)
  Generating 5 questions (total so far: 10/200)
  Generated 5 pairs. Running total: 15/200
  Waiting 6s before next chunk...


Generating QA for Section G:   9%|████▊                                                 | 3/34 [00:54<09:31, 18.44s/it]


  Chunk 4/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5168 chars)
  Generating 5 questions (total so far: 15/200)
  Generated 5 pairs. Running total: 20/200
  Waiting 6s before next chunk...


Generating QA for Section G:  12%|██████▎                                               | 4/34 [01:12<09:07, 18.24s/it]


  Chunk 5/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (4887 chars)
  Generating 6 questions (total so far: 20/200)
  Generated 6 pairs. Running total: 26/200
  Waiting 5s before next chunk...


Generating QA for Section G:  15%|███████▉                                              | 5/34 [01:31<08:58, 18.56s/it]


  Chunk 6/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5321 chars)
  Generating 6 questions (total so far: 26/200)
  Generated 6 pairs. Running total: 32/200
  Waiting 6s before next chunk...


Generating QA for Section G:  18%|█████████▌                                            | 6/34 [01:51<08:57, 19.20s/it]


  Chunk 7/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6154 chars)
  Generating 6 questions (total so far: 32/200)
  Generated 6 pairs. Running total: 38/200
  Waiting 6s before next chunk...


Generating QA for Section G:  21%|███████████                                           | 7/34 [02:11<08:41, 19.30s/it]


  Chunk 8/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6198 chars)
  Generating 6 questions (total so far: 38/200)
  Generated 6 pairs. Running total: 44/200
  Waiting 6s before next chunk...


Generating QA for Section G:  24%|████████████▋                                         | 8/34 [02:30<08:20, 19.27s/it]


  Chunk 9/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6989 chars)
  Generating 6 questions (total so far: 44/200)
  Generated 6 pairs. Running total: 50/200
  Waiting 6s before next chunk...


Generating QA for Section G:  26%|██████████████▎                                       | 9/34 [02:50<08:06, 19.45s/it]


  Chunk 10/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5039 chars)
  Generating 6 questions (total so far: 50/200)
  Generated 6 pairs. Running total: 56/200
  Waiting 6s before next chunk...


Generating QA for Section G:  29%|███████████████▌                                     | 10/34 [03:08<07:38, 19.09s/it]


  Chunk 11/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (4787 chars)
  Generating 6 questions (total so far: 56/200)
  Generated 6 pairs. Running total: 62/200
  Waiting 5s before next chunk...


Generating QA for Section G:  32%|█████████████████▏                                   | 11/34 [03:25<06:59, 18.26s/it]


  Chunk 12/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5057 chars)
  Generating 6 questions (total so far: 62/200)
  Generated 6 pairs. Running total: 68/200
  Waiting 6s before next chunk...


Generating QA for Section G:  35%|██████████████████▋                                  | 12/34 [03:40<06:19, 17.25s/it]


  Chunk 13/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (3025 chars)
  Generating 6 questions (total so far: 68/200)
  Generated 6 pairs. Running total: 74/200
  Waiting 5s before next chunk...


Generating QA for Section G:  38%|████████████████████▎                                | 13/34 [03:57<06:05, 17.39s/it]


  Chunk 14/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5299 chars)
  Generating 6 questions (total so far: 74/200)
  Generated 6 pairs. Running total: 80/200
  Waiting 6s before next chunk...


Generating QA for Section G:  41%|█████████████████████▊                               | 14/34 [04:18<06:09, 18.48s/it]


  Chunk 15/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6311 chars)
  Generating 6 questions (total so far: 80/200)
  Generated 6 pairs. Running total: 86/200
  Waiting 6s before next chunk...


Generating QA for Section G:  44%|███████████████████████▍                             | 15/34 [04:38<05:57, 18.82s/it]


  Chunk 16/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (4203 chars)
  Generating 6 questions (total so far: 86/200)
  Generated 6 pairs. Running total: 92/200
  Waiting 5s before next chunk...


Generating QA for Section G:  47%|████████████████████████▉                            | 16/34 [04:56<05:33, 18.52s/it]


  Chunk 17/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (7344 chars)
  Generating 6 questions (total so far: 92/200)
  Generated 6 pairs. Running total: 98/200
  Waiting 6s before next chunk...


Generating QA for Section G:  50%|██████████████████████████▌                          | 17/34 [05:13<05:09, 18.21s/it]


  Chunk 18/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (7102 chars)
  Generating 6 questions (total so far: 98/200)
  Generated 6 pairs. Running total: 104/200
  Waiting 6s before next chunk...


Generating QA for Section G:  53%|████████████████████████████                         | 18/34 [05:36<05:15, 19.74s/it]


  Chunk 19/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (7434 chars)
  Generating 6 questions (total so far: 104/200)
  Generated 6 pairs. Running total: 110/200
  Waiting 6s before next chunk...


Generating QA for Section G:  56%|█████████████████████████████▌                       | 19/34 [05:59<05:10, 20.68s/it]


  Chunk 20/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (4172 chars)
  Generating 6 questions (total so far: 110/200)
  Generated 6 pairs. Running total: 116/200
  Waiting 5s before next chunk...


Generating QA for Section G:  59%|███████████████████████████████▏                     | 20/34 [06:13<04:20, 18.62s/it]


  Chunk 21/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (4533 chars)
  Generating 6 questions (total so far: 116/200)
  Generated 6 pairs. Running total: 122/200
  Waiting 5s before next chunk...


Generating QA for Section G:  62%|████████████████████████████████▋                    | 21/34 [06:25<03:37, 16.72s/it]


  Chunk 22/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5189 chars)
  Generating 6 questions (total so far: 122/200)
  Generated 6 pairs. Running total: 128/200
  Waiting 6s before next chunk...


Generating QA for Section G:  65%|██████████████████████████████████▎                  | 22/34 [06:38<03:07, 15.60s/it]


  Chunk 23/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6272 chars)
  Generating 6 questions (total so far: 128/200)
  Generated 6 pairs. Running total: 134/200
  Waiting 6s before next chunk...


Generating QA for Section G:  68%|███████████████████████████████████▊                 | 23/34 [06:54<02:49, 15.44s/it]


  Chunk 24/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6334 chars)
  Generating 6 questions (total so far: 134/200)
  Generated 6 pairs. Running total: 140/200
  Waiting 6s before next chunk...


Generating QA for Section G:  71%|█████████████████████████████████████▍               | 24/34 [07:23<03:16, 19.63s/it]


  Chunk 25/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (6792 chars)
  Generating 6 questions (total so far: 140/200)
  Generated 6 pairs. Running total: 146/200
  Waiting 6s before next chunk...


Generating QA for Section G:  74%|██████████████████████████████████████▉              | 25/34 [07:48<03:12, 21.39s/it]


  Chunk 26/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (4388 chars)
  Generating 6 questions (total so far: 146/200)
  Generated 6 pairs. Running total: 152/200
  Waiting 5s before next chunk...


Generating QA for Section G:  76%|████████████████████████████████████████▌            | 26/34 [08:08<02:46, 20.83s/it]


  Chunk 27/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (3747 chars)
  Generating 6 questions (total so far: 152/200)
  Generated 6 pairs. Running total: 158/200
  Waiting 5s before next chunk...


Generating QA for Section G:  79%|██████████████████████████████████████████           | 27/34 [08:26<02:19, 19.93s/it]


  Chunk 28/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5028 chars)
  Generating 6 questions (total so far: 158/200)
  Generated 6 pairs. Running total: 164/200
  Waiting 6s before next chunk...


Generating QA for Section G:  82%|███████████████████████████████████████████▋         | 28/34 [08:41<01:51, 18.50s/it]


  Chunk 29/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5415 chars)
  Generating 6 questions (total so far: 164/200)
  Generated 6 pairs. Running total: 170/200
  Waiting 6s before next chunk...


Generating QA for Section G:  85%|█████████████████████████████████████████████▏       | 29/34 [09:00<01:34, 18.82s/it]


  Chunk 30/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (5611 chars)
  Generating 6 questions (total so far: 170/200)
  Generated 6 pairs. Running total: 176/200
  Waiting 6s before next chunk...


Generating QA for Section G:  88%|██████████████████████████████████████████████▊      | 30/34 [09:22<01:18, 19.52s/it]


  Chunk 31/34 from approved_document_G_Sanitation, hot water safety and water efficiency.pdf (3322 chars)
  Generating 5 questions (total so far: 176/200)
  Generated 5 pairs. Running total: 181/200
  Waiting 5s before next chunk...


Generating QA for Section G:  91%|████████████████████████████████████████████████▎    | 31/34 [09:35<00:52, 17.60s/it]


  Chunk 32/34 from Part_G_FAQ_approved_document_G_Sanitation, hot water safety and water efficiency.pdf (7356 chars)
  Generating 5 questions (total so far: 181/200)
  Generated 5 pairs. Running total: 186/200
  Waiting 6s before next chunk...


Generating QA for Section G:  94%|█████████████████████████████████████████████████▉   | 32/34 [09:56<00:37, 18.68s/it]


  Chunk 33/34 from Part_G_FAQ_approved_document_G_Sanitation, hot water safety and water efficiency.pdf (7412 chars)
  Generating 5 questions (total so far: 186/200)
  Generated 5 pairs. Running total: 191/200
  Waiting 6s before next chunk...


Generating QA for Section G:  97%|███████████████████████████████████████████████████▍ | 33/34 [10:15<00:18, 18.68s/it]


  Chunk 34/34 from Part_G_FAQ_approved_document_G_Sanitation, hot water safety and water efficiency.pdf (4266 chars)
  Generating 9 questions (total so far: 191/200)


Generating QA for Section G: 100%|█████████████████████████████████████████████████████| 34/34 [10:37<00:00, 18.76s/it]


  Generated 9 pairs. Running total: 200/200

Section G complete: 200 QA pairs

Processing Section H (1 PDFs)...
  Total chunks for Section H: 33
  Target: 200 questions (6 per chunk + 2 extra)


Generating QA for Section H:   0%|                                                              | 0/33 [00:00<?, ?it/s]


  Chunk 1/33 from approved_document_H_Drainage and waste disposal.pdf (6839 chars)
  Generating 6 questions (total so far: 0/200)
  Generated 6 pairs. Running total: 6/200
  Waiting 6s before next chunk...


Generating QA for Section H:   3%|█▋                                                    | 1/33 [00:22<11:49, 22.16s/it]


  Chunk 2/33 from approved_document_H_Drainage and waste disposal.pdf (1973 chars)
  Generating 6 questions (total so far: 6/200)
  Generated 6 pairs. Running total: 12/200
  Waiting 5s before next chunk...


Generating QA for Section H:   6%|███▎                                                  | 2/33 [00:42<10:48, 20.92s/it]


  Chunk 3/33 from approved_document_H_Drainage and waste disposal.pdf (7444 chars)
  Generating 6 questions (total so far: 12/200)
  Generated 6 pairs. Running total: 18/200
  Waiting 6s before next chunk...


Generating QA for Section H:   9%|████▉                                                 | 3/33 [01:01<10:08, 20.28s/it]


  Chunk 4/33 from approved_document_H_Drainage and waste disposal.pdf (5454 chars)
  Generating 6 questions (total so far: 18/200)
  Generated 6 pairs. Running total: 24/200
  Waiting 6s before next chunk...


Generating QA for Section H:  12%|██████▌                                               | 4/33 [01:21<09:39, 19.98s/it]


  Chunk 5/33 from approved_document_H_Drainage and waste disposal.pdf (2102 chars)
  Generating 6 questions (total so far: 24/200)
  Generated 6 pairs. Running total: 30/200
  Waiting 5s before next chunk...


Generating QA for Section H:  15%|████████▏                                             | 5/33 [01:40<09:12, 19.73s/it]


  Chunk 6/33 from approved_document_H_Drainage and waste disposal.pdf (7311 chars)
  Generating 6 questions (total so far: 30/200)
  Generated 6 pairs. Running total: 36/200
  Waiting 6s before next chunk...


Generating QA for Section H:  18%|█████████▊                                            | 6/33 [01:55<08:12, 18.23s/it]


  Chunk 7/33 from approved_document_H_Drainage and waste disposal.pdf (7437 chars)
  Generating 6 questions (total so far: 36/200)
  Generated 6 pairs. Running total: 42/200
  Waiting 6s before next chunk...


Generating QA for Section H:  21%|███████████▍                                          | 7/33 [02:14<07:55, 18.29s/it]


  Chunk 8/33 from approved_document_H_Drainage and waste disposal.pdf (6058 chars)
  Generating 6 questions (total so far: 42/200)
  Generated 6 pairs. Running total: 48/200
  Waiting 6s before next chunk...


Generating QA for Section H:  24%|█████████████                                         | 8/33 [02:33<07:41, 18.47s/it]


  Chunk 9/33 from approved_document_H_Drainage and waste disposal.pdf (6305 chars)
  Generating 6 questions (total so far: 48/200)
  Generated 6 pairs. Running total: 54/200
  Waiting 6s before next chunk...


Generating QA for Section H:  27%|██████████████▋                                       | 9/33 [02:51<07:23, 18.47s/it]


  Chunk 10/33 from approved_document_H_Drainage and waste disposal.pdf (7033 chars)
  Generating 6 questions (total so far: 54/200)
  Generated 6 pairs. Running total: 60/200
  Waiting 6s before next chunk...


Generating QA for Section H:  30%|████████████████                                     | 10/33 [03:12<07:22, 19.26s/it]


  Chunk 11/33 from approved_document_H_Drainage and waste disposal.pdf (5503 chars)
  Generating 6 questions (total so far: 60/200)
  Generated 6 pairs. Running total: 66/200
  Waiting 6s before next chunk...


Generating QA for Section H:  33%|█████████████████▋                                   | 11/33 [03:30<06:55, 18.87s/it]


  Chunk 12/33 from approved_document_H_Drainage and waste disposal.pdf (6621 chars)
  Generating 6 questions (total so far: 66/200)
  Generated 6 pairs. Running total: 72/200
  Waiting 6s before next chunk...


Generating QA for Section H:  36%|███████████████████▎                                 | 12/33 [03:48<06:28, 18.48s/it]


  Chunk 13/33 from approved_document_H_Drainage and waste disposal.pdf (7447 chars)
  Generating 6 questions (total so far: 72/200)
  Generated 6 pairs. Running total: 78/200
  Waiting 6s before next chunk...


Generating QA for Section H:  39%|████████████████████▉                                | 13/33 [04:07<06:16, 18.83s/it]


  Chunk 14/33 from approved_document_H_Drainage and waste disposal.pdf (7435 chars)
  Generating 6 questions (total so far: 78/200)
  Generated 6 pairs. Running total: 84/200
  Waiting 6s before next chunk...


Generating QA for Section H:  42%|██████████████████████▍                              | 14/33 [04:30<06:20, 20.03s/it]


  Chunk 15/33 from approved_document_H_Drainage and waste disposal.pdf (2930 chars)
  Generating 6 questions (total so far: 84/200)
  Generated 6 pairs. Running total: 90/200
  Waiting 5s before next chunk...


Generating QA for Section H:  45%|████████████████████████                             | 15/33 [04:46<05:37, 18.76s/it]


  Chunk 16/33 from approved_document_H_Drainage and waste disposal.pdf (5240 chars)
  Generating 6 questions (total so far: 90/200)
  Generated 6 pairs. Running total: 96/200
  Waiting 6s before next chunk...


Generating QA for Section H:  48%|█████████████████████████▋                           | 16/33 [05:02<05:06, 18.05s/it]


  Chunk 17/33 from approved_document_H_Drainage and waste disposal.pdf (2493 chars)
  Generating 6 questions (total so far: 96/200)
  Generated 6 pairs. Running total: 102/200
  Waiting 5s before next chunk...


Generating QA for Section H:  52%|███████████████████████████▎                         | 17/33 [05:18<04:37, 17.35s/it]


  Chunk 18/33 from approved_document_H_Drainage and waste disposal.pdf (7478 chars)
  Generating 6 questions (total so far: 102/200)
  Generated 6 pairs. Running total: 108/200
  Waiting 6s before next chunk...


Generating QA for Section H:  55%|████████████████████████████▉                        | 18/33 [05:39<04:34, 18.28s/it]


  Chunk 19/33 from approved_document_H_Drainage and waste disposal.pdf (4649 chars)
  Generating 6 questions (total so far: 108/200)
  Generated 6 pairs. Running total: 114/200
  Waiting 5s before next chunk...


Generating QA for Section H:  58%|██████████████████████████████▌                      | 19/33 [05:53<04:00, 17.17s/it]


  Chunk 20/33 from approved_document_H_Drainage and waste disposal.pdf (7297 chars)
  Generating 6 questions (total so far: 114/200)
JSON parsing error: Expecting property name enclosed in double quotes: line 6 column 5 (char 871)
Attempting to clean JSON and retry...
  Generated 6 pairs. Running total: 120/200
  Waiting 6s before next chunk...


Generating QA for Section H:  61%|████████████████████████████████                     | 20/33 [06:18<04:13, 19.49s/it]


  Chunk 21/33 from approved_document_H_Drainage and waste disposal.pdf (2249 chars)
  Generating 6 questions (total so far: 120/200)
  Generated 6 pairs. Running total: 126/200
  Waiting 5s before next chunk...


Generating QA for Section H:  64%|█████████████████████████████████▋                   | 21/33 [06:33<03:38, 18.19s/it]


  Chunk 22/33 from approved_document_H_Drainage and waste disposal.pdf (7398 chars)
  Generating 6 questions (total so far: 126/200)
  Generated 6 pairs. Running total: 132/200
  Waiting 6s before next chunk...


Generating QA for Section H:  67%|███████████████████████████████████▎                 | 22/33 [06:52<03:20, 18.27s/it]


  Chunk 23/33 from approved_document_H_Drainage and waste disposal.pdf (6866 chars)
  Generating 6 questions (total so far: 132/200)
  Generated 6 pairs. Running total: 138/200
  Waiting 6s before next chunk...


Generating QA for Section H:  70%|████████████████████████████████████▉                | 23/33 [07:10<03:03, 18.31s/it]


  Chunk 24/33 from approved_document_H_Drainage and waste disposal.pdf (2031 chars)
  Generating 6 questions (total so far: 138/200)
  Generated 6 pairs. Running total: 144/200
  Waiting 5s before next chunk...


Generating QA for Section H:  73%|██████████████████████████████████████▌              | 24/33 [07:25<02:36, 17.37s/it]


  Chunk 25/33 from approved_document_H_Drainage and waste disposal.pdf (7470 chars)
  Generating 6 questions (total so far: 144/200)
  Generated 6 pairs. Running total: 150/200
  Waiting 6s before next chunk...


Generating QA for Section H:  76%|████████████████████████████████████████▏            | 25/33 [07:45<02:23, 17.96s/it]


  Chunk 26/33 from approved_document_H_Drainage and waste disposal.pdf (4303 chars)
  Generating 6 questions (total so far: 150/200)
  Generated 6 pairs. Running total: 156/200
  Waiting 5s before next chunk...


Generating QA for Section H:  79%|█████████████████████████████████████████▊           | 26/33 [08:02<02:05, 17.89s/it]


  Chunk 27/33 from approved_document_H_Drainage and waste disposal.pdf (4249 chars)
  Generating 6 questions (total so far: 156/200)
  Generated 6 pairs. Running total: 162/200
  Waiting 5s before next chunk...


Generating QA for Section H:  82%|███████████████████████████████████████████▎         | 27/33 [08:18<01:44, 17.38s/it]


  Chunk 28/33 from approved_document_H_Drainage and waste disposal.pdf (4203 chars)
  Generating 6 questions (total so far: 162/200)
  Generated 6 pairs. Running total: 168/200
  Waiting 5s before next chunk...


Generating QA for Section H:  85%|████████████████████████████████████████████▉        | 28/33 [08:39<01:32, 18.44s/it]


  Chunk 29/33 from approved_document_H_Drainage and waste disposal.pdf (6198 chars)
  Generating 6 questions (total so far: 168/200)
  Generated 6 pairs. Running total: 174/200
  Waiting 6s before next chunk...


Generating QA for Section H:  88%|██████████████████████████████████████████████▌      | 29/33 [08:55<01:10, 17.73s/it]


  Chunk 30/33 from approved_document_H_Drainage and waste disposal.pdf (7494 chars)
  Generating 6 questions (total so far: 174/200)
  Generated 6 pairs. Running total: 180/200
  Waiting 6s before next chunk...


Generating QA for Section H:  91%|████████████████████████████████████████████████▏    | 30/33 [09:16<00:55, 18.49s/it]


  Chunk 31/33 from approved_document_H_Drainage and waste disposal.pdf (7498 chars)
  Generating 6 questions (total so far: 180/200)
  Generated 6 pairs. Running total: 186/200
  Waiting 6s before next chunk...


Generating QA for Section H:  94%|█████████████████████████████████████████████████▊   | 31/33 [09:30<00:34, 17.37s/it]


  Chunk 32/33 from approved_document_H_Drainage and waste disposal.pdf (7361 chars)
  Generating 6 questions (total so far: 186/200)
  Generated 6 pairs. Running total: 192/200
  Waiting 6s before next chunk...


Generating QA for Section H:  97%|███████████████████████████████████████████████████▍ | 32/33 [09:48<00:17, 17.32s/it]


  Chunk 33/33 from approved_document_H_Drainage and waste disposal.pdf (3672 chars)
  Generating 8 questions (total so far: 192/200)


Generating QA for Section H: 100%|█████████████████████████████████████████████████████| 33/33 [10:14<00:00, 18.62s/it]


  Generated 8 pairs. Running total: 200/200

Section H complete: 200 QA pairs

Processing Section J (1 PDFs)...
  Total chunks for Section J: 36
  Target: 200 questions (5 per chunk + 20 extra)


Generating QA for Section J:   0%|                                                              | 0/36 [00:00<?, ?it/s]


  Chunk 1/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (472 chars)
  Generating 5 questions (total so far: 0/200)
  Generated 5 pairs. Running total: 5/200
  Waiting 5s before next chunk...


Generating QA for Section J:   3%|█▌                                                    | 1/36 [00:14<08:22, 14.36s/it]


  Chunk 2/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (2686 chars)
  Generating 5 questions (total so far: 5/200)
  Generated 5 pairs. Running total: 10/200
  Waiting 5s before next chunk...


Generating QA for Section J:   6%|███                                                   | 2/36 [00:26<07:27, 13.15s/it]


  Chunk 3/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7474 chars)
  Generating 5 questions (total so far: 10/200)
  Generated 5 pairs. Running total: 15/200
  Waiting 6s before next chunk...


Generating QA for Section J:   8%|████▌                                                 | 3/36 [00:44<08:18, 15.10s/it]


  Chunk 4/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7439 chars)
  Generating 5 questions (total so far: 15/200)
  Generated 5 pairs. Running total: 20/200
  Waiting 6s before next chunk...


Generating QA for Section J:  11%|██████                                                | 4/36 [01:00<08:24, 15.77s/it]


  Chunk 5/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7295 chars)
  Generating 5 questions (total so far: 20/200)
  Generated 5 pairs. Running total: 25/200
  Waiting 6s before next chunk...


Generating QA for Section J:  14%|███████▌                                              | 5/36 [01:16<08:11, 15.85s/it]


  Chunk 6/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (5880 chars)
  Generating 5 questions (total so far: 25/200)
  Generated 5 pairs. Running total: 30/200
  Waiting 6s before next chunk...


Generating QA for Section J:  17%|█████████                                             | 6/36 [01:31<07:39, 15.30s/it]


  Chunk 7/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7330 chars)
  Generating 5 questions (total so far: 30/200)
  Generated 5 pairs. Running total: 35/200
  Waiting 6s before next chunk...


Generating QA for Section J:  19%|██████████▌                                           | 7/36 [01:49<07:54, 16.37s/it]


  Chunk 8/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7458 chars)
  Generating 5 questions (total so far: 35/200)
  Generated 5 pairs. Running total: 40/200
  Waiting 6s before next chunk...


Generating QA for Section J:  22%|████████████                                          | 8/36 [02:07<07:49, 16.77s/it]


  Chunk 9/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7208 chars)
  Generating 5 questions (total so far: 40/200)
  Generated 5 pairs. Running total: 45/200
  Waiting 6s before next chunk...


Generating QA for Section J:  25%|█████████████▌                                        | 9/36 [02:23<07:26, 16.55s/it]


  Chunk 10/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7262 chars)
  Generating 5 questions (total so far: 45/200)
  Generated 5 pairs. Running total: 50/200
  Waiting 6s before next chunk...


Generating QA for Section J:  28%|██████████████▋                                      | 10/36 [02:37<06:50, 15.79s/it]


  Chunk 11/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7488 chars)
  Generating 5 questions (total so far: 50/200)
  Generated 5 pairs. Running total: 55/200
  Waiting 6s before next chunk...


Generating QA for Section J:  31%|████████████████▏                                    | 11/36 [02:53<06:36, 15.85s/it]


  Chunk 12/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7285 chars)
  Generating 5 questions (total so far: 55/200)
  Generated 5 pairs. Running total: 60/200
  Waiting 6s before next chunk...


Generating QA for Section J:  33%|█████████████████▋                                   | 12/36 [03:08<06:16, 15.70s/it]


  Chunk 13/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7171 chars)
  Generating 5 questions (total so far: 60/200)
  Generated 5 pairs. Running total: 65/200
  Waiting 6s before next chunk...


Generating QA for Section J:  36%|███████████████████▏                                 | 13/36 [03:27<06:22, 16.63s/it]


  Chunk 14/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7412 chars)
  Generating 5 questions (total so far: 65/200)
  Generated 5 pairs. Running total: 70/200
  Waiting 6s before next chunk...


Generating QA for Section J:  39%|████████████████████▌                                | 14/36 [03:44<06:10, 16.83s/it]


  Chunk 15/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7388 chars)
  Generating 5 questions (total so far: 70/200)
  Generated 5 pairs. Running total: 75/200
  Waiting 6s before next chunk...


Generating QA for Section J:  42%|██████████████████████                               | 15/36 [03:59<05:42, 16.31s/it]


  Chunk 16/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7283 chars)
  Generating 5 questions (total so far: 75/200)
JSON parsing error: Expecting property name enclosed in double quotes: line 6 column 5 (char 863)
Attempting to clean JSON and retry...
  Generated 5 pairs. Running total: 80/200
  Waiting 6s before next chunk...


Generating QA for Section J:  44%|███████████████████████▌                             | 16/36 [04:17<05:33, 16.66s/it]


  Chunk 17/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (6947 chars)
  Generating 6 questions (total so far: 80/200)
  Generated 6 pairs. Running total: 86/200
  Waiting 6s before next chunk...


Generating QA for Section J:  47%|█████████████████████████                            | 17/36 [04:34<05:16, 16.68s/it]


  Chunk 18/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7490 chars)
  Generating 6 questions (total so far: 86/200)
  Generated 6 pairs. Running total: 92/200
  Waiting 6s before next chunk...


Generating QA for Section J:  50%|██████████████████████████▌                          | 18/36 [04:55<05:22, 17.94s/it]


  Chunk 19/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (5463 chars)
  Generating 6 questions (total so far: 92/200)
  Generated 6 pairs. Running total: 98/200
  Waiting 6s before next chunk...


Generating QA for Section J:  53%|███████████████████████████▉                         | 19/36 [05:11<04:57, 17.51s/it]


  Chunk 20/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7391 chars)
  Generating 6 questions (total so far: 98/200)
  Generated 6 pairs. Running total: 104/200
  Waiting 6s before next chunk...


Generating QA for Section J:  56%|█████████████████████████████▍                       | 20/36 [05:29<04:39, 17.50s/it]


  Chunk 21/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7393 chars)
  Generating 5 questions (total so far: 104/200)
  Generated 5 pairs. Running total: 109/200
  Waiting 6s before next chunk...


Generating QA for Section J:  58%|██████████████████████████████▉                      | 21/36 [05:48<04:30, 18.01s/it]


  Chunk 22/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7391 chars)
  Generating 5 questions (total so far: 109/200)
  Generated 5 pairs. Running total: 114/200
  Waiting 6s before next chunk...


Generating QA for Section J:  61%|████████████████████████████████▍                    | 22/36 [06:05<04:09, 17.82s/it]


  Chunk 23/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (1513 chars)
  Generating 5 questions (total so far: 114/200)
  Generated 5 pairs. Running total: 119/200
  Waiting 5s before next chunk...


Generating QA for Section J:  64%|█████████████████████████████████▊                   | 23/36 [06:18<03:31, 16.24s/it]


  Chunk 24/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7450 chars)
  Generating 5 questions (total so far: 119/200)
  Generated 5 pairs. Running total: 124/200
  Waiting 6s before next chunk...


Generating QA for Section J:  67%|███████████████████████████████████▎                 | 24/36 [06:36<03:24, 17.01s/it]


  Chunk 25/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7467 chars)
  Generating 5 questions (total so far: 124/200)
  Generated 5 pairs. Running total: 129/200
  Waiting 6s before next chunk...


Generating QA for Section J:  69%|████████████████████████████████████▊                | 25/36 [06:52<03:03, 16.64s/it]


  Chunk 26/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7377 chars)
  Generating 5 questions (total so far: 129/200)
  Generated 5 pairs. Running total: 134/200
  Waiting 6s before next chunk...


Generating QA for Section J:  72%|██████████████████████████████████████▎              | 26/36 [07:09<02:48, 16.81s/it]


  Chunk 27/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7478 chars)
  Generating 5 questions (total so far: 134/200)
  Generated 5 pairs. Running total: 139/200
  Waiting 6s before next chunk...


Generating QA for Section J:  75%|███████████████████████████████████████▊             | 27/36 [07:28<02:35, 17.33s/it]


  Chunk 28/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7496 chars)
  Generating 5 questions (total so far: 139/200)
  Generated 5 pairs. Running total: 144/200
  Waiting 6s before next chunk...


Generating QA for Section J:  78%|█████████████████████████████████████████▏           | 28/36 [07:54<02:40, 20.03s/it]


  Chunk 29/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7491 chars)
  Generating 5 questions (total so far: 144/200)
  Generated 5 pairs. Running total: 149/200
  Waiting 6s before next chunk...


Generating QA for Section J:  81%|██████████████████████████████████████████▋          | 29/36 [08:14<02:18, 19.84s/it]


  Chunk 30/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7497 chars)
  Generating 5 questions (total so far: 149/200)
  Generated 5 pairs. Running total: 154/200
  Waiting 6s before next chunk...


Generating QA for Section J:  83%|████████████████████████████████████████████▏        | 30/36 [08:37<02:06, 21.02s/it]


  Chunk 31/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7497 chars)
  Generating 5 questions (total so far: 154/200)
  Generated 5 pairs. Running total: 159/200
  Waiting 6s before next chunk...


Generating QA for Section J:  86%|█████████████████████████████████████████████▋       | 31/36 [08:56<01:42, 20.40s/it]


  Chunk 32/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (3314 chars)
  Generating 5 questions (total so far: 159/200)
  Generated 5 pairs. Running total: 164/200
  Waiting 5s before next chunk...


Generating QA for Section J:  89%|███████████████████████████████████████████████      | 32/36 [09:13<01:17, 19.38s/it]


  Chunk 33/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7357 chars)
  Generating 5 questions (total so far: 164/200)
  Generated 5 pairs. Running total: 169/200
  Waiting 6s before next chunk...


Generating QA for Section J:  92%|████████████████████████████████████████████████▌    | 33/36 [09:29<00:54, 18.11s/it]


  Chunk 34/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (7457 chars)
  Generating 5 questions (total so far: 169/200)
  Generated 5 pairs. Running total: 174/200
  Waiting 6s before next chunk...


Generating QA for Section J:  94%|██████████████████████████████████████████████████   | 34/36 [09:46<00:35, 17.75s/it]


  Chunk 35/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (14134 chars)
  Generating 5 questions (total so far: 174/200)
  Generated 5 pairs. Running total: 179/200
  Waiting 7s before next chunk...


Generating QA for Section J:  97%|███████████████████████████████████████████████████▌ | 35/36 [10:05<00:18, 18.29s/it]


  Chunk 36/36 from approved_document_J_Combustion appliances and fuel storage systems.pdf (3120 chars)
  Generating 21 questions (total so far: 179/200)


Generating QA for Section J: 100%|█████████████████████████████████████████████████████| 36/36 [10:30<00:00, 17.53s/it]


  Generated 19 pairs. Running total: 198/200

Section J complete: 198 QA pairs

Processing Section K (1 PDFs)...
  Total chunks for Section K: 15
  Target: 200 questions (13 per chunk + 5 extra)


Generating QA for Section K:   0%|                                                              | 0/15 [00:00<?, ?it/s]


  Chunk 1/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (7487 chars)
  Generating 13 questions (total so far: 0/200)
  Generated 12 pairs. Running total: 12/200
  Waiting 6s before next chunk...


Generating QA for Section K:   7%|███▌                                                  | 1/15 [00:23<05:24, 23.21s/it]


  Chunk 2/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (7119 chars)
  Generating 13 questions (total so far: 12/200)
  Generated 13 pairs. Running total: 25/200
  Waiting 6s before next chunk...


Generating QA for Section K:  13%|███████▏                                              | 2/15 [00:51<05:39, 26.10s/it]


  Chunk 3/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (7257 chars)
  Generating 13 questions (total so far: 25/200)
  Generated 13 pairs. Running total: 38/200
  Waiting 6s before next chunk...


Generating QA for Section K:  20%|██████████▊                                           | 3/15 [01:19<05:23, 26.93s/it]


  Chunk 4/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (5627 chars)
  Generating 13 questions (total so far: 38/200)
  Generated 13 pairs. Running total: 51/200
  Waiting 6s before next chunk...


Generating QA for Section K:  27%|██████████████▍                                       | 4/15 [01:44<04:50, 26.41s/it]


  Chunk 5/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (6410 chars)
  Generating 13 questions (total so far: 51/200)
  Generated 13 pairs. Running total: 64/200
  Waiting 6s before next chunk...


Generating QA for Section K:  33%|██████████████████                                    | 5/15 [02:13<04:31, 27.15s/it]


  Chunk 6/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (6912 chars)
  Generating 13 questions (total so far: 64/200)
  Generated 12 pairs. Running total: 76/200
  Waiting 6s before next chunk...


Generating QA for Section K:  40%|█████████████████████▌                                | 6/15 [02:34<03:45, 25.07s/it]


  Chunk 7/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (5155 chars)
  Generating 13 questions (total so far: 76/200)
  Generated 13 pairs. Running total: 89/200
  Waiting 6s before next chunk...


Generating QA for Section K:  47%|█████████████████████████▏                            | 7/15 [03:00<03:22, 25.34s/it]


  Chunk 8/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (6392 chars)
  Generating 13 questions (total so far: 89/200)
  Generated 13 pairs. Running total: 102/200
  Waiting 6s before next chunk...


Generating QA for Section K:  53%|████████████████████████████▊                         | 8/15 [03:37<03:24, 29.20s/it]


  Chunk 9/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (6269 chars)
  Generating 13 questions (total so far: 102/200)
  Generated 13 pairs. Running total: 115/200
  Waiting 6s before next chunk...


Generating QA for Section K:  60%|████████████████████████████████▍                     | 9/15 [04:05<02:53, 28.84s/it]


  Chunk 10/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (6835 chars)
  Generating 13 questions (total so far: 115/200)
  Generated 12 pairs. Running total: 127/200
  Waiting 6s before next chunk...


Generating QA for Section K:  67%|███████████████████████████████████▎                 | 10/15 [04:29<02:15, 27.16s/it]


  Chunk 11/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (7384 chars)
  Generating 13 questions (total so far: 127/200)
  Generated 13 pairs. Running total: 140/200
  Waiting 6s before next chunk...


Generating QA for Section K:  73%|██████████████████████████████████████▊              | 11/15 [04:58<01:51, 27.93s/it]


  Chunk 12/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (5129 chars)
  Generating 13 questions (total so far: 140/200)
  Generated 12 pairs. Running total: 152/200
  Waiting 6s before next chunk...


Generating QA for Section K:  80%|██████████████████████████████████████████▍          | 12/15 [05:20<01:18, 26.10s/it]


  Chunk 13/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (5194 chars)
  Generating 13 questions (total so far: 152/200)
  Generated 13 pairs. Running total: 165/200
  Waiting 6s before next chunk...


Generating QA for Section K:  87%|█████████████████████████████████████████████▉       | 13/15 [05:46<00:52, 26.09s/it]


  Chunk 14/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (7318 chars)
  Generating 13 questions (total so far: 165/200)
  Generated 13 pairs. Running total: 178/200
  Waiting 6s before next chunk...


Generating QA for Section K:  93%|█████████████████████████████████████████████████▍   | 14/15 [06:14<00:26, 26.42s/it]


  Chunk 15/15 from Approved_Document_K_Protection from falling, collision and impact.pdf (1273 chars)
  Generating 22 questions (total so far: 178/200)


Generating QA for Section K: 100%|█████████████████████████████████████████████████████| 15/15 [06:40<00:00, 26.67s/it]


  Generated 20 pairs. Running total: 198/200

Section K complete: 198 QA pairs

Processing Section L (1 PDFs)...
  Total chunks for Section L: 32
  Target: 200 questions (6 per chunk + 8 extra)


Generating QA for Section L:   0%|                                                              | 0/32 [00:00<?, ?it/s]


  Chunk 1/32 from approved_document_L_Conservation of fuel and power.pdf (7313 chars)
  Generating 6 questions (total so far: 0/200)
  Generated 6 pairs. Running total: 6/200
  Waiting 6s before next chunk...


Generating QA for Section L:   3%|█▋                                                    | 1/32 [00:17<09:14, 17.89s/it]


  Chunk 2/32 from approved_document_L_Conservation of fuel and power.pdf (5165 chars)
  Generating 6 questions (total so far: 6/200)
  Generated 6 pairs. Running total: 12/200
  Waiting 6s before next chunk...


Generating QA for Section L:   6%|███▍                                                  | 2/32 [00:32<07:55, 15.86s/it]


  Chunk 3/32 from approved_document_L_Conservation of fuel and power.pdf (7488 chars)
  Generating 6 questions (total so far: 12/200)
  Generated 6 pairs. Running total: 18/200
  Waiting 6s before next chunk...


Generating QA for Section L:   9%|█████                                                 | 3/32 [00:46<07:19, 15.17s/it]


  Chunk 4/32 from approved_document_L_Conservation of fuel and power.pdf (7370 chars)
  Generating 6 questions (total so far: 18/200)
  Generated 6 pairs. Running total: 24/200
  Waiting 6s before next chunk...


Generating QA for Section L:  12%|██████▊                                               | 4/32 [01:05<07:50, 16.80s/it]


  Chunk 5/32 from approved_document_L_Conservation of fuel and power.pdf (7086 chars)
  Generating 6 questions (total so far: 24/200)
  Generated 6 pairs. Running total: 30/200
  Waiting 6s before next chunk...


Generating QA for Section L:  16%|████████▍                                             | 5/32 [01:25<08:02, 17.88s/it]


  Chunk 6/32 from approved_document_L_Conservation of fuel and power.pdf (7339 chars)
  Generating 6 questions (total so far: 30/200)
  Generated 6 pairs. Running total: 36/200
  Waiting 6s before next chunk...


Generating QA for Section L:  19%|██████████▏                                           | 6/32 [01:42<07:34, 17.49s/it]


  Chunk 7/32 from approved_document_L_Conservation of fuel and power.pdf (7456 chars)
  Generating 6 questions (total so far: 36/200)
  Generated 6 pairs. Running total: 42/200
  Waiting 6s before next chunk...


Generating QA for Section L:  22%|███████████▊                                          | 7/32 [02:00<07:18, 17.54s/it]


  Chunk 8/32 from approved_document_L_Conservation of fuel and power.pdf (7132 chars)
  Generating 6 questions (total so far: 42/200)
  Generated 6 pairs. Running total: 48/200
  Waiting 6s before next chunk...


Generating QA for Section L:  25%|█████████████▌                                        | 8/32 [02:20<07:25, 18.57s/it]


  Chunk 9/32 from approved_document_L_Conservation of fuel and power.pdf (7053 chars)
  Generating 6 questions (total so far: 48/200)
  Generated 6 pairs. Running total: 54/200
  Waiting 6s before next chunk...


Generating QA for Section L:  28%|███████████████▏                                      | 9/32 [02:38<07:00, 18.27s/it]


  Chunk 10/32 from approved_document_L_Conservation of fuel and power.pdf (7445 chars)
  Generating 6 questions (total so far: 54/200)
  Generated 6 pairs. Running total: 60/200
  Waiting 6s before next chunk...


Generating QA for Section L:  31%|████████████████▌                                    | 10/32 [02:59<06:57, 18.97s/it]


  Chunk 11/32 from approved_document_L_Conservation of fuel and power.pdf (7491 chars)
  Generating 6 questions (total so far: 60/200)
  Generated 6 pairs. Running total: 66/200
  Waiting 6s before next chunk...


Generating QA for Section L:  34%|██████████████████▏                                  | 11/32 [03:15<06:22, 18.22s/it]


  Chunk 12/32 from approved_document_L_Conservation of fuel and power.pdf (7496 chars)
  Generating 6 questions (total so far: 66/200)
  Generated 6 pairs. Running total: 72/200
  Waiting 6s before next chunk...


Generating QA for Section L:  38%|███████████████████▉                                 | 12/32 [03:37<06:29, 19.49s/it]


  Chunk 13/32 from approved_document_L_Conservation of fuel and power.pdf (7485 chars)
  Generating 6 questions (total so far: 72/200)
  Generated 6 pairs. Running total: 78/200
  Waiting 6s before next chunk...


Generating QA for Section L:  41%|█████████████████████▌                               | 13/32 [03:56<06:04, 19.20s/it]


  Chunk 14/32 from approved_document_L_Conservation of fuel and power.pdf (7461 chars)
  Generating 6 questions (total so far: 78/200)
  Generated 6 pairs. Running total: 84/200
  Waiting 6s before next chunk...


Generating QA for Section L:  44%|███████████████████████▏                             | 14/32 [04:14<05:36, 18.71s/it]


  Chunk 15/32 from approved_document_L_Conservation of fuel and power.pdf (7456 chars)
  Generating 6 questions (total so far: 84/200)
  Generated 6 pairs. Running total: 90/200
  Waiting 6s before next chunk...


Generating QA for Section L:  47%|████████████████████████▊                            | 15/32 [04:31<05:11, 18.34s/it]


  Chunk 16/32 from approved_document_L_Conservation of fuel and power.pdf (7435 chars)
  Generating 6 questions (total so far: 90/200)
  Generated 6 pairs. Running total: 96/200
  Waiting 6s before next chunk...


Generating QA for Section L:  50%|██████████████████████████▌                          | 16/32 [04:46<04:34, 17.18s/it]


  Chunk 17/32 from approved_document_L_Conservation of fuel and power.pdf (7417 chars)
  Generating 6 questions (total so far: 96/200)
  Generated 6 pairs. Running total: 102/200
  Waiting 6s before next chunk...


Generating QA for Section L:  53%|████████████████████████████▏                        | 17/32 [05:01<04:11, 16.75s/it]


  Chunk 18/32 from approved_document_L_Conservation of fuel and power.pdf (7418 chars)
  Generating 6 questions (total so far: 102/200)
  Generated 6 pairs. Running total: 108/200
  Waiting 6s before next chunk...


Generating QA for Section L:  56%|█████████████████████████████▊                       | 18/32 [05:17<03:50, 16.47s/it]


  Chunk 19/32 from approved_document_L_Conservation of fuel and power.pdf (7211 chars)
  Generating 6 questions (total so far: 108/200)
  Generated 6 pairs. Running total: 114/200
  Waiting 6s before next chunk...


Generating QA for Section L:  59%|███████████████████████████████▍                     | 19/32 [05:35<03:39, 16.88s/it]


  Chunk 20/32 from approved_document_L_Conservation of fuel and power.pdf (7402 chars)
  Generating 6 questions (total so far: 114/200)
  Generated 6 pairs. Running total: 120/200
  Waiting 6s before next chunk...


Generating QA for Section L:  62%|█████████████████████████████████▏                   | 20/32 [05:50<03:14, 16.20s/it]


  Chunk 21/32 from approved_document_L_Conservation of fuel and power.pdf (7314 chars)
  Generating 6 questions (total so far: 120/200)
  Generated 6 pairs. Running total: 126/200
  Waiting 6s before next chunk...


Generating QA for Section L:  66%|██████████████████████████████████▊                  | 21/32 [06:07<03:03, 16.70s/it]


  Chunk 22/32 from approved_document_L_Conservation of fuel and power.pdf (7314 chars)
  Generating 6 questions (total so far: 126/200)
  Generated 6 pairs. Running total: 132/200
  Waiting 6s before next chunk...


Generating QA for Section L:  69%|████████████████████████████████████▍                | 22/32 [06:24<02:47, 16.72s/it]


  Chunk 23/32 from approved_document_L_Conservation of fuel and power.pdf (7460 chars)
  Generating 6 questions (total so far: 132/200)
  Generated 6 pairs. Running total: 138/200
  Waiting 6s before next chunk...


Generating QA for Section L:  72%|██████████████████████████████████████               | 23/32 [06:43<02:37, 17.48s/it]


  Chunk 24/32 from approved_document_L_Conservation of fuel and power.pdf (7370 chars)
  Generating 6 questions (total so far: 138/200)
  Generated 6 pairs. Running total: 144/200
  Waiting 6s before next chunk...


Generating QA for Section L:  75%|███████████████████████████████████████▊             | 24/32 [06:58<02:13, 16.74s/it]


  Chunk 25/32 from approved_document_L_Conservation of fuel and power.pdf (7303 chars)
  Generating 6 questions (total so far: 144/200)
  Generated 6 pairs. Running total: 150/200
  Waiting 6s before next chunk...


Generating QA for Section L:  78%|█████████████████████████████████████████▍           | 25/32 [07:16<01:58, 16.91s/it]


  Chunk 26/32 from approved_document_L_Conservation of fuel and power.pdf (7432 chars)
  Generating 6 questions (total so far: 150/200)
  Generated 6 pairs. Running total: 156/200
  Waiting 6s before next chunk...


Generating QA for Section L:  81%|███████████████████████████████████████████          | 26/32 [07:34<01:44, 17.42s/it]


  Chunk 27/32 from approved_document_L_Conservation of fuel and power.pdf (7382 chars)
  Generating 6 questions (total so far: 156/200)
  Generated 6 pairs. Running total: 162/200
  Waiting 6s before next chunk...


Generating QA for Section L:  84%|████████████████████████████████████████████▋        | 27/32 [07:52<01:26, 17.39s/it]


  Chunk 28/32 from approved_document_L_Conservation of fuel and power.pdf (7485 chars)
  Generating 6 questions (total so far: 162/200)
  Generated 6 pairs. Running total: 168/200
  Waiting 6s before next chunk...


Generating QA for Section L:  88%|██████████████████████████████████████████████▍      | 28/32 [08:08<01:08, 17.07s/it]


  Chunk 29/32 from approved_document_L_Conservation of fuel and power.pdf (7101 chars)
  Generating 6 questions (total so far: 168/200)
  Generated 6 pairs. Running total: 174/200
  Waiting 6s before next chunk...


Generating QA for Section L:  91%|████████████████████████████████████████████████     | 29/32 [08:26<00:51, 17.27s/it]


  Chunk 30/32 from approved_document_L_Conservation of fuel and power.pdf (7497 chars)
  Generating 6 questions (total so far: 174/200)
  Generated 6 pairs. Running total: 180/200
  Waiting 6s before next chunk...


Generating QA for Section L:  94%|█████████████████████████████████████████████████▋   | 30/32 [08:49<00:38, 19.02s/it]


  Chunk 31/32 from approved_document_L_Conservation of fuel and power.pdf (7444 chars)
  Generating 6 questions (total so far: 180/200)
  Generated 6 pairs. Running total: 186/200
  Waiting 6s before next chunk...


Generating QA for Section L:  97%|███████████████████████████████████████████████████▎ | 31/32 [09:04<00:17, 17.72s/it]


  Chunk 32/32 from approved_document_L_Conservation of fuel and power.pdf (6030 chars)
  Generating 14 questions (total so far: 186/200)


Generating QA for Section L: 100%|█████████████████████████████████████████████████████| 32/32 [09:25<00:00, 17.66s/it]


  Generated 14 pairs. Running total: 200/200

Section L complete: 200 QA pairs

Processing Section S (2 PDFs)...
  Total chunks for Section S: 17
  Target: 200 questions (11 per chunk + 13 extra)


Generating QA for Section S:   0%|                                                              | 0/17 [00:00<?, ?it/s]


  Chunk 1/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7414 chars)
  Generating 11 questions (total so far: 0/200)
  Generated 11 pairs. Running total: 11/200
  Waiting 6s before next chunk...


Generating QA for Section S:   6%|███▏                                                  | 1/17 [00:16<04:18, 16.14s/it]


  Chunk 2/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7443 chars)
  Generating 11 questions (total so far: 11/200)
  Generated 10 pairs. Running total: 21/200
  Waiting 6s before next chunk...


Generating QA for Section S:  12%|██████▎                                               | 2/17 [00:42<05:33, 22.24s/it]


  Chunk 3/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (6099 chars)
  Generating 11 questions (total so far: 21/200)
  Generated 11 pairs. Running total: 32/200
  Waiting 6s before next chunk...


Generating QA for Section S:  18%|█████████▌                                            | 3/17 [01:08<05:33, 23.80s/it]


  Chunk 4/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7455 chars)
  Generating 12 questions (total so far: 32/200)
  Generated 11 pairs. Running total: 43/200
  Waiting 6s before next chunk...


Generating QA for Section S:  24%|████████████▋                                         | 4/17 [01:31<05:04, 23.43s/it]


  Chunk 5/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7423 chars)
  Generating 12 questions (total so far: 43/200)
  Generated 11 pairs. Running total: 54/200
  Waiting 6s before next chunk...


Generating QA for Section S:  29%|███████████████▉                                      | 5/17 [01:58<04:58, 24.87s/it]


  Chunk 6/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7379 chars)
  Generating 12 questions (total so far: 54/200)
  Generated 11 pairs. Running total: 65/200
  Waiting 6s before next chunk...


Generating QA for Section S:  35%|███████████████████                                   | 6/17 [02:23<04:32, 24.78s/it]


  Chunk 7/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7242 chars)
  Generating 12 questions (total so far: 65/200)
  Generated 12 pairs. Running total: 77/200
  Waiting 6s before next chunk...


Generating QA for Section S:  41%|██████████████████████▏                               | 7/17 [02:49<04:12, 25.25s/it]


  Chunk 8/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7493 chars)
  Generating 12 questions (total so far: 77/200)
  Generated 10 pairs. Running total: 87/200
  Waiting 6s before next chunk...


Generating QA for Section S:  47%|█████████████████████████▍                            | 8/17 [03:13<03:43, 24.87s/it]


  Chunk 9/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7409 chars)
  Generating 12 questions (total so far: 87/200)
  Generated 12 pairs. Running total: 99/200
  Waiting 6s before next chunk...


Generating QA for Section S:  53%|████████████████████████████▌                         | 9/17 [03:38<03:19, 24.88s/it]


  Chunk 10/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7251 chars)
  Generating 12 questions (total so far: 99/200)
  Generated 12 pairs. Running total: 111/200
  Waiting 6s before next chunk...


Generating QA for Section S:  59%|███████████████████████████████▏                     | 10/17 [04:03<02:53, 24.84s/it]


  Chunk 11/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7376 chars)
  Generating 12 questions (total so far: 111/200)
  Generated 12 pairs. Running total: 123/200
  Waiting 6s before next chunk...


Generating QA for Section S:  65%|██████████████████████████████████▎                  | 11/17 [04:32<02:37, 26.24s/it]


  Chunk 12/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (7413 chars)
  Generating 12 questions (total so far: 123/200)
  Generated 11 pairs. Running total: 134/200
  Waiting 6s before next chunk...


Generating QA for Section S:  71%|█████████████████████████████████████▍               | 12/17 [04:54<02:04, 24.97s/it]


  Chunk 13/17 from Approved_Document_S_Infrastructure for charging electric vehicles.pdf (4971 chars)
  Generating 12 questions (total so far: 134/200)
  Generated 12 pairs. Running total: 146/200
  Waiting 5s before next chunk...


Generating QA for Section S:  76%|████████████████████████████████████████▌            | 13/17 [05:18<01:38, 24.58s/it]


  Chunk 14/17 from Part_S_FAQ_Approved Document S_ Infrastructure for charging electric vehicles.pdf (7437 chars)
  Generating 11 questions (total so far: 146/200)
  Generated 10 pairs. Running total: 156/200
  Waiting 6s before next chunk...


Generating QA for Section S:  82%|███████████████████████████████████████████▋         | 14/17 [05:41<01:12, 24.28s/it]


  Chunk 15/17 from Part_S_FAQ_Approved Document S_ Infrastructure for charging electric vehicles.pdf (7413 chars)
  Generating 11 questions (total so far: 156/200)
  Generated 10 pairs. Running total: 166/200
  Waiting 6s before next chunk...


Generating QA for Section S:  88%|██████████████████████████████████████████████▊      | 15/17 [06:03<00:46, 23.47s/it]


  Chunk 16/17 from Part_S_FAQ_Approved Document S_ Infrastructure for charging electric vehicles.pdf (7499 chars)
  Generating 11 questions (total so far: 166/200)
  Generated 10 pairs. Running total: 176/200
  Waiting 6s before next chunk...


Generating QA for Section S:  94%|█████████████████████████████████████████████████▉   | 16/17 [06:29<00:24, 24.10s/it]


  Chunk 17/17 from Part_S_FAQ_Approved Document S_ Infrastructure for charging electric vehicles.pdf (2008 chars)
  Generating 24 questions (total so far: 176/200)


Generating QA for Section S: 100%|█████████████████████████████████████████████████████| 17/17 [07:05<00:00, 25.06s/it]


  Generated 20 pairs. Running total: 196/200

Section S complete: 196 QA pairs

Processing Section C (1 PDFs)...
  Total chunks for Section C: 1
  Target: 200 questions (200 per chunk + 0 extra)


Generating QA for Section C:   0%|                                                               | 0/1 [00:00<?, ?it/s]


  Chunk 1/1 from Part_C_FAQ_approved_document_C_Site preparation and resistance to contaminates and moisture.pdf (4211 chars)
  Generating 200 questions (total so far: 0/200)
JSON parsing error: Expecting ',' delimiter: line 905 column 306 (char 91254)
Attempting to clean JSON and retry...
Failed to parse JSON even after cleaning


Generating QA for Section C: 100%|██████████████████████████████████████████████████████| 1/1 [05:39<00:00, 339.69s/it]


  Generated 95 pairs. Running total: 95/200

Section C complete: 95 QA pairs

Processing Section M (1 PDFs)...
  Total chunks for Section M: 7
  Target: 200 questions (28 per chunk + 4 extra)


Generating QA for Section M:   0%|                                                               | 0/7 [00:00<?, ?it/s]


  Chunk 1/7 from Part_M_FAQ_approved_document_M_Access to and use of buildings.pdf (7405 chars)
  Generating 28 questions (total so far: 0/200)
  Generated 22 pairs. Running total: 22/200
  Waiting 6s before next chunk...


Generating QA for Section M:  14%|███████▊                                               | 1/7 [00:47<04:47, 47.97s/it]


  Chunk 2/7 from Part_M_FAQ_approved_document_M_Access to and use of buildings.pdf (7043 chars)
  Generating 29 questions (total so far: 22/200)
  Generated 25 pairs. Running total: 47/200
  Waiting 6s before next chunk...


Generating QA for Section M:  29%|███████████████▋                                       | 2/7 [01:42<04:20, 52.03s/it]


  Chunk 3/7 from Part_M_FAQ_approved_document_M_Access to and use of buildings.pdf (7338 chars)
  Generating 29 questions (total so far: 47/200)
  Generated 24 pairs. Running total: 71/200
  Waiting 6s before next chunk...


Generating QA for Section M:  43%|███████████████████████▌                               | 3/7 [02:35<03:30, 52.51s/it]


  Chunk 4/7 from Part_M_FAQ_approved_document_M_Access to and use of buildings.pdf (7274 chars)
  Generating 29 questions (total so far: 71/200)
  Generated 23 pairs. Running total: 94/200
  Waiting 6s before next chunk...


Generating QA for Section M:  57%|███████████████████████████████▍                       | 4/7 [03:23<02:31, 50.56s/it]


  Chunk 5/7 from Part_M_FAQ_approved_document_M_Access to and use of buildings.pdf (7286 chars)
  Generating 28 questions (total so far: 94/200)
  Generated 23 pairs. Running total: 117/200
  Waiting 6s before next chunk...


Generating QA for Section M:  71%|███████████████████████████████████████▎               | 5/7 [04:01<01:32, 46.05s/it]


  Chunk 6/7 from Part_M_FAQ_approved_document_M_Access to and use of buildings.pdf (7337 chars)
  Generating 28 questions (total so far: 117/200)
  Generated 22 pairs. Running total: 139/200
  Waiting 6s before next chunk...


Generating QA for Section M:  86%|███████████████████████████████████████████████▏       | 6/7 [04:36<00:42, 42.14s/it]


  Chunk 7/7 from Part_M_FAQ_approved_document_M_Access to and use of buildings.pdf (4838 chars)
  Generating 61 questions (total so far: 139/200)


Generating QA for Section M: 100%|███████████████████████████████████████████████████████| 7/7 [05:30<00:00, 47.22s/it]


  Generated 41 pairs. Running total: 180/200

Section M complete: 180 QA pairs

Processing Section Q (1 PDFs)...
  Total chunks for Section Q: 1
  Target: 200 questions (200 per chunk + 0 extra)


Generating QA for Section Q:   0%|                                                               | 0/1 [00:00<?, ?it/s]


  Chunk 1/1 from Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf (2355 chars)
  Generating 200 questions (total so far: 0/200)


Generating QA for Section Q: 100%|███████████████████████████████████████████████████████| 1/1 [00:19<00:00, 19.58s/it]

  Generated 11 pairs. Running total: 11/200

Section Q complete: 11 QA pairs

All sections processed.
Total generated: 3000 QA pairs across 17 sections
Final results saved to construction_training_data.jsonl

Summary by section:
  Section A: 202 questions
  Section B: 192 questions
  Section C: 95 questions
  Section D: 156 questions
  Section E: 200 questions
  Section F: 198 questions
  Section G: 200 questions
  Section H: 200 questions
  Section J: 198 questions
  Section K: 198 questions
  Section L: 200 questions
  Section M: 180 questions
  Section O: 198 questions
  Section P: 178 questions
  Section Q: 11 questions
  Section S: 196 questions
  Section T: 198 questions

Sample QA pairs by section:

Section A examples:
  Example 1 (from approved-document-R_Infrastructure_Electronic_communications.pdf):
  Q: What is the purpose of Approved Document R?
  A: Approved Document R provides practical guidance on how to meet the requirements of Requirement R1 of Schedule 1 to the Buildi




In [2]:
##This Notebook was loaded into Google Colab so I can now use A100 GPU to fine tune, this notebook started with Jupyter Notebook 
##but ended with Google Colab

# Fine-tuning (LoRA & Quantization)

#**Fine-Tuning Llama 3.1-8B-Instruct base model**

In [None]:
!pip install transformers peft datasets trl accelerate bitsandbytes

Collecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting trl
  Downloading trl-0.16.1-py3-none-any.whl.metadata (12 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.12

In [None]:
import warnings
warnings.filterwarnings('ignore')

#Importing dependencies
import torch
import gc
import re
from datasets import load_dataset, concatenate_datasets
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer

from google.colab import userdata, drive
import shutil

In [None]:
#Hugging Face token securely from Colab secrets
hf_token = userdata.get('HF_TOKEN')

#Mounting Google Drive
drive.mount('/content/drive')
print("Drive mounted successfully!")

# Load datasets from Drive
dataset = load_dataset("json", data_files="/content/drive/MyDrive/Colab Notebooks/uk_building_regulations/construction_training_data.jsonl", split="train")
print(f"Loaded {len(dataset)} Question Pairs")

Mounted at /content/drive
Drive mounted successfully!


Generating train split: 0 examples [00:00, ? examples/s]

Loaded 3000 Question Pairs


## **Exploring Data Quality of FAQ curatted data before using for Fine-tuning**

In [None]:
#Defining cleaning functions
def clean_instruction_tags(text):
    """Remove [INST] tags from text"""
    return re.sub(r'\[/?INST\]', '', text).strip()

def ensure_period_ending(text):
    """Ensure text ends with proper punctuation"""
    text = text.strip()
    if text and not text[-1] in ['.', '?', '!']:
        text = text + '.'
    return text

In [None]:
# Apply cleaning to construction data FAQ dataset
def cleaner(sample):
    # Clean any instruction tags from output
    clean_output = clean_instruction_tags(sample['output'])

    # Ensure output ends with proper punctuation
    clean_output = ensure_period_ending(clean_output)

    # Return the cleaned sample
    return {
        'instruction': sample['instruction'],
        'input': sample['input'],
        'output': clean_output
    }



In [None]:
# Clean the entire dataset
cleaned_dataset = dataset.map(cleaner)

# Verify changes
print("Original first example output:", dataset[20]['output'])
print("Cleaned first example output:", cleaned_dataset[20]['output'])

Original first example output: For further information on enforcement and sanctions related to the Building Regulations, one should refer to Chapter B in Volume 2 of the Manual to the Building Regulations, which outlines the legal framework and consequences of non-compliance.
Cleaned first example output: For further information on enforcement and sanctions related to the Building Regulations, one should refer to Chapter B in Volume 2 of the Manual to the Building Regulations, which outlines the legal framework and consequences of non-compliance.


In [None]:
# Convert to pandas DataFrame for full pandas functionality
df = dataset.to_pandas()
cleaned_df = cleaned_dataset.to_pandas()

In [None]:
from IPython.display import display, HTML #Base currated Data Set

# Create styled HTML that shows full content with scrollbars if needed
html = df.tail().style.set_properties(**{
    'white-space': 'pre-wrap',
    'max-width': '800px',
    'overflow-x': 'auto'
}).to_html()

display(HTML(html))

Unnamed: 0,instruction,input,output,source,section
2995,Answer this question about UK building regulations Part Q,What types of security measures are emphasized in Approved Document Q?,"Approved Document Q emphasizes various security measures that should be integrated into the design of dwellings. These include the use of robust doors and windows, secure locks and fittings, and the appropriate positioning of letter boxes and other access points. The document aims to ensure that all elements contribute to the overall security of the home, thereby reducing the risk of burglary and unauthorized entry.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q
2996,Answer this question about UK building regulations Part Q,How does Approved Document Q address the issue of safety for postal delivery personnel?,"Approved Document Q addresses the safety of postal delivery personnel by referencing standards such as BS EN 13724:2013, which outlines the ergonomic and performance considerations for letter box design. By ensuring that letter boxes are positioned within a specified height range (700mm to 1700mm), the document helps facilitate safe and efficient mail delivery, reducing the risk of injury to postal workers while maintaining security for residents.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q
2997,Answer this question about UK building regulations Part Q,"What is the role of the Ministry of Housing, Communities & Local Government in relation to Approved Document Q?","The Ministry of Housing, Communities & Local Government is responsible for issuing and updating Approved Document Q as part of the UK building regulations. This document provides statutory guidance on security measures for dwellings, ensuring that new buildings are designed with adequate security features to protect residents. The ministry oversees the compliance and implementation of these regulations, helping to enhance safety and security standards across the housing sector.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q
2998,Answer this question about UK building regulations Part Q,What are the compliance implications for designers and manufacturers regarding Approved Document Q?,"Designers and manufacturers must adhere to the specifications outlined in Approved Document Q to ensure compliance with UK building regulations. This includes integrating security measures as specified by referenced standards such as TS 008:2012 and BS EN 13724:2013. By following these standards, they can demonstrate that their designs and products meet the necessary safety and security requirements, which is essential for obtaining building approvals and ensuring the safety of occupants.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q
2999,Answer this question about UK building regulations Part Q,What does the term 'robust door hardware' refer to in the context of Approved Document Q?,"In the context of Approved Document Q, 'robust door hardware' refers to the fittings and locking mechanisms used in doors that are designed to withstand attempted break-ins and provide a high level of security. This includes features such as durable locks, reinforced hinges, and secure door frames. The standard TS 008:2012 provides specific criteria for what constitutes robust door hardware, ensuring that these elements contribute effectively to the overall security of the dwelling.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q


In [None]:
from IPython.display import display, HTML #CLeaned Dataset

# Create styled HTML that shows full content with scrollbars if needed
html = cleaned_df.tail().style.set_properties(**{
    'white-space': 'pre-wrap',
    'max-width': '800px',
    'overflow-x': 'auto'
}).to_html()

display(HTML(html))

Unnamed: 0,instruction,input,output,source,section
2995,Answer this question about UK building regulations Part Q,What types of security measures are emphasized in Approved Document Q?,"Approved Document Q emphasizes various security measures that should be integrated into the design of dwellings. These include the use of robust doors and windows, secure locks and fittings, and the appropriate positioning of letter boxes and other access points. The document aims to ensure that all elements contribute to the overall security of the home, thereby reducing the risk of burglary and unauthorized entry.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q
2996,Answer this question about UK building regulations Part Q,How does Approved Document Q address the issue of safety for postal delivery personnel?,"Approved Document Q addresses the safety of postal delivery personnel by referencing standards such as BS EN 13724:2013, which outlines the ergonomic and performance considerations for letter box design. By ensuring that letter boxes are positioned within a specified height range (700mm to 1700mm), the document helps facilitate safe and efficient mail delivery, reducing the risk of injury to postal workers while maintaining security for residents.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q
2997,Answer this question about UK building regulations Part Q,"What is the role of the Ministry of Housing, Communities & Local Government in relation to Approved Document Q?","The Ministry of Housing, Communities & Local Government is responsible for issuing and updating Approved Document Q as part of the UK building regulations. This document provides statutory guidance on security measures for dwellings, ensuring that new buildings are designed with adequate security features to protect residents. The ministry oversees the compliance and implementation of these regulations, helping to enhance safety and security standards across the housing sector.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q
2998,Answer this question about UK building regulations Part Q,What are the compliance implications for designers and manufacturers regarding Approved Document Q?,"Designers and manufacturers must adhere to the specifications outlined in Approved Document Q to ensure compliance with UK building regulations. This includes integrating security measures as specified by referenced standards such as TS 008:2012 and BS EN 13724:2013. By following these standards, they can demonstrate that their designs and products meet the necessary safety and security requirements, which is essential for obtaining building approvals and ensuring the safety of occupants.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q
2999,Answer this question about UK building regulations Part Q,What does the term 'robust door hardware' refer to in the context of Approved Document Q?,"In the context of Approved Document Q, 'robust door hardware' refers to the fittings and locking mechanisms used in doors that are designed to withstand attempted break-ins and provide a high level of security. This includes features such as durable locks, reinforced hinges, and secure door frames. The standard TS 008:2012 provides specific criteria for what constitutes robust door hardware, ensuring that these elements contribute effectively to the overall security of the dwelling.",Part_Q_FAQ_approved_document_Q_Security in dwellings.pdf,Q


In [None]:
# Count how many examples were modified
different_count = sum(1 for i in range(len(dataset)) if dataset[i]['output'] != cleaned_dataset[i]['output'])
print(f"Modified {different_count} examples out of {len(dataset)}")

# Use the cleaned dataset for fine-tuning
dataset = cleaned_dataset

Modified 1 examples out of 3000


The Above outcome means that my currated daata is in good shape, it suggests the original dataset was already well-formatted. The changes were minimal, which means the data quality was high to begin with and the risk of introducing errors during cleaning was minimal.

In [None]:
#Split into train/validation
train_val = dataset.train_test_split(test_size=0.05)
train_dataset = train_val["train"]
val_dataset = train_val["test"]
print(f"Training examples: {len(train_dataset)}, Validation examples: {len(val_dataset)}")

torch.cuda.empty_cache()
gc.collect()

Training examples: 2850, Validation examples: 150


495

In [None]:
#Test bitsandbytes installation
!python -c "import bitsandbytes as bnb; print('bitsandbytes version:', bnb.__version__)"

#Using the Instruct variant
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"

bitsandbytes version: 0.45.5


In [None]:
#Loading tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

In [None]:
#Configuring 8-bit quantization-better quality than 4-bit while still memory efficient
print("Configuring 8-bit quantization...")
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
    llm_int8_skip_modules=None,
    llm_int8_enable_fp32_cpu_offload=False
)

Configuring 8-bit quantization...


In [None]:
#Loading model with bitsandbytes quantization
print(f"Loading model {model_name} with 8-bit quantization...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    max_memory={0: "30GB", "cpu": "30GB"},  #Reserving some VRAM for training
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)

Loading model meta-llama/Meta-Llama-3.1-8B-Instruct with 8-bit quantization...


config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

In [None]:
# Free up memory
gc.collect()
torch.cuda.empty_cache()

print(f"Memory footprint: {model.get_memory_footprint() / 1e9:,.1f} GB")

Memory footprint: 9.1 GB


In [None]:
# Configuring LoRA with rank = 16, alpha=32, and target modules.
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [None]:
# Apply LoRA to model
print("Applying LoRA to model...")
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)
print("LoRA applied successfully!")

# Print trainable parameters
model.print_trainable_parameters()

Applying LoRA to model...
LoRA applied successfully!
trainable params: 13,631,488 || all params: 8,043,892,736 || trainable%: 0.1695


In [None]:
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear8bitLt(
                (base_layer): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj):

In [None]:
# Freeing up memory again
gc.collect()
torch.cuda.empty_cache()

output_dir="/content/drive/MyDrive/llama_3.1-8b-instruct-construction-lora-a100"


In [None]:
training_args = TrainingArguments(
    output_dir=output_dir,  # Save directly to Drive
    per_device_train_batch_size=4,                          # Good for A100
    gradient_accumulation_steps=8,                          # Good balance
    learning_rate=2e-4,                                     # Standard for LoRA
    num_train_epochs=4,                                     #
    logging_steps=10,                                       # Good frequency
    save_strategy="steps",                                  # Change from "epoch" to "steps"
    save_steps=100,                                         # Save more frequently
    save_total_limit=3,                                     # Prevent using too much storage
    warmup_ratio=0.1,                                       # Good warmup
    fp16=True,                                              # Standard for efficiency
    gradient_checkpointing=True,                            # Add this for memory efficiency
    dataloader_num_workers=2,                               # Add for faster data loading
    eval_steps=100,                                         # Check validation every 100 steps
    report_to="wandb",                                      # Explicitly set wandb reporting
)

In [None]:
#Defining a proper formatting function for instruction tuning
def formatting_func(examples):
    return [
        f"<s>[INST] {instruction} \n\n {input_text} [/INST] {output}</s>"
        for instruction, input_text, output in zip(
            examples["instruction"],
            examples["input"],
            examples["output"]
        )
    ]

In [None]:
import trl
print(trl.__version__) ##verifying the transformer reinforcement library I am using  to debug issues 

0.16.1


In [None]:
!pip install -U trl



In [None]:
#Initializing Supervised Fine-Tuning Trainer:
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    args=training_args,
    peft_config=lora_config, 
    formatting_func=formatting_func
)

Applying formatting function to train dataset:   0%|          | 0/2850 [00:00<?, ? examples/s]

Converting train dataset to ChatML:   0%|          | 0/2850 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/2850 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/2850 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/2850 [00:00<?, ? examples/s]

Applying formatting function to eval dataset:   0%|          | 0/150 [00:00<?, ? examples/s]

Converting eval dataset to ChatML:   0%|          | 0/150 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/150 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/150 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/150 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
print("Starting training...")
trainer.train()



Starting training...


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33msamuel-jaja21[0m ([33msamuel-jaja21-univeristy-of-hull[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
10,2.5659
20,2.0395
30,1.5375
40,1.481
50,1.3906
60,1.3255
70,1.315
80,1.2989
90,1.353
100,1.2447


TrainOutput(global_step=356, training_loss=1.2419542717129997, metrics={'train_runtime': 2839.6948, 'train_samples_per_second': 4.015, 'train_steps_per_second': 0.125, 'total_flos': 7.817406737670144e+16, 'train_loss': 1.2419542717129997})

In [None]:
print(f"Saving fine-tuned model to {output_dir}...")
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print("Fine-tuning completed successfully!")

Saving fine-tuned model to /content/drive/MyDrive/llama_3.1-8b-instruct-construction-lora-a100...
Fine-tuning completed successfully!


## Fine-Tuned Model Testing

In [None]:
#Defining a testing function that properly cleans responses
def test_generation(model, tokenizer, prompt, temperature=0.1, max_new_tokens=256):
    # Format according to Llama chat template
    if not prompt.startswith("[INST]"):
        formatted_prompt = f"[INST] {prompt} [/INST]"
    else:
        formatted_prompt = prompt

    inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs.input_ids,
            attention_mask=inputs.attention_mask,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=0.9,
            do_sample=False  # Deterministic for testing
        )

    # Decode the response
    full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Clean the response
    if formatted_prompt in full_response:
        response = full_response.replace(formatted_prompt, "").strip()
    else:
        response = full_response

    # Extra cleaning for any remaining tags
    response = re.sub(r'\[/?INST\]', '', response).strip()

    return response

In [None]:
#Using the fine-tuned model
print("Testing model with a sample prompt...")
from peft import PeftModel, PeftConfig

# Load the fine-tuned model
config = PeftConfig.from_pretrained(output_dir)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    quantization_config=bnb_config,
    device_map="auto",
    token=hf_token
)
model = PeftModel.from_pretrained(model, output_dir)

#Setting the  model to evaluation mode so I can test
model.eval()

Testing model with a sample prompt...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear8bitLt(
                (base_layer): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj):

In [None]:
#Run a sample test from reddit https://www.reddit.com/r/DIYUK/comments/11qcsuo/is_there_a_minimum_size_a_bedroom_door_has_to_be/
test_questions = [
    "Is there a minimum size a bedroom door has to be under UK building regs? Is 610mm too small?",
    "What are the main requirements for fire safety in commercial buildings?",
    "How should insulation be installed to meet current building regulations?",
    "What are the minimum requirements for accessible bathroom facilities?",
    "Explain the key considerations for sustainable building design in the UK"
]

for question in test_questions:
    print(f"\nQuestion: {question}")
    response = test_generation(model, tokenizer, question)
    print(f"Response: {response}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Question: Is there a minimum size a bedroom door has to be under UK building regs? Is 610mm too small?


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: Under UK building regulations, there is no minimum size requirement for a bedroom door. However, the door's size can impact the overall functionality and safety of the room. A door that is too small, such as 610mm, may not allow for easy passage of furniture or individuals, which could lead to safety issues. It is recommended to consult with building control authorities for specific guidance on door sizes in relation to the intended use of the room. Additionally, the Building Regulations Part K, which pertains to fire safety, may also have implications for door sizes in certain situations. It is advisable to ensure that any door installation complies with all relevant regulations to maintain safety and functionality.

Question: What are the main requirements for fire safety in commercial buildings?


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: The main requirements for fire safety in commercial buildings include the installation of fire alarms, sprinkler systems, and fire extinguishers. Additionally, buildings must have clear escape routes, fire-resistant materials, and regular fire drills to ensure occupant safety. Compliance with local building codes and regulations is also essential. Furthermore, buildings must have adequate lighting and signage to facilitate evacuation during emergencies. Regular inspections and maintenance of fire safety systems are also crucial to ensure they function properly in case of a fire. All these measures help to protect both people and property in commercial buildings.

Question: How should insulation be installed to meet current building regulations?


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: Insulation should be installed in accordance with the manufacturer's instructions and must meet the minimum standards set out in the Building Regulations. This includes ensuring that insulation is properly supported and sealed to prevent moisture ingress and ensure thermal performance. Additionally, insulation should be installed in a manner that allows for easy inspection and maintenance of the building's structure and services. Compliance with these guidelines is essential for achieving the desired energy efficiency and safety standards.

Question: What are the minimum requirements for accessible bathroom facilities?


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Response: The minimum requirements for accessible bathroom facilities include: a single fully enclosed self-contained toilet cubicle with a minimum clear space of 1500mm x 1500mm; a single fully enclosed self-contained shower cubicle with a minimum clear space of 1200mm x 1200mm; and a single fully enclosed self-contained washbasin with a minimum clear space of 800mm x 800mm. Additionally, there must be a minimum of one accessible bathroom facility per 20 people in the building, and all facilities must be located on the same level as the main entrance. The layout must also allow for wheelchair access and maneuverability within the bathroom area. Compliance with these requirements is essential to ensure that all individuals can use the bathroom facilities safely and independently.

Question: Explain the key considerations for sustainable building design in the UK
Response: Sustainable building design in the UK involves several key considerations, including energy efficiency, use of rene

In [None]:
def test_generation(model, tokenizer, prompt, temperature=0.1, max_new_tokens=256):
    # Format according to Llama chat template
    if not prompt.startswith("[INST]"):
        formatted_prompt = f"[INST] {prompt} [/INST]"
    else:
        formatted_prompt = prompt

    inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs.input_ids,
            attention_mask=inputs.attention_mask,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=0.9,
            do_sample=False  # Deterministic for testing
        )

    # Decode the response
    full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Clean the response
    if formatted_prompt in full_response:
        response = full_response.replace(formatted_prompt, "").strip()
    else:
        response = full_response

    # Extra cleaning for any remaining tags
    response = re.sub(r'\[/?INST\]', '', response).strip()

    # NEW: Trim at any follow-up questions
    question_patterns = [
        r'\s+What\s+is', r'\s+How\s+does', r'\s+Why\s+is',
        r'\s+When\s+should', r'\s+Where\s+can'
    ]
    for pattern in question_patterns:
        parts = re.split(pattern, response, 1)
        if len(parts) > 1:
            response = parts[0].strip()

    return response

In [None]:
def test_generation_with_formatting(model, tokenizer, prompt, temperature=0.1, max_new_tokens=256):
    # Get the response using your existing function
    response = test_generation(model, tokenizer, prompt, temperature, max_new_tokens)

    # Format for display with proper wrapping
    import textwrap
    wrapped_response = textwrap.fill(response, width=80)  # Adjust width as needed

    # Print with decorators for better readability
    print(f"\nQuestion: {prompt}")
    print("-" * 80)
    print(f"Response:\n{wrapped_response}")
    print("-" * 80)

    return response

# Test with better formatting
for question in test_questions:
    test_generation_with_formatting(model, tokenizer, question)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Question: Is there a minimum size a bedroom door has to be under UK building regs? Is 610mm too small?
--------------------------------------------------------------------------------
Response:
Under UK building regulations, there is no minimum size requirement for a
bedroom door. However, the door's size can impact the overall functionality and
safety of the room. A door that is too small, such as 610mm, may not allow for
easy passage of furniture or individuals, which could lead to safety issues. It
is recommended to consult with building control authorities for specific
guidance on door sizes in relation to the intended use of the room.
Additionally, the Building Regulations Part K, which pertains to fire safety,
may also have implications for door sizes in certain situations. It is advisable
to ensure that any door installation complies with all relevant regulations to
maintain safety and functionality.
------------------------------------------------------------------------------

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Question: What are the main requirements for fire safety in commercial buildings?
--------------------------------------------------------------------------------
Response:
The main requirements for fire safety in commercial buildings include the
installation of fire alarms, sprinkler systems, and fire extinguishers.
Additionally, buildings must have clear escape routes, fire-resistant materials,
and regular fire drills to ensure occupant safety. Compliance with local
building codes and regulations is also essential. Furthermore, buildings must
have adequate lighting and signage to facilitate evacuation during emergencies.
Regular inspections and maintenance of fire safety systems are also crucial to
ensure they function properly in case of a fire. All these measures help to
protect both people and property in commercial buildings.
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Question: How should insulation be installed to meet current building regulations?
--------------------------------------------------------------------------------
Response:
Insulation should be installed in accordance with the manufacturer's
instructions and must meet the minimum standards set out in the Building
Regulations. This includes ensuring that insulation is properly supported and
sealed to prevent moisture ingress and ensure thermal performance. Additionally,
insulation should be installed in a manner that allows for easy inspection and
maintenance of the building's structure and services. Compliance with these
guidelines is essential for achieving the desired energy efficiency and safety
standards.
--------------------------------------------------------------------------------


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Question: What are the minimum requirements for accessible bathroom facilities?
--------------------------------------------------------------------------------
Response:
The minimum requirements for accessible bathroom facilities include: a single
fully enclosed self-contained toilet cubicle with a minimum clear space of
1500mm x 1500mm; a single fully enclosed self-contained shower cubicle with a
minimum clear space of 1200mm x 1200mm; and a single fully enclosed self-
contained washbasin with a minimum clear space of 800mm x 800mm. Additionally,
there must be a minimum of one accessible bathroom facility per 20 people in the
building, and all facilities must be located on the same level as the main
entrance. The layout must also allow for wheelchair access and maneuverability
within the bathroom area. Compliance with these requirements is essential to
ensure that all individuals can use the bathroom facilities safely and
independently.
----------------------------------------------

In [None]:
from IPython.display import display, HTML

def display_qa(question, answer):
    html = f"""
    <div style="font-family: Arial, sans-serif; margin: 15px 0; border: 1px solid #ccc; border-radius: 8px; overflow: hidden;">
        <div style="background-color: #e6f2ff; padding: 15px; border-bottom: 1px solid #ccc;">
            <strong style="font-size: 16px; color: #003366;">Question: {question}</strong>
        </div>
        <div style="background-color: #e6f2ff; padding: 15px;
                  color: #000000 !important; font-weight: normal !important;
                  line-height: 1.6; white-space: pre-wrap;">
            {answer}
        </div>
    </div>
    """
    display(HTML(html))

# Test with HTML display
for question in test_questions:
    response = test_generation(model, tokenizer, question)
    display_qa(question, response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


## **Shiping Fine-tuned Model Adapter to HuggingFace**

In [None]:
# Install the huggingface_hub library if not already installed
!pip install -q huggingface_hub

In [None]:
# Import required libraries
from huggingface_hub import HfApi, login
import os

In [None]:
#Setting HF parameters
USERNAME = "SamuelJaja"  #HF username
MODEL_NAME = "llama_3.1-8b-instruct-construction-lora-a100"  #model repo
LOCAL_MODEL_PATH = output_dir

In [None]:
#Login to Hugging Face
login(token=hf_token)

#Initialize API
api = HfApi()

#Create a new repository if it doesn't exist
try:
    api.create_repo(
        repo_id=f"{USERNAME}/{MODEL_NAME}",
        repo_type="model",
        private=False,  #set to false for public repository
    )
    print(f"Created repository: {USERNAME}/{MODEL_NAME}")
except Exception as e:
    print(f"Repository may already exist or there was an error: {e}")

Created repository: SamuelJaja/llama_3.1-8b-instruct-construction-lora-a100


In [None]:
#Uploading the model files
print(f"Uploading model from {LOCAL_MODEL_PATH} to {USERNAME}/{MODEL_NAME}...")
api.upload_folder(
    folder_path=LOCAL_MODEL_PATH,
    repo_id=f"{USERNAME}/{MODEL_NAME}",
    repo_type="model",
    ignore_patterns=[".git/*", ".gitattributes", "**/__pycache__/*"]
)

print("Upload complete!")
print(f"Your model is now available at: https://huggingface.co/{USERNAME}/{MODEL_NAME}")

Uploading model from /content/drive/MyDrive/llama_3.1-8b-instruct-construction-lora-a100 to SamuelJaja/llama_3.1-8b-instruct-construction-lora-a100...


Upload 23 LFS files:   0%|          | 0/23 [00:00<?, ?it/s]

scaler.pt:   0%|          | 0.00/988 [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.2k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/54.6M [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/54.6M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/109M [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.69k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/54.6M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/109M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.2k [00:00<?, ?B/s]

scaler.pt:   0%|          | 0.00/988 [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.69k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/54.6M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/109M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.2k [00:00<?, ?B/s]

scaler.pt:   0%|          | 0.00/988 [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.69k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Upload complete!
Your model is now available at: https://huggingface.co/SamuelJaja/llama_3.1-8b-instruct-construction-lora-a100


*This approach allows for quick experimentation by applying the LoRA adapter to the base model during runtime. It's beneficial for testing purposes as it avoids the need to create a new merged model for each iteration.*

In [None]:
#Adding a model card with information about fine-tuning
model_card_content = """---
language: en
license: apache-2.0
library_name: peft
tags:
- llama
- llama-3
- construction
- building-regulations
- lora
- custom construction industry dataset
---

# LLAMA3.1-8B-Instruct-Construction

This is a fine-tuned version of LLAMA3.1-8B-Instruct optimized for construction industry and building regulations knowledge.

## Model Details

- **Base Model:** meta-llama/Llama-3.1-8B-Instruct
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
- **Training Data:** Custom dataset focusing on construction industry standards, building regulations, and safety requirements
- **Usage:** This model is designed to answer questions about building codes, construction best practices, and regulatory compliance

## Example Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
import torch
import re

# Load the adapter configuration
config = PeftConfig.from_pretrained("USERNAME/llama-3.1-8b-instruct-construction-lora-a100")

# Load base model with quantization
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    quantization_config=bnb_config,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "USERNAME/llama-3.1-8b-instruct-construction-lora-a100")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

# Clean response function
def clean_response(text):
    return re.sub(r'\\[/?INST\\]', '', text).strip()

# Generate text
def generate_response(prompt, temperature=0.1, max_tokens=256):
    # Format properly
    if not prompt.startswith("[INST]"):
        formatted_prompt = f"[INST] {prompt} [/INST]"
    else:
        formatted_prompt = prompt

    inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda")

    outputs = model.generate(
        input_ids=inputs.input_ids,
        attention_mask=inputs.attention_mask,
        max_new_tokens=max_tokens,
        temperature=temperature,
        top_p=0.9,
        do_sample=False
    )

    full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Remove prompt from output
    if formatted_prompt in full_response:
        response = full_response.replace(formatted_prompt, "").strip()
    else:
        response = full_response

    # Clean any remaining instruction tags
    response = clean_response(response)

    return response

# Example use
question = "What are the main requirements for fire safety in commercial buildings?"
answer = generate_response(question)
print(answer)
```
"""

# Replace USERNAME with actual username in the model card
model_card_content = model_card_content.replace("USERNAME", USERNAME)

# Write model card to README.md file
with open("README.md", "w") as f:
    f.write(model_card_content)

# Upload the model card
api.upload_file(
    path_or_fileobj="README.md",
    path_in_repo="README.md",
    repo_id=f"{USERNAME}/{MODEL_NAME}",
    repo_type="model"
)

print("Model card uploaded!")
print("Your model is ready to be shared and used.")

Model card uploaded!
Your model is ready to be shared and used.


## **Merge LoRA Adapter with Base Model and Upload to Hugging Face**

In [None]:
# Model names
BASE_MODEL_ID = "meta-llama/Meta-Llama-3.1-8B-Instruct"
LORA_ADAPTER_PATH = "/content/drive/MyDrive/llama_3.1-8b-instruct-construction-lora-a100"
MERGED_MODEL_NAME = "llama-3.1-8b-instruct-construction-merged"  # Name for the merged model repository

In [None]:
# Paths
MERGED_MODEL_PATH = "/content/merged_model"  # Temporary local directory for merged model

In [None]:
# ====LOAD BASE MODEL WITH FULL PRECISION ======
print(f"Loading base model: {BASE_MODEL_ID}...")
#Load in float16 precision to save memory
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    torch_dtype=torch.float16,  # Use float16 to save memory
    device_map="auto",  # Automatically distribute across available GPUs
    trust_remote_code=True
)
print("Base model loaded!")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token

Loading base model: meta-llama/Meta-Llama-3.1-8B-Instruct...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]



Base model loaded!


In [None]:
# ====== LOADING LORA ADAPTER AND MERGE WITH BASE MODEL ======
print(f"Loading LoRA adapter from {LORA_ADAPTER_PATH}...")
#Loading the LoRA model
lora_model = PeftModel.from_pretrained(
    base_model,
    LORA_ADAPTER_PATH,
    is_trainable=False  # Setting to False since I am just merging, not training further
)
print("LoRA adapter loaded!")

print("Merging LoRA weights with base model...")
# Merging the LoRA adapter with the base model
merged_model = lora_model.merge_and_unload()
print("Models merged successfully!")

Loading LoRA adapter from /content/drive/MyDrive/llama_3.1-8b-instruct-construction-lora-a100...




LoRA adapter loaded!
Merging LoRA weights with base model...
Models merged successfully!


In [None]:
# ====== SAVING MERGED MODEL LOCALLY ======
print(f"Saving merged model to {MERGED_MODEL_PATH}...")
# Create directory if it doesn't exist
os.makedirs(MERGED_MODEL_PATH, exist_ok=True)

# Save the merged model and tokenizer
merged_model.save_pretrained(
    MERGED_MODEL_PATH,
    safe_serialization=True  # Use safetensors format
)
tokenizer.save_pretrained(MERGED_MODEL_PATH)
print("Merged model saved locally!")

# Free up memory
del base_model, lora_model, merged_model
torch.cuda.empty_cache()
print("Memory cleared!")

Saving merged model to /content/merged_model...


Saving checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Merged model saved locally!
Memory cleared!


In [None]:
# =====CCREATING MODEL REPOSITORY ====== #CREATE A RESUABLE FUNCTION FOR THIS MUCH LATER
print(f"Creating repository: {USERNAME}/{MERGED_MODEL_NAME}...")
try:
    api.create_repo(
        repo_id=f"{USERNAME}/{MERGED_MODEL_NAME}",
        repo_type="model",
        private=False,
    )
    print(f"Repository created: {USERNAME}/{MERGED_MODEL_NAME}")
except Exception as e:
    print(f"Repository may already exist or there was an error: {e}")

Creating repository: SamuelJaja/llama-3.1-8b-instruct-construction-merged...
Repository created: SamuelJaja/llama-3.1-8b-instruct-construction-merged


In [None]:
# ====== CREATING MODEL CARD ======
print("Creating model card...")
model_card_content = """---
language: en
license: apache-2.0
tags:
- llama
- llama-3
- construction
- building-regulations
- merged
- instruction-tuned
- uk-standards
pipeline_tag: text-generation
widget:
- text: "What are the main requirements for fire safety in commercial buildings?"
- text: "Explain the key differences between residential and commercial building codes."
- text: "Is there a minimum size a bedroom door has to be under UK building regs? Is 610mm too small?"
---

# LLAMA-3.1-8B-Instruct-Construction (Merged)

This is a fine-tuned version of Meta's Llama-3.1-8B-Instruct model optimized for construction industry and UK building regulations knowledge. This repository contains the **full merged model** (base + fine-tuning), making it directly usable with the Hugging Face API.

## Model Details

- **Base Model:** meta-llama/Meta-Llama-3.1-8B-Instruct
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation), merged with base model
- **Training Data:** Custom dataset focusing on UK construction industry standards, building regulations, and safety requirements
- **Parameters:** 8 billion parameters
- **Training Hardware:** A100 GPU with 40GB VRAM
- **Usage:** This model is designed to answer questions about building codes, construction best practices, and regulatory compliance with a focus on UK standards

## Capabilities

This model can:
- Answer questions about UK building regulations and standards
- Explain technical requirements for construction projects
- Provide insights on fire safety, accessibility, insulation, and sustainable design
- Assist with understanding compliance requirements for construction projects
- Interpret building code requirements for various building types

## Example Usage

You can use this model directly with the Hugging Face `transformers` library:

```python
import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_id = "SamuelJaja/llama-3.1-8b-instruct-construction-merged"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Clean response function
def clean_response(text):
    \"\"\"Remove any instruction tags from the response\"\"\"
    return re.sub(r'\\[/?INST\\]', '', text).strip()

# Format prompt function
def format_prompt(prompt):
    if not prompt.startswith("[INST]"):
        return f"[INST] {prompt} [/INST]"
    return prompt

# Generate text
prompt = "What are the main requirements for fire safety in commercial buildings?"
formatted_prompt = format_prompt(prompt)
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=256,
    temperature=0.1,
    top_p=0.9,
    do_sample=False
)

# Process the response
full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract just the model's response
if formatted_prompt in full_response:
    response = full_response.replace(formatted_prompt, "").strip()
else:
    response = full_response

# Clean any remaining tags
response = clean_response(response)
print(response)
"""

Creating model card...


In [None]:
# Write model card to a file
with open(f"{MERGED_MODEL_PATH}/README.md", "w") as f:
    f.write(model_card_content)
print("Model card created!")

Model card created!


In [None]:
#======UPLOADING MERGED MODEL TO HUGGING FACE ======
print(f"Uploading merged model to Hugging Face ({USERNAME}/{MERGED_MODEL_NAME})...")
print("This may take a while depending on your internet connection...")

api.upload_folder(
    folder_path=MERGED_MODEL_PATH,
    repo_id=f"{USERNAME}/{MERGED_MODEL_NAME}",
    repo_type="model",
    ignore_patterns=[".git/*", ".gitattributes", "**/__pycache__/*"]
)

print("Upload complete!")
print(f"Your merged model is now available at: https://huggingface.co/{USERNAME}/{MERGED_MODEL_NAME}")
print("It should be directly usable with the Hugging Face Inference API!")

#Cleaning up local files to save space
if os.path.exists(MERGED_MODEL_PATH):
    shutil.rmtree(MERGED_MODEL_PATH)
    print(f"Cleaned up local directory: {MERGED_MODEL_PATH}")


Uploading merged model to Hugging Face (SamuelJaja/llama-3.1-8b-instruct-construction-merged)...
This may take a while depending on your internet connection...


model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Upload 5 LFS files:   0%|          | 0/5 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Upload complete!
Your merged model is now available at: https://huggingface.co/SamuelJaja/llama-3.1-8b-instruct-construction-merged
It should be directly usable with the Hugging Face Inference API!
Cleaned up local directory: /content/merged_model


*Merging the LoRA adapter into the base model results in a single, unified model. This simplifies deployment, as users only need to handle one model file. It also reduces inference latency and memory usage since the model doesn't need to apply the adapter dynamically during runtime.*

## **Deploying Model to AWS SAGEMAKER to verify inference speed/latency**

In [None]:
!pip install sagemaker --upgrade
!pip install boto3

Collecting sagemaker
  Downloading sagemaker-2.243.2-py3-none-any.whl.metadata (16 kB)
Collecting attrs<24,>=23.1.0 (from sagemaker)
  Downloading attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting boto3<2.0,>=1.35.75 (from sagemaker)
  Downloading boto3-1.37.37-py3-none-any.whl.metadata (6.7 kB)
Collecting docker (from sagemaker)
  Downloading docker-7.1.0-py3-none-any.whl.metadata (3.8 kB)
Collecting fastapi (from sagemaker)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting importlib-metadata<7.0,>=1.4.0 (from sagemaker)
  Downloading importlib_metadata-6.11.0-py3-none-any.whl.metadata (4.9 kB)
Collecting numpy<2.0,>=1.9.0 (from sagemaker)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting omegaconf<=2.3,>=2.2 (from sagemaker)
  Downloading omegaconf-2.3.0-py3-none



In [None]:
!pip install -U sagemaker



In [None]:
import boto3
import sagemaker
from sagemaker.huggingface import HuggingFaceModel
import datetime


# ---------- ✅ CONFIGURE ----------
HF_MODEL_ID = "SamuelJaja/llama_3.1-8b-instruct-construction-lora-a100"
IAM_ROLE = "arn:aws:iam::540019364540:role/service-role/AmazonSageMaker-ExecutionRole-20250319T155878"
REGION = "eu-west-2" #London

# ---------- 🔐 AWS SESSION ----------
aws_access_key = userdata.get('AWS_ACCESS_KEY')
aws_secret_key = userdata.get('AWS_SECRET_KEY')

session = boto3.Session(
    aws_access_key_id=aws_access_key,
    aws_secret_access_key=aws_secret_key,
    region_name=REGION
)

sagemaker_session = sagemaker.Session(boto_session=session)

# ---------- 📦 MODEL ENV ----------
hub = {
    'HF_MODEL_ID': HF_MODEL_ID,
    'HF_TASK': 'text-generation'
}

#---------- 🛠️ DEFINE MODEL ----------
huggingface_model = HuggingFaceModel(
    transformers_version='4.37.0',
    pytorch_version='2.1.0',
    py_version='py310',
    env=hub,
    role=IAM_ROLE,
    sagemaker_session=sagemaker_session
)

# ---------- 🚀 DEPLOY ----------
#Creating a unique endpoint name using timestamp to avoid conflicts
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
endpoint_name = f"llama31-8b-instruct-lora-endpoint-{timestamp}"

print(f"Deploying to endpoint: {endpoint_name}")

#Using ml.m5.xlarge CPU instance
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.xlarge',
    endpoint_name=endpoint_name
)

print(f"✅ Model deployed to endpoint: {endpoint_name}")
print("⚠️ Note: This endpoint will continue to incur charges (~$0.269/hour) until deleted.")
print("⚠️ CPU inference will be much slower than GPU for this model size.")




Deploying to endpoint: llama31-8b-instruct-lora-endpoint-20250421163927
------!✅ Model deployed to endpoint: llama31-8b-instruct-lora-endpoint-20250421163927
⚠️ Note: This endpoint will continue to incur charges (~$0.269/hour) until deleted.
⚠️ CPU inference will be much slower than GPU for this model size.


*Just testing this AWS appraoch to see the speed but it required me to apply for service quota increase to use a GPU. Which could incure more cost so I paused and focused on HuggingFace L4-GPU.*

*Below are links to deployment Code on hugginfFace Servers which are not for notebook purpose. Running on app.py in the Cloud.*

https://huggingface.co/spaces/SamuelJaja/llama3.1_8B_merged_uk_building_regulations/blob/main/app.py

[Multi-Model RAG Bot](https://huggingface.co/spaces/SamuelJaja/uk-building-regulations-bot?logs=container)

In [3]:
##This appendix outlines the codebase used to deploy the StructureGPT chatbot on Hugging Face Spaces. 
##The deployment leverages a Streamlit-based frontend for user interaction and integrates three LLaMA model variants through 
##a unified Retrieval-Augmented Generation (RAG) pipeline. The backend includes components for semantic chunking, vector database retrieval 
##(via ChromaDB), and model switching via Streamlit UI.

##Please note: The code is intended for deployment via Hugging Face’s cloud environment and cannot be executed directly in Jupyter Notebook 
##environments due to the use of Streamlit and GPU-based APIs (e.g., Groq API for model inference).

Live Deployment (Hugging Face Spaces): https://huggingface.co/spaces/SamueLJaja/uk-building-regulations-bot 

User Interface Deployment Code: https://huggingface.co/spaces/SamuelJaja/uk-building-regulations-bot/blob/main/app.py 

Model Deployment Code: Hugging Face – app.py https://huggingface.co/spaces/SamuelJaja/llama3.1_8B_merged_uk_building_regulations/blob/main/app.py