<a href="https://colab.research.google.com/github/Nandish4470/Claude_3nd_time_/blob/main/Analyzer_2_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Robust parsing helpers to replace brittle table/line parsing in Analyzer_2_1.ipynb
# Save this file in the repository (e.g., at the repo root or in a `utils/` folder)
# and import it from the notebook:
#   from parser_fixes import safe_normalize_and_parse, build_item_breakup, to_number
#
# This module:
# - improves numeric parsing (currency, grouping, parentheses)
# - normalizes DataFrame columns safely (no DataFrame.str)
# - provides parse_item_line with strict + heuristic fallbacks
# - builds the canonical item-breakup DataFrame with columns:
#   ["S No.", "Item No", "Description of Item", "Unit", "Qty", "Rate", "Amount"]

import re
import pandas as pd
import numpy as np

# Improved numeric conversion with currency, grouping and parentheses support
def to_number(value, default=np.nan):
    if value is None:
        return default
    if isinstance(value, (int, float, np.floating, np.integer)):
        try:
            return float(value)
        except Exception:
            return default
    s = str(value).strip()
    if s == "" or s.lower() in ("na", "n/a", "-"):
        return default
    # Remove common currency symbols and non-breaking spaces
    s = s.replace('\xa0', '').replace('$', '').replace('€', '').replace('£', '').replace('₹', '')
    # Handle parentheses as negative
    neg = False
    if s.startswith('(') and s.endswith(')'):
        neg = True
        s = s[1:-1].strip()
    # Remove commas used as thousands separators (but keep decimal points)
    s = s.replace(',', '')
    # Final sanitization: keep digits, dot, minus, plus, and exponent
    s = re.sub(r'[^0-9eE\.\-+]', '', s)
    if s in ("", ".", "-", "+"):
        return default
    try:
        num = float(s)
        return -num if neg else num
    except Exception:
        return default

# Normalize a parsed DataFrame safely (do not call DataFrame.str)
def normalize_table_df(df):
    df = df.copy()
    # Ensure column names are strings
    df.columns = [str(c).strip() for c in df.columns]
    for col in df.columns:
        if pd.api.types.is_object_dtype(df[col]) or pd.api.types.is_string_dtype(df[col]):
            # Use map instead of applymap on entire DataFrame
            df[col] = df[col].astype(str).map(lambda x: x.strip())
    return df

# Primary robust parser for a single free-text item line
def parse_item_line(text_line):
    """
    Parse a single line which may contain:
    S No., Item No, Description, Unit, Qty, Rate, Amount
    Returns a dict with keys:
      "S No.", "Item No", "Description of Item", "Unit", "Qty", "Rate", "Amount"
    Fields not found are None.
    """
    if text_line is None:
        return None
    line = str(text_line).strip()
    if line == "":
        return None

    # 1) Strict regex for fully-structured lines
    strict_re = re.compile(
        r'^\s*(?P<sno>\d{1,6})\s+'
        r'(?P<itemno>[\w\-\./]{1,30})\s+'
        r'(?P<desc>.+?)\s+'
        r'(?P<unit>[A-Za-z/%]{1,10})\s+'
        r'(?P<qty>[\d,().\u00A0-]+)\s+'
        r'(?P<rate>[\d,().\u00A0-]+)\s+'
        r'(?P<amount>[\d,().\u00A0-]+)\s*$'
    )
    m = strict_re.match(line)
    if m:
        return {
            "S No.": m.group('sno'),
            "Item No": m.group('itemno'),
            "Description of Item": m.group('desc').strip(),
            "Unit": m.group('unit'),
            "Qty": to_number(m.group('qty')),
            "Rate": to_number(m.group('rate')),
            "Amount": to_number(m.group('amount'))
        }

    # 2) Looser approach: extract trailing numeric tokens (likely Qty, Rate, Amount)
    tokens = re.split(r'\s{2,}|\t', line)
    if len(tokens) == 1:
        tokens = line.split()

    flat_tokens = []
    for t in tokens:
        parts = t.strip().split()
        flat_tokens.extend([p for p in parts if p != ""])

    def is_numeric_like(tok):
        return bool(re.search(r'\d', tok)) and not re.match(r'^[A-Za-z]+$', tok)

    parsed = {
        "S No.": None,
        "Item No": None,
        "Description of Item": None,
        "Unit": None,
        "Qty": None,
        "Rate": None,
        "Amount": None
    }

    if len(flat_tokens) >= 3:
        last3 = flat_tokens[-3:]
        if all(is_numeric_like(t) for t in last3):
            parsed["Qty"] = to_number(last3[0])
            parsed["Rate"] = to_number(last3[1])
            parsed["Amount"] = to_number(last3[2])
            prefix = flat_tokens[:-3]
            if len(prefix) >= 2 and re.match(r'^\d+$', prefix[0]):
                parsed["S No."] = prefix[0]
                parsed["Item No"] = prefix[1]
                parsed["Description of Item"] = " ".join(prefix[2:]) if len(prefix) > 2 else ""
            else:
                if len(prefix) >= 1 and re.match(r'^[\w\-\./]+$', prefix[0]):
                    parsed["Item No"] = prefix[0]
                    parsed["Description of Item"] = " ".join(prefix[1:]) if len(prefix) > 1 else ""
                else:
                    parsed["Description of Item"] = " ".join(prefix)
            if parsed["Description of Item"]:
                desc_tokens = parsed["Description of Item"].split()
                if len(desc_tokens) >= 1 and re.fullmatch(r'[A-Za-z/%]{1,6}', desc_tokens[-1]):
                    parsed["Unit"] = desc_tokens[-1]
                    parsed["Description of Item"] = " ".join(desc_tokens[:-1]).strip()
            return parsed

    # 3) Pattern where only amount is explicit
    alt_re = re.compile(
        r'^(?:(?P<sno>\d{1,6})\s+)?(?:(?P<itemno>[\w\-\./]{1,30})\s+)?(?P<desc>.+?)\s+'
        r'(?P<amount>[\d,().\u00A0\-]+)$'
    )
    m2 = alt_re.match(line)
    if m2:
        parsed["S No."] = m2.group('sno')
        parsed["Item No"] = m2.group('itemno')
        parsed["Description of Item"] = m2.group('desc').strip()
        parsed["Amount"] = to_number(m2.group('amount'))
        return parsed

    # 4) Last resort: whole line as description
    parsed["Description of Item"] = line
    return parsed

# Assemble final item-breakup DataFrame from parsed rows or raw item-line strings
def build_item_breakup(lines):
    """
    lines: iterable of either dicts (already parsed) or raw strings (item lines)
    Returns: pandas DataFrame with columns:
      ["S No.", "Item No", "Description of Item", "Unit", "Qty", "Rate", "Amount"]
    Numeric columns coerced to floats where possible.
    """
    rows = []
    for entry in lines:
        if entry is None:
            continue
        if isinstance(entry, dict):
            rows.append(entry)
        else:
            parsed = parse_item_line(entry)
            if parsed:
                rows.append(parsed)

    df = pd.DataFrame(rows, columns=["S No.", "Item No", "Description of Item", "Unit", "Qty", "Rate", "Amount"])
    for c in ["Qty", "Rate", "Amount"]:
        df[c] = df[c].apply(lambda v: to_number(v))
    df = df.reset_index(drop=True)
    return df

# Safe helper to try-normalizing parsed tables and logging failures without halting
def safe_normalize_and_parse(table_like, fallback_text_column=None):
    """
    table_like: DataFrame or list-of-strings. If DataFrame, will attempt to find a column that appears to be lines.
    fallback_text_column: explicit column name to use if DataFrame has multiple columns and one contains raw lines.
    Returns: tuple(parsed_df, logs) where parsed_df is the assembled DataFrame and logs is a list of (reason, context).
    """
    logs = []
    lines = []

    if isinstance(table_like, pd.DataFrame):
        df = normalize_table_df(table_like)
        if fallback_text_column and fallback_text_column in df.columns:
            for v in df[fallback_text_column].astype(str).tolist():
                lines.append(v)
        else:
            candidate_cols = list(df.columns)
            chosen = None
            max_score = -1
            for col in candidate_cols:
                col_sample = df[col].dropna().astype(str)
                if col_sample.empty:
                    continue
                digit_frac = col_sample.str.contains(r'\d').mean()
                avg_len = col_sample.str.len().mean()
                score = digit_frac * 2 + (avg_len / 100.0)
                if score > max_score:
                    max_score = score
                    chosen = col
            if chosen is None:
                logs.append(("no_candidate_column", "DataFrame empty or no suitable column found"))
            else:
                for v in df[chosen].astype(str).tolist():
                    lines.append(v)
                logs.append(("chosen_column", chosen))
    elif isinstance(table_like, (list, tuple, pd.Series, np.ndarray)):
        for v in table_like:
            lines.append(v)
    else:
        logs.append(("unsupported_type", str(type(table_like))))
        return pd.DataFrame(columns=["S No.", "Item No", "Description of Item", "Unit", "Qty", "Rate", "Amount"]), logs

    parsed_df = build_item_breakup(lines)
    return parsed_df, logs

# Example usage (for notebook):
# parsed_df, logs = safe_normalize_and_parse(raw_table_df, fallback_text_column='item_line')
# parsed_df.to_csv('improved_item_breakup.csv', index=False)

In [None]:
# Google Colab Notebook Script
# STEP 1: Run → installs
# STEP 2: Upload your tender PDF
# STEP 3: Wait → Full analysis ready!
# Single-cell runnable Colab script. Paste into a Colab code cell and execute.

# ------------------------------
# INSTALL DEPENDENCIES
# ------------------------------
!pip install -q pdfplumber tabula-py PyPDF2 pandas tqdm pytesseract pdf2image openpyxl

# tabula-py requires Java installed on Colab (already present in Colab). If not, user will see warnings but code will continue.
import os
import re
import io
import json
import pickle
import math
import time
import traceback
from collections import defaultdict, Counter
from tqdm import tqdm
from datetime import datetime
from decimal import Decimal, InvalidOperation

import pandas as pd
import pdfplumber
import PyPDF2
try:
    import tabula
except Exception as e:
    tabula = None

from google.colab import files
from IPython.display import Markdown, display, HTML

# ------------------------------
# UTILITIES
# ------------------------------
# Minimal helper functions for numeric parsing and safe operations.
def to_number(s):
    """Try to parse numbers robustly, remove commas and currency symbols."""
    if s is None:
        return None
    if isinstance(s, (int, float, Decimal)):
        return s
    s = str(s).strip()
    if s == '':
        return None
    s = s.replace('\xa0', ' ')
    # remove currency symbols and words
    s = re.sub(r'[^\d\.\-]', '', s)
    try:
        if s.count('.') > 1:
            # If there are multiple dots, remove all but last
            parts = s.split('.')
            s = ''.join(parts[:-1]) + '.' + parts[-1]
        return float(s)
    except Exception:
        try:
            return int(s)
        except Exception:
            return None

def safe_extract_group(m, idx=1):
    try:
        return m.group(idx).strip()
    except Exception:
        return ''

def numeric_tokens_from_line(line):
    """Return last 4 numeric-looking tokens from a line as possible [Unit?, Qty, Rate, Amount]"""
    tokens = re.split(r'\s{2,}|\t', line.strip())
    # flatten tokens by splitting on spaces if necessary if too few tokens
    if len(tokens) < 6:
        tokens = re.split(r'\s+', line.strip())
    nums = []
    for t in reversed(tokens):
        v = to_number(t)
        if v is not None:
            nums.append((t, v))
        if len(nums) >= 4:
            break
    nums.reverse()
    return nums, tokens

def confidence_msg(section, method, score):
    return f"Section {section} parsed with {method} – {score:.0f}% confidence"

def download_file(path):
    try:
        files.download(path)
    except Exception as e:
        print(f"Could not download {path}: {e}")

# ------------------------------
# UPLOAD STEP
# ------------------------------
print("Upload your tender PDF (single file). The uploaded file will appear as a variable `uploaded`.")
uploaded = files.upload()
if not uploaded:
    raise SystemExit("No file uploaded. Re-run and upload a PDF.")

# pick first file
uploaded_fname = list(uploaded.keys())[0]
print(f"Uploaded file: {uploaded_fname}")

# save path
pdf_path = uploaded_fname

# ------------------------------
# PARSING ORCHESTRATOR
# ------------------------------
verbose = True
progress_logs = []
confidence_scores = {}
parse_results = {
    "nit_header": {},
    "schedules_summary": None,
    "item_breakups": {},
    "eligibility_criteria": {"bullets": [], "raw_text": ""},
    "flags": [],
    "top10": None,
    "raw_text_pages": []
}

# read raw PDF pages texts using PyPDF2 (fast)
def method3_pypdf2_text(path):
    texts = []
    try:
        with open(path, 'rb') as f:
            reader = PyPDF2.PdfReader(f)
            for p in range(len(reader.pages)):
                try:
                    texts.append(reader.pages[p].extract_text() or "")
                except Exception:
                    texts.append("")
        return texts, True, "Method 3 (PyPDF2)"
    except Exception as e:
        return [], False, f"Method 3 failed: {e}"

# method2: pdfplumber extract page-level text + tables
def method2_pdfplumber(path):
    pages_text = []
    tables_by_page = {}
    try:
        with pdfplumber.open(path) as pdf:
            for i, page in enumerate(pdf.pages):
                try:
                    t = page.extract_text() or ""
                except Exception:
                    t = ""
                pages_text.append(t)
                # extract simple tables
                try:
                    page_tables = page.extract_tables()
                    if page_tables:
                        tables_by_page[i+1] = page_tables
                except Exception:
                    pass
        return pages_text, tables_by_page, True, "Method 2 (pdfplumber)"
    except Exception as e:
        return [], {}, False, f"Method 2 failed: {e}"

# method1: tabula (attempt at high-quality table extraction)
def method1_tabula(path):
    all_tables = []
    if tabula is None:
        return [], False, "tabula not available"
    try:
        # try lattice first
        try:
            dfs = tabula.read_pdf(path, pages='all', lattice=True, pandas_options={'header': None})
            if isinstance(dfs, list):
                all_tables.extend(dfs)
        except Exception:
            pass
        # try stream as fallback
        try:
            dfs2 = tabula.read_pdf(path, pages='all', lattice=False, pandas_options={'header': None})
            if isinstance(dfs2, list):
                all_tables.extend(dfs2)
        except Exception:
            pass
        return all_tables, True, "Method 1 (tabula)"
    except Exception as e:
        return [], False, f"Method 1 failed: {e}"

# Run the three methods in sequence with retries for robust coverage
max_retries = 3
method_outputs = {}
# Method 1 attempts
for attempt in range(1, max_retries+1):
    try:
        tabula_tables, ok_tabula, tabula_msg = method1_tabula(pdf_path)
        method_outputs['tabula'] = {'ok': ok_tabula, 'msg': tabula_msg, 'tables': tabula_tables}
        if ok_tabula and len(tabula_tables) > 0:
            progress_logs.append(confidence_msg("Tables(All)", "Method 1 (tabula)", 92))
            confidence_scores['tables'] = 92
            break
    except Exception as e:
        progress_logs.append(f"Method1 attempt {attempt} failed: {e}")
        time.sleep(0.3)
else:
    progress_logs.append("Method1 (tabula) exhausted retries.")

# Method 2 attempts
for attempt in range(1, max_retries+1):
    pages_text, tables_by_page, ok_pdfplumber, pdfplumber_msg = method2_pdfplumber(pdf_path)
    method_outputs['pdfplumber'] = {'ok': ok_pdfplumber, 'msg': pdfplumber_msg, 'pages_text': pages_text, 'tables_by_page': tables_by_page}
    if ok_pdfplumber and len(pages_text) > 0:
        progress_logs.append(confidence_msg("Text+Tables(All)", "Method 2 (pdfplumber)", 98))
        confidence_scores['text'] = 98
        break
    time.sleep(0.2)
else:
    progress_logs.append("Method2 (pdfplumber) exhausted retries.")

# Method 3 attempts
for attempt in range(1, max_retries+1):
    texts3, ok_p3, p3msg = method3_pypdf2_text(pdf_path)
    method_outputs['pypdf2'] = {'ok': ok_p3, 'msg': p3msg, 'pages_text': texts3}
    if ok_p3 and len(texts3) > 0:
        progress_logs.append(confidence_msg("RawText(All)", "Method 3 (PyPDF2)", 88))
        confidence_scores['raw_text'] = 88
        break
    time.sleep(0.2)
else:
    progress_logs.append("Method3 (PyPDF2) exhausted retries.")

# Prefer pdfplumber text for downstream parsing (best formatting). Fallback to pypdf2.
pages_text = method_outputs.get('pdfplumber', {}).get('pages_text') or method_outputs.get('pypdf2', {}).get('pages_text') or []
parse_results['raw_text_pages'] = pages_text

# Helper: merge multi-line table fragments into normalized lines
def normalize_text_block(block):
    # Remove repeated multiple spaces, keep newlines
    return '\n'.join([re.sub(r'[ \t]{2,}', '  ', ln).rstrip() for ln in block.splitlines() if ln.strip()!=''])

# ------------------------------
# PARSE NIT HEADER (page 1)
# ------------------------------
# We'll attempt structured extraction from first page text using regexes.
try:
    header_page_text = pages_text[0] if len(pages_text) >= 1 else ""
    header_text = header_page_text.replace('\r','\n')
    parse_results['nit_header_raw'] = header_text

    nit = {}
    # Common fields with flexible regexes
    patterns = {
        'Tender No': r'Tender No[:\s]*([A-Za-z0-9\-\_\/]+)',
        'Name of Work': r'Name of Work\s*(.*?)\n',
        'Bidding type': r'Bidding type\s*(.*?)\n',
        'Tender Type': r'Tender Type\s*(.*?)\n',
        'Tender Closing Date': r'Tender Closing Date\s*Time\s*([0-9\/\:\sA-Za-z]+)',
        'Advertised Value': r'Advertised Value\s*([\d\.,]+)',
        'Earnest Money': r'Earnest Money \(Rs\.\)\s*([\d\.,]+)',
        'Period of Completion': r'Period of Completion\s*([0-9A-Za-z\s]+)',
        'Are JV allowed to bid': r'Are JV allowed to bid\s*(Yes|No)',
        'Tendering Section': r'Tendering Section\s*(.*?)\n',
        'Bidding Start Date': r'Bidding Start Date\s*([0-9\/\:\sA-Za-z]+)',
        'Pre-Bid Conference': r'Pre-Bid Conference\s*(.*?)\n'
    }
    for k, pat in patterns.items():
        m = re.search(pat, header_text, re.IGNORECASE|re.DOTALL)
        if m:
            nit[k] = m.group(1).strip()
    # fallback: extract lines like "Tender No: BCT-24-25-257 Closing Date/Time 04/02/2025 15:00 Hrs"
    m2 = re.search(r'Tender No[:\s]*([A-Za-z0-9\-\_]+)', header_text)
    if m2 and 'Tender No' not in nit:
        nit['Tender No'] = m2.group(1).strip()
    # Advertised value may contain currency or other format
    adv = nit.get('Advertised Value') or re.search(r'Advertised Value\s*([^\n]+)', header_text)
    if adv:
        nit['Advertised Value'] = re.sub(r'[^\d\.,]', '', adv if isinstance(adv, str) else adv.group(1))
    # Try to get Title line preceding "Tender Document"
    first_lines = [ln.strip() for ln in header_text.splitlines() if ln.strip()]
    if first_lines:
        # pick first non-empty large line
        nit['first_lines_sample'] = first_lines[:8]
    parse_results['nit_header'] = nit
    progress_logs.append("NIT header extracted from page 1.")
    confidence_scores['nit_header'] = 96 if nit else 50
except Exception as e:
    progress_logs.append(f"NIT header parse error: {e}")
    parse_results['nit_header'] = {}
    confidence_scores['nit_header'] = 40

# ------------------------------
# PARSE SCHEDULE SUMMARY
# ------------------------------
# Try to extract the Schedule Summary table using tabula results first; otherwise parse from raw text.
schedules_df = None
try:
    # If tabula produced many tables, find the table having header "S.No. Item" or "Schedule"
    tabula_tables = method_outputs.get('tabula', {}).get('tables') or []
    candidate = None
    for df in tabula_tables:
        # Convert to string and handle potential non-string columns before applying upper()
        txt = ' '.join(df.astype(str).fillna('').apply(lambda r: ' '.join(r), axis=1).tolist()).upper()
        if 'S.NO' in txt or 'S.No.' in txt or 'SCHEDULE' in txt:
            candidate = df
            break
    if candidate is not None:
        # try to clean candidate into a schedule summary df
        df = candidate.copy()
        df = df.replace(r'^\s*$', pd.NA, regex=True)
        schedules_df = df
    else:
        # fallback parsing from raw text: search for "SCHEDULE" block
        full = '\n'.join(pages_text[:6])  # schedule summary usually near start
        # find "S.No. Item" header line
        block = ''
        m = re.search(r'2\.\s*SCHEDULE(.*?)3\.\s*ITEM BREAKUP', '\n'.join(pages_text[:8]), re.IGNORECASE|re.DOTALL)
        if m:
            block = m.group(1)
        else:
            # take pages 0-2
            block = '\n'.join(pages_text[:4])
        # attempt to extract lines with Description:- and Amounts
        lines = [ln for ln in block.splitlines() if ln.strip()]
        rows = []
        for ln in lines:
            if re.search(r'Description:-', ln, re.IGNORECASE):
                # previous few lines contain value
                rows.append(ln.strip())
        # build a very small DF fallback
        schedules_df = pd.DataFrame({'raw': rows})
    parse_results['schedules_summary'] = schedules_df
    confidence_scores['schedules_summary'] = 92 if schedules_df is not None else 45
    progress_logs.append("Schedule summary extracted.")
except Exception as e:
    progress_logs.append(f"Schedule summary parse error: {e}")
    parse_results['schedules_summary'] = None
    confidence_scores['schedules_summary'] = 40

# ------------------------------
# PARSE ITEM BREAKUPS (robust line-by-line parser)
# ------------------------------
# Strategy:
#  - Use pdfplumber page tables where available (method 2) to extract row tables directly.
#  - Else, parse page text using regex to find "Item- " blocks and read lines that look like item rows.
#  - Group items by Schedule headings (e.g., "Schedule A1 (2.0 Earthwork)" or "Schedule A3 (5.0 R.C.C work)")
item_breakups = defaultdict(list)
item_tables_dfs = {}

try:
    # Gather page-level table fragments from pdfplumber
    pdfpl_tables = method_outputs.get('pdfplumber', {}).get('tables_by_page', {}) or {}
    # Use both pdfplumber tables and textual parsing
    num_pages = len(pages_text)
    schedule_title = None
    current_schedule = None
    buffer_rows = []
    page_iter = range(num_pages)
    # Pre-scan to find schedule headers and their page indices
    schedule_page_map = {}
    schedule_heading_re = re.compile(r'Schedule\s+A[-\d\w\(\)\.\s]*|Schedule\s+B[-\d\w\(\)\.\s]*|Schedule\s+C[-\d\w\(\)\.\s]*|Schedule\s+Schedule\s+[A-C]', re.IGNORECASE)
    # also look for "Item- " and "Schedule A1 (2.0 Earthwork)" patterns
    schedule_pat2 = re.compile(r'Schedule\s*(A[\d\w]*)\s*\(?([0-9\.]+)?\)?\s*[-:]*\s*(.*)', re.IGNORECASE)
    parent_work_re = re.compile(r'(\d+\.\d+)\s+([A-Z\s&\.\-]+(?:WORK|CONCRETE|STEEL|WATER|DRAINAGE|ROOFING|MASONRY|PLASTER|CLADDING|FINISHING)?)', re.IGNORECASE)

    for p in page_iter:
        pagetxt = pages_text[p] if p < len(pages_text) else ""
        # find explicit "Schedule" lines
        for m in re.finditer(r'(Schedule\s+.*)', pagetxt, re.IGNORECASE):
            title_line = m.group(1).strip()
            # pick the whole line
            ln = title_line.splitlines()[0]
            schedule_page_map[ln] = schedule_page_map.get(ln, []) + [p+1]
    # Now parse pages to extract items: rely primarily on pdfplumber page tables if present
    for p in tqdm(page_iter, desc="Parsing pages", leave=False):
        page_no = p+1
        pt = pages_text[p] if p < len(pages_text) else ""
        # Determine current schedule on this page via heuristics
        # Look for lines like "Item- <n> Schedule A1 (2.0 Earthwork)" or "Item- 1 Schedule A1"
        schedule_found = None
        for line in pt.splitlines():
            if re.search(r'Item[-\s]*\d+\s*Schedule', line, re.IGNORECASE):
                schedule_found = line.strip()
                break
            if re.search(r'Schedule\s+A\d*\s*\(.*\)', line, re.IGNORECASE):
                schedule_found = line.strip()
                break
            if re.search(r'Schedule\s+A[-\d\w]*', line, re.IGNORECASE) and 'Item' in line:
                schedule_found = line.strip()
                break
            # also look for "Schedule A-All DSR 2021 Items" heading
            if re.search(r'Schedule\s+.*DSR\s*2021', line, re.IGNORECASE):
                schedule_found = line.strip()
                break
        if schedule_found:
            current_schedule = schedule_found
        # If pdfplumber has table(s) on this page, attempt to parse meaningful ones
        if (p+1) in pdfpl_tables and pdfpl_tables[p+1]:
            for t in pdfpl_tables[p+1]:
                # t is a table represented as list of lists
                # convert to dataframe and try to detect columns
                try:
                    df = pd.DataFrame(t)
                    # heuristics: if df has numeric columns near right side, try to set them as Rate/Amount
                    # Normalize columns: merge multi-row headers
                    # flatten header if it's repeating
                    df = df.replace('', pd.NA).dropna(how='all', axis=1)
                    # convert all to string temporarily, handling potential non-string data
                    df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
                    # Determine if this table contains item rows by scanning for numeric values in last columns
                    right_numeric = False
                    for col in df2.columns[-4:]:
                        # check many numeric tokens
                        numeric_count = df2[col].apply(lambda cell: bool(re.search(r'\d', str(cell)))).sum()
                        if numeric_count > 0:
                            right_numeric = True
                    if right_numeric:
                        # attempt to set column names
                        # try to locate header row by searching for 'S No' or 'Item No' or 'Description'
                        headerRowIdx = None
                        for ridx in range(min(3, len(df2))):
                            rowtxt = ' '.join(df2.iloc[ridx].fillna('').tolist()).lower()
                            if 'item no' in rowtxt or 'description' in rowtxt or 's no' in rowtxt or 'qty' in rowtxt:
                                headerRowIdx = ridx
                                break
                        if headerRowIdx is not None:
                            header = [str(x).strip() for x in df2.iloc[headerRowIdx].tolist()]
                            data_df = df2.iloc[headerRowIdx+1:].copy()
                            data_df.columns = header
                            # Keep only likely useful columns and rename to standard columns
                            std_cols = {}
                            for c in data_df.columns:
                                cc = str(c).lower() # Ensure column name is string
                                if 's no' in cc or re.match(r'^\s*s\.?\s*no', cc):
                                    std_cols[c] = 'S No.'
                                elif 'item' in cc and 'no' in cc:
                                    std_cols[c] = 'Item No'
                                elif 'description' in cc or 'desc' in cc:
                                    std_cols[c] = 'Description of Item'
                                elif 'unit' == cc.strip() or 'unit' in cc:
                                    std_cols[c] = 'Unit'
                                elif 'qty' in cc or 'quantity' in cc:
                                    std_cols[c] = 'Qty'
                                elif 'rate' in cc:
                                    std_cols[c] = 'Rate'
                                elif 'amount' in cc or 'value' in cc:
                                    std_cols[c] = 'Amount'
                                else:
                                    # try to guess by cell examples
                                    # if column contains mostly digits -> Rate/Amount/Qty
                                    col_sample = data_df[c].astype(str).str.replace(',','').str.replace('₹','')
                                    digits_frac = col_sample.str.match(r'^\s*[\d\.\-]+\s*$').sum()
                                    if digits_frac > len(data_df)/2:
                                        # if already have Qty then this is Rate/Amount
                                        if 'Qty' not in std_cols.values():
                                            std_cols[c] = 'Qty'
                                        elif 'Rate' not in std_cols.values():
                                            std_cols[c] = 'Rate'
                                        else:
                                            std_cols[c] = 'Amount'
                                    else:
                                        std_cols[c] = c
                            standardized = data_df.rename(columns=std_cols)
                            # Ensure standardized has the required columns present (fallback to heuristics)
                            for req in ['S No.', 'Item No', 'Description of Item', 'Unit', 'Qty', 'Rate', 'Amount']:
                                if req not in standardized.columns:
                                    standardized[req] = pd.NA
                            # Trim rows where Item No and Description both are empty
                            standardized = standardized[[ 'S No.', 'Item No', 'Description of Item', 'Unit', 'Qty', 'Rate', 'Amount']]
                            # add schedule
                            sched_key = current_schedule or f"Page_{page_no}_table"
                            item_tables_dfs.setdefault(sched_key, []).append(standardized)
                            progress_logs.append(f"Parsed table on page {page_no} with pdfplumber -> schedule {sched_key}")
                except Exception as e:
                    # ignore and continue
                    progress_logs.append(f"Table parse error on page {page_no}: {e}")
        # Fallback textual row parsing: find lines that look like item rows
        # We'll parse line by line trying to detect rows with S No leading numeric
        lines = [ln for ln in pt.splitlines() if ln.strip()]
        for i, ln in enumerate(lines):
            # detect start of an item block like "Item- 1 Schedule A1 (2.0 Earthwork)" or "Item- 1 Schedule A1"
            if re.match(r'Item[-\s]*\d+\s*Schedule', ln, re.IGNORECASE):
                # set current schedule based on this line (prefer parentheses)
                current_schedule = ln.strip()
                continue
            # detect item rows: start with number optionally then item number e.g., "1 2.6.1 Earth work ... cum 1600 205.45 328720"
            mrow = re.match(r'^\s*(\d+)\s+([0-9\.]+)\s+(.*)$', ln)
            if mrow:
                sno = mrow.group(1)
                item_no = mrow.group(2)
                rest = mrow.group(3)
                # attempt to extract unit, qty, rate, amount from end of rest
                nums, tokens = numeric_tokens_from_line(rest)
                # numeric_tokens may return last numeric tokens
                unit = None; qty=None; rate=None; amount=None; desc=None
                if nums and len(nums) >= 3:
                    # choose last numeric as amount, previous as rate, earlier as qty
                    try:
                        # find positions of these tokens in original split tokens
                        amount = nums[-1][1]
                        rate = nums[-2][1] if len(nums)>=2 else None
                        qty = nums[-3][1] if len(nums)>=3 else None
                        # Unit is token just before qty in tokens if present
                        # reconstruct description by removing numeric tokens at end
                        # convert tokens to list
                        toks = re.split(r'\s{2,}|\t', rest)
                        if len(toks) < 4:
                            toks = re.split(r'\s+', rest)
                        # Remove trailing numeric tokens from rest to get description
                        desc = rest
                        # strip numeric matches from end
                        desc = re.sub(r'(\s*[\d\.,]+\s*)+$', '', desc).strip()
                        # attempt to find unit as the token just before Qty number string
                        # find original qty token string
                        qty_token_str = nums[-3][0] if len(nums)>=3 else None
                        if qty_token_str:
                            # locate in rest
                            idx = rest.rfind(qty_token_str)
                            if idx != -1:
                                left = rest[:idx].strip()
                                # last token of left likely is Unit
                                u_toks = left.split()
                                if u_toks:
                                    unit = u_toks[-1]
                                    # remove unit from desc if appended
                                    if desc.endswith(unit):
                                        desc = desc[: -len(unit)].strip()
                    except Exception:
                        pass
                else:
                    # try other pattern: some rows have 'cum 1600 205.45 328720' with item no on previous line
                    parts = rest.split()
                    # attempt to find first numeric token from right side
                    right_nums = [to_number(tok) for tok in reversed(parts)]
                    right_nums = [r for r in right_nums if r is not None]
                    if len(right_nums) >= 1:
                        amount = right_nums[0]
                    if len(right_nums) >= 2:
                        rate = right_nums[1]
                    if len(right_nums) >= 3:
                        qty = right_nums[2]
                    # attempt to find unit as first non-numeric near the right
                    unit = None
                    for tok in reversed(parts):
                        if to_number(tok) is None:
                            unit = tok
                            break
                    desc = rest
                row = {
                    'S No.': sno,
                    'Item No': item_no,
                    'Description of Item': desc,
                    'Unit': unit,
                    'Qty': qty,
                    'Rate': rate,
                    'Amount': amount,
                    'Page': page_no
                }
                key = current_schedule or f"Schedule_Page_{page_no}"
                item_breakups[key].append(row)
            else:
                # Other common pattern: lines starting with item number like "2 2.33.1 Beyond 30 cm..."
                m2 = re.match(r'^\s*([0-9]+)\s+([0-9]+\.[0-9]+\.[0-9]+)\s+(.*)$', ln)
                if m2:
                    sno = m2.group(1); item_no = m2.group(2); rest = m2.group(3)
                    nums, tokens = numeric_tokens_from_line(rest)
                    qty=None; rate=None; amount=None; unit=None; desc=rest
                    if nums and len(nums)>=3:
                        amount = nums[-1][1]; rate = nums[-2][1]; qty = nums[-3][1]
                        desc = re.sub(r'(\s*[\d\.,]+\s*)+$', '', rest).strip()
                        # try to deduce unit by finding token before qty token
                        # simple heuristics: last word before qty number is unit in many lines
                        # use tokens split on spaces
                        tks = re.split(r'\s+', rest)
                        # find position of first occurrence of str(qty) from right
                        qstr = str(int(qty)) if isinstance(qty, (int,float)) and float(qty).is_integer() else str(qty)
                        pos = None
                        for ix in range(len(tks)-1,-1,-1):
                            if re.sub(r'[^\d\.\-]','', tks[ix]) == qstr:
                                pos = ix
                                break
                        if pos and pos-1 >= 0:
                            unit = tks[pos-1]
                    row = {
                        'S No.': sno,
                        'Item No': item_no,
                        'Description of Item': desc,
                        'Unit': unit,
                        'Qty': qty,
                        'Rate': rate,
                        'Amount': amount,
                        'Page': page_no
                    }
                    key = current_schedule or f"Schedule_Page_{page_no}"
                    item_breakups[key].append(row)
    # convert collected lists into DataFrames and normalize types
    normalized_item_tables = {}
    for k, rows in item_breakups.items():
        if isinstance(rows, list) and rows:
            df = pd.DataFrame(rows)
            # ensure columns exist
            for col in ['S No.', 'Item No', 'Description of Item', 'Unit', 'Qty', 'Rate', 'Amount']:
                if col not in df.columns:
                    df[col] = pd.NA
            # coerce numeric columns
            df['Qty'] = df['Qty'].apply(lambda x: to_number(x))
            df['Rate'] = df['Rate'].apply(lambda x: to_number(x))
            df['Amount'] = df['Amount'].apply(lambda x: to_number(x))
            normalized_item_tables[k] = df[['S No.', 'Item No', 'Description of Item', 'Unit', 'Qty', 'Rate', 'Amount', 'Page']]
    # Also integrate tables parsed via pdfplumber converted to dataframes earlier (item_tables_dfs)
    for sched_k, list_of_dfs in item_tables_dfs.items():
        combined_df = pd.concat(list_of_dfs, ignore_index=True)
        # Ensure required columns exist before processing
        for col in ['S No.', 'Item No', 'Description of Item', 'Unit', 'Qty', 'Rate', 'Amount']:
            if col not in combined_df.columns:
                combined_df[col] = pd.NA
        # Convert numeric columns to appropriate types, coercing errors
        for col in ['Qty', 'Rate', 'Amount']:
             combined_df[col] = pd.to_numeric(combined_df[col], errors='coerce')
        # Keep only the desired columns and assign to normalized_item_tables
        normalized_item_tables[sched_k] = combined_df[['S No.', 'Item No', 'Description of Item', 'Unit', 'Qty', 'Rate', 'Amount']]

    parse_results['item_breakups'] = normalized_item_tables
    confidence_scores['item_breakups'] = 90 if normalized_item_tables else 40
    progress_logs.append(f"Extracted item breakups for {len(normalized_item_tables)} schedules.")
except Exception as e:
    progress_logs.append(f"Item breakup extraction error: {e}\n{traceback.format_exc()}")
    parse_results['item_breakups'] = {}
    confidence_scores['item_breakups'] = 30

# ------------------------------
# GROUP Schedule A perfectly by parent class using regex
# ------------------------------
grouped_schedules = {}
try:
    # For each schedule key, attempt to detect parent class via pattern
    pattern = re.compile(r"(\d+\.\d+)\s+([A-Z0-9\.\s&\-]+(?:WORK|CONCRETE|STEEL|WATER|DRAINAGE|ROOFING|MASONRY|PLASTER|CLADDING|FINISHING)?)", re.IGNORECASE)
    for sched_key, df in parse_results['item_breakups'].items():
        # try to search in schedule key name first
        m = pattern.search(sched_key)
        groupname = None
        if m:
            groupname = m.group(2).strip().upper()
        else:
            # search inside descriptions for strong occurrence of parent header like "5.0 REINFORCED CEMENT CONCRETE"
            found = None
            for text in parse_results['raw_text_pages'][:8]:
                mm = pattern.search(text)
                if mm:
                    found = mm.group(2).strip().upper()
                    break
            groupname = found or sched_key
        # Normalize group name
        group_label = f"{sched_key} | {groupname}"
        grouped_schedules[group_label] = df
    parse_results['item_breakups_grouped'] = grouped_schedules
    progress_logs.append("Grouped Schedule A entries by parent class using regex.")
    confidence_scores['grouping'] = 92
except Exception as e:
    progress_logs.append(f"Schedule grouping error: {e}")
    parse_results['item_breakups_grouped'] = parse_results['item_breakups']
    confidence_scores['grouping'] = 40

# ------------------------------
# ELIGIBILITY CRITERIA (search pages 15-35)
# ------------------------------
eligibility_bullets = []
elig_raw = []
try:
    # pages are 1-indexed to user; convert to 0-index
    start = max(0, 14)
    end = min(len(pages_text), 35)
    block = '\n'.join(pages_text[start:end])
    parse_results['eligibility_raw_block'] = block
    # search for headings
    keywords = ['ELIGIBILITY', 'QUALIFICATION', 'FINANCIAL CRITERIA']
    # split into lines and pick lines that look like bullet points or contain keywords
    lines = [ln.strip() for ln in block.splitlines() if ln.strip()]
    for i, ln in enumerate(lines):
        uln = ln.upper()
        if any(k in uln for k in keywords) or re.match(r'^\d+\.', ln) or re.match(r'^[A-Za-z]\)', ln) or ln.startswith('-') or ln.startswith('*'):
            # take this and next few lines as a bullet chunk
            chunk = ln
            # append subsequent short lines until next numbered heading
            j = i+1
            while j < len(lines) and len(lines[j]) < 200 and not re.match(r'^\d+\.', lines[j]):
                # stop if blank or uppercase title
                if lines[j].isupper() and len(lines[j].split())<8:
                    break
                chunk += ' ' + lines[j]
                j += 1
            eligibility_bullets.append(chunk.strip())
    # normalize bullets: remove long legal paragraphs & keep shorter bullets
    filtered = []
    for b in eligibility_bullets:
        if len(b.split()) > 6 and len(b.split()) < 200:
            filtered.append(b)
    # fallback: if none found, extract all text between "4. ELIGIBILITY CONDITIONS" and "5. COMPLIANCE"
    if not filtered:
        m = re.search(r'4\.\s*ELIGIBILITY CONDITIONS(.*)5\.\s*COMPLIANCE', block, re.IGNORECASE | re.DOTALL)
        if m:
            blk = m.group(1)
            bullets = [ln.strip() for ln in blk.splitlines() if ln.strip()]
            filtered = bullets
    parse_results['eligibility_criteria'] = {'bullets': filtered, 'raw_text': block}
    confidence_scores['eligibility'] = 90 if filtered else 45
    progress_logs.append(f"Extracted {len(filtered)} eligibility bullets from pages 15-35.")
except Exception as e:
    progress_logs.append(f"Eligibility extraction error: {e}")
    parse_results['eligibility_criteria'] = {'bullets': [], 'raw_text': ''}
    confidence_scores['eligibility'] = 30

# ------------------------------
# TOP 10 COST DRIVERS (global)
# ------------------------------
top10_list = []
try:
    # Combine all item_breakups and compute amounts per top-level description tokens
    all_items = []
    for sched_label, df in parse_results['item_breakups_grouped'].items():
        if isinstance(df, pd.DataFrame):
            tmp = df.copy()
            tmp['Schedule'] = sched_label
            all_items.append(tmp)
    if all_items:
        big = pd.concat(all_items, ignore_index=True, sort=False)
        # ensure Amount numeric and handle potential non-numeric values
        big['Amount'] = pd.to_numeric(big['Amount'], errors='coerce').fillna(0.0) # Ensure fillna after to_numeric
        # Create a coarse "Category" using Item No or Schedule header or Description prefix
        def category_from_row(r):
            # prefer schedule parent if includes keywords like R.C.C
            sched = r.get('Schedule') or ''
            # attempt to extract e.g., "REINFORCED CEMENT CONCRETE"
            m = parent_work_re.search(sched)
            if m:
                return m.group(2).strip().upper()
            # fallback: look into Description for keywords
            desc = str(r.get('Description of Item') or '').upper()
            if 'CONCRETE' in desc:
                return 'R.C.C WORK / CONCRETE'
            if 'STEEL' in desc or 'REINFORCEMENT' in desc:
                return 'STEEL / REINFORCEMENT'
            if 'EARTH' in desc or 'EXCAVATION' in desc:
                return 'EARTHWORK'
            # fallback to first words of schedule label
            return sched.split('|')[-1].strip()[:40].upper()
        big['Category'] = big.apply(category_from_row, axis=1)
        # Drop rows where 'Category' is None before grouping
        big = big.dropna(subset=['Category'])
        catsum = big.groupby('Category', dropna=False)['Amount'].sum().reset_index().sort_values('Amount', ascending=False)
        total_amt = catsum['Amount'].sum() if not catsum.empty else 0.0
        topk = catsum.head(10)
        # For top items, also extract prominent subitems (by description) within that category
        top10_result = []
        for _, row in topk.iterrows():
            cat = row['Category']
            amt = row['Amount']
            pct = (amt/total_amt*100) if total_amt>0 else 0.0
            sub = big[big['Category']==cat].copy()
            # find top 2 descriptions by amount
            # Ensure 'Description of Item' exists and handle potential non-string data
            if 'Description of Item' in sub.columns:
                 sub['Description of Item'] = sub['Description of Item'].astype(str)
            else:
                 sub['Description of Item'] = '' # Add column if missing
            subsum = sub.groupby('Description of Item')['Amount'].sum().reset_index().sort_values('Amount', ascending=False).head(5)
            top10_result.append({'Category': cat, 'Amount': amt, 'Pct': pct, 'TopDescriptions': list(subsum.to_dict('records'))})
        parse_results['top10'] = {'total': total_amt, 'top10': top10_result}
        confidence_scores['top10'] = 88 if top10_result else 40
        progress_logs.append("Computed top 10 cost drivers.")
    else:
        parse_results['top10'] = {'total':0,'top10':[]}
        confidence_scores['top10'] = 20
except Exception as e:
    progress_logs.append(f"Top10 computation error: {e}")
    parse_results['top10'] = {'total':0,'top10':[]}
    confidence_scores['top10'] = 20

# ------------------------------
# RISK ANALYSIS heuristics (simple)
# ------------------------------
risk_notes = []
try:
    # Heuristics: if any item description mentions 'dismantling' with qty large -> high safety risk
    for sched, df in parse_results['item_breakups_grouped'].items():
        if isinstance(df, pd.DataFrame):
            for desc, qty in zip(df['Description of Item'].fillna(''), df['Qty'].fillna(0)):
                d = str(desc).lower()
                if 'dismantl' in d or 'demolis' in d:
                    if qty and to_number(qty) and to_number(qty) > 10:
                        risk_notes.append(f"Dismantling {qty} units → High safety risk")
                if 'steel' in d or 'thermo' in d or 'fe-500' in d:
                    # aggregate steel qty
                    risk_notes.append("Steel items present → Price volatility alert")
                if 'months' in d or 'period of completion' in d:
                    # capture timeline
                    pass
    # timeline check from nit header
    period = parse_results['nit_header'].get('Period of Completion','')
    if period:
        risk_notes.append(f"Timeline: {period} → check feasibility")
    # add some static heuristics
    risk_notes = list(dict.fromkeys(risk_notes))  # unique
    parse_results['risk_analysis'] = risk_notes
    confidence_scores['risk'] = 70
except Exception:
    parse_results['risk_analysis'] = []
    confidence_scores['risk'] = 30

# ------------------------------
# FLAGS (JV NOT ALLOWED, Single Packet System, Earnest Money)
# ------------------------------
flags = []
try:
    header = parse_results['nit_header']
    # JV NOT ALLOWED
    jv_allowed = header.get('Are JV allowed to bid') or ''
    if jv_allowed.strip().lower() in ['no','not allowed','n']:
        flags.append({'text':'JV NOT ALLOWED', 'severity':'red'})
    # Single Packet System: look in raw text for token
    whole_text = '\n'.join(pages_text[:6])
    if re.search(r'Single Packet System', whole_text, re.IGNORECASE):
        flags.append({'text':'Single Packet System', 'severity':'red'})
    # Earnest Money
    em = header.get('Earnest Money') or ''
    if em:
        # format in INR with commas
        em_num = to_number(em)
        try:
            em_fmt = f"₹{int(em_num):,}" if em_num else em
        except Exception:
            em_fmt = em
        flags.append({'text':f"Earnest Money: {em_fmt}", 'severity':'red'})
    parse_results['flags'] = flags
    confidence_scores['flags'] = 95 if flags else 40
except Exception as e:
    parse_results['flags'] = []
    confidence_scores['flags'] = 40
    progress_logs.append(f"Flags detection error: {e}")

# ------------------------------
# Prepare Beautiful Markdown View
# ------------------------------
try:
    # Header top line
    tender_no = parse_results['nit_header'].get('Tender No') or os.path.splitext(os.path.basename(pdf_path))[0]
    # Advertised value guess
    adv_val = parse_results['nit_header'].get('Advertised Value') or ''
    # try to format rupee
    adv_amt = to_number(adv_val) or None
    adv_str = f"₹{adv_amt:,.2f}" if adv_amt is not None else adv_val
    # Title (try to extract station or short from Name of Work)
    name_work = parse_results['nit_header'].get('Name of Work') or ''
    title_line = f"# TENDER {tender_no} | {name_work[:30]} | {adv_str}"
    md_lines = [title_line, '\n']
    # NIT HEADER table
    md_lines.append("## NIT HEADER")
    # convert nit header dict into two-column table
    nit = parse_results['nit_header']
    if nit:
        md_lines.append("| Field | Value |")
        md_lines.append("|---|---|")
        for k,v in nit.items():
            md_lines.append(f"| {k} | {str(v).replace('|','\\|')} |")
    else:
        md_lines.append("_No header parsed._")
    md_lines.append("\n")
    # Schedule summary
    md_lines.append("## SCHEDULE SUMMARY")
    ss = parse_results['schedules_summary']
    if isinstance(ss, pd.DataFrame):
        try:
            md_lines.append(ss.to_markdown(index=False))
        except Exception:
            md_lines.append("`Schedule summary present (DataFrame)`")
    else:
        md_lines.append("_Schedule summary not parsed into table. Raw snippet:_")
        md_lines.append("```\n" + (str(ss)[:400] if ss else "None") + "\n```")
    md_lines.append("\n")
    # TOP 10 COST DRIVERS
    md_lines.append("## TOP 10 COST DRIVERS (Global)")
    top10 = parse_results.get('top10', {})
    total_amt = top10.get('total', 0)
    if top10 and top10.get('top10'):
        n=1
        for item in top10['top10']:
            cat = item['Category']
            amt = item['Amount'] or 0
            pct = item['Pct'] or 0.0
            md_lines.append(f"{n}. {cat} → {pct:.2f}% (₹{amt:,.2f})")
            # list top descriptions (limit 3)
            for td in item.get('TopDescriptions', [])[:3]:
                desc = td.get('Description of Item') or td.get('Description','') # Use get with default
                a = td.get('Amount') or 0
                # attempt to get qty and unit by searching big DF
                md_lines.append(f"   ├─ {desc[:60]} → ₹{a:,.2f}")
            n += 1
    else:
        md_lines.append("_No cost drivers computed._")
    md_lines.append("\n")
    # ELIGIBILITY
    md_lines.append("## ELIGIBILITY CRITERIA")
    bullets = parse_results.get('eligibility_criteria', {}).get('bullets', [])
    raw_elig = parse_results.get('eligibility_criteria', {}).get('raw_text', '')
    if bullets:
        for b in bullets:
            md_lines.append(f"- {b}")
    else:
        md_lines.append("_No bulletized eligibility criteria found; raw text below:_")
        md_lines.append("```")
        md_lines.append((raw_elig[:800] + '...') if len(raw_elig)>800 else raw_elig)
        md_lines.append("```")
    md_lines.append("\n")
    # RISK ANALYSIS
    md_lines.append("## RISK ANALYSIS")
    risks = parse_results.get('risk_analysis', [])
    if risks:
        for r in risks:
            md_lines.append(f"• {r}")
    else:
        md_lines.append("• _No automatic risks detected. Manual review recommended._")
    md_lines.append("\n")
    # FLAGS
    md_lines.append("## FLAGS")
    for f in parse_results.get('flags', []):
        # mark in red using HTML (Markdown doesn't have red text natively)
        md_lines.append(f"<span style='color:red; font-weight:bold'>{f['text']}</span>")
    md_lines.append("\n")
    # CONFIDENCE + LOGS
    md_lines.append("## PARSING LOGS & CONFIDENCE")
    md_lines.append("**Confidence scores:**")
    for k,v in confidence_scores.items():
        md_lines.append(f"- {k}: {v}%")
    md_lines.append("\n**Progress logs (highlights):**")
    for pl in progress_logs[-12:]:
        md_lines.append(f"- {pl}")
    # Render Markdown
    md_full = '\n'.join(md_lines)
    display(Markdown(md_full))
except Exception as e:
    print("Error while preparing markdown view:", e)
    print(traceback.format_exc())

# ------------------------------
# EXPORTS: top10.csv, full_breakup.json, parsed_data.pkl
# ------------------------------
# Create top10.csv (simple flattened)
try:
    top10_flat = []
    for entry in parse_results.get('top10', {}).get('top10', []):
        cat = entry['Category']
        amt = entry['Amount']
        pct = entry['Pct']
        # include top descriptions
        for td in entry.get('TopDescriptions', []):
            top10_flat.append({
                'Category': cat,
                'CategoryAmount': amt,
                'CategoryPct': pct,
                'Description': td.get('Description of Item') or td.get('Description',''), # Use get with default
                'DescAmount': td.get('Amount') or 0
            })
    top10_df = pd.DataFrame(top10_flat)
    top10_csv = 'top10.csv'
    top10_df.to_csv(top10_csv, index=False)
    print(f"Saved {top10_csv}")
except Exception as e:
    print("top10.csv creation failed:", e)

# full_breakup.json
try:
    # prepare serializable dict for item_breakups_grouped
    ser = {}
    for k, df in parse_results.get('item_breakups_grouped', {}).items():
        try:
            # Convert DataFrame to dictionary, handling potential non-serializable types
            ser[k] = df.fillna('').to_dict(orient='records')
        except Exception:
            ser[k] = []
    full_json_path = 'full_breakup.json'
    with open(full_json_path, 'w', encoding='utf-8') as f:
        json.dump({
            'nit_header': parse_results.get('nit_header', {}),
            'schedules_summary': None if parse_results.get('schedules_summary') is None else (parse_results['schedules_summary'].to_dict('records') if isinstance(parse_results['schedules_summary'], pd.DataFrame) else str(parse_results['schedules_summary'])),
            'item_breakups_grouped': ser,
            'eligibility_criteria': parse_results.get('eligibility_criteria', {}),
            'top10': parse_results.get('top10', {}),
            'flags': parse_results.get('flags', [])
        }, f, indent=2, ensure_ascii=False)
    print(f"Saved {full_json_path}")
except Exception as e:
    print("full_breakup.json creation failed:", e)

# parsed_data.pkl
try:
    pkl_path = 'parsed_data.pkl'
    with open(pkl_path, 'wb') as f:
        pickle.dump(parse_results, f)
    print(f"Saved {pkl_path}")
except Exception as e:
    print("parsed_data.pkl creation failed:", e)

# ------------------------------
# Provide export buttons (downloads)
# ------------------------------
print("\nExport files available for download:")
for path in ['top10.csv', 'full_breakup.json', 'parsed_data.pkl']:
    if os.path.exists(path):
        print(f"- {path}")
    else:
        print(f"- {path} (missing)")

# Attempt to provide automatic downloads (Colab will prompt)
try:
    # Use download but wrap in try/except to avoid crashes
    print("\nAttempting to trigger downloads (browser will prompt)...")
    for path in ['top10.csv', 'full_breakup.json', 'parsed_data.pkl']:
        if os.path.exists(path):
            try:
                files.download(path)
            except Exception as e:
                print(f"Could not auto-download {path}: {e}")
except Exception as e:
    print("Auto-download failed:", e)

# ------------------------------
# PROGRESS SUMMARY (display)
# ------------------------------
print("\n--- PARSING SUMMARY ---")
print(f"File: {pdf_path}")
print(f"Pages processed: {len(pages_text)}")
print("Confidence scores:")
for k,v in confidence_scores.items():
    print(f" - {k}: {v}%")
print("\nTop logs:")
for ln in progress_logs[-10:]:
    print(" -", ln)

# Ensure notebook never crashes: final safety output of partial results counts
print("\nPartial counts (guaranteed):")
print(" - NIT header fields:", len(parse_results.get('nit_header', {})))
print(" - Schedule summaries parsed:", 0 if parse_results.get('schedules_summary') is None else (len(parse_results['schedules_summary']) if isinstance(parse_results['schedules_summary'], pd.DataFrame) else 1))
print(" - Item breakup schedules:", len(parse_results.get('item_breakups_grouped', {})))
print(" - Eligibility bullets:", len(parse_results.get('eligibility_criteria', {}).get('bullets', [])))

# Save a small human-readable markdown export too
try:
    with open('parsed_summary.md', 'w', encoding='utf-8') as f:
        f.write(md_full)
    print("Saved parsed_summary.md")
except Exception:
    pass

# STEP COMPLETE
print("\nAnalysis complete. Use the downloaded files for further inspection.")
# End of script

# Suggestions for next actions (short; as comments)
# **a.** (Optional) Run the notebook's parsing cell again with `tabula` available in the environment and Java enabled for likely improved table extraction.
# **b.** (Optional) Ask to add unit-tests for parser functions or to export Excel (xlsx) with separate sheets per schedule.

Upload your tender PDF (single file). The uploaded file will appear as a variable `uploaded`.


Saving NIT-BCT-24-25-257.pdf to NIT-BCT-24-25-257 (1).pdf
Uploaded file: NIT-BCT-24-25-257 (1).pdf


  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astype(str).applymap(lambda x: x.strip() if isinstance(x, str) else x)
  df2 = df.astyp

# TENDER BCT-24-25-257 | storage tank under the jurisdi | ₹42,145,189.36


## NIT HEADER
| Field | Value |
|---|---|
| Tender No | BCT-24-25-257 |
| Name of Work | storage tank under the jurisdiction of Sr. DEN/North/MMCT. |
| Bidding type | Normal Tender |
| Tender Type | Open Bidding System Single Packet System |
| Advertised Value | 42145189.36 |
| Earnest Money | 360700.00 |
| Period of Completion | 18 Months
Contract Type Works |
| Are JV allowed to bid | No |
| Tendering Section | CETR/N/II |
| Bidding Start Date | 21/01/2025
Number of JV Member
Are JV allowed to bid No 0
Allowed
Are Consortium allowed Number of Consortium
No 0
to bid Member Allowed
Ranking Order For Bids Lowest to Highest Expenditure Type Capital |
| Pre-Bid Conference | Pre-Bid Conference Date |
| first_lines_sample | ['MUMBAI CENTRAL DIVISION-ENGINEERING/WESTERN RLY', 'TENDER DOCUMENT', 'Tender No: BCT-24-25-257 Closing Date/Time: 04/02/2025 15:00', 'Divisional Railway Manager Works Mumbai Central acting for and on behalf of The President of India invites E-', 'Tenders against Tender No BCT-24-25-257 Closing Date/Time 04/02/2025 15:00 Hrs. Bidders will be able to submit', 'their original/revised bids upto closing date and time only. Manual offers are not allowed against this tender, and any', 'such manual offer received shall be ignored.', '1. NIT HEADER'] |


## SCHEDULE SUMMARY
| raw                                                                       |
|:--------------------------------------------------------------------------|
| Description:- Schedule A1 (2.0 Earthwork)                                 |
| Description:- Schedule A2 (4.0 Concrete work)                             |
| Description:- Schedule A3 (5.0 R.C.C work)                                |
| Description:- Schedule A4 (6.0 Masonary work)                             |
| Description:- Schedule A5 (8.0 Cladding work)                             |
| Description:- Schedule A6 (9.0 Wood and P.V.C work)                       |
| Description:- Schedule A7 (10.0 Steel work)                               |
| Description:- Schedule A8 (11.0 Flooring work)                            |
| Description:- Schedule A9 (12.0 Roofing work)                             |
| Description:- Schedule A10 (13.0 Finishing work)                          |
| Description:- Schedule A11 (14.0 Repairs to building)                     |
| Description:- Schedule A12 (15.0 Dismantling and Demolishing)             |
| Description:- Schedule A13 (16.0 Road work)                               |
| Description:- Schedule A14 (18.0 Water Supply)                            |
| Description:- Schedule A15 (19.0 Drainage)                                |
| Description:- Schedule A16 (21.0 Aluminum work)                           |
| Description:- Schedule A17 (22.0 Water proofing)                          |
| Description:- Schedule A18 (23.0 Rain water harvesting)                   |
| Description:- All USSOR-2021 items                                        |
| Description:- Hand packed dry rubble soling                               |
| Description:- Supply and fixing signage board                             |
| Description:- Providing and laying 35 mm thick heavy duty chequered tiles |
| Description:- Supply & Fixing brass name plate                            |


## TOP 10 COST DRIVERS (Global)
_No cost drivers computed._


## ELIGIBILITY CRITERIA
- 4. ELIGIBILITY CONDITIONS Standard Financial Criteria S.No. Description ConfirmationRemarks Documents Required Allowed Uploading Financial Eligibility Criteria: The tenderer must have minimum average annual contractual turnover of[V/N or 'V' which ever is less; where V = Advertised value of the tender in crores of Rupees N= Number of years prescribed for completion of work for which bids have been invited. The average annual contractual turnover shall be calculated as an average of "total contractual payments'' in the previous three financial years, as per the audited balance sheet. However, in case balance sheet of the Allowed 1 previous year is yet to be prepared/ audited, the audited balance sheet No No (Mandatory) of the fourth previous year shall be considered for calculating average annual contractual turnover. The tenderers shall submit requisite information as per Annexure-VIB, along with copies of Audited Balance Sheets duly certified by the Chartered Accountant/ Certificate from Chartered Accountant duly supported by Audited Balance Sheet. (Page- 14, Para 10.2, Part-I of GCC April 2022 and as per Advance Correction slip No 1 dt 14.07.2022) Standard Technical Criteria S.No. Description ConfirmationRemarks Documents Required Allowed Uploading Page 16 of 33 Run Date/Time: 09/01/2025 10:19:22
- Standard Financial Criteria S.No. Description ConfirmationRemarks Documents Required Allowed Uploading Financial Eligibility Criteria: The tenderer must have minimum average annual contractual turnover of[V/N or 'V' which ever is less; where V = Advertised value of the tender in crores of Rupees N= Number of years prescribed for completion of work for which bids have been invited. The average annual contractual turnover shall be calculated as an average of "total contractual payments'' in the previous three financial years, as per the audited balance sheet. However, in case balance sheet of the Allowed 1 previous year is yet to be prepared/ audited, the audited balance sheet No No (Mandatory) of the fourth previous year shall be considered for calculating average annual contractual turnover. The tenderers shall submit requisite information as per Annexure-VIB, along with copies of Audited Balance Sheets duly certified by the Chartered Accountant/ Certificate from Chartered Accountant duly supported by Audited Balance Sheet. (Page- 14, Para 10.2, Part-I of GCC April 2022 and as per Advance Correction slip No 1 dt 14.07.2022) Standard Technical Criteria S.No. Description ConfirmationRemarks Documents Required Allowed Uploading Page 16 of 33 Run Date/Time: 09/01/2025 10:19:22
- Financial Eligibility Criteria: The tenderer must have minimum average annual contractual turnover of[V/N or 'V' which ever is less; where V = Advertised value of the tender in crores of Rupees N= Number of years prescribed for completion of work for which bids have been invited. The average annual contractual turnover shall be calculated as an average of "total contractual payments'' in the previous three financial years, as per the audited balance sheet. However, in case balance sheet of the Allowed 1 previous year is yet to be prepared/ audited, the audited balance sheet No No (Mandatory) of the fourth previous year shall be considered for calculating average annual contractual turnover. The tenderers shall submit requisite information as per Annexure-VIB, along with copies of Audited Balance Sheets duly certified by the Chartered Accountant/ Certificate from Chartered Accountant duly supported by Audited Balance Sheet. (Page- 14, Para 10.2, Part-I of GCC April 2022 and as per Advance Correction slip No 1 dt 14.07.2022) Standard Technical Criteria S.No. Description ConfirmationRemarks Documents Required Allowed Uploading Page 16 of 33 Run Date/Time: 09/01/2025 10:19:22
- a) The tenderer must have successfully completed or substantially completed any one of the following categories of work(s) during last 07 (seven) years, ending last day of month previous to the one in which tender is invited: (i) Three similar works each costing not less than the amount equal to 30% of advertised value of the tender, or (ii) Two similar works each costing not less than the amount equal to 40% of advertised value of the tender, or (iii) One similar work costing not less than the amount equal to 60% of advertised value of the tender. (As per Page- 12,13, Para 10.1, Part-I of GCC April 2022) (a)(i)For Works without composite components The technical eligibility for the work as per para Allowed 1 No No
- composite components The technical eligibility for the work as per para Allowed 1 No No
- values shall also be considered for fulfillment of technical eligibility criteria for different components. (As per Page-13, Para 10.1.b(1) Part-I of GCC April 2022. (b-1) For works with composite components The technical eligibility for major component of work as per para 10.1 above, shall be satisfied by either the 'JV in its own name & style' or 'Lead member of the JV' and technical eligibility for other component(s) of
- technical eligibility for major component of work as per para 10.1 above, shall be satisfied by either the 'JV in its own name & style' or 'Lead member of the JV' and technical eligibility for other component(s) of
- member of the JV' and technical eligibility for other component(s) of
- technical eligibility criteria Note for Para I 7. I 5. I: a)a) The Major component of the work for this purpose shall be the component of work having highest value. In cases where value of two or more component of work is same, any one work can be classified as Major component of work. b) Value of a completed work done by a Member in an earlier JV shall be reckoned only to the extent of the concerned member's share in that JV for the purpose of satisfying his/her compliance to the above mentioned technical eligibility criteria in the tender under consideration. (Page-24, Para 17.15.1 Part-I of GCC April 2022 and As per Advance Correction slip No 1 dt 14.07.2022). (b)(2) In such cases, what constitutes a component in a composite work shall be clearly pre-defined with estimated tender cost of it, as part of
- mentioned technical eligibility criteria in the tender under consideration. (Page-24, Para 17.15.1 Part-I of GCC April 2022 and As per Advance Correction slip No 1 dt 14.07.2022). (b)(2) In such cases, what constitutes a component in a composite work shall be clearly pre-defined with estimated tender cost of it, as part of
- 1.2 No No Not Allowed the tender documents without any ambiguity. (As per Page-13, Para
- 10.1.b(2) Part-I of GCC April 2022) Page 17 of 33 Run Date/Time: 09/01/2025 10:19:22
- (b) (3) To evaluate the technical eligibility of tenderer, only components of work as stipulated in tender documents for evaluation of technical eligibility, shall be considered. The scope of work covered in other remaining components shall be either executed by tenderer himself if he has work experience as mentioned in clause 7 of the Standard General Conditions of Contractor through subcontractor fulfilling the requirements as per clause 7 of the Standard General Conditions of Contract or jointly i.e., partly himself and remaining through subcontractor, with prior approval of Chief Engineer in writing. However, if required in tender documents by way of Special Conditions, a formal agreement duly notarized, legally enforceable in the court of law, shall be executed by the main contractor with the subcontractor for the component(s) of work proposed to be executed by the subcontractor(s), and shall be submitted along with the offer for considering subletting of
- eligibility, shall be considered. The scope of work covered in other remaining components shall be either executed by tenderer himself if he has work experience as mentioned in clause 7 of the Standard General Conditions of Contractor through subcontractor fulfilling the requirements as per clause 7 of the Standard General Conditions of Contract or jointly i.e., partly himself and remaining through subcontractor, with prior approval of Chief Engineer in writing. However, if required in tender documents by way of Special Conditions, a formal agreement duly notarized, legally enforceable in the court of law, shall be executed by the main contractor with the subcontractor for the component(s) of work proposed to be executed by the subcontractor(s), and shall be submitted along with the offer for considering subletting of
- 1.3 No No Not Allowed that scope of work towards fulfilment of technical eligibility. Such subcontractor must fulfill technical eligibility criteria as follows: The subcontractor shall have successfully completed at least one work similar to work proposed for subcontract, costing not less than 35% value of work to be subletted, in last 5 years, ending last day of month previous to the one in which tender is invited through a works contract. Note: for subletting of work costing up to Rs 50 lakh, no previous work experience of subcontractor shall be asked for by the Railway. In case after award of contract or during execution of work it becomes necessary for contractor to change subcontractor, the same shall be done with subcontractor(s) fulfilling the requirements as per clause 7 of the Standard General Conditions of Contract, with prior approval of Chief Engineer in writing. (As per Page-13,14, Para 10.1.b(3) Part-I of GCC April 2022) If a bidder has successfully completed a work as subcontractor and the work experience certificate has been issued for such work to subcontractor by a Govt. Organization or public listed company as Allowed
- that scope of work towards fulfilment of technical eligibility. Such subcontractor must fulfill technical eligibility criteria as follows: The subcontractor shall have successfully completed at least one work similar to work proposed for subcontract, costing not less than 35% value of work to be subletted, in last 5 years, ending last day of month previous to the one in which tender is invited through a works contract. Note: for subletting of work costing up to Rs 50 lakh, no previous work experience of subcontractor shall be asked for by the Railway. In case after award of contract or during execution of work it becomes necessary for contractor to change subcontractor, the same shall be done with subcontractor(s) fulfilling the requirements as per clause 7 of the Standard General Conditions of Contract, with prior approval of Chief Engineer in writing. (As per Page-13,14, Para 10.1.b(3) Part-I of GCC April 2022) If a bidder has successfully completed a work as subcontractor and the work experience certificate has been issued for such work to subcontractor by a Govt. Organization or public listed company as Allowed
- subcontractor must fulfill technical eligibility criteria as follows: The subcontractor shall have successfully completed at least one work similar to work proposed for subcontract, costing not less than 35% value of work to be subletted, in last 5 years, ending last day of month previous to the one in which tender is invited through a works contract. Note: for subletting of work costing up to Rs 50 lakh, no previous work experience of subcontractor shall be asked for by the Railway. In case after award of contract or during execution of work it becomes necessary for contractor to change subcontractor, the same shall be done with subcontractor(s) fulfilling the requirements as per clause 7 of the Standard General Conditions of Contract, with prior approval of Chief Engineer in writing. (As per Page-13,14, Para 10.1.b(3) Part-I of GCC April 2022) If a bidder has successfully completed a work as subcontractor and the work experience certificate has been issued for such work to subcontractor by a Govt. Organization or public listed company as Allowed
- 1.3.1 No No defined in Note for Item 10.1 Part-1 of GCC, the same shall be considered (Mandatory) for the purpose of fulfillment of credentials. (As per Page-15, Para 10.5.5, Part-I of GCC April 2022) Page 18 of 33 Run Date/Time: 09/01/2025 10:19:22
- has been found eligible in other eligibility criteria/tender requirement. (As per Annexure-VI, Page-35, 36, Part-I of GCC April 2022) No Technical and Financial credentials are required for tenders having
- 1.4.1 value up to Rs 50 lakh. (As per Clause 10.4, Page-14, Part-I of GCC April No No Not Allowed 2022) Work experience certificate from private individual shall not be considered. However, in addition to work experience certificates issued by any Govt. Organisation, work experience certificate issued by Public listed company having average annual turnover of Rs 500 crore and above in last 3 financial years excluding the current financial year, listed on National Stock Exchange or Bombay Stock Exchange, incorporated/registered at least 5 years prior to the date of closing of tender, shall also be considered provided the work experience certificate Allowed
- 1.5 has been issued by a person authorized by the Public listed company to No No (Mandatory) issue such certificates. In case tenderer submits work experience certificate issued by public listed company, the tenderer shall also submit along with work experience certificate, the relevant copy of work order, bill of quantities, bill wise details of payment received duly certified by Chartered Accountant, TDS certificates for all payments received and copy of final/last bill paid by company in support of above work experience certificate. (As per Note for Item 10.1, Page-14 of GCC April 2022)
- 1.6 Defination of Similar Work :- Construction of RCC overhead tank. No No Not Allowed Bidders shall confirm and certify on the behalf of the tenderer including its constituents as under: Page 19 of 33 Run Date/Time: 09/01/2025 10:19:22
- 5. COMPLIANCE Commercial-Compliance S.No. Description Confirmation Remarks Documents Required Allowed Uploading Please enter the percentage of local content in the material being offered. Please enter 0 for fully imported items, and 100 Allowed 1 for fully indigenous items. The definition and calculation of localNo Yes (Optional) content shall be in accordance with the Make in India policy as incorporated in the tender conditions. Page 20 of 33 Run Date/Time: 09/01/2025 10:19:22
- 2.1 Bid Security as mentioned in tender documents, failing whichYes No Not Allowed the tender shall be summarily rejected. (As per Clause 6 (a) Annex -I, Page-11, Part-I of GCC April 2022) Note: ii) Any firm recognized by Department of Industrial Policy and Promotion (DIPP as 'Startups' shall be exempted from payment of Bid Security subject to submission of Registration Allowed
- 2.1.1 Certificate issued by appropriate authority. iii) Labour Yes Yes (Mandatory) Cooperative Societies shall submit only 50% of Bid Security shall also additionally submit Registration Certificate. (As per Clause 5 , Page-5, Part-I of GCC April 2022) General Instructions S.No. Description Confirmation Remarks Documents Required Allowed Uploading 1 Applicability of rules for this tender No No Not Allowed
- 1.1 1.GCC-April 2022 with upto date correction slips No No Not Allowed
- 1. USSOR and IR UNIFIED Standard Specification with upto date
- 1.2 correction slip. 2. DSR with upto date correction slip (As per No No Not Allowed Clause 1.1(k), Page-42, Part-I of GCC April 2022)
- 1.3 Relevant IS-CODE & RAILWAY CODES AND MANUALS. No No Not Allowed "If any dispute arises between the parties with respect to this agreement any application or suit shall be instituted only in the court with the local lines or whose jurisdiction, the Western
- 1.4 No No Not Allowed Railway's Divisional Headquarters office is Situated and both the parties shall be bound by this clause. (Headquarters letter no. CE-Circular No. 11/No. W/623/5/ARB/1 dt. 26.04.04) In these Special Conditions of Contract the following terms shall 2 have the meaning hereby assigned to them except where theNo No Not Allowed context otherwise requires. Page 21 of 33 Run Date/Time: 09/01/2025 10:19:22
- 2.1 of Rates) and CPWD Specifications-2019 (or latest) shall be usedNo No Not Allowed for all works related to Building Works, Road Works and Horticulture works and other Miscellaneous works with effect from 01.06.2021 with latest Correction Slips issued from time to time by CPWD. (b)Standard Specifications shall mean "Specifications for Materials and Works of the Railway as specified under the authority of the Ministry of Railways or Chief Engineer or as
- 2.2 No No Not Allowed amplified, added to or superseded by special specifications if any, appended to the Tender Forms. (GCC April 2022 Part I para I Instructions to tenderers (ITT)). (c) Standard Schedule of Rates (SSOR) shall mean the schedule of Rates adopted by the Railway which includes- 1."Unified Standard Schedule of Rates of the Railway (USSOR)" i.e. the Standard Schedule of Rates of the Railway issued under the authority of the Chief Engineer from time to time, updated with correction slips issued up to date of inviting tender or as otherwise specified in the tender documents; 2. "Delhi Schedule
- 2.3 No No Not Allowed Of Rates (DSR)" i.e. the Standard Schedule of Rates published by Director General / Central Public Works Department, Government of India, New Delhi, as adopted and modified by the Railway under the authority of the Chief Engineer from time to time, updated with correction slips issued up to date of inviting tender or as otherwise specified in the tender documents. (As per Clause 1.1(k), Page-42, Part-I of GCC April 2022) Where there is any conflict in conditions/Specifications 3 contained in various parts, order of precedence will be as givenNo No Not Allowed below i . Any foot note given by the Railway in the schedule of
- 3.1 No No Not Allowed quantities and rates.
- 3.2 ii. Description of item in the Schedule of Quantities and rates. No No Not Allowed
- 3.3 iii. Special Specifications. No No Not Allowed
- 3.4 iv Additional Special Conditions/of Contract. No No Not Allowed
- 3.5 v. Standard Specifications. No No Not Allowed vi. Special Conditions of Contract & General Conditions of
- 3.6 No No Not Allowed Contract April 2022 corrected upto date. 4 Signature on Receipts for Amount. No No Not Allowed Every receipt for money which may become payable or for any security which may become transferable to the Contractors under these presents, shall, if signed in the partnership name by anyone of the partners of a Contractor's firm be a good and sufficient discharge to the Railway in respect of the moneys or security purported to be acknowledged thereby and in the event of death of any of the Contractor, partners during the pendency of the contract, it is hereby expressly agreed that every receipt by anyone of the surviving Contractor partners shall if so signed
- 4.1 as aforesaid be good and sufficient discharge as aforesaid No No Not Allowed provided that nothing in this Clause contained shall be deemed to prejudice or effect any claim which the Railway may hereafter have against the legal representative of any Contractor partner so dying for or in respect to any breach of any of the conditions of the contract, provided also that nothing in this clause contained shall be deemed to prejudice or effect the respective rights or obligations of the Contractor partners and of the legal representatives of any deceased Contractor partners interse. (As per Clause 53, Part-II, Page-86 of GCC-April 2022) Modification to GCC for introduction of measurement and recording of "Executed works" by the contractor in Railway construction works. " Introduction of Item (As per clause 45 (i) 5 No No Not Allowed (a) (b), 45 (ii) (a) (b), 46.(1), 46.(2), 46.(3), 46.(4), 46A., 46 (A.1 to A.10), 51.(1), 51.(2), 51-A, Page- Part II of GCC-April 2022) Page-65 to 85 Page 22 of 33 Run Date/Time: 09/01/2025 10:19:22
- v)Proper laying of the cables vi) No temporary joints to be permitted. vii) Use of proper size plug / sockets. For the un- metered connections of less than 1200 watt, only item No. VI & 7 No No Not Allowed VII with the use of 3 core cable with earth wire only to be insisted as other items will not be applicable. Before connecting the assets to electrical power supply, SSE incharge must personally be satisfied that the firm's installation is safe against any fire hazards/electric shocks. (This is as per letter no. Sr.DEE(P) BCT's letter no. EL.197/13/9 (Tech circular) dt 25/07/2018). (As per Para 31.4 (a)(b), Page-58 Part-II of GCC April 2022) Page 23 of 33 Run Date/Time: 09/01/2025 10:19:22
- 14.07.2022) Provisions of Contract Labour (Regulation and Abolition) Act, 1970 (Shramik Kalyan) (As per Para 55-A.1 to 55-A.5, 55-B, 55-C 10 No No Not Allowed (i) (a to e), 55-C (ii) & 55-D, Page- 88-89, Part-II of GCC-April 2022) Special Conditions S.No. Description Confirmation Remarks Documents Required Allowed Uploading Page 25 of 33 Run Date/Time: 09/01/2025 10:19:22
- 2.1 No No Not Allowed following Qualified Engineers during execution of the allotted work: (a) One Qualified Graduate Engineer when cost of work to be executed is Rs. 200 lakh and above, and (b)One Qualified Diploma Holder Engineer when cost of work to be executed is more than Rs. 25 lakh, but less than Rs. 200 lakh. (As per Railway Board's letter No. 2012/CE-I/CT/O/20 dtd. 10.05.2013) (B) Further, in case the contractor fails to employ the Qualified Engineer, as aforesaid in Para (A) above, he, in terms of provisions of Clause 26A.2 of the General Conditions of Contract, shall be liable to pay an amount of Rs. 40,000 and Rs. 25,000 for
- 2.2 No No Not Allowed each month or part thereof for the default period for the provisions, as contained in Para A (a) and A (b) above respectively. (As per Railway Board's letter No. 2012/CE- I/CT/O/20 dtd. 10.05.2013) C) Provision for deployment of Qualified Engineers (Graduate Engineer or Diploma Holder Engineer) shall be for the values as prescribed above. However, for the works contract tenders, if it is considered appropriate by the tender inviting authority, not to
- C) Provision for deployment of Qualified Engineers (Graduate Engineer or Diploma Holder Engineer) shall be for the values as prescribed above. However, for the works contract tenders, if it is considered appropriate by the tender inviting authority, not to
- 2.3 have the services of qualified engineer, the same shall be soNo No Not Allowed mentioned in the tender documents by the concerned Executive with the approval of Officer not below the level of SAG Officer, for reasons to be recorded in writing. (As per Railway Board's letter No. 2012/CEI/CT/O/20 dtd. 10.05.2013) As per para 26A.3 No. of qualified Engineers required to be deployed by the Contractor for various activities contained in
- 2.4 the works contract shall be specified in the tender documents as No No Not Allowed 'Special Condition of Contract' by the tender inviting authority." (As per Clause 26A.3 Page-57, Part-II of GCC April 2022) RESTRICTIONS ON ARBITRATION CLAUSES (As per Clause 64.(1), 3 No No Not Allowed Page-97, 98, Part-II of GCC April 2022) Demand for Arbitration: (As per Clause 64.(1), Page-97, 98, Part-
- 3.1 No No Not Allowed II of GCC April 2022) Settlement of disputes-Indian railway Arbitration and
- 3.2 Conciliation Rules: (As per Clause No 63, Page-95-97, Part-II of No No Not Allowed GCC April 2022) These special conditions shall prevail over existing clauses 63
- 3.3 No No Not Allowed and 64 of General Conditions of Contract April 2022. GUIDELINE FOR THE MAINTENANCE PERIOD (As per Clause 47, 4 No No Not Allowed Page-82, Part-II of GCC April 2022) Page 26 of 33 Run Date/Time: 09/01/2025 10:19:22
- 4.1 arise in or be discovered or be in any way connected with the No No Not Allowed works, provided that such damage or defect is not directly caused by errors in the contract documents, act of providence or insurrection or civil riot, and the Contractor shall be liable for and shall pay and make good to the Railway or other persons legally entitled thereto whenever required by the Engineer so to do, all losses, damages, costs and expenses they or any of them may incur or be put or be liable to by reasons or in consequence of the operations of the Contractor or of his failure in any respect. (As per Clause No. 47, Page-82, Part-II of GCC April 2022) However, for a zonal work, the maintenance period shall be as a) Repair and maintenance work including white/color washing: three calendar months from date of completion. b) All new works
- 4.2 No No Not Allowed except earth work: Six calendar months from date of completion. (As per Annexure-III, Page-30,31, Part-I of GCC April 2022) To cover up monsoon period, the maintenance period will be extended in cases when required and contractor shall remain responsible for maintenance for this extended period also. The contractor shall make good and remedy at his own expense within such period as may be stipulated by the Engineer, any defect which may develop or may be before the expiry of this period and intimation of which has been sent to the contractor within seven days of the expiry of the said period by a letter,
- 4.3 No No Not Allowed sent by hand delivery or by registered post. In case the contractor fails to make adequate arrangements to rectify the defects within seven days of the receipt of such notices, the Engineer without further notice may make his own arrangement to rectify the defects and the cost of such rectification shall be recovered from the Security Deposit of the contractor or from any other money due to the contractor under this or any other contract. 5 SPECIAL CONDITION FOR TAX DEDUCTION No No Not Allowed (1) In respect of works, the contract value of which is more than Rs.5,000/- each, a deduction of 2% on the gross payment from each of the Contractor's bills shall be made in terms of section
- 5.1 No No Not Allowed 194(C) of the Income Tax Act of 1961 & 1991. (From time to time surcharge will also be deducted along with I. Tax as per extent rules. (2)The Railway will deduct GST if leviable in a particular state where the work is going on, the gross amount of each bill while
- 5.2 making payment to the contractor(s). The recovery shall beNo No Not Allowed governed as per the guide lines & rates prescribed by the concerned State Government. 3) Any Other taxes The Contractor shall bear in full all taxes and royalties levied by the State Government and/or Central Government from time to time. This would be entirely a matter
- 5.3 between the contractor and State Government/or CentralNo No Not Allowed Government. Railway will recover the taxes and royalties through final bills if the contractor fails to pay the taxes and royalties to the Government. DETAILS OF INSPECTION REGISTER AND RECORDS ARE TO 6 No No Not Allowed
- 6.1 No No Not Allowed reasonable times. Records of tests made shall be handed over to the Engineer/s representative after carrying out the tests. The following registers will be maintained at site by the Contractor/s. 2) Site Order Register: The Contractor/s shall promptly acknowledge by putting his signature in the site order against any order given therein by the Engineer or his representative or
- 6.2 No No Not Allowed his superior officers and comply with them. The Compliance shall be reported by the Contractor/s to the Engineer in good time so that it can be checked. 3) Labour Register: This register will be maintained to show daily
- 6.3 strength of labour in different categories employed by the No No Not Allowed Contractor/s. 4) LOG book of events: All events are required to be
- 6.4 No No Not Allowed chronologically logged in this book shift wise and date wise. 5)Cement & steel registers shall be maintained by the
- 6.5 No No Not Allowed contractor. APPLICABILITY OF PRICE VARIATION CLAUSE AS PER GCC 7 No No Not Allowed
- 14.07.2022).@10'46@(1) Rates for Extra Item(s) of Works: (a) Standard Schedule of Rates (SSOR) Items: Any item of work carried out by the Contractor on the instructions of the Engineer which is not included in the accepted Bil(s) of Quantities but figures in the Standard Schedule of Rates (SSOR), shall be executed at the rates set forth in the "Standard Schedule of Rates (SSOR)" modified by the tender percentage as accepted in the contract for that chapter of Standard Schedule of Rates (SSOR). For item(s) not covered in this sub clause, the rate shall
- 7.2 (As per Railway Boards letter No. 2013/CE/I/CT/O/10-PVC-Pt.I No No Not Allowed dtd.27.01.2015) b. As per PCE/CCG letter No.W118/0 Vol. II (W6) date 14.06.2019. The security deposit against the contract shall be released only after the contractor has submitted the final PVC bill wherever applicable. 8 DISASTER MANAGMENT No No Not Allowed "All the available vehicles and equipment's of the contractor can be drafted by the Railway Administration in case of accidents/natural calamities involving human lives. The payment for such drafting shall be made according to the rates as shall be fixed by the Engineer. However, if the contractor is not satisfied
- 8.1 with the decision of the Engineer in this respect he may appeal No No Not Allowed to the Chief Engineer within 30 Days of getting the decision of the Engineer, supported by analysis of the rates claimed. The Chief Engineer's decision after hearing both the parties in the matter would be final and binding on the contractor and the Railway". 9 EMERGENCY WORK No No Not Allowed In the event of any accident or failure occurring in the execution of work/ arising out of it which in the opinion of the Engineer requires immediate attention, the Railway may bring its own workmen or other agency/agencies to execute or partly execute
- 9.1 No No Not Allowed the necessary work or carry out repairs if the Engineer-in-charge considers that the contractor(s) is/are not in a position to do so in time without giving any notice and charge the cost thereof, to be determined by the Engineer-in-charge, to the contractor. 10 DAMAGE BY ACCIDENT/ FLOOD/ TIDES OR NATURAL CALAMITIES No No Not Allowed The Contractor shall take all precautions against damages from accidents, floods tides or other natural occurrences. He shall not be entitled to any compensation for his tools, plants, materials, machines and other equipment lost or damaged by any cause whatsoever. The Contractor shall be liable to make good the damage to any structure or part of a structure, plant or material
- DIPP Relaxation of eligibility criteria for work tenders for 'Startups' (recognized by Department of Industrial Policy and Promotion, Ministry of Commerce and Industry) has been examined in Board's office and Board (ME & FC) have approved as under- "The technical and financial eligibility criteria mentioned in GCC shall normally apply to all firms including 'Startup' firms (recognized by Department of Industrial Policy 18 and Promotion, Ministry of Commerce and Industry). However, No No Not Allowed before inviting tender, General Manager, on the recommendation of PHOD/CHOD of the department inviting the tender and associate finance, can relax the applicability of eligibility criteria to 'Startup' firms (recognized by Department of Industrial Policy and Promotion, Ministry of Commerce and Industry). on case-to-case basis. (As per Railway Board Letter No. 2012/CE-I/CT/O/5, dated. 24-04-2019.) 19 Special Condition of Drawings No No Not Allowed
- as under- "The technical and financial eligibility criteria mentioned in GCC shall normally apply to all firms including 'Startup' firms (recognized by Department of Industrial Policy 18 and Promotion, Ministry of Commerce and Industry). However, No No Not Allowed before inviting tender, General Manager, on the recommendation of PHOD/CHOD of the department inviting the tender and associate finance, can relax the applicability of eligibility criteria to 'Startup' firms (recognized by Department of Industrial Policy and Promotion, Ministry of Commerce and Industry). on case-to-case basis. (As per Railway Board Letter No. 2012/CE-I/CT/O/5, dated. 24-04-2019.) 19 Special Condition of Drawings No No Not Allowed
- eligibility criteria to 'Startup' firms (recognized by Department of Industrial Policy and Promotion, Ministry of Commerce and Industry). on case-to-case basis. (As per Railway Board Letter No. 2012/CE-I/CT/O/5, dated. 24-04-2019.) 19 Special Condition of Drawings No No Not Allowed
- 1. Contractor will provide architectural drawings with elevations etc. complete, structural design details/drawings, design calculations, GAD (i.e. foundation, beams, lintel, column, roof etc. based on railway approved GAD and will get it approved from railway. Contractor will also submit 3 copies with soft copy (2 nos.CD/pen drive) to railway for the same. 2. Design shall be followed for respective zones and should be resistant from various disasters occurs in respective zones. Design shall be proof checked by IIT / NIT/ Govt engineering college which will be further got approved from railway before actual commencement of work. 3. Contractor has to submit completion plan in 75 Micron double mate GARWARE or similar tracing film
- 2.1 Services Within One Year of their Retirement. (As per clause No Yes No Not Allowed
- 59.(9), Page-91, Part-II of GCC April 2022) Certificate of NO Relative being an employee of Western Railway Allowed
- 2.2 No No as per attached Performa. (Mandatory) Joint Venture (JV) in works tenders (As per Clause No. 17.1 to Allowed 3 No No
- 17.15.3, Page-20 to 25, Part-1 of GCC April 2022) (Mandatory) Participation of Partnership Firm in works tenders. (As per Clause Allowed 4 No No No. 18.1 to 18.11, Page-25 to 27, Part-1 of GCC April 2022) (Mandatory) Page 30 of 33 Run Date/Time: 09/01/2025 10:19:22
- 6. Documents attached with tender S.No. Document Name Document Description 1 SPLCONDTECHOHTANKDRD.pdf SPECIAL CONDITION TECHINICAL ACS-2toGCC-2022_2022-CE-1-CT-GCC-2022- 2 GCC correction slip No. 2 dtd.13.12.2022 POLICY_13.12.2022_1.pdf 2022-CE-I-CT- Clarification regarding submission of 3 GCCCorrespondencedated.14.05.2024.pdf Annexure-V


## RISK ANALYSIS
• Timeline: 18 Months
Contract Type Works → check feasibility


## FLAGS
<span style='color:red; font-weight:bold'>JV NOT ALLOWED</span>
<span style='color:red; font-weight:bold'>Single Packet System</span>
<span style='color:red; font-weight:bold'>Earnest Money: ₹360,700</span>


## PARSING LOGS & CONFIDENCE
**Confidence scores:**
- text: 98%
- raw_text: 88%
- nit_header: 96%
- schedules_summary: 92%
- item_breakups: 90%
- grouping: 92%
- eligibility: 90%
- top10: 20%
- risk: 70%
- flags: 95%

**Progress logs (highlights):**
- Table parse error on page 23: 'DataFrame' object has no attribute 'str'
- Table parse error on page 26: 'DataFrame' object has no attribute 'str'
- Table parse error on page 27: 'DataFrame' object has no attribute 'str'
- Table parse error on page 29: 'DataFrame' object has no attribute 'str'
- Table parse error on page 30: 'DataFrame' object has no attribute 'str'
- Parsed table on page 30 with pdfplumber -> schedule Item- 18 Schedule A18 (23.0 Rain water harvesting)
- Table parse error on page 32: 'DataFrame' object has no attribute 'str'
- Parsed table on page 32 with pdfplumber -> schedule Item- 18 Schedule A18 (23.0 Rain water harvesting)
- Extracted item breakups for 11 schedules.
- Grouped Schedule A entries by parent class using regex.
- Extracted 74 eligibility bullets from pages 15-35.
- Top10 computation error: Reindexing only valid with uniquely valued Index objects

Saved top10.csv
Saved full_breakup.json
Saved parsed_data.pkl

Export files available for download:
- top10.csv
- full_breakup.json
- parsed_data.pkl

Attempting to trigger downloads (browser will prompt)...


  ser[k] = df.fillna('').to_dict(orient='records')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


--- PARSING SUMMARY ---
File: NIT-BCT-24-25-257 (1).pdf
Pages processed: 33
Confidence scores:
 - text: 98%
 - raw_text: 88%
 - nit_header: 96%
 - schedules_summary: 92%
 - item_breakups: 90%
 - grouping: 92%
 - eligibility: 90%
 - top10: 20%
 - risk: 70%
 - flags: 95%

Top logs:
 - Table parse error on page 27: 'DataFrame' object has no attribute 'str'
 - Table parse error on page 29: 'DataFrame' object has no attribute 'str'
 - Table parse error on page 30: 'DataFrame' object has no attribute 'str'
 - Parsed table on page 30 with pdfplumber -> schedule Item- 18 Schedule A18 (23.0 Rain water harvesting)
 - Table parse error on page 32: 'DataFrame' object has no attribute 'str'
 - Parsed table on page 32 with pdfplumber -> schedule Item- 18 Schedule A18 (23.0 Rain water harvesting)
 - Extracted item breakups for 11 schedules.
 - Grouped Schedule A entries by parent class using regex.
 - Extracted 74 eligibility bullets from pages 15-35.
 - Top10 computation error: Reindexing only vali