# RFP Question Detection Investigation

This notebook investigates why the RFP processing app fails to detect questions from the Sample_Request.pdf file.

## Section 1: Extract Text from Sample PDF

In [1]:
%pip install PyMuPDF -q

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import fitz  # PyMuPDF

pdf_path = r'c:\Users\webma\Source\Repos\RFPCreatorAgentic\docs\Sample_Request.pdf'
doc = fitz.open(pdf_path)

# Extract all text from all pages
extracted_text = ""
for page_num, page in enumerate(doc):
    extracted_text += f"\n--- Page {page_num + 1} ---\n"
    extracted_text += page.get_text()

print(f"Total pages: {len(doc)}")
print("=" * 50)
print("EXTRACTED TEXT:")
print("=" * 50)
print(extracted_text)

Total pages: 3
EXTRACTED TEXT:

--- Page 1 ---
Request for Proposal (RFP)
Catering & Event Services Questionnaire
Fictional Hotel

--- Page 2 ---
This Request for Proposal (RFP) document is intended to collect detailed information from Fictional
Hotel regarding its catering and event services. Please provide responses to the following questions to
assist us in evaluating your suitability for hosting our upcoming event.
G
What breakfast catering options are available (e.g., Continental, Full American, Healthy Start,
buffet)?
G
What lunch catering options are available (plated service, buffet, boxed lunches, salad stations,
sandwich bars)?
G
What dinner catering options are available, including themed menus?
G
What beverage services do you provide (full bar, beer/wine, coffee/tea, non-alcoholic options)?
G
What snack and break options do you offer during events?
G
Do you provide special menus (vegetarian, vegan, gluten-free, kosher, halal)?
G
What event service styles are available (plat

In [3]:
# Show first 4000 characters of extracted text
print("FIRST 4000 CHARACTERS OF PDF:")
print("=" * 50)
print(extracted_text[:4000])

FIRST 4000 CHARACTERS OF PDF:

--- Page 1 ---
Request for Proposal (RFP)
Catering & Event Services Questionnaire
Fictional Hotel

--- Page 2 ---
This Request for Proposal (RFP) document is intended to collect detailed information from Fictional
Hotel regarding its catering and event services. Please provide responses to the following questions to
assist us in evaluating your suitability for hosting our upcoming event.
G
What breakfast catering options are available (e.g., Continental, Full American, Healthy Start,
buffet)?
G
What lunch catering options are available (plated service, buffet, boxed lunches, salad stations,
sandwich bars)?
G
What dinner catering options are available, including themed menus?
G
What beverage services do you provide (full bar, beer/wine, coffee/tea, non-alcoholic options)?
G
What snack and break options do you offer during events?
G
Do you provide special menus (vegetarian, vegan, gluten-free, kosher, halal)?
G
What event service styles are available (plate

In [4]:
# Show just first 2000 characters
print(extracted_text[:2000])


--- Page 1 ---
Request for Proposal (RFP)
Catering & Event Services Questionnaire
Fictional Hotel

--- Page 2 ---
This Request for Proposal (RFP) document is intended to collect detailed information from Fictional
Hotel regarding its catering and event services. Please provide responses to the following questions to
assist us in evaluating your suitability for hosting our upcoming event.
G
What breakfast catering options are available (e.g., Continental, Full American, Healthy Start,
buffet)?
G
What lunch catering options are available (plated service, buffet, boxed lunches, salad stations,
sandwich bars)?
G
What dinner catering options are available, including themed menus?
G
What beverage services do you provide (full bar, beer/wine, coffee/tea, non-alcoholic options)?
G
What snack and break options do you offer during events?
G
Do you provide special menus (vegetarian, vegan, gluten-free, kosher, halal)?
G
What event service styles are available (plated, buffet, family-style, stati

In [5]:
# Save to file for easier analysis
with open("extracted_text.txt", "w", encoding="utf-8") as f:
    f.write(extracted_text)
print(f"Saved extracted text ({len(extracted_text)} chars) to extracted_text.txt")

Saved extracted text (2293 chars) to extracted_text.txt


## Section 2: Analyze the Detection Pattern from RfpProcessingService.cs

Let's implement the same `DetectQuestions` logic from the C# code in Python to see what matches.

In [6]:
import re

# These are the patterns from RfpProcessingService.cs DetectQuestions method
patterns = [
    r"^\d+[.)]\s+.+\?$",                    # Numbered questions: "1. What is your experience?"
    r"^[A-Z][^.!]*\?$",                     # Standard questions ending with ?
    r"(?:please|kindly)\s+(?:describe|explain|provide|list|detail|outline)",  # Imperative requests
]

question_starters = ["what", "how", "why", "when", "where", "who", "which", "can", "could", "would", "will", "do", "does", "is", "are", "describe", "explain", "provide"]

def split_into_sentences(text):
    """Same logic as C# SplitIntoSentences"""
    pattern = r"(?<=[.!?\n])\s+"
    sentences = re.split(pattern, text)
    return [s.strip() for s in sentences if s.strip()]

def is_question(sentence, patterns):
    """Same logic as C# IsQuestion"""
    # Check if ends with question mark
    if sentence.strip().endswith("?"):
        return True, "Ends with ?"
    
    # Check against regex patterns
    for pattern in patterns:
        if re.search(pattern, sentence, re.IGNORECASE | re.MULTILINE):
            return True, f"Matches pattern: {pattern}"
    
    # Check for question starters
    first_word = sentence.split()[0].lower() if sentence.split() else ""
    if first_word in question_starters:
        return True, f"Starts with question word: {first_word}"
    
    return False, "No match"

# Test with extracted text
sentences = split_into_sentences(extracted_text)
print(f"Total sentences after splitting: {len(sentences)}")
print("\n" + "=" * 60)
print("SENTENCE ANALYSIS:")
print("=" * 60)

for i, sentence in enumerate(sentences):
    is_q, reason = is_question(sentence, patterns)
    status = "✓ DETECTED" if is_q else "✗ NOT DETECTED"
    print(f"\n[{i+1}] {status}")
    print(f"    Sentence: {sentence[:100]}{'...' if len(sentence) > 100 else ''}")
    print(f"    Reason: {reason}")

Total sentences after splitting: 24

SENTENCE ANALYSIS:

[1] ✗ NOT DETECTED
    Sentence: --- Page 1 ---
Request for Proposal (RFP)
Catering & Event Services Questionnaire
Fictional Hotel
    Reason: No match

[2] ✗ NOT DETECTED
    Sentence: --- Page 2 ---
This Request for Proposal (RFP) document is intended to collect detailed information ...
    Reason: No match

[3] ✓ DETECTED
    Sentence: Please provide responses to the following questions to
assist us in evaluating your suitability for ...
    Reason: Matches pattern: (?:please|kindly)\s+(?:describe|explain|provide|list|detail|outline)

[4] ✓ DETECTED
    Sentence: G
What breakfast catering options are available (e.g., Continental, Full American, Healthy Start,
bu...
    Reason: Ends with ?

[5] ✓ DETECTED
    Sentence: G
What lunch catering options are available (plated service, buffet, boxed lunches, salad stations,
...
    Reason: Ends with ?

[6] ✓ DETECTED
    Sentence: G
What dinner catering options are available, includin

In [7]:
# Show only detected questions
detected = []
not_detected = []
for sentence in sentences:
    is_q, reason = is_question(sentence, patterns)
    if is_q:
        detected.append((sentence, reason))
    else:
        not_detected.append(sentence)

print(f"DETECTED: {len(detected)} questions")
print(f"NOT DETECTED: {len(not_detected)} sentences")
print("\n" + "=" * 60)
print("DETECTED QUESTIONS:")
print("=" * 60)
for q, reason in detected:
    print(f"\n• {q[:120]}...")
    print(f"  Reason: {reason}")

DETECTED: 20 questions
NOT DETECTED: 4 sentences

DETECTED QUESTIONS:

• Please provide responses to the following questions to
assist us in evaluating your suitability for hosting our upcoming...
  Reason: Matches pattern: (?:please|kindly)\s+(?:describe|explain|provide|list|detail|outline)

• G
What breakfast catering options are available (e.g., Continental, Full American, Healthy Start,
buffet)?...
  Reason: Ends with ?

• G
What lunch catering options are available (plated service, buffet, boxed lunches, salad stations,
sandwich bars)?...
  Reason: Ends with ?

• G
What dinner catering options are available, including themed menus?...
  Reason: Ends with ?

• G
What beverage services do you provide (full bar, beer/wine, coffee/tea, non-alcoholic options)?...
  Reason: Ends with ?

• G
What snack and break options do you offer during events?...
  Reason: Ends with ?

• G
Do you provide special menus (vegetarian, vegan, gluten-free, kosher, halal)?...
  Reason: Ends with ?

• G
What

In [8]:
# Check the actual sentences that are being analyzed
print(f"Detected: {len(detected)} | Not detected: {len(not_detected)}")
print("\nSentences that were NOT detected:")
for s in not_detected[:5]:
    print(f"  - '{s[:80]}...' (len={len(s)})")

Detected: 20 | Not detected: 4

Sentences that were NOT detected:
  - '--- Page 1 ---
Request for Proposal (RFP)
Catering & Event Services Questionnair...' (len=97)
  - '--- Page 2 ---
This Request for Proposal (RFP) document is intended to collect d...' (len=163)
  - '--- Page 3 ---
Thank you for your time and effort in completing this RFP questio...' (len=87)
  - 'Your responses will be
carefully reviewed to determine the alignment of your cat...' (len=133)


In [9]:
# Check raw bytes and characters around the "G" prefix (likely a bullet symbol)
print("Examining raw characters around question lines:")
lines = extracted_text.split('\n')
for i, line in enumerate(lines):
    if 'What' in line or line.startswith('G'):
        print(f"Line {i}: {repr(line[:60])}...")
        # Show first 5 chars as ordinals
        for j, c in enumerate(line[:5]):
            print(f"  char[{j}] = {repr(c)} (ord={ord(c)})")

Examining raw characters around question lines:
Line 10: 'G'...
  char[0] = 'G' (ord=71)
Line 11: 'What breakfast catering options are available (e.g., Contine'...
  char[0] = 'W' (ord=87)
  char[1] = 'h' (ord=104)
  char[2] = 'a' (ord=97)
  char[3] = 't' (ord=116)
  char[4] = ' ' (ord=32)
Line 13: 'G'...
  char[0] = 'G' (ord=71)
Line 14: 'What lunch catering options are available (plated service, b'...
  char[0] = 'W' (ord=87)
  char[1] = 'h' (ord=104)
  char[2] = 'a' (ord=97)
  char[3] = 't' (ord=116)
  char[4] = ' ' (ord=32)
Line 16: 'G'...
  char[0] = 'G' (ord=71)
Line 17: 'What dinner catering options are available, including themed'...
  char[0] = 'W' (ord=87)
  char[1] = 'h' (ord=104)
  char[2] = 'a' (ord=97)
  char[3] = 't' (ord=116)
  char[4] = ' ' (ord=32)
Line 18: 'G'...
  char[0] = 'G' (ord=71)
Line 19: 'What beverage services do you provide (full bar, beer/wine, '...
  char[0] = 'W' (ord=87)
  char[1] = 'h' (ord=104)
  char[2] = 'a' (ord=97)
  char[3] = 't' (ord=116)
  cha

In [10]:
# Find first line with "G" prefix and check its structure
lines = extracted_text.split('\n')
for line in lines:
    if line.strip().startswith('G') and 'What' in line:
        print(f"Found line: {repr(line[:50])}")
        for j, c in enumerate(line[:8]):
            print(f"  [{j}]: '{c}' = {ord(c)}")
        break

In [11]:
# Show all lines containing "What"
lines = extracted_text.split('\n')
for i, line in enumerate(lines):
    if 'What' in line:
        print(f"[{i}]: {repr(line)}")

[11]: 'What breakfast catering options are available (e.g., Continental, Full American, Healthy Start,'
[14]: 'What lunch catering options are available (plated service, buffet, boxed lunches, salad stations,'
[17]: 'What dinner catering options are available, including themed menus?'
[19]: 'What beverage services do you provide (full bar, beer/wine, coffee/tea, non-alcoholic options)?'
[21]: 'What snack and break options do you offer during events?'
[25]: "What event service styles are available (plated, buffet, family-style, stations, hors d'oeuvres)?"
[29]: 'What staffing support do you provide for events (catering manager, servers, bartenders,'
[32]: 'What is your pricing structure for breakfast, lunch, dinner, and beverage packages?'
[34]: 'What additional services are included (table linens, china, floral, A/V support, etc.)?'
[36]: 'What are the parking options and associated costs (self-parking, valet, bus)?'
[38]: 'What are your hotel policies related to catering events (menu 

In [12]:
# Analyze questions that are split across lines - show context
lines = extracted_text.split('\n')
print("Lines 10-14 (showing split question):")
for i in range(10, 15):
    print(f"[{i}]: {repr(lines[i])}")

print("\n\nNow show how SplitIntoSentences breaks it:")
sentences = split_into_sentences(extracted_text)
for i, s in enumerate(sentences):
    if 'breakfast' in s.lower():
        print(f"Sentence {i}: {repr(s[:100])}")

Lines 10-14 (showing split question):
[10]: 'G'
[11]: 'What breakfast catering options are available (e.g., Continental, Full American, Healthy Start,'
[12]: 'buffet)?'
[13]: 'G'
[14]: 'What lunch catering options are available (plated service, buffet, boxed lunches, salad stations,'


Now show how SplitIntoSentences breaks it:
Sentence 3: 'G\nWhat breakfast catering options are available (e.g., Continental, Full American, Healthy Start,\nbu'
Sentence 12: 'G\nWhat is your pricing structure for breakfast, lunch, dinner, and beverage packages?'


In [13]:
# THE ROOT CAUSE ANALYSIS
print("=" * 60)
print("ROOT CAUSE ANALYSIS")
print("=" * 60)

# Issue 1: "G" is the first word, not "What"
sample = "G\nWhat breakfast catering options are available?"
first_word = sample.split()[0].lower()
print(f"\n1. First word of sentence with 'G' prefix: '{first_word}'")
print(f"   Is '{first_word}' in question_starters? {first_word in question_starters}")

# Issue 2: Check sentence ending
print(f"\n2. Does sentence end with '?'? {sample.endswith('?')}")

# Issue 3: Fragmented questions
print("\n3. Questions that don't end with '?' due to line breaks:")
for i, line in enumerate(lines):
    if 'What' in line and not line.strip().endswith('?'):
        print(f"   Line {i}: '{line[:60]}...' [NO '?' AT END]")

# Count properly ended vs incomplete
complete = 0
incomplete = 0
for line in lines:
    if 'What' in line:
        if line.strip().endswith('?'):
            complete += 1
        else:
            incomplete += 1
print(f"\n4. Summary:")
print(f"   Questions with '?' at line end: {complete}")
print(f"   Questions WITHOUT '?' (fragmented): {incomplete}")

ROOT CAUSE ANALYSIS

1. First word of sentence with 'G' prefix: 'g'
   Is 'g' in question_starters? False

2. Does sentence end with '?'? True

3. Questions that don't end with '?' due to line breaks:
   Line 11: 'What breakfast catering options are available (e.g., Contine...' [NO '?' AT END]
   Line 14: 'What lunch catering options are available (plated service, b...' [NO '?' AT END]
   Line 29: 'What staffing support do you provide for events (catering ma...' [NO '?' AT END]
   Line 38: 'What are your hotel policies related to catering events (men...' [NO '?' AT END]

4. Summary:
   Questions with '?' at line end: 13
   Questions WITHOUT '?' (fragmented): 4
