Exercise 1 — Document A (FAQ)

Document Type: FAQ

FAQ documents are naturally structured as Q&A pairs.

Each pair forms a meaningful unit and should stay together for semantic clarity.


1. Chosen Strategy

Custom chunking: split by Q/A pairs

2. Reason

Splitting by characters or sentences would break the question from its answer → lowers usefulness.

Paragraph chunking is fine, but FAQs are better treated as logical units.

A custom splitter based on \n\n works perfectly because each Q&A pair is separated by a blank line.

In [4]:
# Implementation (following your style)
# Step A — Convert the document into a list of Q&A items
# Step B — Use a chunking function like your chunk_list() to group Q&A pairs if necessary.
# This ensures that each chunk contains complete Q&A pairs, preserving context and meaning.

# Document A: FAQ
document_A = """
Q: What is the return policy?
A: Items can be returned within 30 days of purchase with original receipt.

Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries worldwide. Shipping times vary by location.

Q: How do I track my order?
A: Use the tracking number sent to your email after shipment.
""".strip()


# 1. Strategy
strategy_A = "custom (FAQ pair chunking)"

# 2. Reason
reason_A = (
    "FAQs work best when each Q&A pair is kept as a single chunk. "
    "We first split the document into FAQ items, then we apply a chunking "
    "function if further grouping is needed."
)


# 3. Implement chunking function
def chunk_list(input_list, chunk_size):
    """Splits the input_list into chunks of size chunk_size."""
    if chunk_size <= 0:
        raise ValueError("chunk_size must be a positive integer")
    chunks = []
    for i in range(0, len(input_list), chunk_size):
        chunks.append(input_list[i:i + chunk_size])
    return chunks


# Convert the FAQ into list items (each Q/A pair is one element)
faq_items = [item.strip() for item in document_A.split("\n") if item.strip()]

# Now chunk the FAQ list into desired size (e.g., chunks of 1 Q&A pair)
chunks_A = chunk_list(faq_items, chunk_size=1)

strategy_A, reason_A, chunks_A

('custom (Q&A pair chunking)',
 'FAQs work best when each Q&A pair is kept as a single chunk. We first split the document into FAQ items, then we apply a chunking function if further grouping is needed.',
 [['Q: What is the return policy?'],
  ['A: Items can be returned within 30 days of purchase with original receipt.'],
  ['Q: Do you offer international shipping?'],
  ['A: Yes, we ship to over 50 countries worldwide. Shipping times vary by location.'],
  ['Q: How do I track my order?'],
  ['A: Use the tracking number sent to your email after shipment.']])