#### **Propositional Chunking**

Proposition chunking = splitting text into tiny factual statements instead of random fixed-size blocks.

Previously chunks are formed on the basis of number of characters in chunk and overlap.

This can split a sentence randomly from mid or etc.. (overlap is useful)

But what if we can make each chunk factual and self-contained propositions.

#### **Benefits of Propositional Chunking**

- **Granularity** : By breaking the document into small factual propositions, the system allows for highly specific retrieval, making it easier to extract precise answers from large or complex documents.

- **Quality Assurance** : The use of a quality-checking LLM ensures that the generated propositions meet specific standards, improving the reliability of the retrieved information.

- **Flexibility in Retrieval** : The comparison between proposition-based and larger chunk-based retrieval allows for evaluating the trade-offs between granularity and broader context in search results.

#### **WorkFlow**

- Load document
- Apply Chunking 
- Use propositional Chunking (using LLM)
- Create VectorDB
- Pass the query and retrieved context to LLM
- Evaluate the responses for `propositional chunking` and `Normal Chunking`

In [1]:
## sample content of text 
sample_content = """Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should maintain their unique management style rather than adopting traditional corporate practices as their companies grow.
Conventional Wisdom vs. Founder Mode
The essay argues that the traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.
This approach, suitable for established companies, can be detrimental to startups where the founder's vision and direct involvement are crucial. "Founder Mode" is presented as an emerging paradigm that is not yet fully understood or documented, contrasting with the conventional "manager mode" often advised by business schools and professional managers.
Unique Founder Abilities
Founders possess unique insights and abilities that professional managers do not, primarily because they have a deep understanding of their company's vision and culture.
Graham suggests that founders should leverage these strengths rather than conform to traditional managerial practices. "Founder Mode" is an emerging paradigm that is not yet fully understood or documented, with Graham hoping that over time, it will become as well-understood as the traditional manager mode, allowing founders to maintain their unique approach even as their companies scale.
Challenges of Scaling Startups
As startups grow, there is a common belief that they must transition to a more structured managerial approach. However, many founders have found this transition problematic, as it often leads to a loss of the innovative and agile spirit that drove the startup's initial success.
Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes. He eventually found success by adopting a different approach, influenced by how Steve Jobs managed Apple.
Steve Jobs' Management Style
Steve Jobs' management approach at Apple served as inspiration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regardless of their position on the organizational chart
. This unconventional method allowed Jobs to maintain a startup-like environment even as Apple grew, fostering innovation and direct communication across hierarchical levels. Such practices emphasize the importance of founders staying deeply involved in their companies' operations, challenging the traditional notion of delegating responsibilities to professional managers as companies scale.
"""

#### LLM used 
ChatOllama (llama3.2)

In [2]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model='llama3.2',
    temperature=0,
    verbose=True
)

llm.invoke("Hey What are you doing right now?")

AIMessage(content='I\'m just a language model, I don\'t have personal experiences or emotions like humans do. However, I am currently:\n\n1. Processing your question and generating a response.\n2. Waiting for any additional input from you to continue our conversation.\n3. Running on computer servers, responding to queries from users like you.\n\nIn other words, I\'m always "on" and ready to help with any questions or topics you\'d like to discuss! How about you? What\'s new with you today?', additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-12-04T13:50:17.668565Z', 'done': True, 'done_reason': 'stop', 'total_duration': 27525745708, 'load_duration': 3804161916, 'prompt_eval_count': 33, 'prompt_eval_duration': 13287917291, 'eval_count': 101, 'eval_duration': 7353193249, 'logprobs': None, 'model_name': 'llama3.2', 'model_provider': 'ollama'}, id='lc_run--51091f05-9383-43b5-b951-bd54770d0b7f-0', usage_metadata={'input_tokens': 33, 'output_tokens': 101, 'tota

#### Embedding Model 
HuggingFace - Sentence Transformer

In [60]:
from langchain_huggingface import HuggingFaceEmbeddings 

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

text = "This is a test document."
query_result = embedding_model.embed_query(text)

# show only the first 100 characters of the stringified vector
print(f"Dimension of embeddings : {len(query_result)}")
print(str(query_result)[:100] + "...")

Dimension of embeddings : 384
[-0.0383385606110096, 0.1234646886587143, -0.02864295430481434, 0.05365273356437683, 0.0088453618809...


In [4]:
## Lets make our test to be of Document class and add some metadata in it
from langchain_core.documents import Document

doc_list = [Document(page_content=sample_content, metadata={"Title": "Paul Graham's Founder Mode Essay", "Source": "https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ"})]

print(doc_list)

[Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ'}, page_content='Paul Graham\'s essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should maintain their unique management style rather than adopting traditional corporate practices as their companies grow.\nConventional Wisdom vs. Founder Mode\nThe essay argues that the traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.\nThis approach, suitable for established companies, can be detrimental to startups where the founder\'s vision and direct involvement are crucial. "Founder Mode" is presented as an emerging paradigm that is not yet fully understood or documented, contrasting with the conventional "manager mode" often advised by business schools and professional managers.\nUniq

#### Make Chunks (RecursiveCharacterTextSplitter)

In [6]:
## text splitter from langchain
from pprint import pprint
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=200, chunk_overlap=20)

## chunks 
chunks = text_splitter.split_documents(doc_list)
pprint(f"Created Chunks are : {chunks}")

('Created Chunks are : [Document(metadata={\'Title\': "Paul Graham\'s Founder '
 'Mode Essay", \'Source\': '
 "'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ'}, "
 'page_content=\'Paul Graham\\\'s essay "Founder Mode," published in September '
 '2024, challenges conventional wisdom about scaling startups, arguing that '
 'founders should maintain their unique management style rather than adopting '
 'traditional corporate practices as their companies grow.\\nConventional '
 'Wisdom vs. Founder Mode\\nThe essay argues that the traditional advice given '
 'to growing companies—hiring good people and giving them autonomy—often fails '
 'when applied to startups.\\nThis approach, suitable for established '
 "companies, can be detrimental to startups where the founder\\'s vision and "
 'direct involvement are crucial. "Founder Mode" is presented as an emerging '
 'paradigm that is not yet fully understood or documented, contrasting with '
 'the convent

In [7]:
print(f"Total number of chunks : {len(chunks)}")

Total number of chunks : 3


#### Creating Propositional Chunks

In [41]:
# we'll use an LLM and instruct it to create relevant self-contained chunks 
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field 
from typing import Annotated, List

class PropositionalChunks(BaseModel):
    """ 
    Converts a normal chunk to list of self-contained and factual chunks
    """
    propositional_chunks: Annotated[List[str], Field(..., description="List of self-contained and factual chunks.")]

# we can use pydantic output parser
parser = PydanticOutputParser(pydantic_object=PropositionalChunks)

# now we'll write a system prompt that instruct our LLM to give propositional chunks
system_prompt_chunking = """
Please break down the user specified text : documents: <docs>{documents}</docs>
into simple, self-contained propositions.

Each proposition must:
1. State a single fact.
2. Be understandable without external context.
3. Use full names, not pronouns.
4. Include dates/qualifiers where relevant.
5. Contain exactly one subject–predicate relationship.

Return the final answer STRICTLY using the JSON schema shown below.
Do NOT add explanations, notes, or extra fields.

<format_instructions>
{format_instructions}
</format_instructions>
"""

prop_chunks_prompt = PromptTemplate(
    template=system_prompt_chunking,
    input_variables=['documents'],
    partial_variables={'format_instructions' : parser.get_format_instructions()}
)

## making chain for this process 
chunking_chain = prop_chunks_prompt | llm | parser


In [42]:
## lets try this propositional chuking chain 
prop_output_chunks = chunking_chain.invoke({'documents' : chunks[0].page_content})

print(prop_output_chunks)

propositional_chunks=["Paul Graham's essay 'Founder Mode' was published in September 2024.", 'The traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.', 'Founders possess unique insights and abilities that professional managers do not.']


In [48]:
prop_output_chunks.propositional_chunks

["Paul Graham's essay 'Founder Mode' was published in September 2024.",
 'The traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.',
 'Founders possess unique insights and abilities that professional managers do not.']

In [54]:
# now I need to add metadata to these propositional chunks that which actual chunk does they belongs to and also about some source n Title 
chunks_list = []

for i, chunk in enumerate(chunks):
    # first invoke the propositional 
    prop_chunks = chunking_chain.invoke({'documents' : chunk.page_content})
    for prop_chunk in prop_chunks.propositional_chunks:
        metadata = chunk.metadata
        metadata['chunk_id'] = i+1
        chunks_list.append(Document(page_content=prop_chunk, metadata=chunk.metadata))


In [55]:
chunks_list

[Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 1}, page_content="Paul Graham's essay 'Founder Mode' was published in September 2024."),
 Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 1}, page_content='The traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.'),
 Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 1}, page_content='Founders possess unique insights and abilities that professional managers do not.'),
 Document(metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-m

#### **Making a Quality Check system to check quality of our propositional chunks**

In [57]:
from pydantic import BaseModel, Field 
from typing import Annotated 
from langchain_core.prompts import PromptTemplate
# Define the data schema for LLM output
class QualityPropChunks(BaseModel):
    """ 
    Grade a propositional Chunk on the basis of its Accuracy, Clarity, Completeness, and Conciseness.
    """

    accuracy: Annotated[int, Field(description="Rate on 1 to 10 that how well proposition reflects original text.")]
    clarity: Annotated[int, Field(description="Rate on 1 to 10 that how easy it is to understand propositional chunk without additional context.")]
    completeness: Annotated[int, Field(description="Rate on 1 to 10 whether propositonal chunk is complete in itself.")]
    conciseness: Annotated[int, Field(description="Rate on 1 to 10 whether the proposition is concise without losing important information.")]

# configuring LLM with schema
llm_quality_check = llm.with_structured_output(QualityPropChunks)

# Quality check prompt 
# Prompt
evaluation_prompt_template = """
Please evaluate the following proposition based on the criteria below:
- **Accuracy**: Rate from 1-10 based on how well the proposition reflects the original text.
- **Clarity**: Rate from 1-10 based on how easy it is to understand the proposition without additional context.
- **Completeness**: Rate from 1-10 based on whether the proposition includes necessary details (e.g., dates, qualifiers).
- **Conciseness**: Rate from 1-10 based on whether the proposition is concise without losing important information.

Example:
Docs: In 1969, Neil Armstrong became the first person to walk on the Moon during the Apollo 11 mission.

Propositons_1: Neil Armstrong was an astronaut.
Evaluation_1: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_2: Neil Armstrong walked on the Moon in 1969.
Evaluation_3: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_3: Neil Armstrong was the first person to walk on the Moon.
Evaluation_3: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_4: Neil Armstrong walked on the Moon during the Apollo 11 mission.
Evaluation_4: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_5: The Apollo 11 mission occurred in 1969.
Evaluation_5: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Format:
Proposition: "{proposition}"
Original Text: "{original_text}"
"""

# making the prompt
quality_check_prompt = PromptTemplate(
    template=evaluation_prompt_template,
    input_variables=['propositon', 'original_text']
)

# chain to check quality
quality_check_chain = quality_check_prompt | llm_quality_check 

In [59]:
## making the validation loop 
for chunk in chunks_list:
    proposition = chunk 
    actual_chunk = chunks[proposition.metadata['chunk_id']-1]
    response = quality_check_chain.invoke({'proposition' : proposition, 'original_text' : actual_chunk})
    print(f"Prop Chunk : {proposition}")
    print(f"Scores : {response}")
    print("-"*89)

Prop Chunk : page_content='Paul Graham's essay 'Founder Mode' was published in September 2024.' metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 1}
Scores : accuracy=8 clarity=6 completeness=4 conciseness=2
-----------------------------------------------------------------------------------------
Prop Chunk : page_content='The traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.' metadata={'Title': "Paul Graham's Founder Mode Essay", 'Source': 'https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ', 'chunk_id': 1}
Scores : accuracy=8 clarity=6 completeness=4 conciseness=2
-----------------------------------------------------------------------------------------
Prop Chunk : page_content='Founders possess unique insights and abilities that professional managers do not.' 

#### **Create a vector_store and retriever**

In [61]:
import faiss 
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

index = faiss.IndexFlatL2(len(embedding_model.embed_query("hello world")))

vector_store = FAISS(
    embedding_function=embedding_model, 
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

vector_store.add_documents(documents=chunks_list)

['58bff5f1-ccef-4fbe-8005-22d207125c84',
 '79e70d68-29b0-4a85-9026-2e5fdc291c67',
 '28d1ff16-fd4a-47e5-a6a7-842e92a4505a',
 '9980dd0c-1d42-4ef4-a7f9-ecd530ccdee2',
 '7c0bc5dd-2975-4d16-b953-ab633fd7d655',
 '2f6add09-6149-4192-991c-b8fe5cadcdfa',
 '879bb74d-b4b3-4b7f-83ea-d4e2f1fbdccd',
 '3474f6f9-6e99-453c-b853-1a8c4a290db4',
 '1e73ea1c-767d-40a9-a781-4b126c3a7dd0',
 '08ef6e24-7565-4c05-bc91-0d91315e2059',
 '49ee3be0-7b22-451f-afc8-8c63fbeec23e',
 '2e882d0c-ca42-42d0-ac67-33e115e8ed52',
 '6e896e05-a3f7-479e-81cb-7154c5ae1016',
 'd61eb7eb-81ac-4c67-8a31-096ffea956bc',
 '189e5fb5-48be-4e9a-9718-6de648bc1fda']

In [63]:
## make a retriever 
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={'k': 4})

## lets retrieve some sample docs 
query = "Who's management approach served as inspiartion for Brian Chesky's \"Founder Mode\" at Airbnb?"

retrieved_docs = retriever.invoke(query)

for i, doc in enumerate(retrieved_docs):
    print(f"Content : {doc.page_content}")
    print("-"*89)

Content : Brian Chesky's "Founder Mode" at Airbnb was inspired by Steve Jobs' management approach.
-----------------------------------------------------------------------------------------
Content : Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes.
-----------------------------------------------------------------------------------------
Content : Founder Mode is an emerging paradigm that is not yet fully understood or documented, with Graham hoping that over time, it will become as well-understood as the traditional manager mode, allowing founders to maintain their unique approach even as their companies scale.
-----------------------------------------------------------------------------------------
Content : Founders possess unique insights and abilities that professional managers do not.
-----------------------------------------------------------------------------------------


#### **Making Final LLM call to response**

In [68]:
from langchain_core.prompts import PromptTemplate 

final_system_prompt = """ 
    Give the query : {query} and the context : {context}.
    You need to provide a concise response for this query using the context.
    Keep the response short
"""

system_prompt = PromptTemplate(
    template=final_system_prompt,
    input_variables=['query', 'context']
)

## making the final chain
final_chain = system_prompt | llm 

In [69]:
import time

if __name__ == "__main__":
    # ask the query 
    query = "Who's management approach served as inspiartion for Brian Chesky's \"Founder Mode\" at Airbnb?"
    print(f"Query : {query}")
    start = time.time()
    retrieved_chunks = retriever.invoke(query)
    mid = time.time()
    print("-"*89)
    print(f"Time to retrieve chunks : {mid-start :.2f} sec")

    context = ""
    for chunk in retrieved_chunks:
        context += chunk.page_content

    print(context) 

    response = final_chain.invoke({'query' : query, 'context' : context})
    end = time.time()
    print("-"*89)
    print(f"Time to response : {end-start :.2f} sec")
    print("-"*89)

    print(f"Response : {response.content}")

Query : Who's management approach served as inspiartion for Brian Chesky's "Founder Mode" at Airbnb?
-----------------------------------------------------------------------------------------
Time to retrieve chunks : 0.23 sec
Brian Chesky's "Founder Mode" at Airbnb was inspired by Steve Jobs' management approach.Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes.Founder Mode is an emerging paradigm that is not yet fully understood or documented, with Graham hoping that over time, it will become as well-understood as the traditional manager mode, allowing founders to maintain their unique approach even as their companies scale.Founders possess unique insights and abilities that professional managers do not.
-----------------------------------------------------------------------------------------
Time to response : 10.37 sec
------------------------------------------------------------