## Building Document Q & A With Ollama and Docling

1. Setting up Environment
2. Importing Packages
3. Mapping Document Format
4. Convert PDF to Markdown
5. Set up Q and A
6. Invoke Q and A

### Setting up Environment


*!ollama install granite3.1-moe:latest* \
*!ollama install nomic-embed-text* \
*!pip install -q "langchain>=0.1.0" "langchain-community>=0.0.13" "langchain-core>=0.1.17" "langchain-ollama>=0.0.1" "pdfminer.six>=20221105" "markdown>=3.5.2" "docling>=2.0.0" "beautifulsoup4>=4.12.0" "unstructured>=0.12.0" "chromadb>=0.4.22" "faiss-cpu>=1.7.4"*


### Importing Packages

In [19]:
# Required imports

import os

import tempfile

import shutil

from pathlib import Path

from IPython.display import Markdown, display



# Docling imports

from docling.datamodel.base_models import InputFormat

from docling.datamodel.pipeline_options import PdfPipelineOptions, TesseractCliOcrOptions

from docling.document_converter import DocumentConverter, PdfFormatOption, WordFormatOption, SimplePipeline



# LangChain imports

from langchain_community.document_loaders import UnstructuredMarkdownLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain_ollama import OllamaEmbeddings, OllamaLLM

from langchain_community.vectorstores import FAISS

from langchain.chains import ConversationalRetrievalChain

from langchain.memory import ConversationBufferMemory

### Understanding Document Format

In [20]:
def get_document_format(file_path) -> InputFormat:

    """Determine the document format based on file extension"""

    try:

        file_path = str(file_path)

        extension = os.path.splitext(file_path)[1].lower()

        format_map = {

            '.pdf': InputFormat.PDF,

            '.docx': InputFormat.DOCX,

            '.doc': InputFormat.DOCX,

            '.pptx': InputFormat.PPTX,

            '.html': InputFormat.HTML,

            '.htm': InputFormat.HTML

        }

        return format_map.get(extension, None)

    except:

        return "Error in get_document_format: {str(e)}"

### Converting PDF into Markdown

In [21]:
def convert_document_to_markdown(doc_path) -> str:

    """Convert document to markdown using simplified pipeline"""

    try:

        # Convert to absolute path string

        input_path = os.path.abspath(str(doc_path))

        print(f"Converting document: {doc_path}")



        # Create temporary directory for processing

        with tempfile.TemporaryDirectory() as temp_dir:

            # Copy input file to temp directory

            temp_input = os.path.join(temp_dir, os.path.basename(input_path))

            shutil.copy2(input_path, temp_input)



            # Configure pipeline options

            pipeline_options = PdfPipelineOptions()

            pipeline_options.do_ocr = False # Disable OCR temporarily

            pipeline_options.do_table_structure = True



            # Create converter with minimal options

            converter = DocumentConverter(

                allowed_formats=[

                    InputFormat.PDF,

                    InputFormat.DOCX,

                    InputFormat.HTML,

                    InputFormat.PPTX,

                ],

                format_options={

                    InputFormat.PDF: PdfFormatOption(

                        pipeline_options=pipeline_options,

                    ),

                    InputFormat.DOCX: WordFormatOption(

                        pipeline_cls=SimplePipeline

                    )

                }

            )



            # Convert document

            print("Starting conversion...")

            conv_result = converter.convert(temp_input)



            if not conv_result or not conv_result.document:

                raise ValueError(f"Failed to convert document: {doc_path}")



            # Export to markdown

            print("Exporting to markdown...")

            md = conv_result.document.export_to_markdown()



            # Create output path

            output_dir = os.path.dirname(input_path)

            base_name = os.path.splitext(os.path.basename(input_path))[0]

            md_path = os.path.join(output_dir, f"{base_name}_converted.md")



            # Write markdown file

            print(f"Writing markdown to: {base_name}_converted.md")

            with open(md_path, "w", encoding="utf-8") as fp:

                fp.write(md)



            return md_path

    except:

        return f"Error converting document: {doc_path}"

### Setting Q & A Chain

In [22]:
def setup_qa_chain(markdown_path: Path, embeddings_model_name:str = "nomic-embed-text:latest", model_name: str = "granite3.1-moe:latest"):

    """Set up the QA chain for document processing"""

    # Load and split the document

    loader = UnstructuredMarkdownLoader(str(markdown_path))

    documents = loader.load()



    text_splitter = RecursiveCharacterTextSplitter(

        chunk_size=1000,

        chunk_overlap=200,

        length_function=len,

    )

    texts = text_splitter.split_documents(documents)



    # Create embeddings and vector store

    embeddings = OllamaEmbeddings(model=embeddings_model_name)

    vectorstore = FAISS.from_documents(texts, embeddings)



    # Initialize LLM

    llm = OllamaLLM(

        model=model_name,

        temperature=0

    )



    # Set up conversation memory

    memory = ConversationBufferMemory(

        memory_key="chat_history",

        output_key="answer",

        return_messages=True

    )



    # Create the chain

    qa_chain = ConversationalRetrievalChain.from_llm(

        llm=llm,

        retriever=vectorstore.as_retriever(             search_kwargs={"k": 10}             ),         memory=memory,

        return_source_documents=True

    )



    return qa_chain

### Function to Invoke QA Chain

In [23]:
def ask_question(qa_chain, question: str):

    """Ask a question and display the answer"""

    result = qa_chain.invoke({"question": question})

    display(Markdown(f"**Question:** {question}\n\n**Answer:** {result['answer']}"))

### Testing Sample One

In [24]:
# Process a document

doc_path = Path("/home/mayur/Downloads/System Design Playbook.pdf") # Replace with your document path

# Check format and process

doc_format = get_document_format(doc_path)

if doc_format:

    md_path = convert_document_to_markdown(doc_path)

    qa_chain = setup_qa_chain(md_path)



    # Example questions

    questions = [

        "What is the main topic of this document?",

        "What are the key points discussed?",

        "Can you summarize the conclusions?"

    ]



    for question in questions:

        ask_question(qa_chain, question)

else:

    print(f"Unsupported document format: {doc_path.suffix}")

2025-10-11 15:10:20,668 - INFO - detected formats: [<InputFormat.PDF: 'pdf'>]
2025-10-11 15:10:20,674 - INFO - Going to convert document batch...
2025-10-11 15:10:20,675 - INFO - Initializing pipeline for StandardPdfPipeline with options hash 60c8066c482b9239b869b997da3fb1da
2025-10-11 15:10:20,676 - INFO - Accelerator device: 'cuda:0'


Converting document: /home/mayur/Downloads/System Design Playbook.pdf
Starting conversion...


2025-10-11 15:10:22,390 - INFO - Accelerator device: 'cuda:0'
2025-10-11 15:10:22,759 - INFO - Processing document System Design Playbook.pdf
2025-10-11 15:10:25,777 - INFO - Finished converting document System Design Playbook.pdf in 5.11 sec.


Exporting to markdown...
Writing markdown to: System Design Playbook_converted.md


2025-10-11 15:10:27,173 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
2025-10-11 15:10:27,283 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
2025-10-11 15:10:29,753 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


**Question:** What is the main topic of this document?

**Answer:** The main topic of this document is system design, focusing on various aspects such as secure password storage, microfrontends, Amazon S3 features, JWT usage, and payment systems like Apple Pay. It covers topics including how databases store passwords securely, the workings of Figma in scaling Postgres to 4 million users, DNS functionality, JWT implementation, microfrontends architecture, Amazon S3's durability mechanisms, Stripe's prevention of double payments, and Apple Pay's credit card handling process without storing sensitive details on devices.

2025-10-11 15:10:30,990 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
2025-10-11 15:10:31,135 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
2025-10-11 15:10:32,016 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


**Question:** What are the key points discussed?

**Answer:** This document primarily discusses several key aspects of system design, including:

1. **Password Storage and Security**: The use of hash functions for password storage and the importance of HTTPS for secure transmission (How Database Stores Passwords Securely).

2. **Micro Frontends Architecture**: Explanation of how microfrontends extend the microservices concept to frontend development, enabling independent teams to own features from backend to frontend (How Micro Frontends Work).

3. **Amazon S3 Durability and Data Replication**: Details about Amazon S3's architecture for achieving 99.999999999% durability through separate storage of metadata and file content, erasure coding, and other measures (How Amazon S3 Achieves 99.999999999% Durability).

4. **JWT Authentication**: The process of creating and verifying JWTs for secure communication between parties (How JWT Works).

5. **Database Design and Security**: Discussion on how databases store only the hashed password, not the actual password, and the use of salt to prevent rainbow table attacks (Database Stores Passwords Securely).

6. **Microservices and Domain-Driven Design**: Explanation of how microservices architecture is applied in frontend development, along with the benefits of domain-driven design principles (How Micro Frontends Work).

7. **Amazon S3 Features**: Description of Amazon S3's features like versioning, backups, and high availability for scalability and data integrity (How Amazon S3 Works).

8. **Uber's ETA Calculation**: The method used by Uber to compute estimated times of arrival using a graph-based approach (How Uber Computes ETA).

9. **DNS and Caching**: Explanation of the DNS workflow, including cache mechanisms at various levels (How DNS Works).

10. **JWT Security Measures**: Discussion on additional security measures implemented in JWTs to prevent unauthorized access or data manipulation (How JWT Works).

11. **Apple Pay and Secure Element**: Description of how Apple Pay securely handles credit card details by storing the unique number, DAN, in a secure element chip (Apple Pay Works).

12. **Figma's Scalability Strategies**: Overview of Figma's strategies for scaling Postgres to support 4 million users, including vertical scaling, database replicas, caching layers, and connection pooling (How Figma Scaled).

2025-10-11 15:10:37,747 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
2025-10-11 15:10:37,920 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
2025-10-11 15:10:38,820 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


**Question:** Can you summarize the conclusions?

**Answer:** This document provides insights into various aspects of system design, focusing on secure password storage, microservices architecture (Micro Frontends), Amazon S3 durability and performance, JWT authentication, DNS, database optimization, and more. Here are the main takeaways:

1. **Secure Password Storage**: Use a one-way hash function to transform passwords into fingerprints for storage in databases. Implement HTTPS for secure transmission of JWTs, set minimum roles and expiry times to minimize damage from stolen credentials, and maintain a denial list on the server to reject suspicious tokens.

2. **Micro Frontends**: Extend microservices concept to frontend development by slicing site's frontend into self-contained, domain-driven microapps. Utilize technology agnostic frameworks like React or Angular, set boundaries based on business value, and support autonomous teams with shared libraries for code reuse and visual consistency.

3. **Amazon S3 Durability**: Store metadata separately from file content to achieve 99.999999999% durability using erasure coding, replication, and free space management. Regularly monitor disk failure rates and maintain physical isolation of data centers for high recovery throughput.

4. **JWT Authentication**: Represent the physical map as a graph and compute ETA by finding shortest paths in directed weighted graphs. Use cryptographic methods to validate location information sent via Bluetooth or Ultra Wideband, and handle DAN (Device Account Number) securely on the iPhone.

5. **DNS and JWT**: DNS maps pre-computed fingerprints to passwords, while JWT uses a signature for authentication. Ensure HTTPS for secure transmission of JWTs and set minimum roles and expiry times to protect against stolen credentials.

6. **Database Optimization**: Store only hashed password fingerprints in the database; regenerate and compare fingerprints when users enter new passwords. Add salt to passwords before hashing to prevent rainbow table attacks, and store salts alongside their corresponding fingerprints for verification.

7. **Microservices Architecture (Micro Frontends)**: Extend microservices concept to frontend development by slicing site's frontend into self-contained, domain-driven microapps. Utilize technology agnostic frameworks like React or Angular, set boundaries based on business value, and support autonomous teams with shared libraries for code reuse and visual consistency.

8. **Amazon S3**: Store unstructured data like log files in an object store using a distributed architecture to achieve high scalability and availability. Implement versioning and backups to reduce user mistakes, deploy changes only through durability reviews, and monitor disk failure rates for efficient repair services.

9. **Uber's Finding Nearby Drivers**: Use a hexagonal-shaped hierarchical geospatial index (H3) to efficiently find nearby drivers by indexing locations. Divide Earth's surface into cells on a flat grid and use bitwise operations to switch between data resolutions in constant time, supporting different data resolutions with minimal impact on performance.

10. **Apple Pay**: Don't store credit card details on the iPhone or Apple servers; instead, send it to the payment network. The payment network creates a unique number (DAN) to represent the credit card and iPhone. The iPhone stores DAN in a secure element for security, validating transactions by regenerating cryptograms using its DAN copy.

11. **Apple AirTag**: An AirTag contains a low-power CPU and memory. It communicates via Bluetooth Low Energy instead of GPS or WiFi; location information is sent to the payment network as a cryptogram and transaction details. The iPhone validates transactions by regenerating the cryptogram using its DAN copy before sending it to the card reader.

12. **Apple's Achieving 99.99999999% Cache Consistency**: Use a distributed cache for scalable reads, implement exponential back-off with jitter to avoid the thundering herd problem, and remove idempotency keys from the database after 24 hours for reuse.

13. **Uber's Finding Nearby Drivers**: Uber uses a consistent hash ring for container image indexing, reducing cold start latency by 90%. They also find shared data between container image layers for optimal data delivery and lazy load containers to reduce cold start latency further.

## Testing Sample Two

In [25]:
# Process a document

doc_path = Path("/home/mayur/Downloads/ai-pdf/ml-interview.pdf") # Replace with your document path

# Check format and process

doc_format = get_document_format(doc_path)

if doc_format:

    md_path = convert_document_to_markdown(doc_path)

    qa_chain = setup_qa_chain(md_path)



    # Example questions

    questions = [

        "what is the concept of temperature in LLM?"

    ]



    for question in questions:

        ask_question(qa_chain, question)

else:

    print(f"Unsupported document format: {doc_path.suffix}")

2025-10-11 15:12:51,492 - INFO - detected formats: [<InputFormat.PDF: 'pdf'>]
2025-10-11 15:12:51,513 - INFO - Going to convert document batch...
2025-10-11 15:12:51,515 - INFO - Initializing pipeline for StandardPdfPipeline with options hash 60c8066c482b9239b869b997da3fb1da
2025-10-11 15:12:51,515 - INFO - Accelerator device: 'cuda:0'


Converting document: /home/mayur/Downloads/ai-pdf/ml-interview.pdf
Starting conversion...


2025-10-11 15:12:53,561 - INFO - Accelerator device: 'cuda:0'
2025-10-11 15:12:53,949 - INFO - Processing document ml-interview.pdf
2025-10-11 15:13:00,334 - INFO - Finished converting document ml-interview.pdf in 8.84 sec.


Exporting to markdown...
Writing markdown to: ml-interview_converted.md


2025-10-11 15:13:00,984 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
2025-10-11 15:13:01,098 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
2025-10-11 15:13:01,752 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"


**Question:** what is the concept of temperature in LLM?

**Answer:** Temperature in Large Language Models (LLMs) refers to a hyperparameter that controls the randomness of text generation by adjusting the probability distribution over possible next tokens. A low temperature (close to 0) makes the model highly deterministic, favoring the most probable tokens. Conversely, a high temperature (above 1) encourages more diversity by flattening the distribution, allowing less probable tokens to be selected. For instance, a temperature of 0.7 strikes a balance between creativity and coherence, making it suitable for generating diverse but sensible outputs.

In summary, lower temperatures result in more predictable and consistent text generation, while higher temperatures introduce more variability and randomness into the model's output. This concept is crucial in LLM text generation as it allows users to control the level of creativity and diversity in the generated text.