In [10]:
from utils import get_context, get_response

# **Document Classification using Inflection AI API**

This notebook demonstrates how to implement **document classification** using the **Inflection AI API**. The goal is to categorize documents as either **"Statement of Work" (SOW) or "Other"**, ensuring structured and accurate classification. By leveraging **instruction-based prompting**, this approach improves **reliability** and ensures **consistent categorization**. The **XML output** format enables seamless integration with automation pipelines. 

## **Overview**

- Utilizes **instruction-based prompting** to guide the classification process.
- Employs a **structured system prompt** for consistent and interpretable output.
- The model performs the following tasks:
  - Analyzes **document text** to determine if it falls under the **SOW category** or **Other**.
  - Parses **XML responses** to extract classification results for downstream processing.
  - Utilizes the **`inflection_3_productivity` model**, optimized for task-specific classification.

This approach ensures **efficient, structured, and scalable** document classification, making it ideal for automated workflows.


In [2]:
system_instruction_prompt = """
You are an AI assistant designed to categorize documents into one of two categories: "statement_of_work" or "other".

# Purpose
1. You will be given a document in text format.
2. Your job is to determine whether the document is a statement_of_work or falls into the Other category.

# Category Definitions
- statement_of_work: A formal document that defines the scope, deliverables, timelines, and conditions of a project or service agreement. Characteristics of statement_of_work typically include:
    - Detailed project scope and objectives
    - Key deliverables
    - Milestones and deadlines
    - Specific tasks to be completed
    - Payment terms
    - Responsibilities of the involved parties

- other: any document that does not align with the above structure and content of a statement_of_work.

# Valid Categories:
- statement_of_work
- other

# Output Format - XML
You will respond using XML tags for determined category. You don't need to provide explanation or any other information, just return the extracted parts within the appropriate XML tags.

# Format of the Output:
<parts>
    <category>determined_category</category>
</parts>
"""

## Test Scenario: Classification of Text in Documents

In [5]:
class color:
    BOLD = '\033[1m'
    END = '\033[0m'

In [6]:
async def test_categorize_document():
    print("Starting test: test_categorize_document")
    print("+*"*20)

    document_text = """
    Project Alpha

    Scope and Objectives:Develop a customer management platform for small businesses with user-friendly interfaces.

    Deliverables:

        - Functional web application.

        - Deployment documentation.

    Milestones:

        - Prototype by March 15, 2025.

        - Final delivery by May 30, 2025.

    Payment Terms:

        --50% upfront.

        - 50% on final delivery.

    Responsibilities:

        Client: Provide requirements and feedback.

        Contractor: Deliver on time and as specified.
    """
    context = get_context(system_instruction_prompt, document_text)
    result = await get_response(context, ["category"])
    assert result["category"] == "statement_of_work"
    print(f"{color.BOLD} Category: {color.END} {result["category"]}")
    print("+*"*20)

    document_text = """
    Technical Architecture Document: Drone System

    System Overview:The drone system features a GPS-enabled quadcopter with a high-definition camera for video streaming. The onboard microcontroller handles flight control and navigation, while a wireless module connects to the ground station for mission planning and live monitoring.

    Components and Integration:Advanced sensors, including LiDAR and ultrasonic, enable obstacle detection and collision avoidance. A cloud backend supports data storage and AI features like object recognition and autonomous flight, with scalability for additional modules.
    """
    context = get_context(system_instruction_prompt, document_text)
    result = await get_response(context, ["category"])
    assert result["category"] == "other"
    print(f"{color.BOLD} Category: {color.END} {result["category"]}")
    print("+*"*20)

    print("All tests passed successfully! 🙌")

# Run the test
await test_categorize_document()

Starting test: test_categorize_document
+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*


INFO:inference:Inflection AI API request took 2751.12 ms


[1m Category: [0m statement_of_work
+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*


INFO:inference:Inflection AI API request took 2330.71 ms


[1m Category: [0m other
+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*
All tests passed successfully! 🙌
