Challenge: Saptang Labs – Machine Learning Challenge
Author: Turing Machines
Date: October 2025
This document outlines the design and architecture of an Agentic Reasoning System (ARS) — an AI system capable of autonomously decomposing, planning, executing, and verifying solutions for logic-based reasoning tasks. Unlike monolithic large language models, ARS performs structured multi-step reasoning by integrating lightweight models, symbolic tools, and rule-based planning.
The system aims to:
- Decompose complex logic problems into solvable subproblems.
- Dynamically select the most appropriate solver or tool.
- Verify intermediate and final results.
The final deliverable is a modular, interpretable pipeline optimized for accuracy, transparency, and reproducibility.
| Objective | Description |
|---|---|
| Problem Decomposition | Identify subcomponents and logical relations within complex problems. |
| Tool Selection | Match subproblems with suitable solvers (symbolic, numeric, code-based). |
| Execution | Perform computations, symbolic manipulations, or simulations. |
| Verification | Check for consistency, dimensional correctness, or logical coherence. |
| Reasoning Trace Generation | Maintain a full record of all reasoning steps, justifications, and verification outcomes. |
To ensure innovation in system design rather than LLM dependency:
-
Prohibited: GPT-4, GPT-5, Claude-3, Gemini Ultra, and equivalent reasoning-heavy APIs.
-
Permitted:
- Small or base open models (e.g., Phi-3-mini, Mistral-7B, Llama-3-8B-Instruct)
- Symbolic tools: SymPy, Z3, PrologPy, MiniKanren
- Algorithmic and rule-based reasoning components
The system follows a four-phase architecture integrating planning, reasoning, and verification:
┌──────────────────────────┐
│ Input Interface │
│ (Natural Language Query) │
└────────────┬──────────────┘
▼
┌──────────────────────────┐
│ 1. Problem Decomposer │
│ - LLM hybrid(T5-small)
│ - Generates subproblems │
└────────────┬──────────────┘
▼
┌──────────────────────────┐
│ 2. Planner & Tool Mapper │
│ - Builds reasoning graph │
│ - Assigns solvers/tools │
└────────────┬──────────────┘
▼
┌──────────────────────────┐
│ 3. Executor & Verifier │
│ - Runs subtasks │
│ - Cross-verifies results │
└────────────┬──────────────┘
▼
┌──────────────────────────┐
│ 4. Reasoning Trace Gen. │
│ - Logs all steps │
│ - Produces final answer │
└──────────────────────────┘
Goal: Convert a raw natural language problem into atomic subproblems. Techniques:
- we are using T5-small and training it on GSM8K and is available at https://raw.githubusercontent.com/openai/grade-school-math/refs/heads/master/grade_school_math/data/train.jsonl
Output Example:
{"question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
"answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72"}Goal: Assign each subproblem to the most efficient solving mechanism and we will be using T5-small on the dataset and is available at https://math-qa.github.io/
| Subproblem Type | Tool/Method | Example |
|---|---|---|
| Arithmetic | Internal calculator | 200 / (60+40) |
| Algebraic | SymPy symbolic solver | Solve for x in 2x + 3 = 7 |
| Logical | Rule-based inference / Prolog | Deduce from premises |
| Algorithmic | Python code executor | Simulation or iteration tasks |
Output Example:
{
"plan": [
{"step": "relative_speed", "tool": "calculator"},
{"step": "meeting_time", "tool": "symbolic_solver"}
]
}Goal: Execute subtasks, verify results, and check intermediate consistency and we will be using the following datset for it https://huggingface.co/datasets/D3xter1922/proofwriter-dataset
Verification Strategies:
- Redundant evaluation: Use both numeric and symbolic solvers.
- Tolerance-based check:
abs(result_1 - result_2) < ε. - Dimensional analysis: Ensure unit consistency.
- Logic equivalence: Validate logical expressions via Z3 or truth tables.
Output Example:
{
"execution_results": {
"relative_speed": "100 km/h",
"meeting_time": "2 hours"
},
"verification": "passed"
}| Tool Name | Type | Library | Capability |
|---|---|---|---|
| SymPy | Symbolic | sympy |
Algebraic & calculus-based reasoning |
| Z3 Solver | Logical | z3-solver |
Logic and constraint satisfaction |
| Python Executor | Code | exec() sandbox |
Algorithmic subtask execution |
| NumPy/Math | Numeric | numpy, math |
Fast computation and array logic |
| MiniProlog | Rule-based | prologpy |
Deductive inference tasks |
- Input: Problem in text form.
- Decomposition: Identify structure and subproblems.
- Planning: Select sequence and tools.
- Execution: Perform calculations/symbolic solutions.
- Verification: Validate correctness and consistency.
- Reasoning Trace: Construct human-readable explanation.
- Output: Final answer + reasoning trace.
Input:
"A box contains 5 red, 3 blue, and 2 green balls. If one ball is drawn at random, what is the probability it is not green?"
Pipeline Trace:
-
Decompose:
- Identify total balls = 5 + 3 + 2 = 10.
- Identify favorable = not green → 8.
- Apply probability formula P = favorable / total.
-
Select Tools:
- Use symbolic/numeric calculator.
-
Execute:
- P = 8 / 10 = 0.8.
-
Verify:
- Alternate check via complementary probability: 1 – (2/10) = 0.8 → matches.
| Phase | Deliverable | Description |
|---|---|---|
| Phase 1 | Core architecture | Implement decomposer, planner, and executor modules. |
| Phase 2 | Tool integration | Add symbolic, logical, and numerical solvers. |
| Phase 3 | Verification logic | Implement redundancy checks and tolerance metrics. |
| Metric | Definition |
|---|---|
| Accuracy | Correct final answers on dataset |
| Verification Score | % of outputs validated successfully |
| Interpretability | Clarity of reasoning trace |
| Modularity | Ease of extension to new tool types |
| Reproducibility | Ease of running pipeline from scratch |
- Hybrid rule-based + symbolic + neural reasoning.
- Self-verifying computation via dual-tool crosschecks.
- Transparent reasoning graph instead of hidden chains.
- Designed for explainability and scientific rigor.
agentic_reasoning_system/
├── decomposer/
│ └── llm_based.py
├── Tool Mapper/
│ └── tool_selector.py
├── Verification & Logical Reasoning/
│ └── consistency_checker.py
├── datasets/
├── main.py
└── README.md
- Integration with graph-based reasoning memory (storing solved subpatterns).
- Adaptive tool learning: system updates tool-selection heuristics from past success rates.
- Expansion to multi-agent reasoning: planner + verifier + critic agents.
This Agentic Reasoning System bridges symbolic, algorithmic, and lightweight neural reasoning to produce reliable, interpretable, and verifiable solutions to logical problems. It emphasizes planning, transparency, and modularity — fulfilling the core objectives of the Saptang Labs Machine Learning Challenge while adhering to its restrictions on pre-trained reasoning-heavy LLMs.