<a href="https://colab.research.google.com/github/danikayoung16/MAT421/blob/main/Project_Plan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Plan: Developing AI Agents with LLMs

### Danika Young

## 1. Research Objective

The goal of this project is to design and implement an AI agent powered by a Large Language Model (LLM) that can solve, explain, and verify numerical and coding problems in a step-by-step and interactive manner. This agent will be built using Python and LLM APIs such as OpenAI or DeepSeek. The project’s significance lies in demonstrating how LLMs can be adapted for educational and practical tools that go beyond static question answering—towards dynamic, human-like reasoning agents.

This project also fulfills the requirements for an Honors Contract by extending the research into UI design and accessibility, making the tool user-friendly across platforms.

## 2. Literature Review

There has been a growing body of work focused on combining LLMs with agent frameworks, such as LangChain, AutoGPT, and BabyAGI. These agents integrate memory, tool use, and reasoning capabilities into a reusable workflow. Prior works like `SimpleMathSolverVerification.ipynb`, `mathSolverMultVerification.ipynb`, and `tradingAgent.ipynb` provide useful references on structuring LLMs for math, verification, and decision-based applications.

However, few implementations focus on end-user accessibility or provide a complete pipeline from problem-solving to interactive visualization. This project fills that gap by not only developing a reasoning-capable agent but also making it usable in a web or mobile format with visual outputs.

## 3. Hypothesis / Research Questions

This project is driven by the following research questions:

- Can a structured, prompt-based agent using LLMs reliably solve and verify complex math or code tasks?
- Will the accuracy and explainability improve when combined with external tools (e.g., SymPy for math verification)?
- How effective is DeepSeek compared to OpenAI in terms of performance and cost?
- Can user engagement and understanding improve with a visual interface that clearly shows step-by-step reasoning?

The underlying hypothesis is that an LLM agent with modular prompts, verification, and visual interface will outperform basic LLM chat applications in usability, accuracy, and interpretability.

## 4. Methodology

The project will be built in Python and follow a modular agent design:

1. **Task Classification**: Classify the user input as a math problem, code issue, or verification task.
2. **Prompt Generation**: Craft a structured prompt template dynamically, using few-shot examples.
3. **LLM Query**: Call OpenAI or DeepSeek API for reasoning and solution generation.
4. **Verification**: Use tools like SymPy or Python AST modules to validate the result.
5. **Response Presentation**: Format the solution in markdown, LaTeX, or visually using MathJax/Plotly.

This approach allows for flexibility, explainability, and modular extension of features such as tutoring mode or alternative agents.

## 5. Data Collection

### For Math Agent:
- A curated dataset of 15–20 problems from numerical methods, calculus, and algebra textbooks.
- Types include root-finding, differentiation, integration, and equation solving.
- Problems will be encoded in a structured format (Markdown/JSON) for reproducibility.

### For Code Debugging Agent:
- A custom set of 10–15 Python snippets with common errors (e.g., syntax, logic, runtime).
- Each problem will include ground truth solutions and explanations.

These datasets will serve both as training/testing samples and user scenarios.

## 6. Analysis Plan

Analysis will focus on both **quantitative** and **qualitative** evaluation:

- **Accuracy Metrics**:
  - % of correct math solutions
  - % of correct code fixes
- **Explainability Score** (manual rubric):
  - Step clarity
  - Use of visual aids
  - Depth of reasoning
- **Cost/Performance**:
  - API usage comparisons between OpenAI and DeepSeek
- **User Feedback** *(optional)*:
  - Short usability test (Likert scale) if UI is deployed in time

Tools: Python (pandas, SymPy), Plotly for graphs, Matplotlib, custom evaluation scripts.

## 7. Team Roles and Responsibilities

This is an **individual project**.

- **Research Design, Writing, and Implementation**: Danika Young
- **UI and Honors Component Development**: Danika Young
- **Experimentation and Analysis**: Danika Young

All sections will be authored and submitted by the student listed above.

---

## 8. Honors Contract Extension: User Interface and Visualization

To fulfill the Honors Contract requirement, the project will be extended to include a polished user-facing interface:

### Web and Mobile Access

- **Web App**: Built using Flask or FastAPI, users can input problems and receive AI-powered responses with explanations and visual breakdowns.
- **Visual Output**:
  - For math problems: Equation steps rendered using MathJax.
  - For functions: Plots of functions or numerical results using Plotly.
  - For code issues: Flowcharts or syntax trees to help users visualize errors and fixes.

The UI layer enhances the project’s usability, makes the tool accessible to broader audiences, and supports real-world deployment.

---


