AI Engineering Take‑Home -- Research Assistant

(Example inspiration: TODO/plan executors such as Cursor — plan → execute → log)

Build a small AI agent that helps users tackle complex goals by breaking them into actionable steps and executing them.

Demo video: https://drive.google.com/file/d/1GfZFA9w_loGDRyLv-4hhla3V6mYl2jlR/view?usp=sharing

Run instructions

Requirements

Python 3.12+
uv package manager

Installation

git clone <repo-url>
cd testtask_aiagent
uv sync

Configuration

Copy the example env file and add your API keys:

cp .env.example .env

Required keys:

OPENAI_API_KEY — OpenAI API key
TAVILY_API_KEY — Tavily API key for web search

Usage

uv run python -m agent

Development

uv run pytest              # run tests
uv run ruff check src/     # lint

General

What type of goals or domain to focus on - General Research Assistant
How the AI interaction works (chat, CLI, minimal UI, etc.) - Rich chat in CLI
What level of automation vs. user confirmation you provide:
- User states the task
- The plan is built
- User rejects the plan and adds clarifications
- The new plan is built and accepted
- The research is done (by calling assistants internally)
- The final report is provided to the user
- The user adds some new demands/questions/instructions
- The new plan is built based on the previous context and new demands
- ...
- Persistence - main user conversation can be resumed from json

Please tell us how you spent your time and what trade‑offs you made.

I spent about 2 days it total:

2-3 hours planning
1h finalized the blueprint with Claude
5h run Claude to implement the basic structures
5h clean up and make it work
2h add token counter and context handling
3h running final demos (adjusted one prompt) and writing the report

What has been done:

1. Context & Prompt Engineering (35%)

Clear prompt structure and instructions - prompts/templates
Thoughtful context selection (what to keep vs. drop) - assistants contexts are separate, only reports are shared throughout the system
Basic handling of longer conversations or state / Avoiding prompt bloat - see task_executor, plan_executor, and __main__, see the demo transcript of a very short context window handling demo_transcripts/1_eu_diesel_ban_5k_context_window_85e9aed3

2. Agent Loop & Tool Use (45%)

High‑level goal → structured TODO list - planner
Simple execution loop (select task → execute → update status) - plan_executor
Integration of at least one real tool (web search, document reading, API call, vector search, etc.) - web search & extraction (Tavily API) - tools
Transparent logging of what the agent is doing - full debug logging to file, rich-formatted transcripts of the main conversation and task flows, see demo_transcripts, basic MLflow tracing

3. Evaluation & Communication (20%)

Clear explanation of how you would test or evaluate the system - that's a task probably bigger than the programming:

Basic python unit tests, some are already in tests
Gold standard dataset with a) basic ruled checks, b) llm-based evaluations

Design and trade‑offs

The interaction with user is described abouve, and it's an important part of the design, too
Within assistant:
- tasks are executed sequentially as planned
- reports of previous tasks are shared, so each assistant sees the main goal, work done so far, and its own task
- all reports are then passed to the main researcher to produce final answer
Withing task:
- tools are called until the model stops to require more tool calls and gives the final text answer
- if tool call limit or context window limit is reached, the model is asked to provide the final report
- the context window limit is handled intelligently, with an offset to leave space for the model to give the final report
In general, see demo_transcripts, they show the process clearly

What I would improve

Of course, at first, a minimum gold standard dataset should be gathered, to improve more or less reliably
The main flaw of the current design is a fixed plan. That may be good for a coding assistant, but inappropriate for new information researh. I would make assistants creation dynamic, depending on what has been found so far. Of course, that would require context handling in the main researcher itself
Web page extraction should be wrapped into a separate summarization/extraction LLM call, so that more web pages can be extracted within a single assistant run without hitting the context limit
As a main researcher, a better model should be used, not gpt-5-mini

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude		.claude
.vscode		.vscode
demo_transcripts		demo_transcripts
src/agent		src/agent
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Engineering Take‑Home -- Research Assistant

Run instructions

Requirements

Installation

Configuration

Usage

Development

General

Please tell us how you spent your time and what trade‑offs you made.

What has been done:

1. Context & Prompt Engineering (35%)

2. Agent Loop & Tool Use (45%)

3. Evaluation & Communication (20%)

Design and trade‑offs

What I would improve

About

Uh oh!

Releases

Packages

Languages

AlexanderKazakov/testtask_aiagent

Folders and files

Latest commit

History

Repository files navigation

AI Engineering Take‑Home -- Research Assistant

Run instructions

Requirements

Installation

Configuration

Usage

Development

General

Please tell us how you spent your time and what trade‑offs you made.

What has been done:

1. Context & Prompt Engineering (35%)

2. Agent Loop & Tool Use (45%)

3. Evaluation & Communication (20%)

Design and trade‑offs

What I would improve

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages