A Pythonic, GPU-accelerated, local-first AI research agent framework.
This system provides a modular framework for AI agents that can perform various tasks, including research, code generation, and data analysis. It is designed to be:
- Local-first: Run models locally with GPU acceleration
- Modular: Easily add new agents and capabilities
- Extensible: Integrate with external services and APIs
- Collaborative: Agents can work together to solve complex problems
The system consists of two main components:
- Frontend: Vite + React + TypeScript + TailwindCSS
- Backend: FastAPI + Pydantic + Prefect + LLM integrations
- Node.js 16+ and npm/yarn for frontend
- Python 3.11+ for backend
- Vulkan-compatible GPU for LLM acceleration
# Clone the repository
git clone https://github.com/yourusername/ai-agent-framework.git
cd ai-agent-framework
# Install frontend dependencies
cd frontend
npm install
cd ..
# Install backend dependencies
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cd ..# Run both frontend and backend
./run-dev.shThis will start:
- Frontend at http://localhost:5173/
- Backend at http://localhost:8000/
/ai-agent-framework
├── frontend/ # Vite + React frontend
├── backend/ # FastAPI backend
│ ├── agents/ # Agent implementations
│ ├── api/ # API endpoints
│ ├── llm/ # LLM interfaces
│ ├── memory/ # Memory systems
│ ├── orchestration/ # Prefect workflows
│ └── integrations/ # External service integrations
└── infra/ # Deployment and infrastructure
The project is currently in the stubout phase, with basic frontend and backend stubs implemented. The next step is to implement the full functionality in the fullycode phase.
This framework is designed to:
- Run modular autonomous agents using
llama.cppinference - Coordinate multi-agent workflows with Prefect
- Store and retrieve knowledge via ChromaDB (with cloud sync)
- Integrate seamlessly with tools like Slack, Notion, GitHub
- Support local and AWS G5 deployment with one Dockerfile
- Agent Reflection Loops – Agents self-improve based on past outputs
- Vulkan-Accelerated LLMs – Powered by
unsloth/gemma-3-4b-itormistral-7b - Shared + Isolated Memory – ChromaDB memory partitions with S3 sync
- FastAPI + Async Architecture – API endpoints for research, memory, and integrations
- Pluggable Integrations – Slack, Notion, GitHub, Jira (modular)
- Frontend Dashboard – Vite + React + Tailwind + shadcn/ui for live UI feedback
| Layer | Stack |
|---|---|
| LLM Backend | llama.cpp + Vulkan + quantized models |
| Agents | Python classes w/ reasoning, reflection, tools |
| Orchestration | Prefect 2.x |
| API Layer | FastAPI, Pydantic v2 |
| Memory | ChromaDB (local), S3 sync, Pinecone optional |
| Frontend | Vite + React + TailwindCSS + shadcn/ui |
| Deployment | Docker (local/cloud), Terraform (AWS G5) |
/ai-agent-framework
├── frontend/ # Live UI powered by Vite/React
├── backend/ # Python modules
│ ├── api/ # FastAPI endpoints
│ ├── agents/ # Agent class definitions
│ ├── llm/ # Inference wrappers (llama.cpp)
│ ├── memory/ # ChromaDB abstraction
│ ├── orchestration/ # Prefect flows
│ ├── integrations/ # Slack, Notion, etc.
│ └── main.py # FastAPI entrypoint
├── infra/ # Dockerfile + Terraform config
│ └── terraform/
├── .cursorrules # Project-wide Cursor rules
├── README.md # You are here- Phase 1:
masterplan.md– Plan architecture and flows - Phase 2:
stubout.md– Create stubbed files + live frontend - Phase 3:
fullycode.md– Build production-ready system
📌 Frontend must always run before implementing backend logic
# Clone repo and enter project
$ git clone https://github.com/your-org/ai-agent-framework
$ cd ai-agent-framework
# Start frontend
$ cd frontend && npm install && npm run dev
# In new terminal, setup Python backend
$ cd backend && python3 -m venv venv && source venv/bin/activate
$ pip install -r requirements.txt
$ uvicorn main:app --reloadPOST /research– Start research workflowGET /memory/{agent_id}– Fetch memory snapshotsPOST /integrations/{tool}– Trigger integration handlers
- Define architecture
- Create dev rules + Cursor-compatible .mdc files
- Stub frontend + backend
- Build and test agent logic
- Integration modules (Slack, Notion, etc.)
- Prefect flows for reasoning and orchestration
MIT © 2025 — Built for open collaboration