A high-performance, privacy-focused AI agent built with FastAPI and the BeeAI Framework. This tool performs autonomous security research and analysis using local LLMs (via Ollama) and Wikipedia, streaming real-time thought processes and final reports to a responsive web dashboard.
- 🔒 Local LLM Processing: All AI inference runs on your hardware using Ollama. Note: Search tools (DuckDuckGo, Wikipedia) send queries to external services.
- 🧠 Agentic Workflow: Uses the BeeAI Framework to Plan (
ThinkTool) > Research (DuckDuckGoSearchTool,WikipediaTool,OpenMeteoTool) > Synthesize. - ⚡ High Performance:
- Built on FastAPI & Uvicorn for async concurrency.
- Uses
orjsonfor ultra-fast JSON serialization (Rust-based). - Server-Sent Events (SSE) for low-latency streaming.
- 🛡️ Resource Management: Implements Async Semaphores to prevent GPU OOM (Out of Memory) errors by queuing concurrent agent requests.
- 💻 Integrated Dashboard: Clean, responsive HTML/JS frontend included. No separate build step required.
- 📄 RAG Document Upload (Optional): Upload documents (PDF, DOCX, PPTX, images, etc.) for context-aware conversations using docling and ChromaDB.
- Python 3.10+
- Ollama installed and running locally (Download Ollama).
- Hardware: A GPU with at least 8GB VRAM is recommended (24GB recommended for larger context windows and models).
-
Clone the repository:
git clone [https://github.com/odevstudio/beeai-analyst.git](https://github.com/yourusername/beeai-analyst.git) cd beeai-analyst -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install core dependencies:
pip install fastapi uvicorn orjson beeai-framework beeai-framework[duckduckgo]
Used versions in this project if you are having trouble:
pip install orjson==3.11.5 fastapi==0.128.0 uvicorn==0.40.0 beeai-framework==0.1.74 "beeai-framework[duckduckgo]"==0.1.74 -
Install RAG dependencies (optional - for document upload):
pip install docling chromadb langchain-text-splitters openai
These enable the document upload feature. The app will work without them (file upload will be disabled).
The agent is configured to use a custom model named gemma-agent via the OpenAI-compatible endpoint. You need to create this model in Ollama to ensure the system prompt works correctly. I used gemma-3-27b-it-q4_k_m.gguf
-
Pull the base model:
ollama pull yourmodel
-
Use a
Modelfile: Create a file namedModelfilein your Model file folder (example file for gemma-3-27b-it-q4_k_m.gguf included). -
Create the custom model:
ollama create gemma-agent -f Modelfile
If you want to use the document upload feature, you need the nomic-embed-text embedding model:
ollama pull nomic-embed-textThis model is used to generate embeddings for document chunks and semantic search queries.
-
Start the Server:
python main.py
Note: Ensure Ollama is running in the background (
ollama serve). -
Access the Dashboard: Open your browser and navigate to:
http://localhost:5000 -
Start an Analysis: Type a query like: "Analyze the security risks of Quantum Computing for banking encryption" and hit Start.
-
Upload Documents (Optional): Use the file upload section in the dashboard to upload documents. The agent will use their content to inform its responses.
The application supports uploading documents for Retrieval-Augmented Generation (RAG). When you upload documents:
- Text Extraction: Documents are processed using docling to extract text from various formats.
- Chunking: Text is split into overlapping chunks (500 chars, 50 char overlap) using LangChain's text splitter.
- Embedding: Chunks are embedded using the
nomic-embed-textmodel via Ollama. - Storage: Embeddings are stored in a session-based ChromaDB vector store.
- Retrieval: When you ask a question, the top 5 most relevant chunks are retrieved and injected into the agent's context.
| Format | Extensions |
|---|---|
.pdf |
|
| Word Documents | .doc, .docx |
| PowerPoint | .pptx |
| Excel | .xlsx |
| HTML | .html |
| Markdown | .md |
| Plain Text | .txt |
| CSV | .csv |
| Images (OCR) | .png, .jpg, .jpeg |
When uploaded documents are consulted, you will see a [RAG] entry in the Live Terminal showing:
- Number of document chunks retrieved
- Source files and their relevance scores
The application follows a modern asynchronous architecture:
- Frontend: Sends a POST request to
/stream. - FastAPI Endpoint:
- Acquires a GPU Semaphore (locks execution to 1 active agent to save VRAM).
- Spawns an asynchronous background task.
- RAG Context Retrieval (if documents uploaded):
- Embeds the query using
nomic-embed-text. - Retrieves top-k relevant chunks from ChromaDB.
- Injects context into the agent's system instructions.
- Embeds the query using
- BeeAI Agent:
- Receives the prompt (with RAG context if available).
- ThinkTool: Plans the research steps.
- DuckDuckGoSearch: Internet search data.
- WikipediaTool: Fetches factual data.
- WeatherSearch: If needed.
- LLM: Synthesizes findings into a structured report.
- Streaming:
- Events are serialized immediately using
orjson. - Data is pushed to an
asyncio.Queue. - The frontend receives events via Server-Sent Events (SSE) and updates the UI in real-time.
- Events are serialized immediately using
You can adjust the following variables in beefast.py to fit your hardware:
| Variable | Default | Description |
|---|---|---|
OLLAMA_NUM_CTX |
8192 |
Context window size. Reduce to 4096 if you have low VRAM. |
gpu_semaphore |
1 |
Number of concurrent agents allowed. Increase if you have multiple GPUs. |
ChatModel |
gemma-agent |
Change this string to use Llama3 or Mistral (openai:llama3, etc.). |
| Variable | Default | Description |
|---|---|---|
EMBEDDING_MODEL |
nomic-embed-text |
Ollama model for generating embeddings. |
CHUNK_SIZE |
500 |
Size of text chunks in characters. |
CHUNK_OVERLAP |
50 |
Overlap between consecutive chunks. |
TOP_K_RESULTS |
5 |
Number of relevant chunks to retrieve per query. |
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Main dashboard UI |
/stream |
POST | Start agent analysis (SSE stream) |
/upload |
POST | Upload document for RAG |
/files/{session_id} |
GET | List uploaded files for session |
/files/{session_id}/{filename} |
DELETE | Remove file from session |
/rag-status |
GET | Check RAG system availability |
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Distributed under the MIT License. See LICENSE for more information.