Skip to content

ODevStudio/beeai-analyst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ BeeAI Local Analyst

A high-performance, privacy-focused AI agent built with FastAPI and the BeeAI Framework. This tool performs autonomous security research and analysis using local LLMs (via Ollama) and Wikipedia, streaming real-time thought processes and final reports to a responsive web dashboard.

License Python FastAPI BeeAI

🚀 Key Features

  • 🔒 Local LLM Processing: All AI inference runs on your hardware using Ollama. Note: Search tools (DuckDuckGo, Wikipedia) send queries to external services.
  • 🧠 Agentic Workflow: Uses the BeeAI Framework to Plan (ThinkTool) > Research (DuckDuckGoSearchTool,WikipediaTool, OpenMeteoTool) > Synthesize.
  • ⚡ High Performance:
    • Built on FastAPI & Uvicorn for async concurrency.
    • Uses orjson for ultra-fast JSON serialization (Rust-based).
    • Server-Sent Events (SSE) for low-latency streaming.
  • 🛡️ Resource Management: Implements Async Semaphores to prevent GPU OOM (Out of Memory) errors by queuing concurrent agent requests.
  • 💻 Integrated Dashboard: Clean, responsive HTML/JS frontend included. No separate build step required.
  • 📄 RAG Document Upload (Optional): Upload documents (PDF, DOCX, PPTX, images, etc.) for context-aware conversations using docling and ChromaDB.

🛠️ Prerequisites

  • Python 3.10+
  • Ollama installed and running locally (Download Ollama).
  • Hardware: A GPU with at least 8GB VRAM is recommended (24GB recommended for larger context windows and models).

📦 Installation

  1. Clone the repository:

    git clone [https://github.com/odevstudio/beeai-analyst.git](https://github.com/yourusername/beeai-analyst.git)
    cd beeai-analyst
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install core dependencies:

    pip install fastapi uvicorn orjson beeai-framework beeai-framework[duckduckgo]

    Used versions in this project if you are having trouble:

    pip install orjson==3.11.5 fastapi==0.128.0 uvicorn==0.40.0 beeai-framework==0.1.74 "beeai-framework[duckduckgo]"==0.1.74
  4. Install RAG dependencies (optional - for document upload):

    pip install docling chromadb langchain-text-splitters openai

    These enable the document upload feature. The app will work without them (file upload will be disabled).

🤖 Model Setup (Ollama)

The agent is configured to use a custom model named gemma-agent via the OpenAI-compatible endpoint. You need to create this model in Ollama to ensure the system prompt works correctly. I used gemma-3-27b-it-q4_k_m.gguf

Chat Model

  1. Pull the base model:

    ollama pull yourmodel
  2. Use a Modelfile: Create a file named Modelfile in your Model file folder (example file for gemma-3-27b-it-q4_k_m.gguf included).

  3. Create the custom model:

    ollama create gemma-agent -f Modelfile

Embedding Model (Required for RAG/Document Upload)

If you want to use the document upload feature, you need the nomic-embed-text embedding model:

ollama pull nomic-embed-text

This model is used to generate embeddings for document chunks and semantic search queries.

▶️ Usage

  1. Start the Server:

    python main.py

    Note: Ensure Ollama is running in the background (ollama serve).

  2. Access the Dashboard: Open your browser and navigate to: http://localhost:5000

  3. Start an Analysis: Type a query like: "Analyze the security risks of Quantum Computing for banking encryption" and hit Start.

  4. Upload Documents (Optional): Use the file upload section in the dashboard to upload documents. The agent will use their content to inform its responses.

📄 RAG Document Upload

The application supports uploading documents for Retrieval-Augmented Generation (RAG). When you upload documents:

  1. Text Extraction: Documents are processed using docling to extract text from various formats.
  2. Chunking: Text is split into overlapping chunks (500 chars, 50 char overlap) using LangChain's text splitter.
  3. Embedding: Chunks are embedded using the nomic-embed-text model via Ollama.
  4. Storage: Embeddings are stored in a session-based ChromaDB vector store.
  5. Retrieval: When you ask a question, the top 5 most relevant chunks are retrieved and injected into the agent's context.

Supported File Formats

Format Extensions
PDF .pdf
Word Documents .doc, .docx
PowerPoint .pptx
Excel .xlsx
HTML .html
Markdown .md
Plain Text .txt
CSV .csv
Images (OCR) .png, .jpg, .jpeg

RAG Visibility in Live Terminal

When uploaded documents are consulted, you will see a [RAG] entry in the Live Terminal showing:

  • Number of document chunks retrieved
  • Source files and their relevance scores

🏗️ Architecture

The application follows a modern asynchronous architecture:

  1. Frontend: Sends a POST request to /stream.
  2. FastAPI Endpoint:
    • Acquires a GPU Semaphore (locks execution to 1 active agent to save VRAM).
    • Spawns an asynchronous background task.
  3. RAG Context Retrieval (if documents uploaded):
    • Embeds the query using nomic-embed-text.
    • Retrieves top-k relevant chunks from ChromaDB.
    • Injects context into the agent's system instructions.
  4. BeeAI Agent:
    • Receives the prompt (with RAG context if available).
    • ThinkTool: Plans the research steps.
    • DuckDuckGoSearch: Internet search data.
    • WikipediaTool: Fetches factual data.
    • WeatherSearch: If needed.
    • LLM: Synthesizes findings into a structured report.
  5. Streaming:
    • Events are serialized immediately using orjson.
    • Data is pushed to an asyncio.Queue.
    • The frontend receives events via Server-Sent Events (SSE) and updates the UI in real-time.

⚙️ Configuration

You can adjust the following variables in beefast.py to fit your hardware:

Core Settings

Variable Default Description
OLLAMA_NUM_CTX 8192 Context window size. Reduce to 4096 if you have low VRAM.
gpu_semaphore 1 Number of concurrent agents allowed. Increase if you have multiple GPUs.
ChatModel gemma-agent Change this string to use Llama3 or Mistral (openai:llama3, etc.).

RAG Settings

Variable Default Description
EMBEDDING_MODEL nomic-embed-text Ollama model for generating embeddings.
CHUNK_SIZE 500 Size of text chunks in characters.
CHUNK_OVERLAP 50 Overlap between consecutive chunks.
TOP_K_RESULTS 5 Number of relevant chunks to retrieve per query.

API Endpoints

Endpoint Method Description
/ GET Main dashboard UI
/stream POST Start agent analysis (SSE stream)
/upload POST Upload document for RAG
/files/{session_id} GET List uploaded files for session
/files/{session_id}/{filename} DELETE Remove file from session
/rag-status GET Check RAG system availability

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the project.
  2. Create your feature branch (git checkout -b feature/AmazingFeature).
  3. Commit your changes (git commit -m 'Add some AmazingFeature').
  4. Push to the branch (git push origin feature/AmazingFeature).
  5. Open a Pull Request.

📄 License

Distributed under the MIT License. See LICENSE for more information.

About

A high-performance, privacy-focused AI agent built with **FastAPI** and the **BeeAI Framework**. This tool performs autonomous security research and analysis using local LLMs (via Ollama) and Wikipedia, streaming real-time thought processes and final reports to a responsive web dashboard.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages