ATOM

AI-Powered Pre-Incident Detection & Predictive Reliability Platform

🎯 Problem Statement

Modern infrastructure teams are stuck in reactive mode—waiting for alerts, dashboards to turn red, or users to complain before taking action. Traditional monitoring tools excel at telling you what broke, but fail at predicting what's about to break.

ATOM shifts the paradigm from reactive alerting to predictive reliability intelligence, enabling teams to prevent incidents before they impact users.

💡 Solution Overview

ATOM is an end-to-end predictive observability platform that:

Collects real-time system metrics (CPU, memory, latency, error rates)
Analyzes trends using statistical methods and anomaly detection
Forecasts future metric behavior using ARIMA time-series models
Predicts risk scores with AI-powered agentic reasoning
Alerts proactively with actionable recommendations

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              ATOM Platform                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                 │
│  │  Prometheus │───▶│   Metrics   │───▶│  Firebase   │                 │
│  │   (Source)  │    │  Collector  │    │  Firestore  │                 │
│  └─────────────┘    └─────────────┘    └──────┬──────┘                 │
│                                               │                         │
│                     ┌─────────────────────────┼─────────────────────┐   │
│                     │                         ▼                     │   │
│  ┌─────────────┐    │  ┌─────────────┐  ┌──────────┐               │   │
│  │   CrewAI    │◀───┼──│  Forecast   │  │ Flutter  │               │   │
│  │  SQL Agent  │    │  │  Pipeline   │  │Dashboard │               │   │
│  └─────────────┘    │  │  (ARIMA)    │  └──────────┘               │   │
│         │           │  └─────────────┘       ▲                     │   │
│         ▼           │         │              │                     │   │
│  ┌─────────────┐    │         └──────────────┘                     │   │
│  │  SQLite DB  │    │       Real-time Updates                      │   │
│  │  (Metrics)  │    └───────────────────────────────────────────────┘   │
│  └─────────────┘                                                        │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

✨ Key Features

Feature	Description
📊 Real-time Monitoring	Live metrics visualization with interactive charts
🔮 AI Forecasting	ARIMA-based prediction of CPU, memory, latency & error rates
⚠️ Risk Scoring	Dynamic risk assessment with 0-100 scale
🤖 Agentic Chat	Natural language queries via CrewAI-powered SQL agent
📈 Trend Analysis	Slope detection for latency, memory & error patterns
🎯 Actionable Insights	Contextual recommendations based on system state
⏱️ Hourly Forecasts	Automated pipeline generating 50-point predictions

🛠️ Tech Stack

Backend

Python 3.10+ - Core runtime
Flask - REST API server
Firebase Admin SDK - Firestore integration
Prometheus Client - Metrics collection
ARIMA (pmdarima) - Time-series forecasting
CrewAI - Agentic AI framework
Groq - LLM inference (Llama 3.3)

Frontend

Flutter 3.0+ - Cross-platform dashboard
fl_chart - Data visualization
Cloud Firestore - Real-time data sync
Google Fonts - Typography

Infrastructure

Firebase Firestore - Metrics & forecast storage
Prometheus - Metrics source (optional)
SQLite - Local metrics database for SQL agent

📁 Project Structure

Atom/
├── server/                    # Backend services
│   ├── app.py                 # Flask API server
│   ├── metrics_collector.py   # Prometheus → Firestore collector
│   ├── forecast_pipeline.py   # ARIMA forecasting engine
│   ├── models/                # Pre-trained ARIMA models
│   │   ├── latency_arima_model.pkl
│   │   ├── cpu_arima_model.pkl
│   │   ├── memory_arima_model.pkl
│   │   ├── error_rate_arima_model.pkl
│   │   └── risk_score_arima_model.pkl
│   └── key.json               # Firebase service account
│
├── sql_agent/                 # CrewAI SQL Agent
│   └── src/sql_agent/
│       ├── main.py            # Agent entry point
│       ├── crew.py            # CrewAI crew definition
│       ├── db.py              # SQLite database interface
│       ├── schema.py          # Database schema loader
│       └── tools/
│           └── custom_tool.py # SQL execution tools
│
├── dashboard/                 # Flutter frontend
│   ├── lib/
│   │   ├── main.dart          # App entry point
│   │   ├── firebase_options.dart
│   │   └── pages/
│   │       ├── dashboard_page.dart  # Main dashboard
│   │       └── analytics_page.dart  # Advanced analytics
│   └── assets/
│       └── logo.png
│
└── data/                      # Data files
    └── metrics.db             # SQLite metrics database

🚀 Getting Started

Prerequisites

Python 3.10+
Flutter 3.0+
Firebase project with Firestore enabled
Groq API key (for LLM features)

1. Clone the Repository

git clone https://github.com/your-team/atom.git
cd atom

2. Backend Setup

cd server

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install flask flask-cors groq firebase-admin prometheus-client numpy pandas pmdarima

# Configure Firebase
# Place your Firebase service account key as key.json

# Set environment variables
export GROQ_API_KEY=your_groq_api_key

# Run the server
python app.py

3. SQL Agent Setup

cd sql_agent

# Install with uv (recommended)
uv sync

# Or with pip
pip install crewai groq

# Set environment variables
export GROQ_API_KEY=your_groq_api_key

# Run the agent
python -m sql_agent.main "What is the average latency?"

4. Dashboard Setup

cd dashboard

# Get Flutter dependencies
flutter pub get

# Configure Firebase
flutterfire configure

# Run the app
flutter run -d chrome  # For web
flutter run -d windows # For desktop

🔧 Configuration

Environment Variables

Variable	Description	Required
`GROQ_API_KEY`	Groq API key for LLM inference	Yes
`PROMETHEUS_URL`	Prometheus server URL	Optional

Firebase Setup

Create a Firebase project at console.firebase.google.com
Enable Firestore Database
Generate a service account key (Project Settings → Service Accounts)
Save as server/key.json
Run flutterfire configure in the dashboard directory

📊 API Endpoints

Endpoint	Method	Description
`/chat`	POST	Chat with AI assistant
`/metrics`	GET	Fetch latest metrics
`/forecast`	GET	Get latest forecast
`/forecast/run`	POST	Trigger manual forecast

Example: Chat Request

curl -X POST http://localhost:5000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is the current risk score?"}'

📸 Screenshots

Main Dashboard	Advanced Analytics
Risk Forecasting	AI Assistant

🧪 How It Works

1. Metrics Collection

The MetricsCollector queries Prometheus (or generates synthetic data) every 10 minutes, computing:

Raw metrics: CPU, memory, latency, error rate
Derived metrics: slopes, trends, anomaly flags
Risk score: weighted combination of all factors

2. Forecasting Pipeline

Hourly, the ForecastPipeline:

Loads pre-trained ARIMA models for each metric
Generates 50-point forecasts (8+ hours ahead)
Stores predictions in Firestore for real-time dashboard updates

3. Risk Scoring Algorithm

risk_score = 0
risk_score += 30 if latency_anomaly else 0
risk_score += min(error_rate * 10, 30)
risk_score += min((memory / 100) * 20, 20)
risk_score += min(abs(memory_slope) * 10, 20)
return min(risk_score, 100)

4. Agentic SQL Queries

The CrewAI SQL Agent allows natural language queries:

User: "Show me the top 5 highest latency events"
Agent: SELECT * FROM metrics ORDER BY latency DESC LIMIT 5

👥 Team

Member	Role
Jagavantha PA	Full Stack Developer
Pranov JB & Karunakaran M	ML Engineer
Yuva Krishna I	Frontend Developer

🏆 BeachHack 2025

This project was built during BeachHack 2025 hackathon.

Problem Statement: Pre-Incident Detection AI
Track: AI/ML & DevOps

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Built with ❤️ for predictive reliability
Shifting from reactive alerts to proactive intelligence

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
dashboard		dashboard
server		server
sql_agent		sql_agent
training_data		training_data
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ATOM

AI-Powered Pre-Incident Detection & Predictive Reliability Platform

🎯 Problem Statement

💡 Solution Overview

🏗️ Architecture

✨ Key Features

🛠️ Tech Stack

Backend

Frontend

Infrastructure

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Backend Setup

3. SQL Agent Setup

4. Dashboard Setup

🔧 Configuration

Environment Variables

Firebase Setup

📊 API Endpoints

Example: Chat Request

📸 Screenshots

🧪 How It Works

1. Metrics Collection

2. Forecasting Pipeline

3. Risk Scoring Algorithm

4. Agentic SQL Queries

👥 Team

🏆 BeachHack 2025

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages