Modern infrastructure teams are stuck in reactive modeโwaiting for alerts, dashboards to turn red, or users to complain before taking action. Traditional monitoring tools excel at telling you what broke, but fail at predicting what's about to break.
ATOM shifts the paradigm from reactive alerting to predictive reliability intelligence, enabling teams to prevent incidents before they impact users.
ATOM is an end-to-end predictive observability platform that:
- Collects real-time system metrics (CPU, memory, latency, error rates)
- Analyzes trends using statistical methods and anomaly detection
- Forecasts future metric behavior using ARIMA time-series models
- Predicts risk scores with AI-powered agentic reasoning
- Alerts proactively with actionable recommendations
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ATOM Platform โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Prometheus โโโโโถโ Metrics โโโโโถโ Firebase โ โ
โ โ (Source) โ โ Collector โ โ Firestore โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โผ โ โ
โ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ โ โ
โ โ CrewAI โโโโโโผโโโ Forecast โ โ Flutter โ โ โ
โ โ SQL Agent โ โ โ Pipeline โ โDashboard โ โ โ
โ โโโโโโโโโโโโโโโ โ โ (ARIMA) โ โโโโโโโโโโโโ โ โ
โ โ โ โโโโโโโโโโโโโโโ โฒ โ โ
โ โผ โ โ โ โ โ
โ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโ โ โ
โ โ SQLite DB โ โ Real-time Updates โ โ
โ โ (Metrics) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Feature | Description |
|---|---|
| ๐ Real-time Monitoring | Live metrics visualization with interactive charts |
| ๐ฎ AI Forecasting | ARIMA-based prediction of CPU, memory, latency & error rates |
| Dynamic risk assessment with 0-100 scale | |
| ๐ค Agentic Chat | Natural language queries via CrewAI-powered SQL agent |
| ๐ Trend Analysis | Slope detection for latency, memory & error patterns |
| ๐ฏ Actionable Insights | Contextual recommendations based on system state |
| โฑ๏ธ Hourly Forecasts | Automated pipeline generating 50-point predictions |
- Python 3.10+ - Core runtime
- Flask - REST API server
- Firebase Admin SDK - Firestore integration
- Prometheus Client - Metrics collection
- ARIMA (pmdarima) - Time-series forecasting
- CrewAI - Agentic AI framework
- Groq - LLM inference (Llama 3.3)
- Flutter 3.0+ - Cross-platform dashboard
- fl_chart - Data visualization
- Cloud Firestore - Real-time data sync
- Google Fonts - Typography
- Firebase Firestore - Metrics & forecast storage
- Prometheus - Metrics source (optional)
- SQLite - Local metrics database for SQL agent
Atom/
โโโ server/ # Backend services
โ โโโ app.py # Flask API server
โ โโโ metrics_collector.py # Prometheus โ Firestore collector
โ โโโ forecast_pipeline.py # ARIMA forecasting engine
โ โโโ models/ # Pre-trained ARIMA models
โ โ โโโ latency_arima_model.pkl
โ โ โโโ cpu_arima_model.pkl
โ โ โโโ memory_arima_model.pkl
โ โ โโโ error_rate_arima_model.pkl
โ โ โโโ risk_score_arima_model.pkl
โ โโโ key.json # Firebase service account
โ
โโโ sql_agent/ # CrewAI SQL Agent
โ โโโ src/sql_agent/
โ โโโ main.py # Agent entry point
โ โโโ crew.py # CrewAI crew definition
โ โโโ db.py # SQLite database interface
โ โโโ schema.py # Database schema loader
โ โโโ tools/
โ โโโ custom_tool.py # SQL execution tools
โ
โโโ dashboard/ # Flutter frontend
โ โโโ lib/
โ โ โโโ main.dart # App entry point
โ โ โโโ firebase_options.dart
โ โ โโโ pages/
โ โ โโโ dashboard_page.dart # Main dashboard
โ โ โโโ analytics_page.dart # Advanced analytics
โ โโโ assets/
โ โโโ logo.png
โ
โโโ data/ # Data files
โโโ metrics.db # SQLite metrics database
- Python 3.10+
- Flutter 3.0+
- Firebase project with Firestore enabled
- Groq API key (for LLM features)
git clone https://github.com/your-team/atom.git
cd atomcd server
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install flask flask-cors groq firebase-admin prometheus-client numpy pandas pmdarima
# Configure Firebase
# Place your Firebase service account key as key.json
# Set environment variables
export GROQ_API_KEY=your_groq_api_key
# Run the server
python app.pycd sql_agent
# Install with uv (recommended)
uv sync
# Or with pip
pip install crewai groq
# Set environment variables
export GROQ_API_KEY=your_groq_api_key
# Run the agent
python -m sql_agent.main "What is the average latency?"cd dashboard
# Get Flutter dependencies
flutter pub get
# Configure Firebase
flutterfire configure
# Run the app
flutter run -d chrome # For web
flutter run -d windows # For desktop| Variable | Description | Required |
|---|---|---|
GROQ_API_KEY |
Groq API key for LLM inference | Yes |
PROMETHEUS_URL |
Prometheus server URL | Optional |
- Create a Firebase project at console.firebase.google.com
- Enable Firestore Database
- Generate a service account key (Project Settings โ Service Accounts)
- Save as
server/key.json - Run
flutterfire configurein the dashboard directory
| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Chat with AI assistant |
/metrics |
GET | Fetch latest metrics |
/forecast |
GET | Get latest forecast |
/forecast/run |
POST | Trigger manual forecast |
curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is the current risk score?"}'The MetricsCollector queries Prometheus (or generates synthetic data) every 10 minutes, computing:
- Raw metrics: CPU, memory, latency, error rate
- Derived metrics: slopes, trends, anomaly flags
- Risk score: weighted combination of all factors
Hourly, the ForecastPipeline:
- Loads pre-trained ARIMA models for each metric
- Generates 50-point forecasts (8+ hours ahead)
- Stores predictions in Firestore for real-time dashboard updates
risk_score = 0
risk_score += 30 if latency_anomaly else 0
risk_score += min(error_rate * 10, 30)
risk_score += min((memory / 100) * 20, 20)
risk_score += min(abs(memory_slope) * 10, 20)
return min(risk_score, 100)The CrewAI SQL Agent allows natural language queries:
User: "Show me the top 5 highest latency events"
Agent: SELECT * FROM metrics ORDER BY latency DESC LIMIT 5
| Member | Role |
|---|---|
| Jagavantha PA | Full Stack Developer |
| Pranov JB & Karunakaran M | ML Engineer |
| Yuva Krishna I | Frontend Developer |
This project was built during BeachHack 2025 hackathon.
Problem Statement: Pre-Incident Detection AI
Track: AI/ML & DevOps
This project is licensed under the MIT License - see the LICENSE file for details.
Shifting from reactive alerts to proactive intelligence




