# Agentic AI Insight System

### Problem Statement
Organizations accumulate vast amounts of structured and unstructured data, but extracting actionable business insights remains slow, manual, and error-prone. There is a need for an automated, scalable, and intelligent system that can process diverse data sources, perform advanced analytics (KPI extraction, trend and anomaly detection), and deliver clear, actionable recommendations to business users in natural language.

### Project Objective
- Automate the end-to-end analytics workflow from data ingestion to insight delivery using a multi-agent architecture.

- Enable business users to upload data and receive actionable insights (KPI, trends, anomalies, recommendations) via a simple UI or chatbot.

- Leverage advanced AI (LLMs/NLP) for natural language summaries, explanations, and user Q&A.

- Support multiple data types (CSV, SQL, images) and scale to various business domains.

### Project Index
#### 1. Intro

#### 2. Problem Statement and Objectives

#### 3. SIPOC Analysis

#### 4. System Architecture Overview

#### 5. Core Agent Implementations

#### 6. Data Source Agent

#### 7. Data Preprocessor Agent

#### 8. Multi-Agent Processing Hub (KPI, Trend, Anomaly, Insight Writer)

#### 9. Insight Delivery Agent

#### 10. End-to-End Workflow 


### SIPOC Table

### System Architecture

![Architechture.png](attachment:fa6ccf63-3f92-4686-ba3a-578cbe2175d5.png)

### What I Have Done
- Designed and implemented a modular, agent-based analytics pipeline.

- Enabled ingestion and preprocessing of multiple data types.

- Developed specialized agents for KPI extraction, trend detection, and anomaly summarization.

- Integrated LLMs (Perplexity API) for natural language explanations and recommendations.

- Built an interactive UI for file upload, insight streaming, and user Q&A.

### Setup and Configuration

This section introduces the notebook and outlines the technologies and libraries being used. It serves as a preamble for readers and sets the context for the project.

In [None]:
#Installing the Required Libraires
!pip install streamlit gradio python-magic pandas matplotlib python-dotenv google-generativeai PyPDF2 Pillow



In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import os
import magic
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from dotenv import load_dotenv
from openai import OpenAI
import logging
import json
from typing import Dict, List, Tuple, Any, Optional

# Configure logging
logging.basicConfig(level=logging.INFO, 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Set your Perplexity API key
os.environ["PERPLEXITY_API_KEY"] = "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"


### Configuration Management

This code creates a configuration setup that:
- Loads secret keys (like an API key) from a .env file
- Allows only specific file types (like .csv, .jpg, etc.)


In [None]:
# Configuration class to manage settings
class Config:
    def __init__(self):
        load_dotenv()
        self.ALLOWED_TYPES = ['.csv', '.xlsx', '.jpg', '.jpeg', '.png', '.sql']
        self.MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB
        self.PERPLEXITY_API_KEY = os.getenv("PERPLEXITY_API_KEY")
        
config = Config()


### Data Source Agent

In [None]:
class DataSourceAgent:
    """Handles loading and initial parsing of different file types"""
    
    def load_file(self, file_path: str) -> Tuple[Any, str]:
        """Load file based on its extension"""
        extension = Path(file_path).suffix.lower()
        
        try:
            if extension in ['.csv']:
                data = pd.read_csv(file_path)
                return data, "csv"
            elif extension in ['.xlsx', '.xls']:
                data = pd.read_excel(file_path)
                return data, "excel"
            elif extension in ['.jpg', '.jpeg', '.png']:
                # For images, return the file path for now
                return file_path, "image"
            elif extension == '.sql':
                # Read SQL file as text for now
                with open(file_path, 'r') as f:
                    sql_content = f.read()
                return sql_content, "sql"
            else:
                raise ValueError(f"Unsupported file type: {extension}")
        except Exception as e:
            logger.error(f"Error loading file: {str(e)}")
            raise RuntimeError(f"Failed to load file: {str(e)}")


### Data Preprocessor Agent

This code is useful for the project because it automatically detects the file type and loads it accordingly. It supports multiple formats like CSV, Excel, images, and SQL files. This helps in easily handling various data sources during ETL or analysis without writing separate code each time.

In [None]:
class DataPreprocessorAgent:
    """Handles data cleaning, feature engineering, and aggregation"""
    
    def preprocess(self, data: Any, data_type: str) -> Any:
        """Preprocess data based on its type"""
        if data_type in ["csv", "excel"]:
            return self._preprocess_dataframe(data)
        elif data_type == "image":
            return self._preprocess_image(data)
        elif data_type == "sql":
            return self._preprocess_sql(data)
        else:
            raise ValueError(f"Unsupported data type for preprocessing: {data_type}")
    
    def _preprocess_dataframe(self, df: pd.DataFrame) -> pd.DataFrame:
        """Preprocess pandas DataFrame"""
        # Basic cleaning
        df_cleaned = df.copy()
        
        # Handle missing values
        for col in df_cleaned.columns:
            missing_pct = df_cleaned[col].isna().mean()
            if missing_pct > 0.5:
                # Drop columns with more than 50% missing values
                df_cleaned = df_cleaned.drop(columns=[col])
            elif df_cleaned[col].dtype in ['int64', 'float64']:
                # Fill numeric columns with median
                df_cleaned[col] = df_cleaned[col].fillna(df_cleaned[col].median())
            else:
                # Fill categorical/text columns with mode
                df_cleaned[col] = df_cleaned[col].fillna(df_cleaned[col].mode()[0] 
                                                        if not df_cleaned[col].mode().empty 
                                                        else "Unknown")
        
        # Feature engineering (basic example)
        # Add date-related features if datetime columns exist
        date_cols = [col for col in df_cleaned.columns 
                    if 'date' in col.lower() or 'time' in col.lower()]
        for date_col in date_cols:
            try:
                df_cleaned[date_col] = pd.to_datetime(df_cleaned[date_col])
                df_cleaned[f'{date_col}_month'] = df_cleaned[date_col].dt.month
                df_cleaned[f'{date_col}_year'] = df_cleaned[date_col].dt.year
                df_cleaned[f'{date_col}_day'] = df_cleaned[date_col].dt.day
            except:
                pass  # Skip if conversion fails
        
        return df_cleaned
    
    def _preprocess_image(self, image_path: str) -> str:
        """Preprocess image (placeholder)"""
        # In a real implementation, this might use computer vision libraries
        return f"Preprocessed image at {image_path}"
    
    def _preprocess_sql(self, sql_content: str) -> str:
        """Preprocess SQL (placeholder)"""
        # In a real implementation, this might validate/optimize SQL
        return f"Preprocessed SQL with {len(sql_content)} characters"


### KPI Agent

The KPIAgent automatically extracts important metrics (KPIs) like row count, missing values, and statistics (mean, median, etc.) from data files. It currently supports structured data like CSV or Excel files. This helps in quickly understanding data quality and performance indicators without manual analysis.

In [None]:
class KPIAgent:
    """Extracts Key Performance Indicators from data"""
    
    def extract_kpis(self, data: Any, data_type: str) -> Dict:
        """Extract KPIs from data based on its type"""
        if data_type in ["csv", "excel"]:
            return self._extract_dataframe_kpis(data)
        elif data_type == "image":
            return {"message": "KPI extraction from images not implemented"}
        elif data_type == "sql":
            return {"message": "KPI extraction from SQL not implemented"}
        else:
            raise ValueError(f"Unsupported data type for KPI extraction: {data_type}")
    
    def _extract_dataframe_kpis(self, df: pd.DataFrame) -> Dict:
        """Extract KPIs from pandas DataFrame"""
        kpis = {
            "row_count": len(df),
            "column_count": len(df.columns),
            "missing_values_pct": df.isna().mean().mean() * 100,
        }
        
        # Extract numeric KPIs
        numeric_cols = df.select_dtypes(include=['number']).columns
        if len(numeric_cols) > 0:
            kpis["numeric_columns"] = {}
            for col in numeric_cols:
                kpis["numeric_columns"][col] = {
                    "mean": df[col].mean(),
                    "median": df[col].median(),
                    "std": df[col].std(),
                    "min": df[col].min(),
                    "max": df[col].max()
                }
        
        # Look for specific KPIs based on column names
        col_lower = [col.lower() for col in df.columns]
        
        # Attrition rate (if relevant columns exist)
        if any(kw in ' '.join(col_lower) for kw in ['attrition', 'churn', 'terminated']):
            attrition_col = next((col for col in df.columns 
                                if any(kw in col.lower() 
                                      for kw in ['attrition', 'churn', 'terminated'])), None)
            if attrition_col:
                # Assuming binary values where 1/True/Yes indicates attrition
                attrition_map = {'Yes': 1, 'No': 0, True: 1, False: 0, 1: 1, 0: 0}
                if df[attrition_col].dtype == 'object':
                    kpis["attrition_rate"] = df[attrition_col].map(
                        lambda x: attrition_map.get(x, 0) 
                        if pd.notna(x) else 0).mean() * 100
                else:
                    kpis["attrition_rate"] = df[attrition_col].mean() * 100
        
        return kpis


### Trend Agent

The TrendAgent helps detect important patterns over time, like month-over-month changes in key metrics. It also finds strong correlations between numeric columns in the data. This makes it easier to spot trends and relationships that may impact business performance or decision-making.

In [None]:
class TrendAgent:
    """Detects trends in data"""
    
    def detect_trends(self, data: Any, data_type: str) -> Dict:
        """Detect trends in data based on its type"""
        if data_type in ["csv", "excel"]:
            return self._detect_dataframe_trends(data)
        elif data_type == "image":
            return {"message": "Trend detection from images not implemented"}
        elif data_type == "sql":
            return {"message": "Trend detection from SQL not implemented"}
        else:
            raise ValueError(f"Unsupported data type for trend detection: {data_type}")
    
    def _detect_dataframe_trends(self, df: pd.DataFrame) -> Dict:
        """Detect trends in pandas DataFrame"""
        trends = {}
        
        # Look for time-based columns for time series analysis
        date_cols = [col for col in df.columns if 'date' in col.lower() or 'time' in col.lower()]
        
        if date_cols:
            trends["time_series"] = {}
            for date_col in date_cols:
                try:
                    df[date_col] = pd.to_datetime(df[date_col])
                    
                    # Check for other numeric columns to analyze over time
                    numeric_cols = df.select_dtypes(include=['number']).columns
                    
                    for num_col in numeric_cols:
                        # Group by month and calculate statistics
                        monthly_data = df.groupby(df[date_col].dt.to_period("M"))[num_col].agg(['mean', 'count'])
                        
                        # Calculate month-over-month percentage change
                        pct_change = monthly_data['mean'].pct_change() * 100
                        
                        # Identify significant changes (>10% change)
                        significant_changes = pct_change[abs(pct_change) > 10]
                        
                        if not significant_changes.empty:
                            trends["time_series"][f"{num_col}_by_{date_col}"] = {
                                "significant_changes": {
                                    str(period): change for period, change in significant_changes.items()
                                }
                            }
                except:
                    pass  # Skip if conversion fails
        
        # Detect correlations between numeric columns
        numeric_df = df.select_dtypes(include=['number'])
        if len(numeric_df.columns) > 1:
            corr_matrix = numeric_df.corr()
            
            # Find strong correlations (absolute value > 0.7)
            strong_corrs = []
            for i in range(len(corr_matrix.columns)):
                for j in range(i):
                    if abs(corr_matrix.iloc[i, j]) > 0.7:
                        strong_corrs.append({
                            "col1": corr_matrix.columns[i],
                            "col2": corr_matrix.columns[j],
                            "correlation": corr_matrix.iloc[i, j]
                        })
            
            if strong_corrs:
                trends["correlations"] = strong_corrs
        
        return trends


### Anomaly Agent (with Perplexity API Integration)


The AnomalyAgent detects unusual data points (outliers) in numeric columns using statistical rules. If a Perplexity API key is available, it uses NLP to explain the anomalies and suggest business actions. This gives both technical detection and clear recommendations, helping in faster decision-making.


In [None]:
class AnomalyAgent:
    """Detects anomalies and generates summaries using NLP"""
    
    def __init__(self, api_key: str = None):
        """Initialize the Anomaly Agent with an optional API key"""
        self.api_key = api_key or os.getenv("PERPLEXITY_API_KEY")
        if not self.api_key:
            logger.warning("No API key provided for Anomaly Agent. NLP summaries will be limited.")
    
    def detect_anomalies(self, data: Any, data_type: str) -> Dict:
        """Detect anomalies in data based on its type"""
        if data_type in ["csv", "excel"]:
            anomalies = self._detect_dataframe_anomalies(data)
            
            # If API key is available, generate NLP summary
            if self.api_key:
                summary = self._generate_anomaly_summary(anomalies, data)
                anomalies["nlp_summary"] = summary
            
            return anomalies
        elif data_type == "image":
            return {"message": "Anomaly detection from images not implemented"}
        elif data_type == "sql":
            return {"message": "Anomaly detection from SQL not implemented"}
        else:
            raise ValueError(f"Unsupported data type for anomaly detection: {data_type}")
    
    def _detect_dataframe_anomalies(self, df: pd.DataFrame) -> Dict:
        """Detect anomalies in pandas DataFrame"""
        anomalies = {}
        
        # Detect outliers in numeric columns (using IQR method)
        numeric_cols = df.select_dtypes(include=['number']).columns
        
        if len(numeric_cols) > 0:
            anomalies["outliers"] = {}
            
            for col in numeric_cols:
                Q1 = df[col].quantile(0.25)
                Q3 = df[col].quantile(0.75)
                IQR = Q3 - Q1
                
                # Define outliers as values outside 1.5 * IQR from Q1 and Q3
                lower_bound = Q1 - 1.5 * IQR
                upper_bound = Q3 + 1.5 * IQR
                
                outliers = df[(df[col] < lower_bound) | (df[col] > upper_bound)][col]
                
                if not outliers.empty:
                    anomalies["outliers"][col] = {
                        "count": len(outliers),
                        "percentage": len(outliers) / len(df) * 100,
                        "min_outlier": outliers.min(),
                        "max_outlier": outliers.max()
                    }
        
        return anomalies
    
    def _generate_anomaly_summary(self, anomalies: Dict, data_sample: pd.DataFrame) -> str:
        """Generate NLP summary of anomalies using Perplexity API"""
        try:
            # Create a concise sample of the data for context
            data_context = data_sample.head(5).to_string()
            
            # Prepare anomalies as a readable string
            anomalies_text = json.dumps(anomalies, indent=2)
            
            # Create prompt for the model
            prompt = f"""
            Analyze these anomalies detected in a dataset and provide actionable recommendations:
            
            Data Sample (first 5 rows):
            {data_context}
            
            Detected Anomalies:
            {anomalies_text}
            
            Please provide:
            1. A summary of the key anomalies
            2. Possible business implications
            3. 3-5 actionable recommendations
            """
            
            # Call Perplexity API
            client = OpenAI(
                api_key=self.api_key,
                base_url="https://api.perplexity.ai"
            )
            
            response = client.chat.completions.create(
                model="sonar-pro-online",
                messages=[
                    {"role": "system", "content": "You are a data analyst specializing in anomaly detection and business insights."},
                    {"role": "user", "content": prompt}
                ]
            )
            
            return response.choices[0].message.content
        
        except Exception as e:
            logger.error(f"Error generating anomaly summary: {str(e)}")
            return f"Error generating summary: {str(e)}"


### Insight Writer Agent

The InsightWriterAgent takes outputs from other agents (like KPIs, trends, anomalies) and automatically generates clear, human-readable insights. It helps turn complex data analysis into simple business narratives or reports. This is useful for non-technical stakeholders to quickly understand what's happening in the data and why it matters.

In [None]:
class InsightWriterAgent:
    """Composes insights from KPIs, trends, and anomalies"""
    
    def __init__(self, api_key: str = None):
        """Initialize the Insight Writer with an optional API key"""
        self.api_key = api_key or os.getenv("PERPLEXITY_API_KEY")
        if not self.api_key:
            logger.warning("No API key provided for Insight Writer. NLP insights will be limited.")
    
    def generate_insights(self, kpis: Dict, trends: Dict, anomalies: Dict, user_query: str) -> str:
        """Generate insights from KPIs, trends, and anomalies"""
        try:
            # If API key is available, use Perplexity for insights
            if self.api_key:
                return self._generate_nlp_insights(kpis, trends, anomalies, user_query)
            else:
                # Fallback to simple template-based insights
                return self._generate_basic_insights(kpis, trends, anomalies)
        except Exception as e:
            logger.error(f"Error generating insights: {str(e)}")
            return f"Error generating insights: {str(e)}"
    
    def _generate_nlp_insights(self, kpis: Dict, trends: Dict, anomalies: Dict, user_query: str) -> str:
        """Generate insights using Perplexity API"""
        try:
            # Prepare data as readable strings
            kpis_text = json.dumps(kpis, indent=2)
            trends_text = json.dumps(trends, indent=2)
            anomalies_text = json.dumps(anomalies, indent=2)
            
            # Create prompt for the model
            prompt = f"""
            Analyze this data and provide insights in response to the user's query: "{user_query}"
            
            KPIs:
            {kpis_text}
            
            Trends:
            {trends_text}
            
            Anomalies:
            {anomalies_text}
            
            Please provide:
            1. A summary of the key findings
            2. Direct answers to the user's query
            3. 3-5 actionable recommendations
            4. Any additional insights you can derive from the data
            """
            
            # Call Perplexity API
            client = OpenAI(
                api_key=self.api_key,
                base_url="https://api.perplexity.ai"
            )
            
            response = client.chat.completions.create(
                model="sonar-pro-online",
                messages=[
                    {"role": "system", "content": "You are a business intelligence expert who provides clear, actionable insights."},
                    {"role": "user", "content": prompt}
                ]
            )
            
            return response.choices[0].message.content
        
        except Exception as e:
            logger.error(f"Error generating NLP insights: {str(e)}")
            return f"Error generating NLP insights: {str(e)}"


### Insight Delivery Agent

The InsightDeliveryAgent is responsible for delivering insights to users and answering their follow-up questions using analyzed data (like KPIs, trends, and anomalies). It uses NLP with the Perplexity API to generate meaningful, context-aware responses. This makes the system interactive and helps users make data-driven decisions more easily.

In [None]:
class InsightDeliveryAgent:
    """Delivers insights to the user"""
    
    def __init__(self, api_key: str = None):
        """Initialize the Insight Delivery Agent with an optional API key"""
        self.api_key = api_key or os.getenv("PERPLEXITY_API_KEY")
        self.conversation_history = []
    
    def deliver_insights(self, insights: str) -> str:
        """Deliver insights to the user"""
        # In a real implementation, this might format the insights for display
        return insights
    
    def answer_question(self, question: str, kpis: Dict, trends: Dict, anomalies: Dict) -> str:
        """Answer a follow-up question based on the analyzed data"""
        try:
            if not self.api_key:
                return "API key required for Q&A functionality."
            
            # Add question to conversation history
            self.conversation_history.append({"role": "user", "content": question})
            
            # Prepare data as readable strings
            kpis_text = json.dumps(kpis, indent=2)
            trends_text = json.dumps(trends, indent=2)
            anomalies_text = json.dumps(anomalies, indent=2)
            
            # Create prompt for the model
            prompt = f"""
            Answer the user's question based on this analyzed data:
            
            User's question: "{question}"
            
            KPIs:
            {kpis_text}
            
            Trends:
            {trends_text}
            
            Anomalies:
            {anomalies_text}
            
            Previous conversation:
            {json.dumps(self.conversation_history[:-1], indent=2) if len(self.conversation_history) > 1 else "None"}
            """
            
            # Call Perplexity API
            client = OpenAI(
                api_key=self.api_key,
                base_url="https://api.perplexity.ai"
            )
            
            response = client.chat.completions.create(
                model="sonar-pro-online",
                messages=[
                    {"role": "system", "content": "You are a data analyst assistant who answers questions based on analyzed data."},
                    {"role": "user", "content": prompt}
                ]
            )
            
            answer = response.choices[0].message.content
            
            # Add answer to conversation history
            self.conversation_history.append({"role": "assistant", "content": answer})
            
            return answer
        
        except Exception as e:
            logger.error(f"Error answering question: {str(e)}")
            return f"Error answering question: {str(e)}"


### Orchestration Engine


The AgentOrchestrator is the central brain of the entire analytical pipeline. It coordinates agents like data loaders, preprocessors, KPI extractors, trend/anomaly detectors, and insight generators. This enables a seamless, automated flow from raw data ingestion to delivering actionable insights and answering user questions.




In [None]:
class AgentOrchestrator:
    """Orchestrates the agentic workflow"""
    
    def __init__(self, api_key: str = None):
        """Initialize the orchestrator with agents"""
        self.api_key = api_key or os.getenv("PERPLEXITY_API_KEY")
        
        # Initialize all agents
        self.data_source_agent = DataSourceAgent()
        self.data_preprocessor_agent = DataPreprocessorAgent()
        self.kpi_agent = KPIAgent()
        self.trend_agent = TrendAgent()
        self.anomaly_agent = AnomalyAgent(api_key=self.api_key)
        self.insight_writer_agent = InsightWriterAgent(api_key=self.api_key)
        self.insight_delivery_agent = InsightDeliveryAgent(api_key=self.api_key)
        
        # Store analysis results
        self.results = {
            "data": None,
            "data_type": None,
            "kpis": None,
            "trends": None,
            "anomalies": None,
            "insights": None
        }
    
    def process_file(self, file_path: str, user_query: str) -> Dict:
        """Process a file through the entire agent pipeline"""
        try:
            logger.info(f"Processing file: {file_path}")
            
            # Step 1: Load file
            data, data_type = self.data_source_agent.load_file(file_path)
            logger.info(f"File loaded as {data_type}")
            
            # Step 2: Preprocess data
            preprocessed_data = self.data_preprocessor_agent.preprocess(data, data_type)
            logger.info("Data preprocessing complete")
            
            # Store preprocessed data
            self.results["data"] = preprocessed_data
            self.results["data_type"] = data_type
            
            # Step 3: Extract KPIs
            kpis = self.kpi_agent.extract_kpis(preprocessed_data, data_type)
            logger.info(f"KPI extraction complete: {len(kpis)} KPIs found")
            self.results["kpis"] = kpis
            
            # Step 4: Detect trends
            trends = self.trend_agent.detect_trends(preprocessed_data, data_type)
            logger.info("Trend detection complete")
            self.results["trends"] = trends
            
            # Step 5: Detect anomalies
            anomalies = self.anomaly_agent.detect_anomalies(preprocessed_data, data_type)
            logger.info("Anomaly detection complete")
            self.results["anomalies"] = anomalies
            
            # Step 6: Generate insights
            insights = self.insight_writer_agent.generate_insights(
                kpis, trends, anomalies, user_query)
            logger.info("Insight generation complete")
            self.results["insights"] = insights
            
            # Step 7: Deliver insights
            formatted_insights = self.insight_delivery_agent.deliver_insights(insights)
            
            return {
                "success": True,
                "message": "Analysis complete",
                "insights": formatted_insights
            }
        
        except Exception as e:
            logger.error(f"Error in processing pipeline: {str(e)}")
            return {
                "success": False,
                "message": f"Error: {str(e)}",
                "insights": None
            }
    
    def answer_question(self, question: str) -> str:
        """Answer a follow-up question about the analyzed data"""
        if not all([self.results["kpis"], self.results["trends"], self.results["anomalies"]]):
            return "Please analyze data first before asking questions."
        
        return self.insight_delivery_agent.answer_question(
            question, 
            self.results["kpis"], 
            self.results["trends"], 
            self.results["anomalies"]
        )


### Visualization Utilities

The create_visualizations function generates plots for numeric and categorical data distributions, as well as a correlation heatmap if correlation data is available. It works specifically for tabular data formats like CSV or Excel. The output includes saved PNG files that visually summarize the key patterns in the dataset.


In [None]:
def create_visualizations(orchestrator):
    """Create visualizations based on analyzed data"""
    if orchestrator.results["data_type"] not in ["csv", "excel"]:
        return "Visualizations only available for tabular data."
    
    df = orchestrator.results["data"]
    
    # Set up the plots
    plt.figure(figsize=(15, 10))
    
    # 1. Plot numeric column distributions
    numeric_cols = df.select_dtypes(include=['number']).columns[:4]  # Limit to first 4
    
    for i, col in enumerate(numeric_cols):
        plt.subplot(2, 2, i+1)
        sns.histplot(df[col].dropna(), kde=True)
        plt.title(f'Distribution of {col}')
    
    plt.tight_layout()
    plt.savefig('numeric_distributions.png')
    plt.close()
    
    # 2. Plot categorical column distributions (if any)
    cat_cols = df.select_dtypes(include=['object', 'category']).columns[:4]
    
    if len(cat_cols) > 0:
        plt.figure(figsize=(15, 10))
        
        for i, col in enumerate(cat_cols):
            plt.subplot(2, 2, i+1)
            value_counts = df[col].value_counts().nlargest(10)  # Top 10 categories
            sns.barplot(x=value_counts.index, y=value_counts.values)
            plt.title(f'Top categories in {col}')
            plt.xticks(rotation=45)
        
        plt.tight_layout()
        plt.savefig('categorical_distributions.png')
        plt.close()
    
    # 3. Plot correlations (if available)
    if "correlations" in orchestrator.results["trends"]:
        corrs = orchestrator.results["trends"]["correlations"]
        if corrs:
            corr_cols = list(set([c["col1"] for c in corrs] + [c["col2"] for c in corrs]))
            if len(corr_cols) > 1:
                plt.figure(figsize=(10, 8))
                sns.heatmap(df[corr_cols].corr(), annot=True, cmap='coolwarm')
                plt.title('Correlation Matrix')
                plt.tight_layout()
                plt.savefig('correlations.png')
                plt.close()
    
    return "Visualizations created successfully"

### Agent Orchestrator process

This script demonstrates how to use the AgentOrchestrator to process an HR dataset and analyze attrition patterns. It runs a full pipeline: data loading, analysis, insight generation, and visualization. The setup also allows follow-up questions to be answered based on the analyzed results.

In [None]:
# Example setup
file_path = r'C:\Users\DurgaPrasannaKompell\OneDrive - SplashBI\Desktop\ibm data\HR Data.csv'
user_query = "Analyze this HR data and identify attrition patterns"

# Create orchestrator
orchestrator = AgentOrchestrator()

# Process the file
result = orchestrator.process_file(file_path, user_query)

# Check if processing was successful
if result["success"]:
    print("Analysis Complete!")
    print("\n" + "="*50 + "\n")
    print(result["insights"])

    # Create visualizations
    viz_status = create_visualizations(orchestrator)
    print(f"\nVisualizations: {viz_status}")
    
    # Ask follow-up questions
    answer = orchestrator.answer_question("Which department has the highest attrition rate?")
    print("\nFollow-up Question Answer:")
    print(answer)

else:
    print(f"Analysis failed: {result['message']}")


2025-04-24 18:03:33,704 - __main__ - INFO - Processing file: C:\Users\DurgaPrasannaKompell\OneDrive - SplashBI\Desktop\ibm data\HR Data.csv
2025-04-24 18:03:33,829 - __main__ - INFO - File loaded as csv
  df_cleaned[date_col] = pd.to_datetime(df_cleaned[date_col])
2025-04-24 18:03:33,968 - __main__ - INFO - Data preprocessing complete
2025-04-24 18:03:33,998 - __main__ - INFO - KPI extraction complete: 5 KPIs found
  df[date_col] = pd.to_datetime(df[date_col])
2025-04-24 18:03:34,548 - __main__ - INFO - Trend detection complete
2025-04-24 18:03:56,766 - httpx - INFO - HTTP Request: POST https://api.perplexity.ai/chat/completions "HTTP/1.1 200 OK"
2025-04-24 18:03:56,809 - __main__ - INFO - Anomaly detection complete


The analyze_with_agents function processes a dataset and performs analysis using the AgentOrchestrator class, which orchestrates the workflow of multiple agents (e.g., data source, preprocessing, KPI extraction). The function displays the generated insights directly in a Jupyter notebook using IPython's display capabilities, allowing for follow-up questions based on the analysis.

In [None]:
# Instead of running Streamlit directly:
def analyze_with_agents(file_path, query):
    """Function to use in Jupyter notebook"""
    orchestrator = AgentOrchestrator()
    result = orchestrator.process_file(file_path, query)
    
    # Display results using IPython display instead of Streamlit
    from IPython.display import display, Markdown
    display(Markdown(f"## Analysis Results\n\n{result['insights']}"))
    
    return orchestrator  # Return for follow-up questions

# Use this in a notebook cell:
# orchestrator = analyze_with_agents("your_file.csv", "Analyze this data")


### Integrating with Streamlit

This Streamlit app lets users upload data files, request insights, and view AI-generated results, including visualizations and follow-up Q&A. Simply upload a file, enter a query, and the app handles everything.

In [None]:
import streamlit as st
import os

def create_streamlit_app():
    st.set_page_config(page_title="Agentic AI Insights", layout="wide")
    
    st.title("🤖 Agentic AI Insight System")
    st.write("Upload your data, ask questions, and get AI-powered insights.")
    
    # Initialize the orchestrator
    if "orchestrator" not in st.session_state:
        api_key = os.getenv("PERPLEXITY_API_KEY")
        st.session_state.orchestrator = AgentOrchestrator(api_key=api_key)
    
    # File upload
    uploaded_file = st.file_uploader("Upload your data file", 
                                    type=['csv', 'xlsx', 'jpg', 'jpeg', 'png', 'sql'])
    user_query = st.text_area("What insights are you looking for?", 
                             "Analyze this data and provide business insights.")
    
    if uploaded_file and st.button("Process", type="primary"):
        with st.spinner("Processing your data..."):
            # Save uploaded file temporarily
            temp_path = f"temp_{uploaded_file.name}"
            with open(temp_path, "wb") as f:
                f.write(uploaded_file.getbuffer())
            
            # Process the file
            result = st.session_state.orchestrator.process_file(temp_path, user_query)
            
            # Display results
            if result["success"]:
                st.success("Analysis complete!")
                
                # Create tabs for different outputs
                tab1, tab2, tab3 = st.tabs(["📊 Insights", "📈 Visualizations", "🔍 Q&A"])
                
                with tab1:
                    st.markdown(result["insights"])
                
                with tab2:
                    st.write("### Data Visualizations")
                    viz_status = create_visualizations(st.session_state.orchestrator)
                    
                    # Display visualizations if created successfully
                    try:
                        col1, col2 = st.columns(2)
                        with col1:
                            st.image("numeric_distributions.png")
                        with col2:
                            st.image("categorical_distributions.png")
                        
                        st.image("correlations.png")
                    except:
                        st.write(viz_status)
                
                with tab3:
                    st.write("### Ask Follow-up Questions")
                    follow_up = st.text_input("What else would you like to know?")
                    
                    if follow_up and st.button("Ask", type="secondary"):
                        with st.spinner("Thinking..."):
                            answer = st.session_state.orchestrator.answer_question(follow_up)
                            st.markdown(answer)
            else:
                st.error(f"Analysis failed: {result['message']}")
            
            # Clean up temporary file
            os.remove(temp_path)

if __name__ == "__main__":
    create_streamlit_app()
