# 06: Anomaly Explanation with Gemini üß†

In this notebook, we bring our data to life. We use the **AnomalyExplainerAgent** (Day 5) to read the statistical outliers detected in Day 4 and generate human-readable business insights.

### üéØ Goals
1.  **Load Anomalies:** Read the `anomaly_payload.json` generated by the Statistical Agent.
2.  **Initialize Agent:** Set up the `AnomalyExplainerAgent` with Gemini (Flash-Lite).
3.  **Generate Insights:** Run the agent on the top 5 highest-severity anomalies.
4.  **Review:** Compare the raw Z-Score data with the LLM's narrative.
5.  **Save Results:** Export the fully enriched dataset for the final dashboard.

### üèóÔ∏è Components Used
* `agents.anomaly_llm_agent.AnomalyExplainerAgent`: The Gemini-powered explainer.
* `outputs/anomalies/anomaly_payload.json`: The source data.

In [1]:
import sys
import os
import json
import pandas as pd
from dotenv import load_dotenv

# Add project root
project_root = os.path.abspath(os.path.join(os.path.dirname("__file__"), ".."))
if project_root not in sys.path:
    sys.path.append(project_root)

# Load API Key
load_dotenv()

from agents.anomaly_llm_agent import AnomalyExplainerAgent

print("‚úÖ Anomaly Explainer Agent Loaded")

  from google.cloud.aiplatform.utils import gcs_utils


‚úÖ Anomaly Explainer Agent Loaded


## 2) Load Data

In [2]:
INPUT_FILE = "../outputs/anomalies/anomaly_payload.json"

if not os.path.exists(INPUT_FILE):
    raise FileNotFoundError(f"Missing {INPUT_FILE}. Please run Notebook 05 first.")

with open(INPUT_FILE, "r") as f:
    data = json.load(f)

anomalies = data.get("top_anomalies", [])
print(f"Loaded {len(anomalies)} top anomalies from payload.")

# Preview the top 1
print("Top Anomaly ID:", anomalies[0]["anomaly_id"])
print("Score:", anomalies[0]["score"])

Loaded 50 top anomalies from payload.
Top Anomaly ID: iqr_Technology_2014-03-18_s53
Score: 53.08


## 3) Run Explanations

In [3]:
# Initialize Agent
agent = AnomalyExplainerAgent(model_name="gemini-2.5-flash-lite")
# Note: Using the latest Flash-Lite for speed and quality

# Pick top 3 for demo (to save time/quota)
target_anomalies = anomalies[:3]

# Run Batch Explanation
enriched_anomalies = agent.batch_explain(target_anomalies)

print("‚úÖ Explanations Generated.")

2025-11-22 10:15:39,268 - google_genai.models - INFO - AFC is enabled with max remote calls: 10.
2025-11-22 10:15:40,703 - httpx - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent "HTTP/1.1 200 OK"
2025-11-22 10:15:41,722 - google_genai.models - INFO - AFC is enabled with max remote calls: 10.
2025-11-22 10:15:42,920 - httpx - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent "HTTP/1.1 200 OK"
2025-11-22 10:15:43,934 - google_genai.models - INFO - AFC is enabled with max remote calls: 10.
2025-11-22 10:15:45,577 - httpx - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent "HTTP/1.1 200 OK"


‚úÖ Explanations Generated.


## 4) Display Results

In [4]:
# Pretty Print the Structured Output
from IPython.display import display, Markdown

for rec in enriched_anomalies:
    # Handle errors gracefully
    if "error" in rec:
        display(
            Markdown(
                f"### ‚ùå Error for {rec.get('entity_id', 'Unknown')}: {rec['error']}"
            )
        )
        continue

    actions_bullet = "\n".join([f"    * {a}" for a in rec.get("suggested_actions", [])])
    meta = rec.get("meta", {})

    # Fallback title logic
    title_id = rec.get("entity_id") or rec.get("anomaly_id")

    md = f"""
    ### üö® Anomaly: {title_id} ({rec.get('level')})
    **Severity:** {rec.get('score')} | **Confidence:** {rec.get('confidence')}
    
    **üìù Short Explanation:**
    {rec.get('explanation_short')}
    
    **üìÑ Full Analysis:**
    > {rec.get('explanation_full')}
    
    **‚úÖ Suggested Actions:**
    {actions_bullet}
    
    *Latency: {meta.get('latency_ms', 0)}ms | Ver: {meta.get('version', '1.0')}*
    ---
    """
    display(Markdown(md))


    ### üö® Anomaly: Technology (category)
    **Severity:** 53.08 | **Confidence:** High

    **üìù Short Explanation:**
    Sales for Technology significantly exceeded expectations and are an extreme outlier based on the IQR.

    **üìÑ Full Analysis:**
    > The observed sales value of 24,739.75 is drastically higher than the expected value of 585.69. This actual value is more than 54 times the upper quartile (Q3) of 585.69, indicating an extreme positive anomaly.

    **‚úÖ Suggested Actions:**
        * Investigate the source of the exceptionally high sales.
    * Verify data integrity for this period.
    * Analyze contributing factors to the sales surge.

    *Latency: 1603ms | Ver: 1.0*
    ---
    


    ### üö® Anomaly: Office Supplies (category)
    **Severity:** 29.98 | **Confidence:** High

    **üìù Short Explanation:**
    Office Supplies sales are significantly higher than expected, indicating a major anomaly.

    **üìÑ Full Analysis:**
    > The actual sales for Office Supplies of 6,929.22 are drastically higher than the expected value of 260.77. This represents a deviation of nearly 30x the expected value (Score: 29.98), far exceeding the interquartile range, suggesting this is not a typical fluctuation.

    **‚úÖ Suggested Actions:**
        * Investigate the source of the elevated sales for Office Supplies.
    * Review recent promotions or sales activities related to Office Supplies.
    * Verify data accuracy for Office Supplies sales to rule out input errors.

    *Latency: 1206ms | Ver: 1.0*
    ---
    


    ### üö® Anomaly: Office Supplies (category)
    **Severity:** 22.21 | **Confidence:** High

    **üìù Short Explanation:**
    Sales for Office Supplies are significantly higher than expected, with a score of 22.21 indicating a substantial anomaly.

    **üìÑ Full Analysis:**
    > The observed sales of 8,263.37 are vastly exceeding the expected value of 399.59. This anomaly is amplified by the statistical context, where the actual sales value far surpasses the third quartile (Q3) of 399.59, and the interquartile range (IQR) of 354.09 suggests this deviation is highly unusual.

    **‚úÖ Suggested Actions:**
        * Investigate data entry for Office Supplies sales to confirm accuracy.
    * Analyze contributing factors for the sudden spike in Office Supplies sales, such as promotions or specific customer orders.
    * Compare Office Supplies sales against historical data for similar periods to identify any recurring patterns or external influences.

    *Latency: 1648ms | Ver: 1.0*
    ---
    

## 5) Save Output

In [5]:
# Save these enriched records for the Reporting Agent (Day 6)
OUTPUT_FILE = "../outputs/anomalies/enriched_anomalies.json"

with open(OUTPUT_FILE, "w") as f:
    json.dump(enriched_anomalies, f, indent=2)

print(f"‚úÖ Enriched anomalies saved to {OUTPUT_FILE}")

‚úÖ Enriched anomalies saved to ../outputs/anomalies/enriched_anomalies.json


---
## ‚è≠Ô∏è Next Step: Closing the Loop (Action)

Success! We have successfully moved from **Data** ‚Üí **Insight**.
* **Day 4:** We detected *what* happened (Statistical Anomalies).
* **Day 5:** We explained *why* it happened (AI Analysis with Gemini).
* **Output:** We have a rich dataset in `outputs/anomalies/enriched_anomalies.json` containing scores, explanations, and suggested actions.

But a true SalesOps Agent isn't just an analyst‚Äîit is a **doer**. The final piece of the puzzle is taking these text-based suggestions and turning them into executable **actions**.

In the next notebook (Day 6), we will build the **Action Recommendation Agent**. This agent will:
1.  Read the enriched anomalies.
2.  Prioritize them based on severity and business impact.
3.  **Use Tools** (via a mock OpenAPI interface) to simulate real-world workflows, such as "Create Jira Ticket" or "Draft Email to Regional Manager".

üëâ **Proceed to `notebooks/07_action_recommendation.ipynb`.**