# 📊 Excel Analysis Report

**File:** `simple_test.xlsx`  
**Analysis Time:** `2025-07-29 00:48:33`  
**Execution Time:** `0.01 seconds`  
**Status:** ✅ Success

---


## 🔒 File Integrity Analysis

**File Hash:** `0cc331c0a36af68d...`  
**File Size:** 0.01 MB  
**MIME Type:** application/vnd.openxmlformats-officedocument.spreadsheetml.sheet  
**Excel Format:** ✅ Valid Excel  
**Trust Level:** ⭐⭐⭐⭐ (4/5)  
**Processing Class:** STANDARD


## 🛡️ Security Analysis

**Risk Level:** 🟢 LOW  
**Safe to Process:** ✅ Yes  
**Threats Found:** 0


## 📋 Structural Analysis

**Total Sheets:** 2  
**Total Cells with Data:** 31  
**Total Formulas:** 9  
**Named Ranges:** 0  
**Complexity Score:** 39/100

### Sheet Details:
- **Summary**: 6 rows × 4 cols, 5 formulas
- **Details**: 4 rows × 3 cols, 4 formulas


## 🔗 Formula Analysis

**Total Formulas:** 9  
**Max Dependency Depth:** 2 levels  
**Formula Complexity Score:** 16.13/100  
**Circular References:** ⚠️ Yes  
**Volatile Formulas:** 0  
**External References:** 0

### ⚠️ Circular References Found:
1. `Details!C4` → `Details!B4`


## 📊 Content Analysis

**Data Quality Score:** 🔴 40/100  
**Patterns Found:** 0  
**Insights Generated:** 5

### Summary:
Analyzed 2 sheets found 0 data patterns generated 5 insights. Overall data quality is poor (40%).

### 💡 Key Insights:

**Incomplete data in Summary** 🔴
Found 4 columns with less than 50% data completeness
- **Recommendation:** Review and fill missing data or document why values are missing

**Incomplete data in Details** 🔴
Found 3 columns with less than 50% data completeness
- **Recommendation:** Review and fill missing data or document why values are missing

**Low overall data quality** 🔴
Average data quality score is 40%
- **Recommendation:** Review data entry processes and validation rules

**Poor data quality in Summary** 🔴
Data quality score is only 47%
- **Recommendation:** Focus on improving data consistency and completeness

**Poor data quality in Details** 🔴
Data quality score is only 33%
- **Recommendation:** Focus on improving data consistency and completeness


## 📌 Analysis Summary

### ✅ Completed Analysis Stages:
- ✓ File Integrity Check
- ✓ Security Scan
- ✓ Structural Analysis
- ✓ Formula Analysis
- ✓ Content Intelligence

---

*This analysis was generated using the deterministic pipeline. Additional interactive analysis can be performed using the cells below.*

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Load the Excel file
excel_path = Path(r"/Users/cheickberthe/PycharmProjects/spreadsheet-analyzer/test_assets/generated/simple_test.xlsx")
print(f"Loading data from: {excel_path}")

try:
    # Try to read the first sheet
    df = pd.read_excel(excel_path)
    print(f"\nLoaded data with shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    
    # Display first few rows
    print("\nFirst 5 rows:")
    display(df.head())
    
    # Basic info
    print("\nData types:")
    print(df.dtypes)
    
except Exception as e:
    print(f"Error loading Excel file: {e}")
    df = None


Loading data from: /Users/cheickberthe/PycharmProjects/spreadsheet-analyzer/test_assets/generated/simple_test.xlsx

Loaded data with shape: (5, 4)
Columns: ['Product', 'Quantity', 'Price', 'Total']

First 5 rows:


Unnamed: 0,Product,Quantity,Price,Total
0,Widget A,10.0,15.99,
1,Widget B,5.0,22.5,
2,Widget C,7.0,18.75,
3,Widget D,3.0,45.0,
4,Total,,,



Data types:
Product      object
Quantity    float64
Price       float64
Total       float64
dtype: object


## 🔍 Formula Dependency Query Interface

The deterministic pipeline has created a formula dependency graph. You can explore it using the `query_interface` object:

### Available Query Methods:

1. **Get Cell Dependencies**
   ```python
   result = query_interface.get_cell_dependencies("Summary", "D2")
   ```

2. **Find Cells Affecting a Range**
   ```python
   affecting = query_interface.find_cells_affecting_range("Summary", "A1", "D5")
   ```

3. **Find Empty Cells in Formula Ranges**
   ```python
   empty_cells = query_interface.find_empty_cells_in_formula_ranges("Summary")
   ```

4. **Get Formula Statistics**
   ```python
   stats = query_interface.get_formula_statistics_with_ranges()
   ```

### Example Usage:


In [2]:
# Formula dependency graph query interface is available
from spreadsheet_analyzer.graph_db.query_interface import create_enhanced_query_interface

# The query interface was created from the pipeline analysis
# Note: This is a placeholder - in production, we would pass the actual pipeline result
query_interface = None  # Will be set by the notebook session

# Example: Analyze dependencies for a specific cell
def analyze_cell_dependencies(sheet_name, cell_ref):
    """Analyze all dependencies for a specific cell."""
    if not query_interface:
        print("Query interface not available. Run the deterministic pipeline first.")
        return
    
    result = query_interface.get_cell_dependencies(sheet_name, cell_ref)
    
    print(f"\n📊 Analysis for {sheet_name}!{cell_ref}:")
    print(f"Has formula: {result.has_formula}")
    if result.formula:
        print(f"Formula: {result.formula}")
    print(f"\nDirect dependencies: {len(result.direct_dependencies)}")
    for dep in result.direct_dependencies:
        print(f"  - {dep}")
    print(f"\nRange dependencies: {len(result.range_dependencies)}")
    for dep in result.range_dependencies:
        print(f"  - {dep}")
    print(f"\nCells that depend on this: {len(result.direct_dependents)}")
    for dep in result.direct_dependents:
        print(f"  - {dep}")
    
    return result

# Example: Find all empty cells that are part of formula ranges
def find_range_gaps(sheet_name):
    """Find empty cells that are included in formula ranges."""
    if not query_interface:
        print("Query interface not available. Run the deterministic pipeline first.")
        return
    
    empty_cells = query_interface.find_empty_cells_in_formula_ranges(sheet_name)
    print(f"\n🔍 Empty cells in formula ranges for {sheet_name}:")
    print(f"Found {len(empty_cells)} empty cells that are part of formula ranges:")
    for cell in empty_cells[:10]:  # Show first 10
        print(f"  - {cell}")
    if len(empty_cells) > 10:
        print(f"  ... and {len(empty_cells) - 10} more")
    
    return empty_cells

print("✅ Graph query interface tools loaded!")
print("\nAvailable functions:")
print("  - analyze_cell_dependencies(sheet_name, cell_ref)")
print("  - find_range_gaps(sheet_name)")
print("\nNote: The actual query_interface will be populated when integrated with the LLM tools.")


✅ Graph query interface tools loaded!

Available functions:
  - analyze_cell_dependencies(sheet_name, cell_ref)
  - find_range_gaps(sheet_name)

Note: The actual query_interface will be populated when integrated with the LLM tools.
