# 📊 Excel Analysis Report

**File:** `Business Accounting.xlsx`


## 🛡️ Security Analysis

**Risk Level:** 🟡 MEDIUM

### Detected Threats:
- **DANGEROUS_FORMULA**: Potentially dangerous function HYPERLINK detected


## 📋 Structural Analysis

**Total Sheets:** 10
**Total Cells with Data:** 6,959
**Named Ranges:** 0
**Complexity Score:** 32/100

### Sheet Details:

| Sheet Name | Hidden | Rows | Columns | Formulas |
|------------|--------|------|---------|----------|
| Yiriden Transactions 2025 | False | 42 | 9 | 2 |
| Yiriden Transactions 2023 | True | 1,021 | 11 | 815 |
| Yiriden 2023 Loans | True | 122 | 14 | 0 |
| Sanoun Transactions 2024 | True | 255 | 28 | 2 |
| Sanoun Transactions 2025 | True | 264 | 11 | 2 |
| 2024 Shea butter shipping | True | 1,001 | 5 | 13 |
| Yiriden mileages | True | 2 | 3 | 0 |
| Truck Revenue Projections | True | 39 | 9 | 39 |
| Yiriden 2022 | True | 1,000 | 26 | 3 |
| Real Estate - Horton Rd | False | 1,001 | 26 | 1 |


## 🔗 Formula Analysis

**Max Dependency Depth:** 0 levels
**Formula Complexity Score:** 1837.01/100
**Circular References:** ✅ No
**Volatile Formulas:** 0
**External References:** 0


In [1]:
import pandas as pd
from pathlib import Path

# Load the Excel file (path relative to repo root)
excel_path = Path("test_assets/collection/business-accounting/Business Accounting.xlsx")
print(f"Loading data from: {excel_path}")

try:
    # Try to read the first sheet
    df = pd.read_excel(excel_path)
    print(f"\nLoaded data with shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")

    # Display first few rows
    print("\nFirst 5 rows:")
    display(df.head())

    # Basic info
    print("\nData types:")
    print(df.dtypes)

except Exception as e:
    print(f"Error loading Excel file: {e}")
    df = None


Loading data from: test_assets/collection/business-accounting/Business Accounting.xlsx

Loaded data with shape: (10, 9)
Columns: ['Date', 'Description', 'USD Amount', 'Transaction type', 'Category', 'TOTAL REVENUES', 0, 'TOTAL EXPENSES', -2031.6299999999999]

First 5 rows:


Unnamed: 0,Date,Description,USD Amount,Transaction type,Category,TOTAL REVENUES,0,TOTAL EXPENSES,-2031.63
0,2025-01-03,Mali Apartment work - Cement,-43.51,EXP,,,,,
1,2025-01-15,Rebtel International Phone charges,-9.49,EXP,,,,,
2,2025-01-21,Mali Apartment work - Cement,-8.57,EXP,,,,,
3,2025-01-22,Mali Apartment Work - Tile Install (Kitchen),-48.78,EXP,,,,,
4,2025-01-23,Aluminum fabricator - Issa Maiga,-424.34,EXP,,,,,



Data types:
 Date                datetime64[ns]
 Description                 object
 USD Amount                 float64
 Transaction type            object
 Category                   float64
 TOTAL REVENUES             float64
 0                          float64
 TOTAL EXPENSES             float64
-2031.63                    float64
dtype: object


In [2]:
# Query interface for formula dependency analysis
# The pipeline has already analyzed all formulas and cached the results

import pickle
from pathlib import Path
from spreadsheet_analyzer.graph_db.query_interface import create_enhanced_query_interface

# Load cached formula analysis
cache_file = Path(r".pipeline_cache/Business Accounting_formula_analysis.pkl")
with open(cache_file, 'rb') as f:
    formula_analysis = pickle.load(f)

query_interface = create_enhanced_query_interface(formula_analysis)

# Convenience functions for graph queries
def get_cell_dependencies(sheet, cell_ref):
    """Get complete dependency information for a specific cell."""
    result = query_interface.get_cell_dependencies(sheet, cell_ref)
    print(f"\nCell {sheet}!{cell_ref}:")
    print(f"  Has formula: {result.has_formula}")
    if result.formula:
        print(f"  Formula: {result.formula}")
    if result.direct_dependencies:
        print(f"  Direct dependencies: {', '.join(result.direct_dependencies[:5])}")
        if len(result.direct_dependencies) > 5:
            print(f"    ...and {len(result.direct_dependencies) - 5} more")
    if result.direct_dependents:
        print(f"  Cells that depend on this: {', '.join(result.direct_dependents[:5])}")
        if len(result.direct_dependents) > 5:
            print(f"    ...and {len(result.direct_dependents) - 5} more")
    return result

def find_cells_affecting_range(sheet, start_cell, end_cell):
    """Find all cells that affect any cell within the specified range."""
    result = query_interface.find_cells_affecting_range(sheet, start_cell, end_cell)
    print(f"\nCells affecting range {sheet}!{start_cell}:{end_cell}:")
    for cell, deps in list(result.items())[:5]:
        print(f"  {cell} depends on: {', '.join(deps[:3])}")
        if len(deps) > 3:
            print(f"    ...and {len(deps) - 3} more")
    if len(result) > 5:
        print(f"  ...and {len(result) - 5} more cells")
    return result

def get_formula_statistics():
    """Get comprehensive statistics about formulas in the workbook."""
    stats = query_interface.get_formula_statistics_with_ranges()
    print("\nFormula Statistics:")
    print(f"  Total formulas: {stats['total_formulas']:,}")
    print(f"  Formulas with dependencies: {stats['formulas_with_dependencies']:,}")
    print(f"  Unique cells referenced: {stats['unique_cells_referenced']:,}")
    print(f"  Max dependency depth: {stats['max_dependency_depth']} levels")
    print(f"  Circular references: {stats['circular_reference_chains']}")
    print(f"  Formula complexity score: {stats['complexity_score']}/100")
    return stats

def find_empty_cells_in_formula_ranges(sheet):
    """Find empty cells that are part of formula ranges."""
    result = query_interface.find_empty_cells_in_formula_ranges(sheet)
    print(f"\nEmpty cells in formula ranges for sheet '{sheet}':")
    if result:
        print(f"  Found {len(result)} empty cells")
        # Group by rows for display
        rows = {}
        for cell in list(result)[:20]:
            row_num = ''.join(filter(str.isdigit, cell))
            if row_num not in rows:
                rows[row_num] = []
            rows[row_num].append(cell)
        for row, cells in list(rows.items())[:5]:
            print(f"  Row {row}: {', '.join(cells)}")
        if len(result) > 20:
            print(f"  ...and {len(result) - 20} more")
    else:
        print("  No empty cells found in formula ranges")
    return result


## 🔍 Formula Dependency Query Tools

The deterministic pipeline has analyzed all formulas and created a dependency graph. You can query this graph using the following tools:

### Available Tools:

1. **get_cell_dependencies** - Analyze what a cell depends on and what depends on it
   - Parameters: `sheet` (e.g., "Summary"), `cell_ref` (e.g., "D2")

2. **find_cells_affecting_range** - Find all cells that affect a specific range
   - Parameters: `sheet`, `start_cell`, `end_cell`

3. **find_empty_cells_in_formula_ranges** - Find gaps in data that formulas reference
   - Parameters: `sheet`

4. **get_formula_statistics** - Get overall statistics about formulas
   - No parameters needed

5. **find_circular_references** - Find all circular reference chains
   - No parameters needed

### Usage:
These tools are available through the tool-calling interface. Each query will be documented in a markdown cell showing both the query and its results.
