AI System Proposal Analysis Pipeline

Comprehensive multi-dimensional analysis of 725 AI system proposals from 145 Fortune 500 companies.

Overview

This pipeline extracts, classifies, and analyzes AI system proposals across 19 dimensions spanning business use cases, technical architecture, and implementation complexity. The goal is to understand patterns in how AI systems are designed and identify distinct "shapes" of systems that require different implementation approaches.

Key Features

✅ Comprehensive Analysis: 19 dimensions covering business, architecture, and implementation complexity ✅ Scalable Processing: Batch processing with automatic retry on API errors (exponential backoff) ✅ Interactive Dashboard: Real-time exploration with 8 visualization types ✅ Modular Pipeline: Skip completed phases, resume from checkpoints ✅ Production Ready: Retry logic, error handling, comprehensive logging ✅ Export Capabilities: JSON, CSV, and PNG export for all visualizations

Quick Start

Get the dashboard running in under 2 minutes using pre-computed data:

1. Clone Both Repositories

cd /your/workspace/  # Choose your workspace directory

# Clone the visualization/analysis code (this repo)
git clone https://github.com/DistylAI/button-data-proposal-visualizer.git

# Clone the data repository (sibling directory)
git clone https://github.com/DistylAI/button-data.git

Directory structure:

/your/workspace/
├── button-data-proposal-visualizer/  # Code, dashboard, outputs
└── button-data/                      # Raw company proposals (146 companies)

2. Install Dependencies

cd button-data-proposal-visualizer

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3. Open the Dashboard

python serve_dashboard.py

The dashboard will automatically open at http://localhost:8000/dashboard.html

That's it! You can now explore all 725 proposals across 19 dimensions using pre-computed data. No API key needed for viewing.

Dashboard Screenshots

Interactive dashboard showing business use case distribution across 725 AI system proposals

Multi-dimensional sunburst visualization exploring architecture patterns and human oversight levels

Interactive Dashboard

The dashboard provides an intuitive interface for exploring all 19 dimensions of AI system analysis:

Features:

8 visualization types: Bar charts, Pie charts, Treemap, Sunburst, Heatmap, Scatter plots, Sankey diagrams
Dynamic dimension selection: Choose from 19 dimensions across business, architecture, and implementation
Multi-dimensional analysis: Combine dimensions to discover patterns (e.g., Architecture Pattern × Human Oversight)
Export capabilities: Download charts as PNG or data as CSV
Maximized viewing area: Compact top bar design for full-screen visualizations
Intelligent error handling: Clear feedback when dimension combinations are incompatible

Usage:

python serve_dashboard.py

Open http://localhost:8000/dashboard.html in your browser.

Recomputing Data

If you want to regenerate the analysis from scratch (e.g., to modify classification logic or add dimensions):

1. Set Up API Key

# Option 1: Environment variable (recommended)
export ANTHROPIC_API_KEY="your-api-key-here"

# Option 2: Create .env file
echo "ANTHROPIC_API_KEY=your-api-key-here" > .env

2. Validate Environment

python analyze.py --validate

This checks that:

API key is set
Data directory exists (../button-data/companies/)
Company proposals are accessible

3. Run Analysis

# Full analysis (all 725 proposals, ~30-45 minutes)
python analyze.py

# Test with sample first (recommended)
python analyze.py --sample 50

# Resume from checkpoint (skip completed phases)
python analyze.py --skip-extract --skip-business

4. Command-Line Options

--sample N - Analyze only N proposals (for testing)
--validate - Validate environment and exit
--skip-extract - Skip proposal extraction (use existing data)
--skip-business - Skip business clustering (use existing classifications)
--skip-architecture - Skip architecture classification
--skip-implementation - Skip implementation complexity classification

5. Generate Static Visualizations (Optional)

python visualize.py

Creates standalone HTML visualizations in visualizations/ directory.

Analysis Pipeline

Phase 1: Extraction

Extracts proposals from companies/*/self-refinement/refined_proposals.json files (with fallback to proposals/proposals.json), capturing:

Company name
Proposal name
Current state
Problems addressed
Impact
Functionality description

Phase 2: Business Use Case Clustering

Uses LLM to:

Discover business use case clusters from sample
Classify all proposals into discovered clusters

Output: 18-20 business use case clusters

Phase 3: Technical Architecture Classification

Classifies proposals across 7 architecture dimensions (see below)

Phase 4: Implementation Complexity Classification

Classifies proposals across 12 complexity dimensions (see below)

Phase 5: Summary Generation

Generates aggregate statistics and summaries

Classification Dimensions

📊 BUSINESS USE CASE (1 dimension)

Business Use Case Type

Description: The primary business function or use case the system addresses

Discovered Clusters (18):

Customer Service Automation
Regulatory Compliance Automation
Document Generation & Translation
Invoice & Financial Reconciliation
Claims Adjudication & Settlement
Technical Support & Troubleshooting
Prior Authorization & Appeals
Warranty & Returns Processing
Trade & Customs Documentation
Underwriting & Risk Assessment
Safety & Incident Reporting
Supplier & Vendor Compliance
Product Policy Enforcement
Policy & Contract Analysis
Fraud Detection & Prevention
HR & Benefits Administration
Knowledge Base Q&A
Travel & Booking Assistance

🏗️ TECHNICAL ARCHITECTURE (7 dimensions)

1. System Architecture Pattern

Description: The overall structural pattern of the AI system

Possible Values:

Basic RAG - Simple retrieval-augmented generation
Agentic RAG - RAG with autonomous query planning and multi-step retrieval
ReAct Agent - Reasoning and Acting agent (iterates: thought → action → observation)
Tool-Using Agent - Agent that calls external tools/APIs/databases
Planning Agent - Decomposes goals into tasks and executes plans
Multi-Agent System - Multiple specialized agents working together
Sequential Pipeline - Fixed sequence of processing steps
Single-Shot Inference - Single LLM call without retrieval or tools
Workflow Orchestration - Complex orchestration with conditional branching

Impact on Button Pipeline:

Rerepresentation: Different patterns need different diagram types (linear flow vs DAG vs state machine)
Construction: Affects code structure (single prompt vs multi-agent coordination)

2. Reasoning Pattern

Description: How the system approaches problem-solving and decision-making

Possible Values:

Chain-of-Thought (CoT) - Explicit step-by-step reasoning
Few-Shot - Relies on examples in prompt
Zero-Shot - No examples, direct task execution
Reflection/Self-Critique - Reviews and corrects own outputs
Planning/Decomposition - Breaks down complex tasks into subtasks
Ensemble/Multi-Path - Generates multiple solutions and selects best
Direct/None - No explicit reasoning pattern

Impact on Button Pipeline:

Construction: Determines prompt structure and whether intermediate reasoning steps are needed
Execution: Affects latency and token consumption

3. Execution Pattern

Description: How the system executes its workflow

Possible Values:

Single-Shot - One execution, no loops
Sequential Chain - Fixed sequence of steps
Iterative Loop - Repeats until condition met
Parallel - Multiple paths execute simultaneously
Conditional Branching - Different paths based on conditions
Human-in-Loop - Requires human input during execution
Event-Driven - Triggered by external events

Impact on Button Pipeline:

Rerepresentation: Sequential vs branching vs cyclic flows require different graph structures
Construction: Affects control flow implementation
Execution: Impacts retry logic and error handling

4. Knowledge Representation

Description: How the system stores and accesses knowledge

Possible Values:

Vector Embeddings - Dense vector representations for similarity search
Knowledge Graph - Structured entities and relationships
Structured Database - Traditional SQL/NoSQL databases
Document Store - Unstructured documents (PDFs, text files)
Hybrid Vector+Graph - Combines vectors and knowledge graphs
Hybrid Vector+DB - Combines vectors and structured data
Policy Rules - Explicit rules and policies
API/External - Real-time external data sources
Sensor/Telemetry - IoT, equipment sensors

Impact on Button Pipeline:

Ingestion: Determines data preparation strategies
Construction: Affects retrieval and lookup implementations

5. Input Modalities

Description: Types of input data the system processes (can be multiple)

Possible Values:

Text Only
Text + Images
Text + Audio
Text + Video
Multimodal (Text + Images + Audio)
Structured Data - Forms, tables, databases
Sensor/Telemetry - IoT, equipment sensors

Impact on Button Pipeline:

Ingestion: Multimodal inputs require preprocessing pipelines
Construction: Affects model selection and input formatting

6. Tool Integration Level

Description: The system's level of integration with external tools/APIs

Possible Values:

No Tools - Pure LLM, no external interactions
Read-Only APIs - Reads data from external systems
Write/Action APIs - Can modify external systems
Multi-System Integration - Integrates with 3+ systems
Workflow Automation - Triggers complex workflows

Impact on Button Pipeline:

Construction: Determines API client generation and error handling needs
Execution: Affects testing complexity (need mocks/stubs)

7. Human Oversight Level

Description: The degree of human involvement in the system

Possible Values:

Fully Autonomous - No human required
Human Approval Gate - Requires approval before action
Human Escalation - Escalates edge cases to humans
Human Monitoring - Humans monitor but don't intervene
Co-Pilot - Human and AI work together in real-time

Impact on Button Pipeline:

Construction: Affects UI/notification requirements
Execution: Impacts evaluation (can't fully automate testing)
Iteration: Human-in-loop systems iterate slower

🔧 IMPLEMENTATION COMPLEXITY (12 dimensions)

8. Data Complexity

Description: Complexity of input data sources and formats

Possible Values:

Single Source, Structured - Simple database queries, single API
Multiple Sources, Structured - Multiple APIs/DBs that need joining
Multimodal, Simple - Text + images OR text + audio
Multimodal, Complex - Text + images + audio + video + sensor data
Streaming/Real-time - Continuous data flow requiring stream processing
Sparse/Incomplete - Missing data, requires imputation/handling

Impact on Button Pipeline:

Ingestion: Determines data generation complexity for synthetic examples
Rerepresentation: Complex data may need entity-relationship diagrams
Construction: Affects preprocessing and validation code

9. Integration Complexity

Description: Number and type of external system integrations

Possible Values:

No External Integration - Self-contained system
Read-Only (1-3 systems) - Simple API reads from few systems
Read-Only (4+ systems) - Complex data aggregation from many systems
Write/Action (1-3 systems) - Limited side effects on few systems
Write/Action (4+ systems) - Orchestrating changes across many systems
Bidirectional with Compensation - Need rollback/saga patterns

Impact on Button Pipeline:

Construction: Higher complexity requires sophisticated error handling and state management
Execution: More integrations = more failure modes to test
Iteration: Systems with 4+ write integrations are harder to iterate on safely

10. Prompt Complexity

Description: Number and sophistication of prompts required

Possible Values:

Single Static Prompt - One template, no variation
Few Static Prompts (2-5) - Simple sequential or branching
Many Static Prompts (6+) - Complex orchestration of many prompts
Dynamic Prompt Assembly - Context-dependent prompt generation
Adaptive/Self-Modifying - Prompts that evolve based on feedback
Meta-Prompted - LLM generates its own prompts

Impact on Button Pipeline:

Construction: Determines prompt template organization and versioning strategy
Iteration: Dynamic/adaptive systems are 3-5x harder to debug
Execution: Affects prompt testing requirements

11. Chain Depth

Description: Length and structure of processing chains

Possible Values:

Single-Shot - One LLM call, done
Sequential (2-3 steps) - Simple pipeline
Sequential (4-7 steps) - Medium pipeline
Sequential (8+ steps) - Deep pipeline with accumulating errors
Branching (2-5 paths) - Decision tree structure
Branching (6+ paths) - Complex decision tree
Cyclic/Iterative - Loops with exit conditions
DAG (Directed Acyclic Graph) - Complex dependencies

Impact on Button Pipeline:

Rerepresentation: Determines graph structure (linear vs tree vs DAG vs state machine)
Construction: Affects orchestration code complexity
Execution: Deep chains have compounding failure rates

12. Schema Complexity

Description: Complexity of output data structures

Possible Values:

Unstructured Text - Free-form response
Simple Structured (flat JSON) - Basic key-value pairs
Nested Structured (2-3 levels) - Moderate nesting
Deep Structured (4+ levels) - Complex nested objects
Graph/Relational - Entities with relationships
Hybrid (Structured + Unstructured) - Mixed outputs
Streaming/Progressive - Partial results over time

Impact on Button Pipeline:

Construction: Affects schema validation code and parsing error handling
Execution: Deep structures have 40-60% higher parsing failure rates
Iteration: Complex schemas are harder to refine based on feedback

13. State Management

Description: How the system manages state across executions

Possible Values:

Stateless - No context retention needed
Session State (Short-term) - Within single conversation/session
User State (Long-term) - Across sessions, per user
Complex State Machine - Explicit state transitions
Distributed State - State across multiple systems
Event Sourcing - Full history replay capability

Impact on Button Pipeline:

Rerepresentation: State machines need state transition diagrams
Construction: Determines database/cache requirements
Execution: Stateful systems harder to test (need state setup/teardown)

14. Error Handling Requirements

Description: Level of error handling and reliability needed

Possible Values:

Best Effort - Failures acceptable, log and continue
Retry with Backoff - Transient failures recoverable
Graceful Degradation - Fallback to simpler behavior
Compensation/Rollback - Must undo partial changes
Mission Critical - Cannot fail, need redundancy
Human Escalation Required - Complex errors need human judgment

Impact on Button Pipeline:

Construction: Mission-critical systems need 5-10x more error handling code
Execution: Determines testing rigor and monitoring requirements
Iteration: More critical systems require longer validation cycles

15. Evaluation Complexity

Description: How system performance is measured

Possible Values:

Ground Truth Available (Exact Match) - Clear right/wrong answers
Ground Truth Available (Similarity) - Semantic matching needed
Proxy Metrics - Indirect quality measures
Human Evaluation Required (Simple) - Binary thumbs up/down
Human Evaluation Required (Complex) - Expert domain judgment
Multi-Dimensional Scoring - Multiple quality aspects
Delayed/Indirect Feedback - Success known days/weeks later

Impact on Button Pipeline:

Execution: Determines eval harness complexity
Iteration: Expert human eval = 10-50x slower iteration
Ingestion: Evaluation type affects synthetic data generation strategy

16. Domain Expertise Depth

Description: Level of domain knowledge required

Possible Values:

General Knowledge - Common sense reasoning
Professional Knowledge - Standard industry practices
Specialist Knowledge - Deep domain expertise (medical, legal)
Expert Knowledge with Complex Rules - Requires rare expertise + complex logic
Cutting-Edge Research - Frontier knowledge

Impact on Button Pipeline:

Ingestion: Specialist+ domains need SME involvement in data generation
Construction: Affects prompt engineering difficulty
Iteration: Specialist+ systems need 3-5x more iteration cycles

17. Latency Requirements

Description: Speed requirements for system response

Possible Values:

Batch/Async (hours-days) - No time pressure
Near Real-time (minutes) - Background processing
Interactive (<5 seconds) - User waiting but tolerant
Real-time (<1 second) - User expects instant response
Sub-second (<200ms) - Part of larger real-time flow
Burst Handling Required - Variable load, need scaling

Impact on Button Pipeline:

Construction: Affects model selection (fast vs capable) and caching strategy
Execution: Sub-second requirements eliminate many architectural patterns
Iteration: Real-time systems need performance regression testing

18. Regulatory Requirements

Description: Compliance and auditability needs

Possible Values:

No Special Requirements - General use
Basic Audit Trail - Who did what when
Full Auditability - Complete decision provenance
Explainability Required - Must justify every decision
PII/Sensitive Data - Privacy regulations apply
Highly Regulated (HIPAA/SOX/etc.) - Strict compliance
Safety-Critical - Human safety implications

Impact on Button Pipeline:

Construction: Determines logging infrastructure and explainability techniques
Execution: Highly regulated systems need 2-4x more validation/testing
Iteration: Compliance constraints slow iteration

19. Rerepresentation Type

Description: How the system should be visually/structurally represented for understanding

Possible Values:

None/Text Description - Simple enough to describe in text
Linear Flow Diagram - Sequential steps
Decision Tree - Branching logic
State Machine - Explicit states and transitions
DAG (Directed Acyclic Graph) - Complex dependencies
Entity-Relationship Diagram - Data model
Process + Data Model (Combined) - Both flow and entities
Network/Graph Structure - Complex relationships

Impact on Button Pipeline:

Rerepresentation: Directly determines what diagram type to generate
Construction: Different representations make different aspects easier to implement

Output Files

All outputs are saved to outputs/ directory:

Raw Data

raw_proposals.json/csv - Extracted proposals
proposals_with_business.json/csv - With business classifications
proposals_complete.json/csv - With architecture classifications
proposals_with_implementation.json/csv - With all 19 dimensions (FINAL OUTPUT)

Summaries

business_clusters_summary.json/csv - Business use case statistics
architecture_summary.json - Architecture dimension statistics
implementation_summary.json - Implementation complexity statistics
analysis_summary.json - Overall summary

Visualizations

visualizations/dashboard.html - Overview dashboard
visualizations/treemap.html - Hierarchical business use cases
visualizations/sunburst.html - Multi-level breakdown
visualizations/network.html - Co-occurrence patterns
visualizations/heatmap.html - Companies × use cases
visualizations/architecture_breakdown.html - Architecture statistics

Project Structure

This project consists of two repositories:

button-data-proposal-visualizer/ (This Repo)

button-data-proposal-visualizer/
├── analyze.py              # Main analysis pipeline (5 phases)
├── utils.py                # Core utilities (LLM calls, validation, file I/O)
├── visualize.py            # Static visualization generation
├── dashboard.html          # Interactive dashboard (main interface)
├── serve_dashboard.py      # Local HTTP server for dashboard
├── README.md               # This file
├── requirements.txt        # Python dependencies
├── .gitignore              # Git ignore rules
├── prompts/                # Jinja2 prompt templates (6 templates)
│   ├── business_clustering_discovery.j2
│   ├── business_clustering_classify.j2
│   ├── architecture_classify.j2
│   └── implementation_classify.j2
├── outputs/                # All analysis outputs (pre-computed)
│   ├── raw_proposals.json/csv
│   ├── proposals_with_business.json/csv
│   ├── proposals_complete.json/csv
│   ├── proposals_with_implementation.json/csv (FINAL - 725 proposals)
│   ├── business_clusters_summary.json/csv
│   ├── architecture_summary.json
│   ├── implementation_summary.json
│   └── analysis_summary.json
└── visualizations/         # Static HTML visualizations (5 files)
    ├── architecture_breakdown.html
    ├── heatmap.html
    ├── network.html
    ├── sunburst.html
    └── treemap.html

button-data/ (Data Repo)

button-data/
└── companies/              # Source data (146 companies)
    ├── 3m/
    │   └── proposals/
    │       └── proposals.json
    ├── abbvie/
    │   └── proposals/
    │       └── proposals.json
    └── ... (144 more companies)

Configuration

Environment Variables

# Required for data recomputation
export ANTHROPIC_API_KEY="your-api-key-here"

# Optional: Override data directory location
export BUTTON_DATA_PATH="/custom/path/to/button-data"

Default behavior: Looks for data in ../button-data/companies/ (sibling directory)

Model Configuration

Edit utils.py to change:

Model: MODEL = "claude-sonnet-4-5-20250929"
Max tokens: MAX_TOKENS = 8192
Batch sizes: Adjust in analyze.py phase functions

Key Findings

From analyzing 725 proposals across 145 companies:

Business Use Cases

Top 3: Customer Service Automation (127, 17.5%), Regulatory Compliance (80, 11.0%), Document Generation (66, 9.1%)

Architecture Patterns

47.3% Tool-Using Agent
19.2% Sequential Pipeline
11.5% Agentic RAG

Reasoning

55.7% Chain-of-Thought
30.1% Planning/Decomposition

Human Oversight

60.8% Human Escalation
20.1% Fully Autonomous

Extending the Analysis

Adding New Dimensions

Create prompt template in prompts/your_dimension_classify.j2
Add phase function in analyze.py
Update main() to call new phase
Add --skip-your-dimension flag
Update this README

Customizing Classification

Edit prompt templates in prompts/ directory to:

Change classification criteria
Add/remove possible values
Adjust context length for proposals

Troubleshooting

"Data directory not found" Error

Problem: extract_proposals_from_companies() can't find company proposals

Solutions:

Ensure both repos are cloned as siblings:

ls ..  # Should show both button-data-proposal-visualizer and button-data

Clone the data repo if missing:

cd /your/workspace
git clone https://github.com/DistylAI/button-data.git

Override data path if in different location:

export BUTTON_DATA_PATH="/path/to/button-data"

"ANTHROPIC_API_KEY not found" Error

Problem: API key not set (only needed for recomputing data, not for viewing dashboard)

Solutions:

Set environment variable:

export ANTHROPIC_API_KEY="your-key-here"

Create .env file:

echo "ANTHROPIC_API_KEY=your-key-here" > .env

Validate setup:
```
python analyze.py --validate
```

API Errors During Analysis

Automatic Retry: The system automatically retries on 500 errors with exponential backoff (1s, 2s, 4s)
Rate Limits: Reduce batch sizes in analyze.py if hitting rate limits
Timeouts: Use --skip-* flags to resume from checkpoints

Parse Errors

Check prompt templates for JSON format requirements
Increase max_tokens in utils.py if responses are truncated
Review failed batches in console output

Dashboard Won't Load

Run python serve_dashboard.py instead of opening dashboard.html directly
Check that outputs/proposals_with_implementation.json exists
If file is missing, data hasn't been computed yet (it should be in the repo)

Contributing

This analysis pipeline is part of the Button project - a system for automatically generating full AI system implementations from proposals.

License

Internal research project - Anthropic

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.resources		.resources
outputs		outputs
prompts		prompts
visualizations		visualizations
.gitignore		.gitignore
AI_SYSTEM_ITERATION_CHALLENGES_REPORT.md		AI_SYSTEM_ITERATION_CHALLENGES_REPORT.md
README.md		README.md
analyze.py		analyze.py
dashboard.html		dashboard.html
requirements.txt		requirements.txt
serve_dashboard.py		serve_dashboard.py
utils.py		utils.py
visualize.py		visualize.py

DistylAI/button-data-proposal-visualizer

Folders and files

Latest commit

History

Repository files navigation

AI System Proposal Analysis Pipeline

Overview

Key Features

Quick Start

1. Clone Both Repositories

2. Install Dependencies

3. Open the Dashboard

Dashboard Screenshots

Interactive Dashboard

Recomputing Data

1. Set Up API Key

2. Validate Environment

3. Run Analysis

4. Command-Line Options

5. Generate Static Visualizations (Optional)

Analysis Pipeline

Phase 1: Extraction

Phase 2: Business Use Case Clustering

Phase 3: Technical Architecture Classification

Phase 4: Implementation Complexity Classification

Phase 5: Summary Generation

Classification Dimensions

📊 BUSINESS USE CASE (1 dimension)

Business Use Case Type

🏗️ TECHNICAL ARCHITECTURE (7 dimensions)

1. System Architecture Pattern

2. Reasoning Pattern

3. Execution Pattern

4. Knowledge Representation

5. Input Modalities

6. Tool Integration Level

7. Human Oversight Level

🔧 IMPLEMENTATION COMPLEXITY (12 dimensions)

8. Data Complexity

9. Integration Complexity

10. Prompt Complexity

11. Chain Depth

12. Schema Complexity

13. State Management

14. Error Handling Requirements

15. Evaluation Complexity

16. Domain Expertise Depth

17. Latency Requirements

18. Regulatory Requirements

19. Rerepresentation Type

Output Files

Raw Data

Summaries

Visualizations

Project Structure

button-data-proposal-visualizer/ (This Repo)

button-data/ (Data Repo)

Configuration

Environment Variables

Model Configuration

Key Findings

Business Use Cases

Architecture Patterns

Reasoning

Human Oversight

Extending the Analysis

Adding New Dimensions

Customizing Classification

Troubleshooting

"Data directory not found" Error

"ANTHROPIC_API_KEY not found" Error

API Errors During Analysis

Parse Errors

Dashboard Won't Load

Contributing

License

About

Resources

Uh oh!

Stars

Packages