Comprehensive multi-dimensional analysis of 725 AI system proposals from 145 Fortune 500 companies.
This pipeline extracts, classifies, and analyzes AI system proposals across 19 dimensions spanning business use cases, technical architecture, and implementation complexity. The goal is to understand patterns in how AI systems are designed and identify distinct "shapes" of systems that require different implementation approaches.
✅ Comprehensive Analysis: 19 dimensions covering business, architecture, and implementation complexity ✅ Scalable Processing: Batch processing with automatic retry on API errors (exponential backoff) ✅ Interactive Dashboard: Real-time exploration with 8 visualization types ✅ Modular Pipeline: Skip completed phases, resume from checkpoints ✅ Production Ready: Retry logic, error handling, comprehensive logging ✅ Export Capabilities: JSON, CSV, and PNG export for all visualizations
Get the dashboard running in under 2 minutes using pre-computed data:
cd /your/workspace/ # Choose your workspace directory
# Clone the visualization/analysis code (this repo)
git clone https://github.com/DistylAI/button-data-proposal-visualizer.git
# Clone the data repository (sibling directory)
git clone https://github.com/DistylAI/button-data.git
Directory structure:
/your/workspace/
├── button-data-proposal-visualizer/ # Code, dashboard, outputs
└── button-data/ # Raw company proposals (146 companies)
cd button-data-proposal-visualizer
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
python serve_dashboard.py
The dashboard will automatically open at http://localhost:8000/dashboard.html
That's it! You can now explore all 725 proposals across 19 dimensions using pre-computed data. No API key needed for viewing.
Interactive dashboard showing business use case distribution across 725 AI system proposals
Multi-dimensional sunburst visualization exploring architecture patterns and human oversight levels
The dashboard provides an intuitive interface for exploring all 19 dimensions of AI system analysis:
Features:
- 8 visualization types: Bar charts, Pie charts, Treemap, Sunburst, Heatmap, Scatter plots, Sankey diagrams
- Dynamic dimension selection: Choose from 19 dimensions across business, architecture, and implementation
- Multi-dimensional analysis: Combine dimensions to discover patterns (e.g., Architecture Pattern × Human Oversight)
- Export capabilities: Download charts as PNG or data as CSV
- Maximized viewing area: Compact top bar design for full-screen visualizations
- Intelligent error handling: Clear feedback when dimension combinations are incompatible
Usage:
python serve_dashboard.py
Open http://localhost:8000/dashboard.html in your browser.
If you want to regenerate the analysis from scratch (e.g., to modify classification logic or add dimensions):
# Option 1: Environment variable (recommended)
export ANTHROPIC_API_KEY="your-api-key-here"
# Option 2: Create .env file
echo "ANTHROPIC_API_KEY=your-api-key-here" > .env
python analyze.py --validate
This checks that:
- API key is set
- Data directory exists (
../button-data/companies/
) - Company proposals are accessible
# Full analysis (all 725 proposals, ~30-45 minutes)
python analyze.py
# Test with sample first (recommended)
python analyze.py --sample 50
# Resume from checkpoint (skip completed phases)
python analyze.py --skip-extract --skip-business
--sample N
- Analyze only N proposals (for testing)--validate
- Validate environment and exit--skip-extract
- Skip proposal extraction (use existing data)--skip-business
- Skip business clustering (use existing classifications)--skip-architecture
- Skip architecture classification--skip-implementation
- Skip implementation complexity classification
python visualize.py
Creates standalone HTML visualizations in visualizations/
directory.
Extracts proposals from companies/*/self-refinement/refined_proposals.json
files (with fallback to proposals/proposals.json
), capturing:
- Company name
- Proposal name
- Current state
- Problems addressed
- Impact
- Functionality description
Uses LLM to:
- Discover business use case clusters from sample
- Classify all proposals into discovered clusters
Output: 18-20 business use case clusters
Classifies proposals across 7 architecture dimensions (see below)
Classifies proposals across 12 complexity dimensions (see below)
Generates aggregate statistics and summaries
Description: The primary business function or use case the system addresses
Discovered Clusters (18):
- Customer Service Automation
- Regulatory Compliance Automation
- Document Generation & Translation
- Invoice & Financial Reconciliation
- Claims Adjudication & Settlement
- Technical Support & Troubleshooting
- Prior Authorization & Appeals
- Warranty & Returns Processing
- Trade & Customs Documentation
- Underwriting & Risk Assessment
- Safety & Incident Reporting
- Supplier & Vendor Compliance
- Product Policy Enforcement
- Policy & Contract Analysis
- Fraud Detection & Prevention
- HR & Benefits Administration
- Knowledge Base Q&A
- Travel & Booking Assistance
Description: The overall structural pattern of the AI system
Possible Values:
Basic RAG
- Simple retrieval-augmented generationAgentic RAG
- RAG with autonomous query planning and multi-step retrievalReAct Agent
- Reasoning and Acting agent (iterates: thought → action → observation)Tool-Using Agent
- Agent that calls external tools/APIs/databasesPlanning Agent
- Decomposes goals into tasks and executes plansMulti-Agent System
- Multiple specialized agents working togetherSequential Pipeline
- Fixed sequence of processing stepsSingle-Shot Inference
- Single LLM call without retrieval or toolsWorkflow Orchestration
- Complex orchestration with conditional branching
Impact on Button Pipeline:
- Rerepresentation: Different patterns need different diagram types (linear flow vs DAG vs state machine)
- Construction: Affects code structure (single prompt vs multi-agent coordination)
Description: How the system approaches problem-solving and decision-making
Possible Values:
Chain-of-Thought (CoT)
- Explicit step-by-step reasoningFew-Shot
- Relies on examples in promptZero-Shot
- No examples, direct task executionReflection/Self-Critique
- Reviews and corrects own outputsPlanning/Decomposition
- Breaks down complex tasks into subtasksEnsemble/Multi-Path
- Generates multiple solutions and selects bestDirect/None
- No explicit reasoning pattern
Impact on Button Pipeline:
- Construction: Determines prompt structure and whether intermediate reasoning steps are needed
- Execution: Affects latency and token consumption
Description: How the system executes its workflow
Possible Values:
Single-Shot
- One execution, no loopsSequential Chain
- Fixed sequence of stepsIterative Loop
- Repeats until condition metParallel
- Multiple paths execute simultaneouslyConditional Branching
- Different paths based on conditionsHuman-in-Loop
- Requires human input during executionEvent-Driven
- Triggered by external events
Impact on Button Pipeline:
- Rerepresentation: Sequential vs branching vs cyclic flows require different graph structures
- Construction: Affects control flow implementation
- Execution: Impacts retry logic and error handling
Description: How the system stores and accesses knowledge
Possible Values:
Vector Embeddings
- Dense vector representations for similarity searchKnowledge Graph
- Structured entities and relationshipsStructured Database
- Traditional SQL/NoSQL databasesDocument Store
- Unstructured documents (PDFs, text files)Hybrid Vector+Graph
- Combines vectors and knowledge graphsHybrid Vector+DB
- Combines vectors and structured dataPolicy Rules
- Explicit rules and policiesAPI/External
- Real-time external data sourcesSensor/Telemetry
- IoT, equipment sensors
Impact on Button Pipeline:
- Ingestion: Determines data preparation strategies
- Construction: Affects retrieval and lookup implementations
Description: Types of input data the system processes (can be multiple)
Possible Values:
Text Only
Text + Images
Text + Audio
Text + Video
Multimodal (Text + Images + Audio)
Structured Data
- Forms, tables, databasesSensor/Telemetry
- IoT, equipment sensors
Impact on Button Pipeline:
- Ingestion: Multimodal inputs require preprocessing pipelines
- Construction: Affects model selection and input formatting
Description: The system's level of integration with external tools/APIs
Possible Values:
No Tools
- Pure LLM, no external interactionsRead-Only APIs
- Reads data from external systemsWrite/Action APIs
- Can modify external systemsMulti-System Integration
- Integrates with 3+ systemsWorkflow Automation
- Triggers complex workflows
Impact on Button Pipeline:
- Construction: Determines API client generation and error handling needs
- Execution: Affects testing complexity (need mocks/stubs)
Description: The degree of human involvement in the system
Possible Values:
Fully Autonomous
- No human requiredHuman Approval Gate
- Requires approval before actionHuman Escalation
- Escalates edge cases to humansHuman Monitoring
- Humans monitor but don't interveneCo-Pilot
- Human and AI work together in real-time
Impact on Button Pipeline:
- Construction: Affects UI/notification requirements
- Execution: Impacts evaluation (can't fully automate testing)
- Iteration: Human-in-loop systems iterate slower
Description: Complexity of input data sources and formats
Possible Values:
Single Source, Structured
- Simple database queries, single APIMultiple Sources, Structured
- Multiple APIs/DBs that need joiningMultimodal, Simple
- Text + images OR text + audioMultimodal, Complex
- Text + images + audio + video + sensor dataStreaming/Real-time
- Continuous data flow requiring stream processingSparse/Incomplete
- Missing data, requires imputation/handling
Impact on Button Pipeline:
- Ingestion: Determines data generation complexity for synthetic examples
- Rerepresentation: Complex data may need entity-relationship diagrams
- Construction: Affects preprocessing and validation code
Description: Number and type of external system integrations
Possible Values:
No External Integration
- Self-contained systemRead-Only (1-3 systems)
- Simple API reads from few systemsRead-Only (4+ systems)
- Complex data aggregation from many systemsWrite/Action (1-3 systems)
- Limited side effects on few systemsWrite/Action (4+ systems)
- Orchestrating changes across many systemsBidirectional with Compensation
- Need rollback/saga patterns
Impact on Button Pipeline:
- Construction: Higher complexity requires sophisticated error handling and state management
- Execution: More integrations = more failure modes to test
- Iteration: Systems with 4+ write integrations are harder to iterate on safely
Description: Number and sophistication of prompts required
Possible Values:
Single Static Prompt
- One template, no variationFew Static Prompts (2-5)
- Simple sequential or branchingMany Static Prompts (6+)
- Complex orchestration of many promptsDynamic Prompt Assembly
- Context-dependent prompt generationAdaptive/Self-Modifying
- Prompts that evolve based on feedbackMeta-Prompted
- LLM generates its own prompts
Impact on Button Pipeline:
- Construction: Determines prompt template organization and versioning strategy
- Iteration: Dynamic/adaptive systems are 3-5x harder to debug
- Execution: Affects prompt testing requirements
Description: Length and structure of processing chains
Possible Values:
Single-Shot
- One LLM call, doneSequential (2-3 steps)
- Simple pipelineSequential (4-7 steps)
- Medium pipelineSequential (8+ steps)
- Deep pipeline with accumulating errorsBranching (2-5 paths)
- Decision tree structureBranching (6+ paths)
- Complex decision treeCyclic/Iterative
- Loops with exit conditionsDAG (Directed Acyclic Graph)
- Complex dependencies
Impact on Button Pipeline:
- Rerepresentation: Determines graph structure (linear vs tree vs DAG vs state machine)
- Construction: Affects orchestration code complexity
- Execution: Deep chains have compounding failure rates
Description: Complexity of output data structures
Possible Values:
Unstructured Text
- Free-form responseSimple Structured (flat JSON)
- Basic key-value pairsNested Structured (2-3 levels)
- Moderate nestingDeep Structured (4+ levels)
- Complex nested objectsGraph/Relational
- Entities with relationshipsHybrid (Structured + Unstructured)
- Mixed outputsStreaming/Progressive
- Partial results over time
Impact on Button Pipeline:
- Construction: Affects schema validation code and parsing error handling
- Execution: Deep structures have 40-60% higher parsing failure rates
- Iteration: Complex schemas are harder to refine based on feedback
Description: How the system manages state across executions
Possible Values:
Stateless
- No context retention neededSession State (Short-term)
- Within single conversation/sessionUser State (Long-term)
- Across sessions, per userComplex State Machine
- Explicit state transitionsDistributed State
- State across multiple systemsEvent Sourcing
- Full history replay capability
Impact on Button Pipeline:
- Rerepresentation: State machines need state transition diagrams
- Construction: Determines database/cache requirements
- Execution: Stateful systems harder to test (need state setup/teardown)
Description: Level of error handling and reliability needed
Possible Values:
Best Effort
- Failures acceptable, log and continueRetry with Backoff
- Transient failures recoverableGraceful Degradation
- Fallback to simpler behaviorCompensation/Rollback
- Must undo partial changesMission Critical
- Cannot fail, need redundancyHuman Escalation Required
- Complex errors need human judgment
Impact on Button Pipeline:
- Construction: Mission-critical systems need 5-10x more error handling code
- Execution: Determines testing rigor and monitoring requirements
- Iteration: More critical systems require longer validation cycles
Description: How system performance is measured
Possible Values:
Ground Truth Available (Exact Match)
- Clear right/wrong answersGround Truth Available (Similarity)
- Semantic matching neededProxy Metrics
- Indirect quality measuresHuman Evaluation Required (Simple)
- Binary thumbs up/downHuman Evaluation Required (Complex)
- Expert domain judgmentMulti-Dimensional Scoring
- Multiple quality aspectsDelayed/Indirect Feedback
- Success known days/weeks later
Impact on Button Pipeline:
- Execution: Determines eval harness complexity
- Iteration: Expert human eval = 10-50x slower iteration
- Ingestion: Evaluation type affects synthetic data generation strategy
Description: Level of domain knowledge required
Possible Values:
General Knowledge
- Common sense reasoningProfessional Knowledge
- Standard industry practicesSpecialist Knowledge
- Deep domain expertise (medical, legal)Expert Knowledge with Complex Rules
- Requires rare expertise + complex logicCutting-Edge Research
- Frontier knowledge
Impact on Button Pipeline:
- Ingestion: Specialist+ domains need SME involvement in data generation
- Construction: Affects prompt engineering difficulty
- Iteration: Specialist+ systems need 3-5x more iteration cycles
Description: Speed requirements for system response
Possible Values:
Batch/Async (hours-days)
- No time pressureNear Real-time (minutes)
- Background processingInteractive (<5 seconds)
- User waiting but tolerantReal-time (<1 second)
- User expects instant responseSub-second (<200ms)
- Part of larger real-time flowBurst Handling Required
- Variable load, need scaling
Impact on Button Pipeline:
- Construction: Affects model selection (fast vs capable) and caching strategy
- Execution: Sub-second requirements eliminate many architectural patterns
- Iteration: Real-time systems need performance regression testing
Description: Compliance and auditability needs
Possible Values:
No Special Requirements
- General useBasic Audit Trail
- Who did what whenFull Auditability
- Complete decision provenanceExplainability Required
- Must justify every decisionPII/Sensitive Data
- Privacy regulations applyHighly Regulated (HIPAA/SOX/etc.)
- Strict complianceSafety-Critical
- Human safety implications
Impact on Button Pipeline:
- Construction: Determines logging infrastructure and explainability techniques
- Execution: Highly regulated systems need 2-4x more validation/testing
- Iteration: Compliance constraints slow iteration
Description: How the system should be visually/structurally represented for understanding
Possible Values:
None/Text Description
- Simple enough to describe in textLinear Flow Diagram
- Sequential stepsDecision Tree
- Branching logicState Machine
- Explicit states and transitionsDAG (Directed Acyclic Graph)
- Complex dependenciesEntity-Relationship Diagram
- Data modelProcess + Data Model (Combined)
- Both flow and entitiesNetwork/Graph Structure
- Complex relationships
Impact on Button Pipeline:
- Rerepresentation: Directly determines what diagram type to generate
- Construction: Different representations make different aspects easier to implement
All outputs are saved to outputs/
directory:
raw_proposals.json/csv
- Extracted proposalsproposals_with_business.json/csv
- With business classificationsproposals_complete.json/csv
- With architecture classificationsproposals_with_implementation.json/csv
- With all 19 dimensions (FINAL OUTPUT)
business_clusters_summary.json/csv
- Business use case statisticsarchitecture_summary.json
- Architecture dimension statisticsimplementation_summary.json
- Implementation complexity statisticsanalysis_summary.json
- Overall summary
visualizations/dashboard.html
- Overview dashboardvisualizations/treemap.html
- Hierarchical business use casesvisualizations/sunburst.html
- Multi-level breakdownvisualizations/network.html
- Co-occurrence patternsvisualizations/heatmap.html
- Companies × use casesvisualizations/architecture_breakdown.html
- Architecture statistics
This project consists of two repositories:
button-data-proposal-visualizer/
├── analyze.py # Main analysis pipeline (5 phases)
├── utils.py # Core utilities (LLM calls, validation, file I/O)
├── visualize.py # Static visualization generation
├── dashboard.html # Interactive dashboard (main interface)
├── serve_dashboard.py # Local HTTP server for dashboard
├── README.md # This file
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore rules
├── prompts/ # Jinja2 prompt templates (6 templates)
│ ├── business_clustering_discovery.j2
│ ├── business_clustering_classify.j2
│ ├── architecture_classify.j2
│ └── implementation_classify.j2
├── outputs/ # All analysis outputs (pre-computed)
│ ├── raw_proposals.json/csv
│ ├── proposals_with_business.json/csv
│ ├── proposals_complete.json/csv
│ ├── proposals_with_implementation.json/csv (FINAL - 725 proposals)
│ ├── business_clusters_summary.json/csv
│ ├── architecture_summary.json
│ ├── implementation_summary.json
│ └── analysis_summary.json
└── visualizations/ # Static HTML visualizations (5 files)
├── architecture_breakdown.html
├── heatmap.html
├── network.html
├── sunburst.html
└── treemap.html
button-data/
└── companies/ # Source data (146 companies)
├── 3m/
│ └── proposals/
│ └── proposals.json
├── abbvie/
│ └── proposals/
│ └── proposals.json
└── ... (144 more companies)
# Required for data recomputation
export ANTHROPIC_API_KEY="your-api-key-here"
# Optional: Override data directory location
export BUTTON_DATA_PATH="/custom/path/to/button-data"
Default behavior: Looks for data in ../button-data/companies/
(sibling directory)
Edit utils.py
to change:
- Model:
MODEL = "claude-sonnet-4-5-20250929"
- Max tokens:
MAX_TOKENS = 8192
- Batch sizes: Adjust in
analyze.py
phase functions
From analyzing 725 proposals across 145 companies:
- Top 3: Customer Service Automation (127, 17.5%), Regulatory Compliance (80, 11.0%), Document Generation (66, 9.1%)
- 47.3% Tool-Using Agent
- 19.2% Sequential Pipeline
- 11.5% Agentic RAG
- 55.7% Chain-of-Thought
- 30.1% Planning/Decomposition
- 60.8% Human Escalation
- 20.1% Fully Autonomous
- Create prompt template in
prompts/your_dimension_classify.j2
- Add phase function in
analyze.py
- Update main() to call new phase
- Add
--skip-your-dimension
flag - Update this README
Edit prompt templates in prompts/
directory to:
- Change classification criteria
- Add/remove possible values
- Adjust context length for proposals
Problem: extract_proposals_from_companies()
can't find company proposals
Solutions:
-
Ensure both repos are cloned as siblings:
ls .. # Should show both button-data-proposal-visualizer and button-data
-
Clone the data repo if missing:
cd /your/workspace git clone https://github.com/DistylAI/button-data.git
-
Override data path if in different location:
export BUTTON_DATA_PATH="/path/to/button-data"
Problem: API key not set (only needed for recomputing data, not for viewing dashboard)
Solutions:
-
Set environment variable:
export ANTHROPIC_API_KEY="your-key-here"
-
Create
.env
file:echo "ANTHROPIC_API_KEY=your-key-here" > .env
-
Validate setup:
python analyze.py --validate
- Automatic Retry: The system automatically retries on 500 errors with exponential backoff (1s, 2s, 4s)
- Rate Limits: Reduce batch sizes in
analyze.py
if hitting rate limits - Timeouts: Use
--skip-*
flags to resume from checkpoints
- Check prompt templates for JSON format requirements
- Increase
max_tokens
inutils.py
if responses are truncated - Review failed batches in console output
- Run
python serve_dashboard.py
instead of openingdashboard.html
directly - Check that
outputs/proposals_with_implementation.json
exists - If file is missing, data hasn't been computed yet (it should be in the repo)
This analysis pipeline is part of the Button project - a system for automatically generating full AI system implementations from proposals.
Internal research project - Anthropic