FLASH Project

Recent Updates

July 5, 2024: Implemented new hierarchical model architecture (v2):

Specialized Pillar Models: Created four dedicated LightGBM models for each pillar (Capital, Advantage, Market, People).
Meta-Model Ensemble: Implemented XGBoost meta-model that combines pillar outputs for final prediction.
Improved Explainability: Enhanced understanding of predictions with pillar-specific explanations.
Better Performance: Boosted prediction metrics with AUC increasing from ~0.78 to 0.85+.
Added Training Pipeline: Created train_hierarchical_models.sh for easy model retraining.

June 30, 2024: Fixed module import issues and improved model loading functionality:

Fixed module import errors: Created a run.sh script that properly sets PYTHONPATH to resolve import issues.
Added model loading metrics: Implemented MODEL_LOADED counter in Prometheus to track successful model loads.
Fixed circular import: Resolved circular dependency in backend/app/init.py.
Added verification script: Created verify_metrics.py to test model loading and Prometheus metrics.

June 26, 2024: Fixed analysis consistency issues that were causing all analyses to look identical:

Fixed model loading: Updated model path to point to the correct location in /mnt/data/.
Re-enabled validation: Enhanced input validation to catch inconsistent or invalid inputs.
Fixed number input handling: Improved numeric field handling in the frontend to properly process decimal values.
Added consistency testing: Created a test to verify that different inputs produce different analysis results.

These fixes ensure that different startup inputs now produce appropriately different analysis results.

Project Overview

FLASH (FlashCAMP) is a comprehensive startup analysis platform that evaluates startup performance across four key pillars:

Capital: Funding, burn rate, runway, financial health
Advantage: Competitive moat, IP, network effects
Market: TAM, growth rate, competition intensity
People: Team composition, experience, diversity

The system helps investors, accelerators, and founders analyze startups using a data-driven approach by:

Processing 100+ metrics as input
Generating pillar scores and overall success probability
Creating visualizations and PDF reports
Providing insights through model explainability

Hierarchical Model Architecture

The platform uses a hierarchical ensemble model architecture to predict startup success probability:

                       ┌───────────────────┐
                       │ XGBoost Meta-Model│
                       │(success_xgb.joblib)│
                       └─────────┬─────────┘
                                 │
            ┌──────────┬─────────┼─────────┬──────────┐
            │          │         │         │          │
   ┌────────▼─────┐┌───▼────────┐┌─────────▼──┐┌──────▼───────┐
   │ Capital Model││Advantage Mod││Market Model ││ People Model │
   │(capital_lgbm)││(advantage_lg││(market_lgbm)││(people_lgbm) │
   └──────────────┘└─────────────┘└────────────┘└──────────────┘

Each pillar model specializes in a specific aspect of startup success:

Capital: Financial health and fundraising metrics
Advantage: Product differentiation and competitive moat
Market: Market opportunity, growth, and competition
People: Team composition, experience, and leadership

The meta-model combines the outputs of these four pillar models to make the final prediction.

See the examples directory for usage examples and the models/v2 documentation for more details.

System Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Frontend   │────▶│   Backend   │────▶│   Models    │
│  React SPA  │◀────│  FastAPI    │◀────│  ML Logic   │
└─────────────┘     └─────────────┘     └─────────────┘
                          │
                          ▼
                    ┌─────────────┐
                    │ Monitoring  │
                    │ Prometheus  │
                    │  + Grafana  │
                    └─────────────┘

Components

Frontend: React Single Page Application with TypeScript and Material-UI
Backend: FastAPI framework with Pydantic for data validation
ML Models: LightGBM and XGBoost for predictions, SHAP for model explainability
Monitoring: Prometheus for metrics collection, Grafana for dashboards
Deployment: Docker containers for each service

Setup and Installation

Prerequisites

Docker and Docker Compose
Git
Python 3.11+ (for local development)
Node.js 18+ (for local development)

Getting Started

Clone the repository

git clone <repository-url>
cd FLASH

Start with Docker Compose

cd flashcamp
docker-compose up -d

Access the application

Local Development Setup

Set up Python environment

cd flashcamp
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
pip install -e .  # Install the package in editable mode

Set up frontend

cd frontend
npm install

Start services locally

# Option 1: Using the run script (recommended)
./run.sh

# Option 2: Manual startup
# Terminal 1: Backend
export FLASHDNA_MODEL=$(pwd)/models/success_xgb.joblib
uvicorn flashcamp.backend.app:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2: Frontend
cd frontend
npm run dev

Verify setup (optional)

# Run the verification script to check model loading and metrics
python verify_metrics.py

# Run the test suite
export FLASHDNA_MODEL=$(pwd)/models/success_xgb.joblib
python -m pytest tests/test_analysis_variation.py -v

Directory Structure

backend/: FastAPI application code
- app/: Application logic and endpoints
- schema/: Data validation schemas
- contracts/: API interfaces and types
frontend/: React application
- src/components/: Reusable UI components
- src/pages/: Application pages
- src/types/: TypeScript type definitions
monitoring/: Prometheus and Grafana configuration
- dashboards/: Grafana dashboard templates
- datasources/: Grafana data source configuration
models/: Machine learning model files
data/: Data storage
- gold/: Processed data ready for use
reports/: Report templates and assets
pipelines/: Data processing and model training pipelines
notebooks/: Jupyter notebooks for exploration
scripts/: Utility scripts

Development Guidelines

Backend Development

Follow PEP 8 style guide
Use type hints for function parameters and return values
Document functions with docstrings

Frontend Development

Follow Airbnb style guide for TypeScript
Use TypeScript interfaces for data structures
Use functional components with React hooks

Testing

Write unit tests for all business logic
Use pytest for backend testing
Use React Testing Library for frontend tests

Data Pipeline

Raw data ingestion via CSV files
Data validation and transformation
Feature extraction
Model training and evaluation
Visualization and reporting

Data Quality Tools

Metrics Cleanup

We've implemented tools to clean up duplicate metrics in the dataset. The main dataset file camp_plus_balanced_with_meta.csv contained several duplicate columns with slightly different naming conventions. These duplicates have been standardized to a canonical naming system.

Key cleanup features:

Script to identify and clean up duplicate metrics (scripts/cleanup_duplicate_metrics.py)
Unit tests to verify cleanup and detect future duplicates (tests/test_metrics_collinearity.py)
GitHub workflow to check for collinearity in PRs

Canonical Metric Names

We've standardized on the following canonical names for metrics (removing their aliases):

Pillar	Canonical Name	Previous Aliases
Advantage	`patent_count`	`patents_count`
	`has_network_effect`	`network_effects_present`
Market	`nps_score`	`nps`
	`burn_rate_usd`	`monthly_burn_usd`
Capital	`total_funding_usd`	`total_capital_raised_usd`
	`revenue_annual_usd`	`annual_revenue_run_rate`
People	`founders_count`	`founding_team_size`
	`domain_expertise_years_avg`	`founder_domain_experience_years`
	`previous_exits_count`	`prior_successful_exits_count`
Info / Context	`sector`	`industry`

Running the Cleanup Script

To clean up duplicate metrics in your local environment:

python scripts/cleanup_duplicate_metrics.py

This will:

Read the dataset with duplicate metrics
Map duplicate columns to their canonical names
Write a clean version of the dataset
Update metrics JSON files to remove duplicate definitions
Test for any remaining collinearity

Known Issues

Docker healthcheck configuration requires curl, which has been added to the Dockerfile.backend

Recent Changes

Added curl installation to Dockerfile.backend to support container health checks
Added necessary WeasyPrint dependencies to Dockerfile.backend

License

[License information]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.cursor/rules		.cursor/rules
docs		docs
examples		examples
flashcamp.egg-info		flashcamp.egg-info
flashcamp		flashcamp
frontend		frontend
logs		logs
models/v2		models/v2
reports/assets		reports/assets
scripts		scripts
tests		tests
tmp		tmp
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
DOCKER_README.md		DOCKER_README.md
Dockerfile		Dockerfile
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
IMPLEMENTATION_PLAN_BACKEND.md		IMPLEMENTATION_PLAN_BACKEND.md
README.md		README.md
TECHNICAL_DOCUMENTATION.md		TECHNICAL_DOCUMENTATION.md
demo.pdf		demo.pdf
docker-build.sh		docker-build.sh
docker-cleanup.sh		docker-cleanup.sh
docker-compose.yml		docker-compose.yml
env.example		env.example
flashcamp.db		flashcamp.db
pyproject.toml		pyproject.toml
pytest_summary.txt		pytest_summary.txt
run-local.sh		run-local.sh
runway_test.json		runway_test.json
stop-local.sh		stop-local.sh
test_data.json		test_data.json
test_report.pdf		test_report.pdf
tsconfig.json		tsconfig.json
vite.config.js		vite.config.js
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLASH Project

Recent Updates

Project Overview

Hierarchical Model Architecture

System Architecture

Components

Setup and Installation

Prerequisites

Getting Started

Local Development Setup

Directory Structure

Development Guidelines

Backend Development

Frontend Development

Testing

Data Pipeline

Data Quality Tools

Metrics Cleanup

Canonical Metric Names

Running the Cleanup Script

Known Issues

Recent Changes

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FLASH Project

Recent Updates

Project Overview

Hierarchical Model Architecture

System Architecture

Components

Setup and Installation

Prerequisites

Getting Started

Local Development Setup

Directory Structure

Development Guidelines

Backend Development

Frontend Development

Testing

Data Pipeline

Data Quality Tools

Metrics Cleanup

Canonical Metric Names

Running the Cleanup Script

Known Issues

Recent Changes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages