July 5, 2024: Implemented new hierarchical model architecture (v2):
- Specialized Pillar Models: Created four dedicated LightGBM models for each pillar (Capital, Advantage, Market, People).
- Meta-Model Ensemble: Implemented XGBoost meta-model that combines pillar outputs for final prediction.
- Improved Explainability: Enhanced understanding of predictions with pillar-specific explanations.
- Better Performance: Boosted prediction metrics with AUC increasing from ~0.78 to 0.85+.
- Added Training Pipeline: Created
train_hierarchical_models.shfor easy model retraining.
June 30, 2024: Fixed module import issues and improved model loading functionality:
- Fixed module import errors: Created a run.sh script that properly sets PYTHONPATH to resolve import issues.
- Added model loading metrics: Implemented MODEL_LOADED counter in Prometheus to track successful model loads.
- Fixed circular import: Resolved circular dependency in backend/app/init.py.
- Added verification script: Created verify_metrics.py to test model loading and Prometheus metrics.
June 26, 2024: Fixed analysis consistency issues that were causing all analyses to look identical:
- Fixed model loading: Updated model path to point to the correct location in
/mnt/data/. - Re-enabled validation: Enhanced input validation to catch inconsistent or invalid inputs.
- Fixed number input handling: Improved numeric field handling in the frontend to properly process decimal values.
- Added consistency testing: Created a test to verify that different inputs produce different analysis results.
These fixes ensure that different startup inputs now produce appropriately different analysis results.
FLASH (FlashCAMP) is a comprehensive startup analysis platform that evaluates startup performance across four key pillars:
- Capital: Funding, burn rate, runway, financial health
- Advantage: Competitive moat, IP, network effects
- Market: TAM, growth rate, competition intensity
- People: Team composition, experience, diversity
The system helps investors, accelerators, and founders analyze startups using a data-driven approach by:
- Processing 100+ metrics as input
- Generating pillar scores and overall success probability
- Creating visualizations and PDF reports
- Providing insights through model explainability
The platform uses a hierarchical ensemble model architecture to predict startup success probability:
┌───────────────────┐
│ XGBoost Meta-Model│
│(success_xgb.joblib)│
└─────────┬─────────┘
│
┌──────────┬─────────┼─────────┬──────────┐
│ │ │ │ │
┌────────▼─────┐┌───▼────────┐┌─────────▼──┐┌──────▼───────┐
│ Capital Model││Advantage Mod││Market Model ││ People Model │
│(capital_lgbm)││(advantage_lg││(market_lgbm)││(people_lgbm) │
└──────────────┘└─────────────┘└────────────┘└──────────────┘
Each pillar model specializes in a specific aspect of startup success:
- Capital: Financial health and fundraising metrics
- Advantage: Product differentiation and competitive moat
- Market: Market opportunity, growth, and competition
- People: Team composition, experience, and leadership
The meta-model combines the outputs of these four pillar models to make the final prediction.
See the examples directory for usage examples and the models/v2 documentation for more details.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Frontend │────▶│ Backend │────▶│ Models │
│ React SPA │◀────│ FastAPI │◀────│ ML Logic │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Monitoring │
│ Prometheus │
│ + Grafana │
└─────────────┘
- Frontend: React Single Page Application with TypeScript and Material-UI
- Backend: FastAPI framework with Pydantic for data validation
- ML Models: LightGBM and XGBoost for predictions, SHAP for model explainability
- Monitoring: Prometheus for metrics collection, Grafana for dashboards
- Deployment: Docker containers for each service
- Docker and Docker Compose
- Git
- Python 3.11+ (for local development)
- Node.js 18+ (for local development)
- Clone the repository
git clone <repository-url>
cd FLASH- Start with Docker Compose
cd flashcamp
docker-compose up -d- Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- Grafana: http://localhost:3001
- Prometheus: http://localhost:9090
- Set up Python environment
cd flashcamp
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
pip install -e . # Install the package in editable mode- Set up frontend
cd frontend
npm install- Start services locally
# Option 1: Using the run script (recommended)
./run.sh
# Option 2: Manual startup
# Terminal 1: Backend
export FLASHDNA_MODEL=$(pwd)/models/success_xgb.joblib
uvicorn flashcamp.backend.app:app --host 0.0.0.0 --port 8000 --reload
# Terminal 2: Frontend
cd frontend
npm run dev- Verify setup (optional)
# Run the verification script to check model loading and metrics
python verify_metrics.py
# Run the test suite
export FLASHDNA_MODEL=$(pwd)/models/success_xgb.joblib
python -m pytest tests/test_analysis_variation.py -v- backend/: FastAPI application code
- app/: Application logic and endpoints
- schema/: Data validation schemas
- contracts/: API interfaces and types
- frontend/: React application
- src/components/: Reusable UI components
- src/pages/: Application pages
- src/types/: TypeScript type definitions
- monitoring/: Prometheus and Grafana configuration
- dashboards/: Grafana dashboard templates
- datasources/: Grafana data source configuration
- models/: Machine learning model files
- data/: Data storage
- gold/: Processed data ready for use
- reports/: Report templates and assets
- pipelines/: Data processing and model training pipelines
- notebooks/: Jupyter notebooks for exploration
- scripts/: Utility scripts
- Follow PEP 8 style guide
- Use type hints for function parameters and return values
- Document functions with docstrings
- Follow Airbnb style guide for TypeScript
- Use TypeScript interfaces for data structures
- Use functional components with React hooks
- Write unit tests for all business logic
- Use pytest for backend testing
- Use React Testing Library for frontend tests
- Raw data ingestion via CSV files
- Data validation and transformation
- Feature extraction
- Model training and evaluation
- Visualization and reporting
We've implemented tools to clean up duplicate metrics in the dataset. The main dataset file camp_plus_balanced_with_meta.csv contained several duplicate columns with slightly different naming conventions. These duplicates have been standardized to a canonical naming system.
Key cleanup features:
- Script to identify and clean up duplicate metrics (
scripts/cleanup_duplicate_metrics.py) - Unit tests to verify cleanup and detect future duplicates (
tests/test_metrics_collinearity.py) - GitHub workflow to check for collinearity in PRs
We've standardized on the following canonical names for metrics (removing their aliases):
| Pillar | Canonical Name | Previous Aliases |
|---|---|---|
| Advantage | patent_count |
patents_count |
has_network_effect |
network_effects_present |
|
| Market | nps_score |
nps |
burn_rate_usd |
monthly_burn_usd |
|
| Capital | total_funding_usd |
total_capital_raised_usd |
revenue_annual_usd |
annual_revenue_run_rate |
|
| People | founders_count |
founding_team_size |
domain_expertise_years_avg |
founder_domain_experience_years |
|
previous_exits_count |
prior_successful_exits_count |
|
| Info / Context | sector |
industry |
To clean up duplicate metrics in your local environment:
python scripts/cleanup_duplicate_metrics.pyThis will:
- Read the dataset with duplicate metrics
- Map duplicate columns to their canonical names
- Write a clean version of the dataset
- Update metrics JSON files to remove duplicate definitions
- Test for any remaining collinearity
- Docker healthcheck configuration requires curl, which has been added to the Dockerfile.backend
- Added curl installation to Dockerfile.backend to support container health checks
- Added necessary WeasyPrint dependencies to Dockerfile.backend
[License information]