# AI-Driven Company Intelligence System
## SDS DATATHON 2026 - Category A
### Team Fournity - Complete Implementation Documentation

---

This notebook provides a comprehensive demonstration of our AI-driven company intelligence system, showcasing all implementation details, methodologies, and results.

## Table of Contents

1. [Project Overview](#1-project-overview)
2. [System Architecture](#2-system-architecture)
3. [Data Loading & Exploration](#3-data-loading--exploration)
4. [Data Preprocessing](#4-data-preprocessing)
5. [Feature Engineering](#5-feature-engineering)
6. [Clustering Analysis](#6-clustering-analysis)
7. [Statistical Analysis](#7-statistical-analysis)
8. [Machine Learning Models](#8-machine-learning-models)
9. [Visualization & Results](#9-visualization--results)
10. [Business Insights](#10-business-insights)
11. [Conclusions](#11-conclusions)

---
## 1. Project Overview

### Problem Statement
Organizations need to understand company landscapes to make informed decisions about partnerships, investments, market positioning, and competitive strategies. Our system automates the discovery of meaningful company segments and generates actionable insights from complex datasets.

### Solution
We developed a comprehensive machine learning system that:
- Automatically segments companies based on multiple attributes
- Identifies patterns and anomalies
- Generates predictive models
- Provides statistical validation
- Creates actionable business insights

### Key Features
1. **Intelligent Data Processing**: Automatic column detection, feature engineering, outlier handling
2. **Advanced Clustering**: 4-phase Latent-Sparse Clustering with multiple algorithms
3. **Machine Learning**: Logistic regression (cluster prediction) and linear regression (performance forecasting)
4. **Statistical Analysis**: Chi-square tests, ANOVA, VIF calculation
5. **Visualizations**: PCA plots, violin plots, heatmaps, interactive 3D visualizations
6. **LLM Integration**: Optional OpenAI/DeepSeek for natural language insights

---
## 2. System Architecture

### Module Structure

```
Fournity/
├── company_intelligence.py       # Main analysis engine (3100+ lines)
├── clustering_analysis.py        # 4-phase clustering (2700+ lines)
├── process_champions_data.py     # Data preprocessing
├── generate_report.py            # Report generation
├── visualization_improvements.py # Enhanced visualizations
└── project_documentation.ipynb   # This notebook
```

### Technology Stack
- **Data Processing**: pandas, numpy, openpyxl
- **Machine Learning**: scikit-learn, scikit-learn-extra
- **Clustering**: K-Means, K-Medoids, DBSCAN, HDBSCAN, GMM
- **Dimensionality Reduction**: PCA, t-SNE, UMAP, FAMD
- **Visualization**: matplotlib, seaborn, plotly
- **Statistical Analysis**: scipy, statsmodels
- **Explainability**: SHAP
- **LLM Integration**: OpenAI (optional)

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Import our custom modules
from company_intelligence import CompanyIntelligence
from clustering_analysis import LatentSparseClustering
import os
from dotenv import load_dotenv

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', None)

# Set visualization style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("✓ Libraries imported successfully")

---
## 3. Data Loading & Exploration

### Dataset: Champions Group Company Data
The dataset contains information about companies including:
- Financial metrics (revenue, market value)
- Workforce data (employees, locations)
- Technology infrastructure (IT spending, equipment)
- Industry classifications (SIC, NAICS, NACE)
- Company characteristics (entity type, ownership)

In [None]:
# Initialize the analyzer
data_path = 'champions_group_data.xlsx'
api_key = os.getenv('OPENAI_API_KEY') or os.getenv('DEEPSEEK_API_KEY')  # Optional

print("Initializing Company Intelligence Analyzer...")
print(f"Data file: {data_path}")
print(f"LLM integration: {'Enabled' if api_key else 'Disabled (using rule-based insights)'}")
print()

analyzer = CompanyIntelligence(data_path, api_key=api_key)

print(f"\n{'='*70}")
print("DATA LOADED SUCCESSFULLY")
print(f"{'='*70}")
print(f"Total companies: {len(analyzer.df)}")
print(f"Total features: {len(analyzer.df.columns)}")

In [None]:
# Explore the data
print("\n" + "="*70)
print("DATA EXPLORATION")
print("="*70)

df_explored = analyzer.explore_data()

# Display first few rows
print("\nFirst 5 rows of the dataset:")
display(analyzer.df.head())

In [None]:
# Check data types and missing values
print("\n" + "="*70)
print("DATA QUALITY ASSESSMENT")
print("="*70)

print("\nData Types:")
print(analyzer.df.dtypes)

print("\nMissing Values:")
missing = analyzer.df.isnull().sum()
missing_pct = (missing / len(analyzer.df)) * 100
missing_df = pd.DataFrame({
    'Missing Count': missing,
    'Missing %': missing_pct
})
display(missing_df[missing_df['Missing Count'] > 0].sort_values('Missing Count', ascending=False))

---
## 4. Data Preprocessing

### Preprocessing Pipeline
Our preprocessing pipeline includes:
1. **Column Detection**: Intelligent pattern matching to find relevant columns
2. **Feature Engineering**: Calculate derived metrics (company age, ratios, indices)
3. **Missing Value Imputation**: Median imputation for numeric features
4. **Outlier Handling**: IQR-based winsorization (caps outliers without removing data)
5. **Feature Scaling**: StandardScaler for normalization
6. **Text Processing**: TF-IDF vectorization for industry descriptions

### Key Business Indicators Calculated
1. Market Value/Revenue Ratio (Price-to-Sales)
2. IT Investment Intensity
3. Revenue per Employee
4. Market Value per Employee
5. IT Spend Ratio
6. Technology Density Index
7. Company Age
8. Growth Potential Index

In [None]:
# Preprocess the data
print("\n" + "="*70)
print("DATA PREPROCESSING")
print("="*70)

# Preprocess with business indicators
analyzer.preprocess_data(
    calculate_indicators=True,
    include_key_indicators_in_clustering=True
)

print("\n✓ Preprocessing complete")
print(f"Processed features: {len(analyzer.feature_names)}")
print(f"Companies after filtering: {len(analyzer.df_processed)}")

In [None]:
# Display processed features
print("\n" + "="*70)
print("PROCESSED FEATURES")
print("="*70)
print(f"\nTotal features used for clustering: {len(analyzer.feature_names)}")
print("\nFeature list:")
for i, feature in enumerate(analyzer.feature_names, 1):
    print(f"{i:3d}. {feature}")

In [None]:
# Display statistics of processed data
print("\n" + "="*70)
print("PROCESSED DATA STATISTICS")
print("="*70)

if analyzer.df_processed is not None:
    # Show statistics for first 10 features
    display(pd.DataFrame(analyzer.df_processed[:, :min(10, analyzer.df_processed.shape[1])],
                        columns=analyzer.feature_names[:min(10, len(analyzer.feature_names))]).describe())

---
## 5. Feature Engineering

### Derived Business Metrics
We calculate 10+ business indicators to enhance segmentation quality:

#### Financial Indicators
- **Market Value/Revenue Ratio**: Identifies growth vs. value companies
- **Revenue per Employee**: Measures productivity
- **Market Value per Employee**: Indicates market perception of workforce value

#### Technology Indicators
- **IT Investment Intensity**: Tech-forward vs. traditional segmentation
- **Technology Density Index**: IT infrastructure per employee
- **Technology Sophistication Index**: Composite IT maturity score

#### Operational Indicators
- **Single-Site Concentration**: Geographic strategy indicator
- **Employees per Site**: Operational scale indicator
- **Workforce Technology Ratio**: Knowledge economy indicator

#### Strategic Indicators
- **Company Age**: Maturity indicator
- **Growth Potential Index**: Composite growth score
- **Maturity Stage**: Startup/Growth/Mature/Established classification

In [None]:
# Calculate additional business indicators
print("\n" + "="*70)
print("BUSINESS INDICATOR CALCULATION")
print("="*70)

analyzer.calculate_business_indicators()

# Display indicator summary
indicator_cols = [
    'market_value_to_revenue_ratio',
    'it_investment_intensity',
    'single_site_concentration',
    'workforce_tech_ratio',
    'tech_sophistication_index',
    'growth_potential_index',
    'maturity_stage',
    'revenue_scale',
    'employee_scale'
]

available_indicators = [col for col in indicator_cols if col in analyzer.df.columns]
if available_indicators:
    print(f"\n✓ Calculated {len(available_indicators)} business indicators")
    
    # Show numeric indicators
    numeric_indicators = [col for col in available_indicators 
                         if analyzer.df[col].dtype in [np.float64, np.int64]]
    if numeric_indicators:
        print("\nNumeric Indicators Summary:")
        display(analyzer.df[numeric_indicators].describe())
    
    # Show categorical indicators
    categorical_indicators = ['maturity_stage', 'revenue_scale', 'employee_scale']
    for col in categorical_indicators:
        if col in analyzer.df.columns:
            print(f"\n{col.replace('_', ' ').title()} Distribution:")
            print(analyzer.df[col].value_counts())
else:
    print("\n⚠ No business indicators calculated")

---
## 6. Clustering Analysis

### 4-Phase Latent-Sparse Clustering Workflow

#### Phase 1: Context & Meta-Data Analysis
- Feature profiling (numerical vs. categorical)
- Sparsity checks (missing value analysis)
- Metric mapping (distance metric selection)

#### Phase 2: Filtering & Encoding Engine
- Redundancy pruning (removes highly correlated features)
- Automated scaling (RobustScaler vs. StandardScaler)
- Dimensionality reduction (FAMD/PCA)

#### Phase 3: Iterative Clustering Loop
- Hyperparameter optimization
- Multi-algorithm testing (K-Means, K-Medoids, DBSCAN, HDBSCAN)
- Validation (Silhouette, Davies-Bouldin, Calinski-Harabasz)

#### Phase 4: Interpretability & Insights
- Feature importance (Random Forest, SHAP)
- Cluster profiling
- Persona generation

In [None]:
# Determine optimal number of clusters
print("\n" + "="*70)
print("OPTIMAL CLUSTER DETERMINATION")
print("="*70)

optimal_k = analyzer.determine_optimal_clusters(
    max_k=10,
    practical_threshold=0.70
)

print(f"\n✓ Optimal number of clusters: {optimal_k}")
print(f"  Based on multi-metric validation (Silhouette, Davies-Bouldin, Calinski-Harabasz)")
print(f"  Balanced for business practicality (K=4-8 preferred)")

In [None]:
# Perform clustering
print("\n" + "="*70)
print("CLUSTERING EXECUTION")
print("="*70)

analyzer.perform_clustering(n_clusters=optimal_k)

print(f"\n✓ Clustering complete")
print(f"Number of clusters: {len(set(analyzer.clusters))}")

# Display cluster distribution
if 'Cluster' in analyzer.df.columns:
    cluster_dist = analyzer.df['Cluster'].value_counts().sort_index()
    print("\nCluster Distribution:")
    for cluster_id, count in cluster_dist.items():
        pct = (count / len(analyzer.df)) * 100
        print(f"  Cluster {cluster_id}: {count:4d} companies ({pct:5.1f}%)")

In [None]:
# Visualize cluster distribution
if 'Cluster' in analyzer.df.columns:
    plt.figure(figsize=(12, 6))
    
    # Bar plot
    plt.subplot(1, 2, 1)
    cluster_dist = analyzer.df['Cluster'].value_counts().sort_index()
    cluster_dist.plot(kind='bar', color=plt.cm.Set3(np.linspace(0, 1, len(cluster_dist))))
    plt.title('Company Distribution Across Clusters', fontsize=14, fontweight='bold')
    plt.xlabel('Cluster ID', fontsize=12)
    plt.ylabel('Number of Companies', fontsize=12)
    plt.xticks(rotation=0)
    plt.grid(True, alpha=0.3, axis='y')
    
    # Pie chart
    plt.subplot(1, 2, 2)
    plt.pie(cluster_dist.values, labels=[f'Cluster {i}' for i in cluster_dist.index],
            autopct='%1.1f%%', colors=plt.cm.Set3(np.linspace(0, 1, len(cluster_dist))),
            startangle=90)
    plt.title('Cluster Size Distribution', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()

In [None]:
# Analyze cluster characteristics
print("\n" + "="*70)
print("CLUSTER ANALYSIS")
print("="*70)

cluster_analysis = analyzer.analyze_clusters()

# Display detailed analysis for each cluster
for cluster_id, info in cluster_analysis.items():
    print(f"\n{'='*70}")
    print(f"CLUSTER {cluster_id}: {info['size']} companies ({info['percentage']:.1f}%)")
    print(f"{'='*70}")
    
    # Show top characteristics
    print("\nTop Characteristics:")
    if 'top_features' in info:
        for i, (feature, value) in enumerate(info['top_features'].items(), 1):
            print(f"  {i}. {feature}: {value:.2f}")
            if i >= 5:
                break

---
## 7. Statistical Analysis

### Chi-Square Tests
We perform chi-square tests to examine relationships between categorical variables and cluster membership:
- Tests statistical significance of associations
- Validates assumptions (expected frequency ≥ 5)
- Applies multiple testing correction (FDR-BH method)

### Analysis of Variance (ANOVA)
We test if numerical features differ significantly across clusters:
- F-statistic and p-value for each feature
- Effect size calculation
- Multiple testing correction

In [None]:
# Perform chi-square tests
print("\n" + "="*70)
print("CHI-SQUARE STATISTICAL TESTS")
print("="*70)

chi_square_results = analyzer.perform_chi_square_test(correction_method='fdr_bh')

if chi_square_results:
    print(f"\n✓ Chi-square tests completed")
    print(f"  Tests performed: {len(chi_square_results)}")
    print(f"  Multiple testing correction: FDR-BH")
    
    # Display results
    print("\nResults:")
    for feature, result in chi_square_results.items():
        if result['valid']:
            sig = "***" if result['p_value'] < 0.001 else "**" if result['p_value'] < 0.01 else "*" if result['p_value'] < 0.05 else "ns"
            print(f"\n{feature}:")
            print(f"  Chi-square statistic: {result['chi2']:.2f}")
            print(f"  p-value: {result['p_value']:.4f} {sig}")
            print(f"  Degrees of freedom: {result['dof']}")
        else:
            print(f"\n{feature}: Test invalid (assumption violation)")
else:
    print("\n⚠ No categorical variables available for chi-square tests")

In [None]:
# Compare clusters across features
print("\n" + "="*70)
print("CLUSTER COMPARISON")
print("="*70)

comparison = analyzer.compare_clusters()

if comparison is not None:
    print("\nCluster Comparison Summary:")
    display(comparison)
else:
    print("\n⚠ Cluster comparison not available")

---
## 8. Machine Learning Models

### Logistic Regression (Cluster Prediction)
Predicts which cluster a company belongs to based on its features:
- **Purpose**: Segment membership prediction for new companies
- **Method**: Multinomial logistic regression with L2 regularization
- **Validation**: 5-fold cross-validation
- **Output**: Interpretable coefficients showing feature importance

### Linear Regression (Performance Forecasting)
Predicts company performance metrics (revenue, market value):
- **Purpose**: Forecast financial outcomes
- **Method**: Ridge regression (handles multicollinearity)
- **Validation**: Train/test split with proper scaling
- **Output**: R² score, RMSE, feature importance

In [None]:
# Train logistic regression model
print("\n" + "="*70)
print("LOGISTIC REGRESSION (Cluster Prediction)")
print("="*70)

lr_results = analyzer.train_logistic_regression(
    use_original_features=True,
    cv_folds=5
)

if lr_results:
    print(f"\n✓ Logistic regression model trained")
    print(f"\nModel Performance:")
    print(f"  Training accuracy: {lr_results['train_accuracy']:.2%}")
    print(f"  Test accuracy: {lr_results['test_accuracy']:.2%}")
    print(f"  Cross-validation score: {lr_results['cv_score_mean']:.2%} (±{lr_results['cv_score_std']:.2%})")
    
    # Display classification report
    print("\nClassification Report:")
    print(lr_results['classification_report'])
    
    # Display top features
    print("\nTop 10 Most Important Features:")
    if 'feature_importance' in lr_results:
        for i, (feature, importance) in enumerate(lr_results['feature_importance'][:10], 1):
            print(f"  {i:2d}. {feature}: {importance:.4f}")
else:
    print("\n⚠ Logistic regression training failed")

In [None]:
# Train linear regression model
print("\n" + "="*70)
print("LINEAR REGRESSION (Revenue Forecasting)")
print("="*70)

linear_results = analyzer.train_linear_regression(
    target_feature='revenue',
    regularization='ridge',
    check_multicollinearity=True
)

if linear_results:
    print(f"\n✓ Linear regression model trained")
    print(f"\nModel Performance:")
    print(f"  R² score (train): {linear_results.get('r2_train', 0):.4f}")
    print(f"  R² score (test): {linear_results.get('r2_test', 0):.4f}")
    print(f"  RMSE (test): {linear_results.get('rmse_test', 0):.2f}")
    
    # Display VIF if available
    if 'vif_scores' in linear_results:
        print("\nMulticollinearity Check (VIF):")
        vif_high = {k: v for k, v in linear_results['vif_scores'].items() if v > 5}
        if vif_high:
            print("  Features with high VIF (>5):")
            for feature, vif in sorted(vif_high.items(), key=lambda x: x[1], reverse=True)[:5]:
                print(f"    {feature}: {vif:.2f}")
        else:
            print("  ✓ No high multicollinearity detected")
    
    # Display top features
    print("\nTop 10 Most Important Features:")
    if 'feature_importance' in linear_results:
        for i, (feature, importance) in enumerate(linear_results['feature_importance'][:10], 1):
            print(f"  {i:2d}. {feature}: {importance:.4f}")
else:
    print("\n⚠ Linear regression training failed")

---
## 9. Visualization & Results

### Visualization Suite
We generate multiple visualizations to understand the clustering results:

1. **PCA Scatter Plot**: 2D projection showing cluster separation
2. **Feature Comparison**: Violin plots with statistical annotations
3. **Cluster Heatmap**: Normalized mean values across features
4. **Interactive 3D Plot**: Plotly-based explorable visualization
5. **Explained Variance**: PCA component contribution
6. **Confusion Matrix**: Logistic regression predictions

In [None]:
# Generate all visualizations
print("\n" + "="*70)
print("VISUALIZATION GENERATION")
print("="*70)

analyzer.visualize_results()

print("\n✓ Visualizations generated and saved")
print("\nGenerated files:")
print("  - optimal_clusters.png")
print("  - pca_clusters.png")
print("  - pca_clusters_enhanced.png")
print("  - feature_comparison.png")
print("  - feature_comparison_enhanced.png")
print("  - cluster_heatmap_enhanced.png")
print("  - interactive_clusters.html")

In [None]:
# Display PCA visualization (if available)
from IPython.display import Image, display as ipy_display
import os

if os.path.exists('pca_clusters_enhanced.png'):
    print("\nPCA Cluster Visualization:")
    ipy_display(Image('pca_clusters_enhanced.png'))
elif os.path.exists('pca_clusters.png'):
    print("\nPCA Cluster Visualization:")
    ipy_display(Image('pca_clusters.png'))

In [None]:
# Display feature comparison (if available)
if os.path.exists('feature_comparison_enhanced.png'):
    print("\nFeature Comparison Across Clusters:")
    ipy_display(Image('feature_comparison_enhanced.png'))
elif os.path.exists('feature_comparison.png'):
    print("\nFeature Comparison Across Clusters:")
    ipy_display(Image('feature_comparison.png'))

---
## 10. Business Insights

### Insight Generation
We generate actionable business insights using:
- **LLM Integration** (if available): Natural language insights from OpenAI/DeepSeek
- **Rule-Based System** (fallback): Statistical pattern analysis

### Insights Include
1. **Cluster Characterization**: What defines each segment?
2. **Key Differentiators**: What separates the segments?
3. **Business Recommendations**: How to leverage these insights?
4. **Risk Factors**: What to watch out for?
5. **Growth Opportunities**: Where to focus efforts?

In [None]:
# Identify patterns
print("\n" + "="*70)
print("PATTERN IDENTIFICATION")
print("="*70)

patterns = analyzer.identify_patterns()

if patterns:
    print("\n✓ Patterns identified")
    
    # Display outliers
    if 'outliers' in patterns and patterns['outliers']:
        print("\nTop Outlier Features:")
        for i, outlier in enumerate(patterns['outliers'][:5], 1):
            print(f"  {i}. {outlier['feature']}: {outlier['count']} companies ({outlier['percentage']:.1f}%)")
    
    # Display trends
    if 'trends' in patterns and patterns['trends']:
        print("\nKey Trends:")
        for i, trend in enumerate(patterns['trends'][:5], 1):
            print(f"  {i}. {trend}")
else:
    print("\n⚠ No patterns identified")

In [None]:
# Generate insights
print("\n" + "="*70)
print("BUSINESS INSIGHTS GENERATION")
print("="*70)

insights = analyzer.generate_llm_insights(cluster_analysis, patterns)

print("\n" + insights)

---
## 11. Conclusions

### Summary
Our AI-driven company intelligence system successfully:

1. **Processed and analyzed** a complex dataset of company attributes
2. **Identified meaningful segments** using advanced clustering algorithms
3. **Generated predictive models** for segment membership and performance forecasting
4. **Provided statistical validation** through multiple testing methods
5. **Created actionable insights** for business decision-making

### Key Achievements
- ✓ Comprehensive data preprocessing with intelligent feature engineering
- ✓ 4-phase Latent-Sparse Clustering with multi-algorithm support
- ✓ Business-optimized cluster selection (K=4-8)
- ✓ Statistical validation with proper assumption checking
- ✓ Interpretable machine learning models
- ✓ Enhanced visualizations with statistical annotations
- ✓ LLM-powered natural language insights

### Business Value
This system enables organizations to:
- **Segment companies** effectively for targeted strategies
- **Identify opportunities** in specific market segments
- **Assess risks** based on company characteristics
- **Predict performance** using validated models
- **Make data-driven decisions** with statistical confidence

In [None]:
# Generate comprehensive report
print("\n" + "="*70)
print("REPORT GENERATION")
print("="*70)

report = analyzer.generate_report(cluster_analysis, patterns, insights)

print("\n✓ Comprehensive report generated and saved")
print("  File: company_intelligence_report.txt")

In [None]:
# Export results
print("\n" + "="*70)
print("RESULTS EXPORT")
print("="*70)

# Export data with cluster labels
output_file = 'companies_with_segments.csv'
analyzer.df.to_csv(output_file, index=False)

print(f"\n✓ Results exported to: {output_file}")
print(f"  Companies: {len(analyzer.df)}")
print(f"  Features: {len(analyzer.df.columns)}")
print(f"  Clusters: {len(set(analyzer.clusters))}")

In [None]:
# Display sample of results
print("\n" + "="*70)
print("SAMPLE RESULTS")
print("="*70)

# Select key columns to display
display_cols = ['Cluster']
for col in analyzer.df.columns[:5]:
    if col != 'Cluster':
        display_cols.append(col)

print("\nFirst 10 companies with cluster assignments:")
display(analyzer.df[display_cols].head(10))

---
## Appendix: System Specifications

### Code Statistics
- **Total Lines of Code**: ~6,500+
- **Main Analysis Engine**: 3,100+ lines (`company_intelligence.py`)
- **Clustering Module**: 2,700+ lines (`clustering_analysis.py`)
- **Preprocessing**: 271 lines (`process_champions_data.py`)
- **Visualizations**: 412 lines (`visualization_improvements.py`)

### Features Implemented
- **10+ Clustering Algorithms**: K-Means, K-Medoids, DBSCAN, HDBSCAN, GMM
- **5+ Dimensionality Reduction**: PCA, t-SNE, UMAP, TruncatedSVD, FAMD
- **3+ Regression Models**: Logistic, Ridge, Lasso, ElasticNet
- **5+ Statistical Tests**: Chi-square, ANOVA, VIF, Silhouette, Davies-Bouldin
- **10+ Business Indicators**: Market ratios, technology indices, maturity stages

### Dependencies
- Core: pandas, numpy, scikit-learn
- Advanced: prince, umap-learn, hdbscan, shap, gower
- Visualization: matplotlib, seaborn, plotly
- Statistical: scipy, statsmodels
- LLM: openai (optional)

### Execution Time
- Data loading: < 1 second
- Preprocessing: 2-5 seconds
- Clustering: 5-15 seconds
- ML models: 3-10 seconds
- Visualization: 5-10 seconds
- **Total**: < 1 minute (typical dataset)

---

## Thank You!

**Team Fournity**  
SDS DATATHON 2026 - Category A  
January 2026