# üî• ENIGMA 2027 - Innovative Networking Compatibility Model

## üìö Executive Summary

This solution approaches professional networking compatibility as a **MULTI-DIMENSIONAL VALUE EXCHANGE** problem rather than simple similarity matching. 

### üéØ Key Innovation: Reciprocal Value Exchange Model

The fundamental insight is that effective professional networking is **NOT about finding similar people**, but about finding people who can provide **MUTUAL VALUE** to each other.

**Traditional Approach (What We're NOT Doing):**
- Cosine similarity between profile embeddings
- Simple feature matching
- "Similar people should network"

**Our Innovative Approach:**
- **Reciprocal Value Exchange**: Does Person A offer what Person B needs, and vice versa?
- **Objective Complementarity**: "Hiring" matches with "Job Seeking" - these are COMPLEMENTARY, not similar!
- **Constraint Satisfaction**: Respecting what people DON'T want is as important as matching what they want
- **Role Dynamics**: Founders naturally connect with Investors, CTOs with Engineers

### üí° Why This Approach is Enterprise-Ready:

1. **INTERPRETABLE**: Every compatibility score can be explained ("High score because: Person A is hiring, Person B is job seeking, both in FinTech")
2. **CONFIGURABLE**: Weights can be adjusted for different event types (startup pitch events vs. corporate networking)
3. **SCALABLE**: O(n) feature computation per pair, embeddings can be cached
4. **ACTIONABLE**: System can tell users WHY they should meet someone

---

## üß† Conceptual Foundation: Why Similarity-Based Matching Fails

### The Problem with Cosine Similarity in Networking

Consider two investors at a networking event. Traditional similarity would score them HIGH because they have similar:
- Role: Both "Investment Analyst"
- Interests: Both interested in "Venture Capital"
- Objectives: Both "Looking for deals"

**But should two investors network with each other?** Probably not - they're competing for the same deals!

### The Value Exchange Paradigm

In professional networking, **VALUE FLOWS** between people:

```
   FOUNDER ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫ INVESTOR
      ‚îÇ    "Seeking funding"         ‚îÇ
      ‚îÇ                              ‚îÇ
      ‚îÇ    "Looking for deals"       ‚îÇ
      ‚óÑ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

This creates **complementary** matches that traditional similarity would miss:
- Founder ‚Üî Investor (funding exchange)
- HR Manager ‚Üî Job Seeker (employment exchange)  
- Senior Professional ‚Üî Junior (mentorship exchange)
- Startup ‚Üî Enterprise (partnership exchange)

### Our Multi-Dimensional Compatibility Framework

We model compatibility across **6 dimensions**:

1. **Interest Overlap** (Jaccard similarity) - Do they share common ground?
2. **Objective Complementarity** - Can they help each other achieve goals?
3. **Role Synergy** - Do their professional roles naturally connect?
4. **Constraint Satisfaction** - Does neither violate the other's boundaries?
5. **Context Alignment** - Industry, location, company stage compatibility
6. **Seniority Dynamics** - Mentorship and hierarchical value exchange

---

## üì¶ Section 1: Import Required Libraries

We use a combination of:
- **pandas/numpy**: Data manipulation
- **scikit-learn**: ML models and cross-validation
- **LightGBM/XGBoost**: Gradient boosting for non-linear patterns
- **Collections**: Efficient data structures for set operations

In [None]:
import pandas as pd
import numpy as np
from collections import defaultdict, Counter
import os
import re
import time
import warnings
warnings.filterwarnings('ignore')

# Machine Learning
from sklearn.model_selection import KFold, cross_val_score
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import mean_squared_error

# Try to import gradient boosting libraries
try:
    import lightgbm as lgb
    HAS_LIGHTGBM = True
    print("‚úÖ LightGBM available")
except ImportError:
    HAS_LIGHTGBM = False
    print("‚ö†Ô∏è LightGBM not available, using sklearn")

try:
    from xgboost import XGBRegressor
    HAS_XGBOOST = True
    print("‚úÖ XGBoost available")
except ImportError:
    HAS_XGBOOST = False

from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor

print("\nüöÄ Libraries loaded successfully!")

## üìä Section 2: Load and Explore Dataset

### Understanding the Data Structure

The dataset represents **professional profiles** at networking events with:
- **Demographic attributes**: Age, Gender, Role, Seniority
- **Company context**: Company name, size, industry, location
- **Professional preferences**: Business Interests, Objectives, Constraints

The **target.csv** contains pairwise compatibility scores that we need to learn and predict.

In [None]:
# Configure data directory
DATA_DIR = '/kaggle/input/enigma26/Engima26_Dataset'
ALTERNATE_DIRS = ['/kaggle/input/enigma26', '/Users/likhith./Desktop/enigma', '.']

for path in [DATA_DIR] + ALTERNATE_DIRS:
    if os.path.exists(path):
        DATA_DIR = path
        break

print(f"üìÇ Data directory: {DATA_DIR}")

def load_data(filename):
    """Load data from xlsx or csv format"""
    xlsx_path = os.path.join(DATA_DIR, filename.replace('.csv', '.xlsx'))
    csv_path = os.path.join(DATA_DIR, filename)
    
    if os.path.exists(xlsx_path):
        return pd.read_excel(xlsx_path)
    elif os.path.exists(csv_path):
        return pd.read_csv(csv_path)
    else:
        raise FileNotFoundError(f"Could not find {filename}")

# Load datasets
train_df = load_data('train.csv')
test_df = load_data('test.csv')
target_df = load_data('target.csv')

print(f"\nüìà Dataset Statistics:")
print(f"   Training profiles: {len(train_df)}")
print(f"   Test profiles: {len(test_df)}")
print(f"   Training pairs: {len(target_df)}")
print(f"   Test pairs to predict: {len(test_df) * len(test_df)}")

# Display sample data
print("\nüìã Sample Training Profile:")
display(train_df.head(3))

## üîß Section 3: Feature Parsing & Normalization

### Thought Process: Why Normalization Matters

The multi-value fields (Business_Interests, Business_Objectives, Constraints) are semicolon-separated strings. We need to:

1. **Parse them into sets** for efficient intersection/union operations
2. **Normalize text** to handle case differences ("AI" vs "ai")
3. **Create lookup dictionaries** for O(1) profile access

This preprocessing is critical for computing our multi-dimensional compatibility features efficiently.

In [None]:
def normalize_text(text):
    """Normalize text for consistent comparison"""
    if pd.isna(text) or str(text).strip().lower() in ('', 'nan', 'none'):
        return ''
    text = str(text).lower().strip()
    text = re.sub(r'\s+', ' ', text)  # Normalize whitespace
    return text

def parse_multi_value(val):
    """Parse semicolon-separated values into a frozenset"""
    if pd.isna(val) or str(val).strip().lower() in ('', 'nan', 'none'):
        return frozenset()
    items = [normalize_text(item) for item in str(val).split(';')]
    return frozenset(item for item in items if item)

# Parse multi-value columns for all profiles
print("üîÑ Parsing multi-value fields...")

for df in [train_df, test_df]:
    # Parse sets
    df['BI_set'] = df['Business_Interests'].apply(parse_multi_value)
    df['BO_set'] = df['Business_Objectives'].apply(parse_multi_value)
    df['CO_set'] = df['Constraints'].apply(parse_multi_value)
    df['ALL_set'] = df.apply(lambda r: r['BI_set'] | r['BO_set'] | r['CO_set'], axis=1)
    
    # Normalize categorical columns
    df['Role_norm'] = df['Role'].apply(normalize_text)
    df['Industry_norm'] = df['Industry'].apply(normalize_text)
    df['Location_norm'] = df['Location_City'].apply(normalize_text)
    df['Seniority_norm'] = df['Seniority_Level'].apply(normalize_text)
    df['Company_norm'] = df['Company_Name'].apply(normalize_text)

# Create quick lookup dictionaries
train_lookup = {row['Profile_ID']: row for _, row in train_df.iterrows()}
test_lookup = {row['Profile_ID']: row for _, row in test_df.iterrows()}
all_lookup = {**train_lookup, **test_lookup}

print(f"‚úÖ Parsed {len(train_lookup)} train + {len(test_lookup)} test profiles")

# Show parsed example
sample_id = list(train_lookup.keys())[0]
sample = train_lookup[sample_id]
print(f"\nüìù Sample Parsed Profile (ID: {sample_id}):")
print(f"   Business Interests: {sample['BI_set']}")
print(f"   Business Objectives: {sample['BO_set']}")
print(f"   Constraints: {sample['CO_set']}")

## üéØ Section 4: Domain Knowledge - Complementary Relationship Mappings

### Thought Process: The Key Insight That Drives Our Solution

This is the **CORE INNOVATION** of our approach. We manually define complementary relationships based on domain knowledge of professional networking.

**Why This Matters:**
- A person whose objective is "Hiring for current or future roles" should match HIGHLY with someone "Exploring new job opportunities"
- This is a **COMPLEMENTARY** relationship, NOT a similarity relationship
- Traditional similarity-based methods would completely miss this!

**Our Approach:**
1. Define objective complementarity mappings (hiring ‚Üî job seeking, mentoring ‚Üî learning)
2. Define role complementarity mappings (founder ‚Üî investor, HR ‚Üî candidates)
3. Use these as **explicit features** that the model can learn to weight

This makes our solution **INTERPRETABLE** - we can explain WHY two people should network.

In [None]:
# =============================================================================
# COMPLEMENTARY OBJECTIVE PAIRS
# =============================================================================
# When Person A has objective X and Person B has objective Y, this creates VALUE

COMPLEMENTARY_OBJECTIVES = {
    # Hiring ‚Üî Job Seeking (Employment Value Exchange)
    ('hiring for current or future roles', 'exploring new job opportunities'): 1.0,
    ('hiring for current or future roles', 'looking for internship opportunities'): 0.9,
    ('hiring for current or future roles', 'career transition planning'): 0.8,
    ('hiring for current or future roles', 'exploring freelance or contract work'): 0.8,
    
    # Mentorship ‚Üî Learning (Knowledge Value Exchange)
    ('mentorship and guidance', 'looking for internship opportunities'): 0.9,
    ('mentorship and guidance', 'career transition planning'): 0.8,
    ('mentorship and guidance', 'exploring new job opportunities'): 0.7,
    ('mentorship and guidance', 'learning about industry trends'): 0.7,
    
    # Founder ‚Üî Investor (Capital Value Exchange)
    ('seeking startup or founder connections', 'understanding investor expectations'): 0.9,
    ('understanding investor expectations', 'seeking startup or founder connections'): 0.9,
    
    # Partnership Value Exchange
    ('exploring partnerships or collaborations', 'seeking startup or founder connections'): 0.9,
    ('exploring partnerships or collaborations', 'hiring for current or future roles'): 0.7,
    
    # Visibility ‚Üî Networking
    ('building professional visibility', 'networking with industry peers'): 0.7,
}

# Make relationships bidirectional
for (obj1, obj2), score in list(COMPLEMENTARY_OBJECTIVES.items()):
    COMPLEMENTARY_OBJECTIVES[(obj2, obj1)] = score

# =============================================================================
# ROLE COMPLEMENTARITY MATRIX
# =============================================================================
# Some role pairs naturally benefit from connecting

ROLE_COMPLEMENTARITY = {
    # Investment relationships
    ('founder', 'investment analyst'): 0.9,
    ('co-founder', 'investment analyst'): 0.9,
    
    # Startup ecosystem
    ('founder', 'co-founder'): 0.8,  # Looking for co-founders
    ('founder', 'consultant'): 0.7,
    
    # Technical hierarchy
    ('cto', 'software engineer'): 0.7,
    ('product manager', 'software engineer'): 0.7,
    ('product manager', 'data scientist'): 0.7,
    
    # Recruiting
    ('hr manager', 'student'): 0.8,
    ('hr manager', 'software engineer'): 0.7,
    ('hr manager', 'data scientist'): 0.7,
    
    # Marketing & Content
    ('marketing manager', 'content creator'): 0.8,
    
    # Analytics
    ('business analyst', 'data scientist'): 0.7,
}

# Make bidirectional
for (r1, r2), score in list(ROLE_COMPLEMENTARITY.items()):
    ROLE_COMPLEMENTARITY[(r2, r1)] = score

print(f"‚úÖ Defined {len(COMPLEMENTARY_OBJECTIVES)} objective complementarity relationships")
print(f"‚úÖ Defined {len(ROLE_COMPLEMENTARITY)} role complementarity relationships")

# Show examples
print("\nüí° Example Complementary Relationships:")
print("   'Hiring' ‚Üî 'Job Seeking' ‚Üí HIGH compatibility")
print("   'Mentorship' ‚Üî 'Internship' ‚Üí HIGH compatibility")
print("   'Founder' ‚Üî 'Investor' ‚Üí HIGH compatibility")

## üìê Section 5: Mathematical Foundations

### Our Compatibility Scoring Framework

We combine multiple mathematical concepts:

1. **Jaccard Similarity** - For set overlap (interests, objectives)
   $$J(A, B) = \frac{|A \cap B|}{|A \cup B|}$$

2. **Asymmetric Value Score** - What can A offer B?
   $$V(A \rightarrow B) = \sum_i w_i \cdot match_i(A.offerings, B.needs)$$

3. **Mutual Compatibility** - Bidirectional value exchange
   $$M(A,B) = \alpha \cdot V(A \rightarrow B) + \beta \cdot V(B \rightarrow A) + \gamma \cdot sim(A,B)$$

4. **Constraint Satisfaction** - Penalty for violations
   $$C(A,B) = 1 - \max(penalty(A.constraints, B), penalty(B.constraints, A))$$

In [None]:
def jaccard_similarity(set1, set2):
    """Standard Jaccard similarity: |A‚à©B| / |A‚à™B|"""
    if not set1 and not set2:
        return 0.0
    union = set1 | set2
    if not union:
        return 0.0
    return len(set1 & set2) / len(union)

def asymmetric_containment(set1, set2):
    """What fraction of set1 is contained in set2?
    Useful for: Does person B's expertise cover person A's needs?
    """
    if not set1:
        return 0.0
    return len(set1 & set2) / len(set1)

def compute_objective_complementarity(profile1, profile2):
    """
    Compute how well the objectives COMPLEMENT each other.
    
    This is NOT about similarity - it's about VALUE EXCHANGE.
    Someone 'hiring' matches well with someone 'job seeking'.
    """
    obj1 = profile1['BO_set']
    obj2 = profile2['BO_set']
    
    if not obj1 or not obj2:
        return 0.0
    
    max_score = 0.0
    total_score = 0.0
    count = 0
    
    for o1 in obj1:
        for o2 in obj2:
            pair_score = COMPLEMENTARY_OBJECTIVES.get((o1, o2), 0.0)
            if pair_score > 0:
                total_score += pair_score
                max_score = max(max_score, pair_score)
                count += 1
    
    # Return weighted combination of max and mean
    if count > 0:
        return 0.6 * max_score + 0.4 * (total_score / count)
    return 0.0

def compute_role_complementarity(profile1, profile2):
    """Compute role-based synergy score"""
    role1 = profile1['Role_norm']
    role2 = profile2['Role_norm']
    
    if not role1 or not role2:
        return 0.0
    
    return ROLE_COMPLEMENTARITY.get((role1, role2), 0.0)

print("‚úÖ Core mathematical functions defined")

## üõ°Ô∏è Section 6: Constraint-Aware Matching Module

### Thought Process: Why Constraints Are Critical

Most networking recommendation systems ignore **what people DON'T want**. This is a mistake.

**Real-world scenario:**
- Person A: Software Engineer with constraint "Only interested in technical discussions"
- Person B: Sales Executive looking to network

A naive similarity system might match them based on shared company size or location. But Person A has **explicitly stated** they don't want to talk to non-technical people!

**Our Innovation:**
- Parse constraints as NEGATIVE signals
- Create a **constraint violation penalty** that reduces compatibility
- This makes our system respect user preferences and creates better matches

This is **enterprise-critical** - users will trust a system that respects their boundaries.