# 🏢 Interactive Employee Explorer

**Explore employee populations with interactive widgets**

This notebook provides an interactive way to:
- Generate employee populations with different parameters
- Filter and search for specific employee profiles
- Visualize population distributions
- Find cases like "Level 5, £80,692.50, Exceeding performance"

---

In [2]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, clear_output
import warnings

warnings.filterwarnings("ignore")

# Set up plotting
plt.style.use("default")
sns.set_palette("husl")
%matplotlib inline

print("📦 Libraries imported successfully!")
print("🎯 Ready to explore employee populations")

📦 Libraries imported successfully!
🎯 Ready to explore employee populations


In [3]:
# Import the employee simulation system
try:
    from employee_simulation_orchestrator import EmployeeSimulationOrchestrator

    print("✅ Employee Simulation System loaded successfully!")
except ImportError as e:
    print(f"❌ Could not import employee simulation system: {e}")
    print("Make sure you're running from the correct directory")

✅ Employee Simulation System loaded successfully!


## 🎛️ Interactive Population Generator

Use the controls below to generate different employee populations:

In [4]:
class InteractiveEmployeeExplorer:
    def __init__(self):
        self.current_data = None
        self.tracked_stories = {}
        self.setup_widgets()

    def setup_widgets(self):
        """Setup interactive widgets"""

        # Population generation controls
        self.population_size = widgets.IntSlider(
            value=200, min=100, max=5000, step=100, description="Population:", style={"description_width": "initial"}
        )

        self.random_seed = widgets.IntSlider(
            value=42, min=1, max=999, step=1, description="Random Seed:", style={"description_width": "initial"}
        )

        # Level distribution controls
        self.enable_level_skew = widgets.Checkbox(
            value=False, description="Enable Level Distribution Skewing", style={"description_width": "initial"}
        )

        self.level3_percentage = widgets.FloatSlider(
            value=20.0,
            min=5.0,
            max=50.0,
            step=1.0,
            description="Level 3 %:",
            style={"description_width": "initial"},
            disabled=True,
        )

        # Gender pay gap controls
        self.enable_gender_gap = widgets.Checkbox(
            value=False, description="Enable Gender Pay Gap Simulation", style={"description_width": "initial"}
        )

        self.gender_gap_percent = widgets.FloatSlider(
            value=15.8,
            min=0.0,
            max=30.0,
            step=0.1,
            description="Pay Gap %:",
            style={"description_width": "initial"},
            disabled=True,
        )

        # Setup widget interactions
        self.enable_level_skew.observe(self.on_level_skew_change, names="value")
        self.enable_gender_gap.observe(self.on_gender_gap_change, names="value")

        self.generate_btn = widgets.Button(
            description="🔄 Generate Population", button_style="success", layout=widgets.Layout(width="200px")
        )
        self.generate_btn.on_click(self.generate_population)

        # Search/filter controls
        self.target_level = widgets.IntSlider(
            value=5, min=1, max=6, step=1, description="Target Level:", style={"description_width": "initial"}
        )

        self.target_salary = widgets.FloatText(
            value=80692.50, description="Target Salary (£):", style={"description_width": "initial"}
        )

        self.salary_tolerance = widgets.FloatSlider(
            value=5000,
            min=1000,
            max=20000,
            step=1000,
            description="Salary Tolerance (£):",
            style={"description_width": "initial"},
        )

        self.performance_filter = widgets.SelectMultiple(
            options=["Not met", "Partially met", "Achieving", "High Performing", "Exceeding"],
            value=["Exceeding", "High Performing"],
            description="Performance:",
            style={"description_width": "initial"},
        )

        self.gender_filter = widgets.SelectMultiple(
            options=["Male", "Female"],
            value=["Male", "Female"],
            description="Gender:",
            style={"description_width": "initial"},
        )

        self.search_btn = widgets.Button(
            description="🔍 Search Employees", button_style="info", layout=widgets.Layout(width="200px")
        )
        self.search_btn.on_click(self.search_employees)

        # Output areas
        self.output_area = widgets.Output()

    def on_level_skew_change(self, change):
        """Enable/disable level distribution controls"""
        self.level3_percentage.disabled = not change["new"]

    def on_gender_gap_change(self, change):
        """Enable/disable gender pay gap controls"""
        self.gender_gap_percent.disabled = not change["new"]

    def generate_population(self, btn):
        """Generate new employee population"""
        with self.output_area:
            clear_output(wait=True)
            print("🔄 Generating employee population...")

            # Build level distribution if enabled
            level_distribution = None
            if self.enable_level_skew.value:
                level3_pct = self.level3_percentage.value / 100.0
                # Distribute remaining percentage among other levels
                remaining = (100.0 - self.level3_percentage.value) / 100.0
                level_distribution = [
                    remaining * 0.31,  # L1: ~25% * 1.25 = 31% of remaining
                    remaining * 0.31,  # L2: ~25% * 1.25 = 31% of remaining
                    level3_pct,  # L3: User-specified percentage
                    remaining * 0.19,  # L4: ~15% * 1.25 = 19% of remaining
                    remaining * 0.12,  # L5: ~10% * 1.25 = 12% of remaining
                    remaining * 0.06,  # L6: ~5% * 1.25 = 6% of remaining
                ]
                print(f"📊 Custom Level Distribution: L3={level3_pct:.1%}, Others distributed proportionally")

            # Build gender pay gap if enabled
            gender_pay_gap_percent = None
            if self.enable_gender_gap.value:
                gender_pay_gap_percent = self.gender_gap_percent.value
                print(f"⚖️ Gender Pay Gap: {gender_pay_gap_percent:.1f}%")

            config = {
                "population_size": self.population_size.value,
                "random_seed": self.random_seed.value,
                "max_cycles": 2,  # Shorter for faster generation
                "level_distribution": level_distribution,
                "gender_pay_gap_percent": gender_pay_gap_percent,
                # Enable story tracking
                "enable_story_tracking": True,
                "tracked_employee_count": 15,
                "export_story_data": False,
                # Disable file outputs for notebook use
                "generate_interactive_dashboard": False,
                "create_individual_story_charts": False,
                "export_formats": [],
                "story_export_formats": [],
                "generate_visualizations": False,
                "export_individual_files": False,
                "export_comprehensive_report": False,
                "generate_summary_report": False,
                "log_level": "ERROR",  # Minimal logging
                "enable_progress_bar": False,
            }

            try:
                orchestrator = EmployeeSimulationOrchestrator(config=config)
                raw_results = orchestrator.run_with_story_tracking()

                # Handle results safely
                if isinstance(raw_results, tuple):
                    results = raw_results[0] if raw_results else {}
                else:
                    results = raw_results or {}

                # Get population data
                population_data = results.get("population_data", [])

                if not population_data:
                    # Try to load from files
                    files = results.get("files_generated", {})
                    pop_file = files.get("population")
                    if pop_file:
                        import json
                        from pathlib import Path

                        if Path(pop_file).exists():
                            with open(pop_file, "r") as f:
                                population_data = json.load(f)

                if population_data:
                    self.current_data = pd.DataFrame(population_data)
                    self.tracked_stories = results.get("employee_stories", {})

                    print(f"✅ Generated {len(self.current_data)} employees successfully!")
                    print(
                        f"📚 Tracked {sum(len(stories) for stories in self.tracked_stories.values())} interesting employees"
                    )

                    # Show quick overview
                    self.show_population_overview()

                else:
                    print("❌ No population data generated")

            except Exception as e:
                print(f"❌ Generation failed: {e}")
                import traceback

                traceback.print_exc()

    def show_population_overview(self):
        """Show a quick overview of the generated population"""
        if self.current_data is None:
            return

        df = self.current_data

        print("\n📊 POPULATION OVERVIEW")
        print("-" * 30)
        print(f"Total Employees: {len(df):,}")
        print(f"Salary Range: £{df['salary'].min():,.0f} - £{df['salary'].max():,.0f}")
        print(f"Average Salary: £{df['salary'].mean():,.0f}")

        print("\nLevel Distribution:")
        level_dist = df["level"].value_counts().sort_index()
        for level, count in level_dist.items():
            pct = (count / len(df)) * 100
            print(f"  Level {level}: {count} ({pct:.1f}%)")

        print("\nPerformance Distribution:")
        perf_dist = df["performance_rating"].value_counts()
        for perf, count in perf_dist.items():
            pct = (count / len(df)) * 100
            print(f"  {perf}: {count} ({pct:.1f}%)")

        # Show gender pay gap if applicable
        gender_stats = df.groupby("gender")["salary"].median()
        if len(gender_stats) >= 2:
            male_median = gender_stats.get("Male", 0)
            female_median = gender_stats.get("Female", 0)
            if male_median > 0 and female_median > 0:
                gap_pct = ((male_median - female_median) / male_median) * 100
                print("\nGender Pay Analysis:")
                print(f"  Male median: £{male_median:,.0f}")
                print(f"  Female median: £{female_median:,.0f}")
                print(f"  Pay gap: {gap_pct:.1f}%")

    def search_employees(self, btn):
        """Search for employees matching criteria"""
        with self.output_area:
            clear_output(wait=True)

            if self.current_data is None:
                print("❌ No population data available. Generate a population first!")
                return

            print("🔍 Searching for employees matching your criteria...")
            print(
                f"Target: Level {self.target_level.value}, £{self.target_salary.value:,.2f} (±£{self.salary_tolerance.value:,.0f})"
            )
            print(f"Performance: {list(self.performance_filter.value)}")
            print(f"Gender: {list(self.gender_filter.value)}")
            print()

            # Apply filters
            filtered_df = self.current_data[
                (self.current_data["level"] == self.target_level.value)
                & (abs(self.current_data["salary"] - self.target_salary.value) <= self.salary_tolerance.value)
                & (self.current_data["performance_rating"].isin(self.performance_filter.value))
                & (self.current_data["gender"].isin(self.gender_filter.value))
            ]

            if len(filtered_df) == 0:
                print("❌ No employees match your exact criteria")
                print("💡 Try adjusting the salary tolerance or performance filters")

                # Show closest matches
                level_employees = self.current_data[self.current_data["level"] == self.target_level.value]
                if len(level_employees) > 0:
                    level_employees = level_employees.copy()
                    level_employees["salary_diff"] = abs(level_employees["salary"] - self.target_salary.value)
                    closest = level_employees.nsmallest(5, "salary_diff")

                    print(f"\n🎯 Closest Level {self.target_level.value} employees:")
                    for i, (_, emp) in enumerate(closest.iterrows(), 1):
                        is_tracked = self.is_employee_tracked(emp["employee_id"])
                        tracked_indicator = f" 📚 {is_tracked}" if is_tracked else ""
                        print(
                            f"  {i}. Employee {emp['employee_id']}: £{emp['salary']:,.0f}, {emp['performance_rating']}, {emp['gender']}{tracked_indicator}"
                        )
                return

            # Sort by closest salary match
            filtered_df = filtered_df.copy()
            filtered_df["salary_diff"] = abs(filtered_df["salary"] - self.target_salary.value)
            filtered_df = filtered_df.sort_values("salary_diff")

            print(f"✅ Found {len(filtered_df)} employees matching your criteria!")
            print()

            # Show results
            for i, (_, emp) in enumerate(filtered_df.iterrows(), 1):
                diff = emp["salary_diff"]
                is_tracked = self.is_employee_tracked(emp["employee_id"])
                tracked_indicator = f" 📚 TRACKED: {is_tracked}" if is_tracked else ""

                print(f"🎯 Match {i}: Employee {emp['employee_id']}")
                print(f"   Level: {emp['level']}")
                print(f"   Salary: £{emp['salary']:,.2f} (±£{diff:.0f} from target)")
                print(f"   Performance: {emp['performance_rating']}")
                print(f"   Gender: {emp['gender']}{tracked_indicator}")
                print()

            # Create visualizations
            self.create_search_visualizations(filtered_df)

    def is_employee_tracked(self, employee_id):
        """Check if employee is tracked and return category"""
        for category, stories in self.tracked_stories.items():
            for story in stories:
                story_emp_id = getattr(story, "employee_id", None) or story.get("employee_id")
                if story_emp_id == employee_id:
                    return category.replace("_", " ").title()
        return None

    def create_search_visualizations(self, matches_df):
        """Create visualizations for search results"""
        if self.current_data is None or len(matches_df) == 0:
            return

        print("📊 SEARCH RESULTS VISUALIZATIONS")
        print("-" * 40)

        # Create interactive plot with plotly
        fig = make_subplots(
            rows=2,
            cols=2,
            subplot_titles=[
                "Salary Distribution (Your Matches vs Population)",
                "Performance vs Salary (Matches Highlighted)",
                "Gender Distribution",
                "Matches in Context",
            ],
            specs=[[{"secondary_y": False}, {"secondary_y": False}], [{"type": "pie"}, {"secondary_y": False}]],
        )

        # 1. Salary distribution comparison
        level_data = self.current_data[self.current_data["level"] == self.target_level.value]

        fig.add_trace(
            go.Histogram(x=level_data["salary"], name=f"All Level {self.target_level.value}", opacity=0.7, nbinsx=20),
            row=1,
            col=1,
        )

        fig.add_trace(go.Histogram(x=matches_df["salary"], name="Your Matches", opacity=0.8, nbinsx=10), row=1, col=1)

        # 2. Performance vs Salary scatter
        perf_mapping = {"Not met": 1, "Partially met": 2, "Achieving": 3, "High Performing": 4, "Exceeding": 5}

        # Population background
        fig.add_trace(
            go.Scatter(
                x=[perf_mapping.get(p, 3) for p in level_data["performance_rating"]],
                y=level_data["salary"],
                mode="markers",
                name=f"All Level {self.target_level.value}",
                opacity=0.4,
                marker=dict(size=5, color="lightgray"),
            ),
            row=1,
            col=2,
        )

        # Matches highlighted
        fig.add_trace(
            go.Scatter(
                x=[perf_mapping.get(p, 3) for p in matches_df["performance_rating"]],
                y=matches_df["salary"],
                mode="markers",
                name="Your Matches",
                marker=dict(size=12, color="red", symbol="star"),
            ),
            row=1,
            col=2,
        )

        # 3. Gender pie chart
        gender_counts = matches_df["gender"].value_counts()
        fig.add_trace(go.Pie(labels=gender_counts.index, values=gender_counts.values, name="Gender"), row=2, col=1)

        # 4. Context view - where matches fit in overall population
        fig.add_trace(
            go.Scatter(
                x=self.current_data["level"],
                y=self.current_data["salary"],
                mode="markers",
                name="Full Population",
                opacity=0.3,
                marker=dict(size=4, color="lightblue"),
            ),
            row=2,
            col=2,
        )

        fig.add_trace(
            go.Scatter(
                x=matches_df["level"],
                y=matches_df["salary"],
                mode="markers",
                name="Your Matches",
                marker=dict(size=15, color="red", symbol="diamond"),
            ),
            row=2,
            col=2,
        )

        # Update layout
        fig.update_layout(
            height=800,
            title_text=f"Employee Search Results Analysis - {len(matches_df)} Matches Found",
            showlegend=True,
        )

        # Update x-axes
        fig.update_xaxes(title_text="Salary (£)", row=1, col=1)
        fig.update_xaxes(
            title_text="Performance Rating",
            tickvals=list(perf_mapping.values()),
            ticktext=list(perf_mapping.keys()),
            row=1,
            col=2,
        )
        fig.update_xaxes(title_text="Level", row=2, col=2)

        # Update y-axes
        fig.update_yaxes(title_text="Count", row=1, col=1)
        fig.update_yaxes(title_text="Salary (£)", row=1, col=2)
        fig.update_yaxes(title_text="Salary (£)", row=2, col=2)

        fig.show()

    def display_interface(self):
        """Display the complete interface"""

        # Population generation section
        gen_box = widgets.VBox(
            [
                widgets.HTML("<h3>🎛️ Population Generation</h3>"),
                widgets.HBox([self.population_size, self.random_seed]),
                widgets.HTML("<h4>🎚️ Advanced Options</h4>"),
                widgets.VBox(
                    [
                        self.enable_level_skew,
                        widgets.HBox([self.level3_percentage]),
                        self.enable_gender_gap,
                        widgets.HBox([self.gender_gap_percent]),
                    ]
                ),
                self.generate_btn,
            ]
        )

        # Search section
        search_box = widgets.VBox(
            [
                widgets.HTML("<h3>🔍 Employee Search</h3>"),
                widgets.HBox([self.target_level, self.target_salary]),
                widgets.HBox([self.salary_tolerance]),
                widgets.HBox([self.performance_filter, self.gender_filter]),
                self.search_btn,
            ]
        )

        # Main interface
        main_interface = widgets.VBox(
            [
                widgets.HTML(
                    "<h1>🏢 Interactive Employee Explorer</h1>"
                    "<p>Generate employee populations and search for specific profiles interactively!</p>"
                    "<p><strong>New Features:</strong></p>"
                    "<ul>"
                    "<li>📊 <strong>Level Distribution Skewing</strong>: Customize the percentage of Level 3 employees</li>"
                    "<li>⚖️ <strong>Gender Pay Gap Simulation</strong>: Apply realistic pay gaps (2024 UK average: 15.8%)</li>"
                    "</ul>"
                ),
                gen_box,
                search_box,
                self.output_area,
            ]
        )

        display(main_interface)


# Create and display the explorer
explorer = InteractiveEmployeeExplorer()
explorer.display_interface()

VBox(children=(HTML(value='<h1>🏢 Interactive Employee Explorer</h1><p>Generate employee populations and search…

## 📊 Population Analysis Dashboard

After generating a population above, run the cell below to get comprehensive population analysis:

In [None]:
def create_population_dashboard(df, tracked_stories=None):
    """Create comprehensive population analysis dashboard"""

    if df is None or len(df) == 0:
        print("❌ No population data available. Generate a population first!")
        return

    print(f"📊 POPULATION DASHBOARD - {len(df):,} Employees")
    print("=" * 60)

    # Create comprehensive visualizations
    fig = make_subplots(
        rows=3,
        cols=2,
        subplot_titles=[
            "Salary Distribution by Level",
            "Performance Rating Distribution",
            "Gender Distribution by Level",
            "Salary vs Performance Correlation",
            "Level Progression Opportunities",
            "Tracked Employees Overview",
        ],
        specs=[
            [{"secondary_y": False}, {"type": "pie"}],
            [{"secondary_y": False}, {"secondary_y": False}],
            [{"secondary_y": False}, {"type": "pie"}],
        ],
    )

    # 1. Salary by Level (Box Plot)
    for level in sorted(df["level"].unique()):
        level_data = df[df["level"] == level]
        fig.add_trace(go.Box(y=level_data["salary"], name=f"Level {level}"), row=1, col=1)

    # 2. Performance Distribution (Pie)
    perf_counts = df["performance_rating"].value_counts()
    fig.add_trace(go.Pie(labels=perf_counts.index, values=perf_counts.values, name="Performance"), row=1, col=2)

    # 3. Gender by Level (Stacked Bar)
    gender_level = pd.crosstab(df["level"], df["gender"])
    for gender in gender_level.columns:
        fig.add_trace(go.Bar(x=gender_level.index, y=gender_level[gender], name=gender), row=2, col=1)

    # 4. Salary vs Performance Scatter
    perf_mapping = {"Not met": 1, "Partially met": 2, "Achieving": 3, "High Performing": 4, "Exceeding": 5}

    fig.add_trace(
        go.Scatter(
            x=[perf_mapping.get(p, 3) for p in df["performance_rating"]],
            y=df["salary"],
            mode="markers",
            marker=dict(size=6, color=df["level"], colorscale="Viridis", showscale=True, colorbar=dict(title="Level")),
            name="Employees",
        ),
        row=2,
        col=2,
    )

    # 5. Level Distribution
    level_counts = df["level"].value_counts().sort_index()
    fig.add_trace(go.Bar(x=level_counts.index, y=level_counts.values, name="Employees by Level"), row=3, col=1)

    # 6. Tracked Employees (if available)
    if tracked_stories:
        tracked_counts = {
            category.replace("_", " ").title(): len(stories) for category, stories in tracked_stories.items() if stories
        }

        if tracked_counts:
            fig.add_trace(
                go.Pie(
                    labels=list(tracked_counts.keys()), values=list(tracked_counts.values()), name="Tracked Stories"
                ),
                row=3,
                col=2,
            )

    # Update layout
    fig.update_layout(
        height=1200, title_text=f"Comprehensive Employee Population Analysis ({len(df):,} Employees)", showlegend=True
    )

    # Update axis labels
    fig.update_yaxes(title_text="Salary (£)", row=1, col=1)
    fig.update_xaxes(title_text="Level", row=2, col=1)
    fig.update_yaxes(title_text="Number of Employees", row=2, col=1)
    fig.update_xaxes(
        title_text="Performance Rating",
        tickvals=list(perf_mapping.values()),
        ticktext=list(perf_mapping.keys()),
        row=2,
        col=2,
    )
    fig.update_yaxes(title_text="Salary (£)", row=2, col=2)
    fig.update_xaxes(title_text="Level", row=3, col=1)
    fig.update_yaxes(title_text="Number of Employees", row=3, col=1)

    fig.show()

    # Print key insights
    print("\n🔍 KEY INSIGHTS:")
    print("-" * 20)

    # Salary insights
    salary_by_level = df.groupby("level")["salary"].mean()
    highest_level = salary_by_level.idxmax()
    print(f"💰 Highest average salary: Level {highest_level} (£{salary_by_level.max():,.0f})")

    # Performance insights
    exceeding_count = len(df[df["performance_rating"] == "Exceeding"])
    exceeding_pct = (exceeding_count / len(df)) * 100
    print(f"🏆 Exceeding performers: {exceeding_count} employees ({exceeding_pct:.1f}%)")

    # Gender insights
    gender_salary = df.groupby("gender")["salary"].median()
    if len(gender_salary) >= 2:
        gap = abs(gender_salary.iloc[0] - gender_salary.iloc[1])
        gap_pct = (gap / gender_salary.max()) * 100
        print(f"⚖️ Median gender pay gap: £{gap:.0f} ({gap_pct:.1f}%)")

    # Tracking insights
    if tracked_stories:
        total_tracked = sum(len(stories) for stories in tracked_stories.values())
        tracked_pct = (total_tracked / len(df)) * 100
        print(f"📚 Tracked interesting cases: {total_tracked} employees ({tracked_pct:.1f}%)")


# Button to run dashboard analysis
dashboard_btn = widgets.Button(
    description="📊 Generate Dashboard", button_style="info", layout=widgets.Layout(width="200px")
)


def run_dashboard(btn):
    if hasattr(explorer, "current_data") and explorer.current_data is not None:
        create_population_dashboard(explorer.current_data, explorer.tracked_stories)
    else:
        print("❌ No population data available. Generate a population first using the controls above!")


dashboard_btn.on_click(run_dashboard)
display(dashboard_btn)

## 💡 Quick Tips

**To find your specific case (Level 5, £80,692.50, Exceeding performance):**

1. **Generate Population**: Use 1000+ employees with different random seeds (42, 123, 789)
2. **Set Filters**: 
   - Target Level: 5
   - Target Salary: 80692.50
   - Salary Tolerance: 5000 (adjust as needed)
   - Performance: Select 'Exceeding' and/or 'High Performing'
3. **Search**: Click 'Search Employees' to find matches
4. **Analyze**: Use the dashboard to understand population patterns

**Features:**
- 🔄 **Interactive Generation**: Try different population sizes and seeds
- 🔍 **Smart Search**: Filter by level, salary range, performance, gender
- 📊 **Visual Analytics**: Interactive plots with Plotly
- 📚 **Story Tracking**: Shows which employees have interesting career patterns
- 🎯 **Targeted Analysis**: Focuses on employees matching your criteria

---

*This interactive notebook provides a clean, visual way to explore employee populations without generating any files - everything runs in-memory for immediate analysis.*