# ⚽ FIFA Analytics Dashboard - Complete Setup & Usage Guide

## 🚀 Professional Sports Analytics with Python

Welcome to the comprehensive guide for the **FIFA Analytics Dashboard** - a modern, machine learning-powered analytics platform for football data analysis. This notebook provides step-by-step instructions, commands, and best practices for setting up and using the dashboard effectively.

### 📋 What You'll Learn

- **Environment Setup**: Create and manage Python virtual environments
- **Dependency Management**: Install and troubleshoot required packages
- **Data Validation**: Ensure your FIFA data is properly structured
- **Application Commands**: Run multiple Streamlit dashboards simultaneously
- **Performance Optimization**: Handle large datasets efficiently
- **Troubleshooting**: Solve common setup and runtime issues
- **Deployment**: Prepare for production environments

### 🎯 Prerequisites

- Python 3.8 or higher
- Basic command line knowledge
- FIFA player dataset (CSV format)
- 4GB+ RAM (for large datasets)

---

**Let's build a world-class football analytics platform together!** 🏆

# 🔧 Section 1: Environment Setup and Virtual Environment Creation

## Understanding Virtual Environments

Virtual environments are isolated Python environments that allow you to manage dependencies for different projects separately. This prevents conflicts between package versions and ensures reproducible deployments.

### Why Use Virtual Environments?
- **Isolation**: Keep project dependencies separate
- **Reproducibility**: Ensure consistent environments across different machines
- **Version Control**: Manage different package versions for different projects
- **Clean System**: Avoid cluttering your global Python installation

## Creating Virtual Environments

### Method 1: Using Python venv (Recommended)

In [None]:
# Windows Environment Setup Commands
# Run these in Command Prompt or PowerShell

# 1. Navigate to your project directory
# cd "D:\Big Data\Kişisel öğrenim\fifa_dash"

# 2. Create virtual environment
# python -m venv venv

# 3. Activate virtual environment (Windows Command Prompt)
# venv\Scripts\activate

# 4. Activate virtual environment (Windows PowerShell)
# venv\Scripts\Activate.ps1

# 5. Verify Python version and virtual environment
import sys
import os

def check_environment():
    """Check current Python environment details"""
    print("🐍 Python Environment Information")
    print("=" * 40)
    print(f"Python Version: {sys.version}")
    print(f"Python Executable: {sys.executable}")
    print(f"Virtual Environment: {'Yes' if hasattr(sys, 'real_prefix') or (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix) else 'No'}")
    print(f"Current Working Directory: {os.getcwd()}")
    print(f"Python Path: {sys.path[0]}")
    
    # Check if running in virtual environment
    in_venv = (hasattr(sys, 'real_prefix') or 
               (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix))
    
    if in_venv:
        print("✅ Running in virtual environment")
    else:
        print("⚠️  Running in global Python environment")
    
    return in_venv

# Execute the check
environment_status = check_environment()

In [None]:
# macOS/Linux Environment Setup Commands
# Run these in Terminal

# 1. Navigate to your project directory
# cd ~/fifa_dash

# 2. Create virtual environment
# python3 -m venv venv

# 3. Activate virtual environment
# source venv/bin/activate

# 4. Alternative: Using conda (if Anaconda/Miniconda is installed)
# conda create -n fifa_dash python=3.12
# conda activate fifa_dash

# Cross-platform environment verification
def verify_python_setup():
    """Verify Python setup is suitable for the FIFA Dashboard"""
    print("🔍 Python Setup Verification")
    print("=" * 35)
    
    # Check Python version
    python_version = sys.version_info
    min_version = (3, 8)
    
    print(f"Python Version: {python_version.major}.{python_version.minor}.{python_version.micro}")
    
    if python_version >= min_version:
        print("✅ Python version is compatible")
    else:
        print(f"❌ Python version too old. Minimum required: {min_version[0]}.{min_version[1]}")
        return False
    
    # Check available modules for dashboard
    required_modules = ['sys', 'os', 'subprocess']
    missing_modules = []
    
    for module in required_modules:
        try:
            __import__(module)
            print(f"✅ {module} available")
        except ImportError:
            missing_modules.append(module)
            print(f"❌ {module} missing")
    
    if not missing_modules:
        print("🎉 Python setup verified successfully!")
        return True
    else:
        print(f"⚠️  Missing modules: {missing_modules}")
        return False

# Run verification
setup_ok = verify_python_setup()

# 📦 Section 2: Installing Required Dependencies

## Understanding the Requirements

The FIFA Analytics Dashboard requires several Python packages for data processing, visualization, and machine learning. Here's a breakdown of the key dependencies:

### Core Libraries
- **streamlit**: Web application framework for the dashboard
- **pandas**: Data manipulation and analysis
- **numpy**: Numerical computing foundation
- **plotly**: Interactive visualizations
- **scikit-learn**: Machine learning algorithms

### Visualization & Analysis
- **seaborn**: Statistical data visualization
- **matplotlib**: Plotting library
- **scipy**: Scientific computing
- **statsmodels**: Statistical modeling

### Additional Tools
- **openpyxl**: Excel file support
- **requests**: HTTP library for API calls

## Installation Commands

In [None]:
# Installation Commands (run in terminal)
# pip install -r requirements.txt
# OR install packages individually:
# pip install streamlit==1.28.1 pandas==2.1.3 numpy==1.25.2 plotly==5.17.0

import subprocess
import pkg_resources
from typing import List, Dict

def check_package_installation(packages: List[str]) -> Dict[str, bool]:
    """Check if required packages are installed"""
    results = {}
    
    print("📋 Checking Package Installation Status")
    print("=" * 45)
    
    for package in packages:
        try:
            # Try to get package information
            dist = pkg_resources.get_distribution(package)
            results[package] = True
            print(f"✅ {package}: v{dist.version}")
        except pkg_resources.DistributionNotFound:
            results[package] = False
            print(f"❌ {package}: Not installed")
        except Exception as e:
            results[package] = False
            print(f"⚠️  {package}: Error checking - {str(e)}")
    
    return results

def install_missing_packages(missing_packages: List[str]):
    """Install missing packages using pip"""
    if not missing_packages:
        print("🎉 All packages are already installed!")
        return
    
    print(f"\n🔧 Installing missing packages: {missing_packages}")
    
    for package in missing_packages:
        try:
            print(f"Installing {package}...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])
            print(f"✅ {package} installed successfully")
        except subprocess.CalledProcessError as e:
            print(f"❌ Failed to install {package}: {e}")

# Define required packages for FIFA Dashboard
required_packages = [
    "streamlit",
    "pandas", 
    "numpy",
    "plotly",
    "scikit-learn",
    "seaborn",
    "matplotlib",
    "scipy",
    "statsmodels"
]

# Check installation status
installation_status = check_package_installation(required_packages)

# Find missing packages
missing_packages = [pkg for pkg, installed in installation_status.items() if not installed]

print(f"\n📊 Installation Summary:")
print(f"   Total packages: {len(required_packages)}")
print(f"   Installed: {len(required_packages) - len(missing_packages)}")
print(f"   Missing: {len(missing_packages)}")

if missing_packages:
    print(f"\n⚠️  Missing packages: {missing_packages}")
    print("Run the following command to install them:")
    print(f"pip install {' '.join(missing_packages)}")
else:
    print("\n🎉 All required packages are installed!")

# 📊 Section 3: Data Structure Validation and Loading

## Understanding FIFA Data Structure

The FIFA Analytics Dashboard expects specific data formats and columns. Proper data validation ensures smooth operation and prevents runtime errors.

### Required CSV Files
- `male_players.csv` - Male player statistics
- `female_players.csv` - Female player statistics

### Essential Columns
- **player_id**: Unique identifier (integer)
- **long_name**: Full player name (string)
- **overall**: Overall rating 0-100 (integer)
- **potential**: Potential rating 0-100 (integer)
- **value_eur**: Market value in Euros (float)
- **age**: Player age (integer)
- **pace, shooting, passing, dribbling, defending, physic**: Skill ratings (integer)

### Optional but Recommended
- **club_name**: Current club (string)
- **player_positions**: Playing positions (string)
- **nationality_name**: Player nationality (string)
- **league_name**: Current league (string)

In [None]:
import pandas as pd
import numpy as np
from pathlib import Path
from typing import List, Dict, Tuple, Optional

def validate_data_structure(file_path: str) -> Dict:
    """Validate FIFA data file structure and quality"""
    
    validation_results = {
        'file_exists': False,
        'file_readable': False,
        'required_columns_present': False,
        'data_quality': {},
        'recommendations': []
    }
    
    print(f"🔍 Validating: {file_path}")
    print("=" * 50)
    
    # Check if file exists
    if not Path(file_path).exists():
        print(f"❌ File not found: {file_path}")
        validation_results['recommendations'].append(f"Create or move {file_path} to the correct location")
        return validation_results
    
    validation_results['file_exists'] = True
    print(f"✅ File exists: {file_path}")
    
    try:
        # Try to read the file
        df = pd.read_csv(file_path, low_memory=False, nrows=1000)  # Read first 1000 rows for validation
        validation_results['file_readable'] = True
        print(f"✅ File readable: {len(df)} rows sampled")
        
        # Check required columns
        required_columns = [
            'player_id', 'long_name', 'overall', 'potential', 
            'value_eur', 'age', 'pace', 'shooting', 'passing', 
            'dribbling', 'defending', 'physic'
        ]
        
        missing_columns = [col for col in required_columns if col not in df.columns]
        
        if not missing_columns:
            validation_results['required_columns_present'] = True
            print("✅ All required columns present")
        else:
            print(f"❌ Missing columns: {missing_columns}")
            validation_results['recommendations'].append(f"Add missing columns: {missing_columns}")
        
        # Data quality checks
        quality_checks = {}
        
        # Check for null values in critical columns
        for col in ['player_id', 'long_name', 'overall']:
            if col in df.columns:
                null_count = df[col].isnull().sum()
                null_percentage = (null_count / len(df)) * 100
                quality_checks[f'{col}_null_percentage'] = null_percentage
                
                if null_percentage == 0:
                    print(f"✅ {col}: No null values")
                elif null_percentage < 5:
                    print(f"⚠️  {col}: {null_percentage:.1f}% null values (acceptable)")
                else:
                    print(f"❌ {col}: {null_percentage:.1f}% null values (problematic)")
                    validation_results['recommendations'].append(f"Clean null values in {col}")
        
        # Check data types
        if 'overall' in df.columns:
            if df['overall'].dtype in ['int64', 'float64']:
                overall_range = (df['overall'].min(), df['overall'].max())
                quality_checks['overall_range'] = overall_range
                
                if 0 <= overall_range[0] and overall_range[1] <= 100:
                    print(f"✅ Overall ratings in valid range: {overall_range}")
                else:
                    print(f"⚠️  Overall ratings outside expected range: {overall_range}")
                    validation_results['recommendations'].append("Check overall rating values")
        
        # Check for duplicates
        if 'player_id' in df.columns:
            duplicate_count = df['player_id'].duplicated().sum()
            quality_checks['duplicate_players'] = duplicate_count
            
            if duplicate_count == 0:
                print("✅ No duplicate player IDs")
            else:
                print(f"⚠️  {duplicate_count} duplicate player IDs found")
                validation_results['recommendations'].append("Remove duplicate player records")
        
        validation_results['data_quality'] = quality_checks
        
        # File size info
        file_size_mb = Path(file_path).stat().st_size / (1024 * 1024)
        print(f"📁 File size: {file_size_mb:.1f} MB")
        
        if file_size_mb > 100:
            validation_results['recommendations'].append("Consider data sampling for better performance")
        
    except Exception as e:
        print(f"❌ Error reading file: {str(e)}")
        validation_results['recommendations'].append(f"Fix file format or encoding issues: {str(e)}")
    
    return validation_results

def load_fifa_data(data_path: str = "data/") -> Optional[pd.DataFrame]:
    """Load and combine FIFA player data"""
    
    print("🚀 Loading FIFA Player Data")
    print("=" * 30)
    
    male_file = Path(data_path) / "male_players.csv"
    female_file = Path(data_path) / "female_players.csv"
    
    # Validate files
    male_validation = validate_data_structure(str(male_file))
    female_validation = validate_data_structure(str(female_file))
    
    print("\n" + "="*50)
    
    # Load data if validation passes
    dataframes = []
    
    if male_validation['file_readable'] and male_validation['required_columns_present']:
        try:
            male_df = pd.read_csv(male_file, low_memory=False)
            male_df['gender'] = 'Male'
            dataframes.append(male_df)
            print(f"✅ Loaded {len(male_df):,} male players")
        except Exception as e:
            print(f"❌ Error loading male players: {e}")
    
    if female_validation['file_readable'] and female_validation['required_columns_present']:
        try:
            female_df = pd.read_csv(female_file, low_memory=False)
            female_df['gender'] = 'Female'
            dataframes.append(female_df)
            print(f"✅ Loaded {len(female_df):,} female players")
        except Exception as e:
            print(f"❌ Error loading female players: {e}")
    
    if dataframes:
        combined_df = pd.concat(dataframes, ignore_index=True)
        print(f"🎉 Total players loaded: {len(combined_df):,}")
        
        # Basic data info
        print(f"\n📊 Dataset Overview:")
        print(f"   Shape: {combined_df.shape}")
        print(f"   Memory usage: {combined_df.memory_usage(deep=True).sum() / 1024**2:.1f} MB")
        print(f"   Age range: {combined_df['age'].min()}-{combined_df['age'].max()}")
        print(f"   Overall range: {combined_df['overall'].min()}-{combined_df['overall'].max()}")
        
        return combined_df
    else:
        print("❌ No valid data files found")
        return None

# Execute data validation and loading
fifa_data = load_fifa_data()

# 🚀 Section 4: Running Streamlit Applications

## Dashboard Application Commands

The FIFA Analytics Dashboard consists of three specialized applications, each serving different analytical purposes. Here are the exact commands to run each application:

### Application Overview

1. **Main Dashboard** (`app.py`): 
   - Overview statistics and distributions
   - Individual player analysis
   - Player comparisons
   - Basic ML insights

2. **Advanced Analytics** (`advanced_analytics.py`):
   - Comprehensive player reports
   - Market analysis and trends
   - Advanced ML predictions
   - League comparisons

3. **Tactical Analysis** (`tactical_analysis.py`):
   - Player clustering analysis
   - Formation optimization
   - Team building tools
   - Strategic insights

## Running Commands

In [None]:
import subprocess
import time
import webbrowser
from pathlib import Path

class StreamlitManager:
    """Manage multiple Streamlit applications"""
    
    def __init__(self, project_path: str = "."):
        self.project_path = Path(project_path)
        self.processes = {}
        self.apps = {
            "main": {
                "file": "app.py",
                "port": 8501,
                "name": "Main Dashboard",
                "description": "Core analytics and player comparison"
            },
            "advanced": {
                "file": "advanced_analytics.py", 
                "port": 8502,
                "name": "Advanced Analytics",
                "description": "ML predictions and market analysis"
            },
            "tactical": {
                "file": "tactical_analysis.py",
                "port": 8503,
                "name": "Tactical Analysis", 
                "description": "Team building and formations"
            }
        }
    
    def show_commands(self):
        """Display all available Streamlit commands"""
        print("⚽ FIFA Dashboard - Streamlit Commands")
        print("=" * 45)
        
        print("\n🎯 Individual Application Commands:")
        print("-" * 35)
        
        for app_key, app_info in self.apps.items():
            print(f"\n📊 {app_info['name']}:")
            print(f"   File: {app_info['file']}")
            print(f"   Port: {app_info['port']}")
            print(f"   Description: {app_info['description']}")
            print(f"   Command: streamlit run {app_info['file']} --server.port {app_info['port']}")
            print(f"   URL: http://localhost:{app_info['port']}")
        
        print(f"\n🔧 Advanced Commands:")
        print("-" * 20)
        print("🎨 Custom theme:")
        print("   streamlit run app.py --theme.primaryColor '#1f77b4'")
        
        print("\n🌐 Network access:")
        print("   streamlit run app.py --server.address 0.0.0.0")
        
        print("\n🐛 Debug mode:")
        print("   streamlit run app.py --logger.level debug")
        
        print("\n⚡ Performance:")
        print("   streamlit run app.py --server.maxUploadSize 1024")
        
        print("\n🛑 Stop all applications:")
        print("   Use Ctrl+C in each terminal or close terminal windows")
    
    def check_file_exists(self, filename: str) -> bool:
        """Check if Streamlit app file exists"""
        file_path = self.project_path / filename
        exists = file_path.exists()
        
        if exists:
            print(f"✅ Found: {filename}")
        else:
            print(f"❌ Missing: {filename}")
            print(f"   Expected path: {file_path}")
        
        return exists
    
    def validate_apps(self):
        """Validate all application files exist"""
        print("🔍 Validating Application Files")
        print("=" * 35)
        
        all_valid = True
        for app_key, app_info in self.apps.items():
            valid = self.check_file_exists(app_info['file'])
            if not valid:
                all_valid = False
        
        if all_valid:
            print("\n✅ All application files found!")
            print("🚀 Ready to launch dashboards!")
        else:
            print("\n❌ Some application files are missing.")
            print("📁 Please ensure you're in the correct directory.")
        
        return all_valid
    
    def get_launch_instructions(self):
        """Get step-by-step launch instructions"""
        instructions = [
            "🚀 Step-by-Step Launch Instructions",
            "=" * 40,
            "",
            "1️⃣ Open three separate terminal/command prompt windows",
            "",
            "2️⃣ Navigate to project directory in each:",
            f'   cd "{self.project_path}"',
            "",
            "3️⃣ Activate virtual environment in each:",
            "   Windows: venv\\Scripts\\activate",
            "   macOS/Linux: source venv/bin/activate",
            "",
            "4️⃣ Run applications in separate terminals:",
            ""
        ]
        
        for i, (app_key, app_info) in enumerate(self.apps.items(), 1):
            instructions.extend([
                f"   Terminal {i} - {app_info['name']}:",
                f"   streamlit run {app_info['file']} --server.port {app_info['port']}",
                ""
            ])
        
        instructions.extend([
            "5️⃣ Access dashboards in browser:",
            ""
        ])
        
        for app_key, app_info in self.apps.items():
            instructions.append(f"   {app_info['name']}: http://localhost:{app_info['port']}")
        
        instructions.extend([
            "",
            "💡 Pro Tips:",
            "   • Use Ctrl+C to stop an application",
            "   • Applications auto-reload on code changes",
            "   • Check browser console for any errors",
            "   • Use different browsers for multiple apps"
        ])
        
        return "\n".join(instructions)

# Initialize Streamlit Manager
streamlit_manager = StreamlitManager()

# Show all available commands
streamlit_manager.show_commands()

print("\n" + "="*60)

# Validate applications
apps_valid = streamlit_manager.validate_apps()

print("\n" + "="*60)

# Show launch instructions
if apps_valid:
    print(streamlit_manager.get_launch_instructions())
else:
    print("⚠️  Fix missing files before launching applications")

# 🌐 Section 5: Port Management and Configuration

## Understanding Port Configuration

When running multiple Streamlit applications simultaneously, each needs its own port to avoid conflicts. The FIFA Dashboard uses three different ports:

- **Main Dashboard**: Port 8501 (default)
- **Advanced Analytics**: Port 8502
- **Tactical Analysis**: Port 8503

## Port Management Strategies

### Default Port Usage
Streamlit uses port 8501 by default. When this port is occupied, Streamlit automatically tries the next available port (8502, 8503, etc.).

### Explicit Port Configuration
For consistent access URLs, explicitly specify ports using the `--server.port` parameter.

### Port Conflict Resolution
If a port is already in use, you'll see an error message. Here's how to handle common scenarios:

In [None]:
import socket
import psutil
from typing import List, Dict

class PortManager:
    """Manage ports for multiple Streamlit applications"""
    
    def __init__(self):
        self.default_ports = {
            'main_dashboard': 8501,
            'advanced_analytics': 8502, 
            'tactical_analysis': 8503
        }
        self.port_range = range(8501, 8520)  # Check ports 8501-8519
    
    def check_port_availability(self, port: int) -> bool:
        """Check if a port is available"""
        try:
            with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
                sock.settimeout(1)
                result = sock.connect_ex(('localhost', port))
                return result != 0  # Port is available if connection fails
        except Exception:
            return False
    
    def find_process_on_port(self, port: int) -> Dict:
        """Find which process is using a specific port"""
        for proc in psutil.process_iter(['pid', 'name', 'cmdline']):
            try:
                for conn in proc.connections():
                    if conn.laddr.port == port:
                        return {
                            'pid': proc.info['pid'],
                            'name': proc.info['name'],
                            'cmdline': ' '.join(proc.info['cmdline']) if proc.info['cmdline'] else 'N/A'
                        }
            except (psutil.NoSuchProcess, psutil.AccessDenied):
                continue
        return None
    
    def scan_ports(self) -> Dict:
        """Scan all relevant ports and return status"""
        print("🔍 Port Availability Scan")
        print("=" * 30)
        
        port_status = {}
        
        for app_name, port in self.default_ports.items():
            available = self.check_port_availability(port)
            port_status[port] = {
                'app': app_name,
                'available': available,
                'process': None if available else self.find_process_on_port(port)
            }
            
            status_icon = "✅" if available else "❌"
            print(f"{status_icon} Port {port} ({app_name}): {'Available' if available else 'In Use'}")
            
            if not available:
                process_info = port_status[port]['process']
                if process_info:
                    print(f"   🔧 Process: {process_info['name']} (PID: {process_info['pid']})")
                    if 'streamlit' in process_info['cmdline'].lower():
                        print(f"   📊 Streamlit app detected")
        
        return port_status
    
    def find_available_ports(self, count: int = 3) -> List[int]:
        """Find available ports for applications"""
        available_ports = []
        
        for port in self.port_range:
            if self.check_port_availability(port):
                available_ports.append(port)
                if len(available_ports) >= count:
                    break
        
        return available_ports
    
    def generate_port_commands(self) -> Dict[str, str]:
        """Generate Streamlit commands with appropriate ports"""
        port_status = self.scan_ports()
        commands = {}
        
        print(f"\n🚀 Recommended Commands:")
        print("-" * 25)
        
        # Try to use default ports first, fallback to available ones
        available_ports = self.find_available_ports()
        port_assignments = {}
        
        for i, (app_name, default_port) in enumerate(self.default_ports.items()):
            if port_status[default_port]['available']:
                assigned_port = default_port
            else:
                assigned_port = available_ports[i] if i < len(available_ports) else default_port + 10
            
            port_assignments[app_name] = assigned_port
        
        # Generate commands
        app_files = {
            'main_dashboard': 'app.py',
            'advanced_analytics': 'advanced_analytics.py',
            'tactical_analysis': 'tactical_analysis.py'
        }
        
        for app_name, port in port_assignments.items():
            file_name = app_files[app_name]
            command = f"streamlit run {file_name} --server.port {port}"
            commands[app_name] = {
                'command': command,
                'port': port,
                'url': f"http://localhost:{port}"
            }
            
            print(f"\n📊 {app_name.replace('_', ' ').title()}:")
            print(f"   Command: {command}")
            print(f"   URL: http://localhost:{port}")
        
        return commands
    
    def create_batch_files(self, commands: Dict) -> None:
        """Create batch files for easy application launching"""
        print(f"\n📁 Creating Launch Scripts:")
        print("-" * 30)
        
        # Windows batch files
        for app_name, cmd_info in commands.items():
            batch_content = f"""@echo off
echo Starting {app_name.replace('_', ' ').title()}...
echo URL: {cmd_info['url']}
echo Press Ctrl+C to stop the application
echo.
{cmd_info['command']}
pause"""
            
            batch_file = f"run_{app_name}.bat"
            try:
                with open(batch_file, 'w') as f:
                    f.write(batch_content)
                print(f"✅ Created: {batch_file}")
            except Exception as e:
                print(f"❌ Failed to create {batch_file}: {e}")
        
        # PowerShell script for all apps
        ps_content = """# FIFA Dashboard Launcher
Write-Host "🚀 FIFA Analytics Dashboard Launcher" -ForegroundColor Green
Write-Host "=" * 45 -ForegroundColor Green

$apps = @{"""
        
        for app_name, cmd_info in commands.items():
            ps_content += f"""
    "{app_name}" = @{{
        "command" = "{cmd_info['command']}"
        "url" = "{cmd_info['url']}"
    }}"""
        
        ps_content += """
}

Write-Host "Choose an application to launch:" -ForegroundColor Yellow
$i = 1
foreach ($app in $apps.Keys) {
    Write-Host "$i. $($app.Replace('_', ' '))" -ForegroundColor Cyan
    $i++
}

$choice = Read-Host "Enter your choice (1-3)"
$appNames = @($apps.Keys)
$selectedApp = $appNames[$choice - 1]

if ($selectedApp) {
    $appInfo = $apps[$selectedApp]
    Write-Host "🚀 Launching $($selectedApp.Replace('_', ' '))..." -ForegroundColor Green
    Write-Host "📍 URL: $($appInfo.url)" -ForegroundColor Yellow
    Write-Host "⏹️  Press Ctrl+C to stop" -ForegroundColor Red
    Write-Host ""
    
    # Execute the command
    Invoke-Expression $appInfo.command
} else {
    Write-Host "❌ Invalid choice" -ForegroundColor Red
}"""
        
        try:
            with open("launch_dashboard.ps1", 'w') as f:
                f.write(ps_content)
            print("✅ Created: launch_dashboard.ps1")
        except Exception as e:
            print(f"❌ Failed to create PowerShell script: {e}")

# Initialize Port Manager
port_manager = PortManager()

# Scan current port status
current_status = port_manager.scan_ports()

print("\n" + "="*50)

# Generate optimized commands
optimized_commands = port_manager.generate_port_commands()

print("\n" + "="*50)

# Create convenience scripts
port_manager.create_batch_files(optimized_commands)

print(f"\n💡 Pro Tips:")
print("- Use the generated batch files for easy launching")
print("- Run each app in a separate terminal window")
print("- Keep terminals open while using the dashboards")
print("- Use Ctrl+C to gracefully stop applications")

# ⚡ Section 6: Performance Optimization Techniques

## Understanding Performance Challenges

The FIFA Analytics Dashboard handles large datasets (potentially 10M+ player records). Without proper optimization, this can lead to:

- **Slow loading times**: Long waits for data processing
- **Memory issues**: Application crashes or system slowdowns  
- **Poor user experience**: Laggy interactions and timeouts
- **Resource consumption**: High CPU and RAM usage

## Optimization Strategies

### 1. Data Sampling
For exploration and development, use representative samples of your data rather than the full dataset.

### 2. Lazy Loading
Load data only when needed, rather than loading everything upfront.

### 3. Efficient Data Types
Use appropriate pandas data types to reduce memory usage.

### 4. Pagination
Display data in chunks rather than all at once.

### 5. Caching
Store processed results to avoid repeated calculations.

In [None]:
import pandas as pd
import numpy as np
import time
from typing import Tuple, Optional
import psutil
import gc

class PerformanceOptimizer:
    """Tools for optimizing FIFA dashboard performance"""
    
    def __init__(self):
        self.memory_usage_history = []
        
    def get_memory_usage(self) -> float:
        """Get current memory usage in MB"""
        process = psutil.Process()
        return process.memory_info().rss / 1024 / 1024
    
    def optimize_dataframe_dtypes(self, df: pd.DataFrame) -> pd.DataFrame:
        """Optimize DataFrame data types to reduce memory usage"""
        print("🔧 Optimizing DataFrame Data Types")
        print("=" * 35)
        
        initial_memory = df.memory_usage(deep=True).sum() / 1024**2
        print(f"📊 Initial memory usage: {initial_memory:.1f} MB")
        
        optimized_df = df.copy()
        
        # Optimize integer columns
        int_columns = df.select_dtypes(include=['int64']).columns
        for col in int_columns:
            max_val = df[col].max()
            min_val = df[col].min()
            
            if min_val >= 0:  # Unsigned integers
                if max_val < 255:
                    optimized_df[col] = df[col].astype('uint8')
                elif max_val < 65535:
                    optimized_df[col] = df[col].astype('uint16')
                elif max_val < 4294967295:
                    optimized_df[col] = df[col].astype('uint32')
            else:  # Signed integers
                if min_val > -128 and max_val < 127:
                    optimized_df[col] = df[col].astype('int8')
                elif min_val > -32768 and max_val < 32767:
                    optimized_df[col] = df[col].astype('int16')
                elif min_val > -2147483648 and max_val < 2147483647:
                    optimized_df[col] = df[col].astype('int32')
        
        # Optimize float columns
        float_columns = df.select_dtypes(include=['float64']).columns
        for col in float_columns:
            optimized_df[col] = pd.to_numeric(df[col], downcast='float')
        
        # Convert categorical columns
        object_columns = df.select_dtypes(include=['object']).columns
        categorical_threshold = 0.5  # Convert to category if unique ratio < 50%
        
        for col in object_columns:
            unique_ratio = df[col].nunique() / len(df)
            if unique_ratio < categorical_threshold:
                optimized_df[col] = df[col].astype('category')\n                print(f\"📝 Converted {col} to category (unique ratio: {unique_ratio:.2%})\")\n        \n        final_memory = optimized_df.memory_usage(deep=True).sum() / 1024**2\n        memory_reduction = (initial_memory - final_memory) / initial_memory * 100\n        \n        print(f\"📊 Final memory usage: {final_memory:.1f} MB\")\n        print(f\"✅ Memory reduction: {memory_reduction:.1f}%\")\n        \n        return optimized_df\n    \n    def create_sample_dataset(self, df: pd.DataFrame, sample_size: int = 10000, \n                            strategy: str = 'random') -> pd.DataFrame:\n        \"\"\"Create a representative sample of the dataset\"\"\"\n        print(f\"🎯 Creating Sample Dataset ({strategy})\")\n        print(\"=\" * 35)\n        \n        if len(df) <= sample_size:\n            print(f\"📊 Dataset already small enough: {len(df)} rows\")\n            return df\n        \n        if strategy == 'random':\n            sample_df = df.sample(n=sample_size, random_state=42)\n            print(f\"🎲 Random sampling: {len(sample_df)} rows\")\n            \n        elif strategy == 'stratified':\n            # Stratified sampling by overall rating\n            if 'overall' in df.columns:\n                df['overall_bin'] = pd.cut(df['overall'], bins=10, labels=False)\n                sample_df = df.groupby('overall_bin').apply(\n                    lambda x: x.sample(min(len(x), sample_size // 10), random_state=42)\n                ).reset_index(drop=True)\n                sample_df = sample_df.drop('overall_bin', axis=1)\n                print(f\"📊 Stratified sampling: {len(sample_df)} rows\")\n            else:\n                sample_df = df.sample(n=sample_size, random_state=42)\n                print(f\"🎲 Fallback to random sampling: {len(sample_df)} rows\")\n                \n        elif strategy == 'top_performers':\n            # Sample top performers by overall rating\n            if 'overall' in df.columns:\n                sample_df = df.nlargest(sample_size, 'overall')\n                print(f\"🏆 Top performers sampling: {len(sample_df)} rows\")\n            else:\n                sample_df = df.head(sample_size)\n                print(f\"📊 Head sampling: {len(sample_df)} rows\")\n        \n        else:\n            sample_df = df.head(sample_size)\n            print(f\"📊 Head sampling: {len(sample_df)} rows\")\n        \n        # Show sample characteristics\n        if 'overall' in sample_df.columns:\n            print(f\"📈 Overall range: {sample_df['overall'].min()}-{sample_df['overall'].max()}\")\n            print(f\"📊 Average overall: {sample_df['overall'].mean():.1f}\")\n        \n        return sample_df\n    \n    def implement_lazy_loading(self, file_path: str, chunk_size: int = 10000) -> None:\n        \"\"\"Demonstrate lazy loading with pandas chunks\"\"\"\n        print(f\"⚡ Lazy Loading Demo (chunk_size={chunk_size})\")\n        print(\"=\" * 45)\n        \n        try:\n            chunk_reader = pd.read_csv(file_path, chunksize=chunk_size, low_memory=False)\n            \n            chunk_count = 0\n            total_rows = 0\n            memory_usage = []\n            \n            start_time = time.time()\n            \n            for chunk in chunk_reader:\n                chunk_count += 1\n                total_rows += len(chunk)\n                current_memory = self.get_memory_usage()\n                memory_usage.append(current_memory)\n                \n                # Process chunk (example: calculate statistics)\n                if 'overall' in chunk.columns:\n                    chunk_avg = chunk['overall'].mean()\n                    print(f\"📊 Chunk {chunk_count}: {len(chunk)} rows, avg overall: {chunk_avg:.1f}\")\n                \n                # Memory management\n                if chunk_count % 10 == 0:\n                    gc.collect()  # Force garbage collection\n                    print(f\"🧹 Memory cleanup at chunk {chunk_count}\")\n                \n                # Stop after a few chunks for demo\n                if chunk_count >= 5:\n                    print(f\"⏹️  Stopping demo after {chunk_count} chunks\")\n                    break\n            \n            end_time = time.time()\n            processing_time = end_time - start_time\n            \n            print(f\"\\n📊 Lazy Loading Results:\")\n            print(f\"   Chunks processed: {chunk_count}\")\n            print(f\"   Total rows: {total_rows:,}\")\n            print(f\"   Processing time: {processing_time:.2f} seconds\")\n            print(f\"   Average memory: {np.mean(memory_usage):.1f} MB\")\n            print(f\"   Peak memory: {max(memory_usage):.1f} MB\")\n            \n        except Exception as e:\n            print(f\"❌ Error during lazy loading: {e}\")\n    \n    def benchmark_operations(self, df: pd.DataFrame) -> Dict:\n        \"\"\"Benchmark common operations on the dataset\"\"\"\n        print(\"🏃‍♂️ Performance Benchmarking\")\n        print(\"=\" * 30)\n        \n        benchmarks = {}\n        \n        operations = [\n            ('Data Loading', lambda: df.copy()),\n            ('Basic Statistics', lambda: df.describe()),\n            ('Filtering', lambda: df[df['overall'] > 80] if 'overall' in df.columns else df.head()),\n            ('Sorting', lambda: df.sort_values('overall') if 'overall' in df.columns else df),\n            ('Grouping', lambda: df.groupby('player_positions')['overall'].mean() if all(col in df.columns for col in ['player_positions', 'overall']) else None)\n        ]\n        \n        for operation_name, operation_func in operations:\n            try:\n                start_memory = self.get_memory_usage()\n                start_time = time.time()\n                \n                result = operation_func()\n                \n                end_time = time.time()\n                end_memory = self.get_memory_usage()\n                \n                execution_time = end_time - start_time\n                memory_delta = end_memory - start_memory\n                \n                benchmarks[operation_name] = {\n                    'time': execution_time,\n                    'memory_delta': memory_delta,\n                    'result_size': len(result) if hasattr(result, '__len__') else 'N/A'\n                }\n                \n                print(f\"⏱️  {operation_name}: {execution_time:.3f}s, Memory: {memory_delta:+.1f}MB\")\n                \n            except Exception as e:\n                print(f\"❌ {operation_name}: Error - {str(e)}\")\n                benchmarks[operation_name] = {'error': str(e)}\n        \n        return benchmarks\n    \n    def get_optimization_recommendations(self, df: pd.DataFrame) -> List[str]:\n        \"\"\"Generate performance optimization recommendations\"\"\"\n        recommendations = []\n        \n        # Check dataset size\n        dataset_size_mb = df.memory_usage(deep=True).sum() / 1024**2\n        row_count = len(df)\n        \n        if dataset_size_mb > 100:\n            recommendations.append(f\"🔧 Large dataset ({dataset_size_mb:.1f}MB) - Consider sampling for development\")\n        \n        if row_count > 100000:\n            recommendations.append(f\"📊 Many rows ({row_count:,}) - Implement pagination in UI\")\n        \n        # Check data types\n        memory_inefficient_cols = df.select_dtypes(include=['int64', 'float64']).columns\n        if len(memory_inefficient_cols) > 5:\n            recommendations.append(\"🔧 Optimize data types with downcast methods\")\n        \n        # Check for categorical data\n        object_cols = df.select_dtypes(include=['object']).columns\n        for col in object_cols:\n            if df[col].nunique() / len(df) < 0.5:\n                recommendations.append(f\"📝 Convert '{col}' to categorical data type\")\n        \n        # Check for missing values\n        missing_data = df.isnull().sum().sum()\n        if missing_data > len(df) * 0.1:\n            recommendations.append(\"🧹 High missing data - implement efficient handling\")\n        \n        return recommendations\n\n# Example usage with mock data if FIFA data not available\nprint(\"🚀 Performance Optimization Demo\")\nprint(\"=\" * 35)\n\n# Create a mock dataset for demonstration\nnp.random.seed(42)\nmock_data = {\n    'player_id': range(50000),\n    'overall': np.random.randint(50, 100, 50000),\n    'potential': np.random.randint(50, 100, 50000),\n    'age': np.random.randint(16, 40, 50000),\n    'value_eur': np.random.exponential(1000000, 50000),\n    'player_positions': np.random.choice(['ST', 'CM', 'CB', 'GK', 'LW'], 50000),\n    'league_name': np.random.choice(['Premier League', 'La Liga', 'Serie A', 'Bundesliga'], 50000)\n}\n\nmock_df = pd.DataFrame(mock_data)\nprint(f\"📊 Created mock dataset: {len(mock_df):,} rows\")\n\n# Initialize optimizer\noptimizer = PerformanceOptimizer()\n\n# Run optimizations\nprint(\"\\n\" + \"=\"*50)\noptimized_df = optimizer.optimize_dataframe_dtypes(mock_df)\n\nprint(\"\\n\" + \"=\"*50)\nsample_df = optimizer.create_sample_dataset(optimized_df, sample_size=5000)\n\nprint(\"\\n\" + \"=\"*50)\nbenchmark_results = optimizer.benchmark_operations(sample_df)\n\nprint(\"\\n\" + \"=\"*50)\nrecommendations = optimizer.get_optimization_recommendations(mock_df)\nprint(\"💡 Optimization Recommendations:\")\nfor rec in recommendations:\n    print(f\"   {rec}\")\n\nprint(\"\\n✅ Performance optimization demo completed!\")

# 💾 Section 7: Data Caching Implementation

## Understanding Streamlit Caching

Streamlit's caching system is crucial for performance, especially with large datasets. The `@st.cache_data` decorator stores function results and serves cached copies when the same inputs are provided.

## Benefits of Caching

- **Faster Load Times**: Avoid reprocessing data on every interaction
- **Better User Experience**: Immediate responses for cached operations
- **Resource Efficiency**: Reduce CPU and memory usage
- **Scalability**: Handle more users with the same resources

## Caching Best Practices

### 1. Cache Data Loading
Always cache expensive data loading operations.

### 2. Cache Preprocessing
Store results of data cleaning and transformation.

### 3. Cache Model Training
Save trained machine learning models.

### 4. Manage Cache Size
Clear cache when data changes or memory is low.

In [None]:
# Note: This demonstrates caching patterns used in the FIFA Dashboard\n# In an actual Streamlit app, these would be used with @st.cache_data decorator\n\nimport hashlib\nimport pickle\nimport time\nfrom functools import wraps\nfrom typing import Any, Callable, Dict, Optional\nimport os\n\nclass CacheManager:\n    \"\"\"Simulate Streamlit's caching behavior for demonstration\"\"\"\n    \n    def __init__(self, cache_dir: str = \".cache\"):\n        self.cache_dir = cache_dir\n        self.cache_stats = {'hits': 0, 'misses': 0}\n        os.makedirs(cache_dir, exist_ok=True)\n    \n    def _get_cache_key(self, func_name: str, args: tuple, kwargs: dict) -> str:\n        \"\"\"Generate a unique cache key for function call\"\"\"\n        key_data = f\"{func_name}_{str(args)}_{str(sorted(kwargs.items()))}\"\n        return hashlib.md5(key_data.encode()).hexdigest()\n    \n    def _get_cache_path(self, cache_key: str) -> str:\n        \"\"\"Get file path for cached result\"\"\"\n        return os.path.join(self.cache_dir, f\"{cache_key}.cache\")\n    \n    def cache_data(self, ttl: Optional[int] = None, show_spinner: bool = True):\n        \"\"\"Decorator to cache function results (simulates @st.cache_data)\"\"\"\n        def decorator(func: Callable) -> Callable:\n            @wraps(func)\n            def wrapper(*args, **kwargs):\n                # Generate cache key\n                cache_key = self._get_cache_key(func.__name__, args, kwargs)\n                cache_path = self._get_cache_path(cache_key)\n                \n                # Check if cached result exists and is valid\n                if os.path.exists(cache_path):\n                    if ttl is None or (time.time() - os.path.getmtime(cache_path)) < ttl:\n                        try:\n                            with open(cache_path, 'rb') as f:\n                                result = pickle.load(f)\n                            self.cache_stats['hits'] += 1\n                            print(f\"✅ Cache hit for {func.__name__}\")\n                            return result\n                        except Exception as e:\n                            print(f\"⚠️  Cache read error: {e}\")\n                \n                # Cache miss - execute function\n                self.cache_stats['misses'] += 1\n                print(f\"🔄 Cache miss for {func.__name__} - executing...\")\n                \n                if show_spinner:\n                    print(f\"⏳ Running {func.__name__}...\")\n                \n                start_time = time.time()\n                result = func(*args, **kwargs)\n                execution_time = time.time() - start_time\n                \n                print(f\"⚡ {func.__name__} completed in {execution_time:.2f}s\")\n                \n                # Save result to cache\n                try:\n                    with open(cache_path, 'wb') as f:\n                        pickle.dump(result, f)\n                    print(f\"💾 Cached result for {func.__name__}\")\n                except Exception as e:\n                    print(f\"⚠️  Cache write error: {e}\")\n                \n                return result\n            \n            return wrapper\n        return decorator\n    \n    def clear_cache(self, pattern: Optional[str] = None):\n        \"\"\"Clear cached results\"\"\"\n        cleared_count = 0\n        \n        for filename in os.listdir(self.cache_dir):\n            if filename.endswith('.cache'):\n                if pattern is None or pattern in filename:\n                    file_path = os.path.join(self.cache_dir, filename)\n                    try:\n                        os.remove(file_path)\n                        cleared_count += 1\n                    except Exception as e:\n                        print(f\"❌ Error removing {filename}: {e}\")\n        \n        print(f\"🧹 Cleared {cleared_count} cache files\")\n        self.cache_stats = {'hits': 0, 'misses': 0}\n    \n    def get_cache_stats(self) -> Dict[str, Any]:\n        \"\"\"Get cache performance statistics\"\"\"\n        total_requests = self.cache_stats['hits'] + self.cache_stats['misses']\n        hit_rate = (self.cache_stats['hits'] / total_requests * 100) if total_requests > 0 else 0\n        \n        cache_files = [f for f in os.listdir(self.cache_dir) if f.endswith('.cache')]\n        total_cache_size = sum(os.path.getsize(os.path.join(self.cache_dir, f)) for f in cache_files)\n        \n        return {\n            'hits': self.cache_stats['hits'],\n            'misses': self.cache_stats['misses'],\n            'hit_rate': hit_rate,\n            'cache_files': len(cache_files),\n            'total_size_mb': total_cache_size / 1024**2\n        }\n\n# Initialize cache manager\ncache_manager = CacheManager()\n\n# Example cached functions for FIFA Dashboard\n\n@cache_manager.cache_data(ttl=3600)  # Cache for 1 hour\ndef load_fifa_data_cached(file_path: str) -> pd.DataFrame:\n    \"\"\"Cached data loading function\"\"\"\n    print(f\"📁 Loading data from {file_path}...\")\n    time.sleep(2)  # Simulate loading time\n    \n    # In real implementation, this would load actual FIFA data\n    np.random.seed(42)\n    data = {\n        'player_id': range(10000),\n        'long_name': [f'Player_{i}' for i in range(10000)],\n        'overall': np.random.randint(50, 100, 10000),\n        'potential': np.random.randint(50, 100, 10000),\n        'value_eur': np.random.exponential(1000000, 10000)\n    }\n    \n    return pd.DataFrame(data)\n\n@cache_manager.cache_data(ttl=1800)  # Cache for 30 minutes\ndef calculate_player_statistics(df: pd.DataFrame) -> Dict[str, float]:\n    \"\"\"Cached statistics calculation\"\"\"\n    print(\"📊 Calculating player statistics...\")\n    time.sleep(1)  # Simulate processing time\n    \n    stats = {\n        'total_players': len(df),\n        'avg_overall': df['overall'].mean(),\n        'avg_potential': df['potential'].mean(),\n        'avg_value': df['value_eur'].mean(),\n        'max_overall': df['overall'].max(),\n        'min_overall': df['overall'].min()\n    }\n    \n    return stats\n\n@cache_manager.cache_data(ttl=7200)  # Cache for 2 hours\ndef train_value_prediction_model(df: pd.DataFrame) -> Dict[str, Any]:\n    \"\"\"Cached ML model training\"\"\"\n    print(\"🤖 Training value prediction model...\")\n    time.sleep(3)  # Simulate training time\n    \n    # In real implementation, this would train an actual ML model\n    model_info = {\n        'model_type': 'RandomForestRegressor',\n        'features': ['overall', 'potential'],\n        'accuracy': 0.85,\n        'training_samples': len(df),\n        'trained_at': time.time()\n    }\n    \n    return model_info\n\n@cache_manager.cache_data(ttl=600)  # Cache for 10 minutes\ndef filter_players_by_criteria(df: pd.DataFrame, min_overall: int = 80, \n                              max_age: int = 30) -> pd.DataFrame:\n    \"\"\"Cached filtering operation\"\"\"\n    print(f\"🔍 Filtering players (overall >= {min_overall}, age <= {max_age})...\")\n    time.sleep(0.5)  # Simulate processing time\n    \n    # Add age column for filtering demo\n    df['age'] = np.random.randint(16, 40, len(df))\n    \n    filtered_df = df[\n        (df['overall'] >= min_overall) & \n        (df['age'] <= max_age)\n    ]\n    \n    return filtered_df\n\n# Demonstrate caching behavior\nprint(\"🧪 Caching Demonstration\")\nprint(\"=\" * 25)\n\n# First calls - cache misses\nprint(\"\\n🔄 First execution (cache misses):\")\nprint(\"-\" * 35)\n\ndata_df = load_fifa_data_cached(\"data/players.csv\")\nstats = calculate_player_statistics(data_df)\nmodel = train_value_prediction_model(data_df)\nfiltered = filter_players_by_criteria(data_df, min_overall=85)\n\nprint(f\"\\n📊 Statistics: {stats}\")\nprint(f\"🤖 Model info: {model['model_type']}, Accuracy: {model['accuracy']}\")\nprint(f\"🔍 Filtered players: {len(filtered)}\")\n\n# Second calls - cache hits\nprint(\"\\n⚡ Second execution (cache hits):\")\nprint(\"-\" * 35)\n\ndata_df = load_fifa_data_cached(\"data/players.csv\")\nstats = calculate_player_statistics(data_df)\nmodel = train_value_prediction_model(data_df)\nfiltered = filter_players_by_criteria(data_df, min_overall=85)\n\n# Show cache statistics\nprint(\"\\n📈 Cache Performance:\")\nprint(\"-\" * 20)\ncache_stats = cache_manager.get_cache_stats()\nfor key, value in cache_stats.items():\n    if key == 'hit_rate':\n        print(f\"   {key}: {value:.1f}%\")\n    elif key == 'total_size_mb':\n        print(f\"   {key}: {value:.2f} MB\")\n    else:\n        print(f\"   {key}: {value}\")\n\n# Cache management examples\nprint(\"\\n🧹 Cache Management:\")\nprint(\"-\" * 20)\n\n# Show cache optimization recommendations\ndef get_cache_recommendations(stats: Dict) -> List[str]:\n    recommendations = []\n    \n    if stats['hit_rate'] < 50:\n        recommendations.append(\"🔧 Low hit rate - consider longer TTL values\")\n    \n    if stats['total_size_mb'] > 100:\n        recommendations.append(\"💾 Large cache size - consider periodic cleanup\")\n    \n    if stats['cache_files'] > 50:\n        recommendations.append(\"🗂️  Many cache files - implement cache rotation\")\n    \n    return recommendations\n\nrecommendations = get_cache_recommendations(cache_stats)\nif recommendations:\n    print(\"💡 Cache Optimization Recommendations:\")\n    for rec in recommendations:\n        print(f\"   {rec}\")\nelse:\n    print(\"✅ Cache performance is optimal\")\n\nprint(\"\\n🎯 Caching Best Practices for FIFA Dashboard:\")\nprint(\"   1. Cache data loading operations (@st.cache_data)\")\nprint(\"   2. Use appropriate TTL values (1-24 hours for data)\")\nprint(\"   3. Cache ML model training (longer TTL)\")\nprint(\"   4. Cache expensive calculations and aggregations\")\nprint(\"   5. Clear cache when underlying data changes\")\nprint(\"   6. Monitor cache hit rates and sizes\")\nprint(\"   7. Use hash_funcs for complex objects if needed\")

# 🛠️ Section 8: Error Handling and Troubleshooting

## Common Issues and Solutions

### 1. Import Errors
```python
# Issue: ModuleNotFoundError
# Solution: Install missing packages
pip install streamlit pandas plotly scikit-learn

# Issue: Version conflicts
# Solution: Use virtual environment and specific versions
pip install streamlit==1.28.1
```

### 2. Data Loading Problems
```python
# Issue: File not found
# Solution: Check file paths and working directory
import os
print("Current directory:", os.getcwd())
print("Files in data/:", os.listdir("data/"))

# Issue: Encoding problems
# Solution: Specify encoding when reading CSV
df = pd.read_csv("data/players.csv", encoding='utf-8')
```

### 3. Memory Issues
```python
# Issue: Out of memory
# Solutions:
# - Use data sampling
df_sample = df.sample(n=10000)

# - Optimize data types
df = df.astype({'player_id': 'uint32', 'overall': 'uint8'})

# - Use chunked processing
for chunk in pd.read_csv("large_file.csv", chunksize=1000):
    process_chunk(chunk)
```

### 4. Port Conflicts
```bash
# Issue: Address already in use
# Solutions:
streamlit run app.py --server.port 8502
# OR kill existing processes
netstat -ano | findstr :8501  # Windows
lsof -ti:8501 | xargs kill   # macOS/Linux
```

# ⚡ Section 9: Application Testing and Validation

## Testing Checklist

### Data Validation Tests
```python
def test_data_integrity(df):
    """Test data quality and integrity"""
    tests = []
    
    # Check required columns
    required_cols = ['player_id', 'long_name', 'overall']
    missing_cols = [col for col in required_cols if col not in df.columns]
    tests.append(("Required columns", len(missing_cols) == 0, missing_cols))
    
    # Check for duplicates
    duplicates = df['player_id'].duplicated().sum()
    tests.append(("No duplicates", duplicates == 0, f"{duplicates} duplicates"))
    
    # Check value ranges
    valid_overall = df['overall'].between(0, 100).all()
    tests.append(("Overall range 0-100", valid_overall, "Invalid overall values"))
    
    return tests
```

### Performance Tests
```python
def test_performance_benchmarks():
    """Test application performance"""
    import time
    
    # Test data loading speed
    start = time.time()
    df = load_fifa_data()
    load_time = time.time() - start
    
    print(f"Data loading: {load_time:.2f}s {'✅' if load_time < 10 else '❌'}")
    
    # Test filtering speed
    start = time.time()
    filtered = df[df['overall'] > 80]
    filter_time = time.time() - start
    
    print(f"Filtering: {filter_time:.2f}s {'✅' if filter_time < 1 else '❌'}")
```

# 🚀 Section 10: Deployment Configuration

## Production Deployment Setup

### 1. Environment Variables
```python
# config.py
import os
from typing import Dict, Any

class Config:
    # Data settings
    DATA_PATH = os.getenv('FIFA_DATA_PATH', 'data/')
    SAMPLE_SIZE = int(os.getenv('FIFA_SAMPLE_SIZE', '10000'))
    
    # Performance settings
    CACHE_TTL = int(os.getenv('CACHE_TTL', '3600'))
    MAX_WORKERS = int(os.getenv('MAX_WORKERS', '4'))
    
    # Streamlit settings
    SERVER_PORT = int(os.getenv('PORT', '8501'))
    SERVER_ADDRESS = os.getenv('SERVER_ADDRESS', '0.0.0.0')
    
    @classmethod
    def get_streamlit_config(cls) -> Dict[str, Any]:
        return {
            'server.port': cls.SERVER_PORT,
            'server.address': cls.SERVER_ADDRESS,
            'server.enableCORS': False,
            'server.enableXsrfProtection': False
        }
```

### 2. Docker Configuration
```dockerfile
# Dockerfile
FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8501

HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health

CMD ["streamlit", "run", "app.py", "--server.address", "0.0.0.0"]
```

### 3. Cloud Deployment Commands

#### Streamlit Cloud
```bash
# 1. Push to GitHub
git add .
git commit -m "Deploy FIFA Dashboard"
git push origin main

# 2. Connect repository in Streamlit Cloud
# 3. Set environment variables in dashboard
```

#### Heroku Deployment
```bash
# 1. Create Procfile
echo "web: streamlit run app.py --server.port=$PORT --server.address=0.0.0.0" > Procfile

# 2. Deploy to Heroku
heroku create fifa-dashboard
git push heroku main
heroku open
```

#### Docker Deployment
```bash
# 1. Build image
docker build -t fifa-dashboard .

# 2. Run container
docker run -p 8501:8501 -e FIFA_SAMPLE_SIZE=5000 fifa-dashboard

# 3. Deploy to cloud
docker tag fifa-dashboard your-registry/fifa-dashboard
docker push your-registry/fifa-dashboard
```

## 🎉 Conclusion and Next Steps

### Summary of Achievements
✅ **Complete Setup Guide**: Environment, dependencies, and data validation  
✅ **Command Reference**: All Streamlit application commands  
✅ **Performance Optimization**: Caching, sampling, and memory management  
✅ **Error Handling**: Comprehensive troubleshooting guide  
✅ **Production Ready**: Deployment configurations and best practices  

### Next Steps for Enhancement
1. **Real-time Data Integration**: Connect to live football APIs
2. **Advanced ML Models**: Implement neural networks and ensemble methods
3. **User Authentication**: Add login and personalization features
4. **Mobile Optimization**: Responsive design improvements
5. **Multi-language Support**: Internationalization features

### Resources for Continued Learning
- **Streamlit Documentation**: https://docs.streamlit.io/
- **FIFA Data Sources**: https://www.kaggle.com/datasets/stefanoleone992/fifa-23-complete-player-dataset
- **Performance Optimization**: https://docs.streamlit.io/library/advanced-features/caching
- **Deployment Guides**: https://docs.streamlit.io/streamlit-community-cloud

---

**🚀 Your FIFA Analytics Dashboard is now ready for professional deployment!**

*Happy analyzing! ⚽📊*