# ‚öΩ FIFA World Cup 2026 Draw Simulator

## üé≤ What group will Argentina get?

This notebook simulates the FIFA World Cup 2026 draw **respecting ALL official restrictions**:

- ‚úÖ 12 groups (A-L) with 4 teams each
- ‚úÖ Fixed hosts: Mexico (A), Canada (B), USA (D)
- ‚úÖ Confederation restrictions: Max 2 UEFA, Max 1 other per group
- ‚úÖ Draw order: Pot 1 ‚Üí Pot 4 ‚Üí Pot 3 ‚Üí Pot 2

---

### üöÄ Quick Start:
**Just click `Runtime ‚Üí Run all` and wait ~1-2 minutes for results!**

---

üìÇ **Full project:** [GitHub Repository](https://github.com/aschwartz97/world-cup-2026-draw-simulation)

---

## üì¶ Step 1: Setup

First, let's install dependencies and download the data files from GitHub.

In [None]:
# Install required libraries
!pip install pandas numpy -q

# Download data files from GitHub
!wget -q https://raw.githubusercontent.com/aschwartz97/world-cup-2026-draw-simulation/main/data/bombos.csv -O bombos.csv
!wget -q https://raw.githubusercontent.com/aschwartz97/world-cup-2026-draw-simulation/main/data/confederaciones.csv -O confederaciones.csv

print("‚úÖ Setup complete!")

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import random
from collections import defaultdict, Counter
import time

print("‚úÖ Libraries imported")

## üìä Step 2: Load Data

Let's load the teams organized by pots and their confederations.

In [None]:
# Load data
df_pots = pd.read_csv('bombos.csv')
df_confederations = pd.read_csv('confederaciones.csv')

print("üìä POTS:")
print(f"   Total teams: {len(df_pots)}")
print(f"\n   Distribution:")
print(df_pots['Pot'].value_counts().sort_index())

print("\nüåç CONFEDERATIONS:")
print(df_confederations['Confederation'].value_counts().sort_index())

# Show Argentina's pot
argentina_pot = df_pots[df_pots['Team'] == 'Argentina']['Pot'].values[0]
print(f"\nüá¶üá∑ Argentina is in: {argentina_pot}")

## üîß Step 3: Configuration

Set up the simulation parameters.

In [None]:
# Simulation parameters
NUM_SIMULATIONS = 10000  # 10K simulations (faster for demo, you can increase to 100K)
TARGET_TEAM = 'Argentina'
RANDOM_SEED = 42

# Tournament structure
NUM_GROUPS = 12
GROUPS = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L']

# Confederation restrictions
MAX_UEFA_PER_GROUP = 2
MAX_OTHER_CONF_PER_GROUP = 1

# Fixed hosts
FIXED_HOSTS = {
    'Mexico': 'A',
    'Canada': 'B',
    'United States': 'D'
}

# Set random seed
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

print(f"‚öôÔ∏è Configuration:")
print(f"   ‚Ä¢ Simulations: {NUM_SIMULATIONS:,}")
print(f"   ‚Ä¢ Target team: {TARGET_TEAM}")
print(f"   ‚Ä¢ Random seed: {RANDOM_SEED}")

## üé≤ Step 4: Prepare Data Structures

Create dictionaries for efficient lookup during simulation.

In [None]:
# Create confederation dictionary
conf_dict = {}
for team in df_confederations['Team'].unique():
    confederations = df_confederations[
        df_confederations['Team'] == team
    ]['Confederation'].tolist()
    conf_dict[team] = confederations

# Create pot dictionary
pot_dict = {}
for pot in df_pots['Pot'].unique():
    teams = df_pots[df_pots['Pot'] == pot]['Team'].tolist()
    pot_dict[pot] = teams

print(f"‚úÖ Data structures ready:")
print(f"   ‚Ä¢ Teams with confederation data: {len(conf_dict)}")
print(f"   ‚Ä¢ Pots: {len(pot_dict)}")

## üßÆ Step 5: Simulation Functions

Core logic for simulating the draw.

In [None]:
def can_team_go_to_group(team, group, groups_formed, conf_dict):
    """Check if a team can be assigned to a group based on confederation restrictions."""
    team_confederations = conf_dict.get(team, [])
    
    if not team_confederations:
        return True
    
    teams_in_group = groups_formed.get(group, [])
    
    conf_counter = defaultdict(int)
    for existing_team in teams_in_group:
        existing_confs = conf_dict.get(existing_team, [])
        for conf in existing_confs:
            conf_counter[conf] += 1
    
    for conf in team_confederations:
        if conf == 'UEFA':
            if conf_counter[conf] >= MAX_UEFA_PER_GROUP:
                return False
        else:
            if conf_counter[conf] >= MAX_OTHER_CONF_PER_GROUP:
                return False
    
    return True


def get_available_groups(team, pot_name, groups_formed, groups_filled_by_pot, conf_dict):
    """Get list of groups where a team can be assigned."""
    available = []
    
    for group in GROUPS:
        if group in groups_filled_by_pot.get(pot_name, set()):
            continue
        
        if can_team_go_to_group(team, group, groups_formed, conf_dict):
            available.append(group)
    
    return available


def simulate_single_draw(df_pots, conf_dict, pot_dict, max_attempts=1000):
    """Simulate a single complete draw."""
    for attempt in range(max_attempts):
        groups_formed = {group: [] for group in GROUPS}
        groups_filled_by_pot = {
            'Pot 1': set(),
            'Pot 2': set(),
            'Pot 3': set(),
            'Pot 4': set()
        }
        
        # Pot 1: Assign fixed hosts first
        pot1_teams = pot_dict['Pot 1'].copy()
        
        for host, fixed_group in FIXED_HOSTS.items():
            if host in pot1_teams:
                groups_formed[fixed_group].append(host)
                groups_filled_by_pot['Pot 1'].add(fixed_group)
                pot1_teams.remove(host)
        
        random.shuffle(pot1_teams)
        
        for team in pot1_teams:
            available = get_available_groups(team, 'Pot 1', groups_formed, groups_filled_by_pot, conf_dict)
            if not available:
                break
            
            assigned_group = random.choice(available)
            groups_formed[assigned_group].append(team)
            groups_filled_by_pot['Pot 1'].add(assigned_group)
        else:
            draw_successful = True
            
            # Pots 4, 3, 2
            for pot_name in ['Pot 4', 'Pot 3', 'Pot 2']:
                pot_teams = pot_dict[pot_name].copy()
                random.shuffle(pot_teams)
                
                for team in pot_teams:
                    available = get_available_groups(team, pot_name, groups_formed, groups_filled_by_pot, conf_dict)
                    if not available:
                        draw_successful = False
                        break
                    
                    assigned_group = random.choice(available)
                    groups_formed[assigned_group].append(team)
                    groups_filled_by_pot[pot_name].add(assigned_group)
                
                if not draw_successful:
                    break
            
            if draw_successful:
                return groups_formed
    
    return None

print("‚úÖ Simulation functions defined")

## üöÄ Step 6: Run Mass Simulation

Let's simulate thousands of draws and collect statistics!

**This will take ~1-2 minutes. Please wait...**

In [None]:
print("üé≤ Running simulation...\n")
print("="*70)

combinations_counter = defaultdict(int)
successful_sims = 0
failed_sims = 0

start_time = time.time()

for i in range(1, NUM_SIMULATIONS + 1):
    if i % 1000 == 0:
        elapsed = time.time() - start_time
        speed = i / elapsed
        remaining = (NUM_SIMULATIONS - i) / speed
        print(f"Progress: {i:,}/{NUM_SIMULATIONS:,} ({i/NUM_SIMULATIONS*100:.1f}%) | "
              f"Speed: {speed:.0f} sim/sec | Remaining: {remaining:.0f}s")
    
    result = simulate_single_draw(df_pots, conf_dict, pot_dict)
    
    if result:
        successful_sims += 1
        
        for group, teams in result.items():
            if TARGET_TEAM in teams:
                pot2_team = None
                pot3_team = None
                pot4_team = None
                
                for team in teams:
                    pot = df_pots[df_pots['Team'] == team]['Pot'].values[0]
                    if pot == 'Pot 2':
                        pot2_team = team
                    elif pot == 'Pot 3':
                        pot3_team = team
                    elif pot == 'Pot 4':
                        pot4_team = team
                
                combination = (pot2_team, pot3_team, pot4_team)
                combinations_counter[combination] += 1
                break
    else:
        failed_sims += 1

total_time = time.time() - start_time

print("\n" + "="*70)
print("‚úÖ SIMULATION COMPLETED")
print("="*70)
print(f"Total time: {total_time:.2f} seconds")
print(f"Average speed: {NUM_SIMULATIONS/total_time:.0f} simulations/second")
print(f"\nüìä Statistics:")
print(f"   ‚Ä¢ Successful: {successful_sims:,} ({successful_sims/NUM_SIMULATIONS*100:.2f}%)")
print(f"   ‚Ä¢ Failed: {failed_sims:,}")
print(f"   ‚Ä¢ Unique combinations: {len(combinations_counter):,}")
print("="*70)

## üìà Step 7: Analyze Results

Let's see the most likely groups for Argentina!

In [None]:
# Create results DataFrame
results_list = []

for combination, frequency in combinations_counter.items():
    pot2_team, pot3_team, pot4_team = combination
    probability = (frequency / successful_sims) * 100
    
    results_list.append({
        'Pot 1': TARGET_TEAM,
        'Pot 2': pot2_team,
        'Pot 3': pot3_team,
        'Pot 4': pot4_team,
        'Frequency': frequency,
        'Probability (%)': probability
    })

df_results = pd.DataFrame(results_list)
df_results = df_results.sort_values('Probability (%)', ascending=False).reset_index(drop=True)
df_results['Probability (%)'] = df_results['Probability (%)'].round(4)

print(f"‚úÖ Results DataFrame created with {len(df_results)} combinations")

### üèÜ TOP 10 Most Likely Complete Groups for Argentina

In [None]:
print("="*70)
print(f"üèÜ TOP 10 MOST LIKELY GROUPS FOR {TARGET_TEAM.upper()}")
print("="*70)

top_10 = df_results.head(10)

for idx, row in top_10.iterrows():
    print(f"\n#{idx+1} - Probability: {row['Probability (%)']:.4f}%")
    print(f"   {row['Pot 1']}")
    print(f"   {row['Pot 2']}")
    print(f"   {row['Pot 3']}")
    print(f"   {row['Pot 4']}")

print("\n" + "="*70)

### ü•á Most Frequent Opponents by Pot

In [None]:
print("="*70)
print("ü•á MOST LIKELY OPPONENTS BY POT")
print("="*70)

for pot_col in ['Pot 2', 'Pot 3', 'Pot 4']:
    print(f"\n{pot_col}:")
    
    team_frequencies = df_results.groupby(pot_col)['Frequency'].sum().sort_values(ascending=False)
    team_probabilities = (team_frequencies / successful_sims * 100).round(2)
    
    top_5 = team_probabilities.head(5)
    
    for i, (team, prob) in enumerate(top_5.items(), 1):
        print(f"   {i}. {team:40s} {prob:6.2f}%")

print("\n" + "="*70)

### üìä Concentration Metrics

In [None]:
# Calculate concentration metrics
prob_top_1 = df_results['Probability (%)'].iloc[0]
prob_top_10 = df_results.head(10)['Probability (%)'].sum()
prob_top_20 = df_results.head(20)['Probability (%)'].sum()

print("="*70)
print("üìä CONCENTRATION METRICS")
print("="*70)
print(f"\n   ‚Ä¢ Highest probability (Top 1): {prob_top_1:.4f}%")
print(f"   ‚Ä¢ Cumulative probability Top 10: {prob_top_10:.2f}%")
print(f"   ‚Ä¢ Cumulative probability Top 20: {prob_top_20:.2f}%")
print(f"   ‚Ä¢ Total unique combinations: {len(df_results):,}")

print(f"\nüí° INTERPRETATION:")
if prob_top_20 < 10:
    print("   ‚ö†Ô∏è HIGH RANDOMNESS: Very difficult to predict the exact group.")
    print("   The draw creates a very wide possibility space.")
elif prob_top_20 < 20:
    print("   üìä MODERATE RANDOMNESS: Some scenarios are more likely than others.")
    print("   Most frequent teams have statistical relevance.")
else:
    print("   üéØ LOW RANDOMNESS: Strong concentration in certain scenarios.")
    print("   Results are quite predictive.")

print("\n" + "="*70)

## üéØ Summary

### Key Findings:

1. **The draw is highly unpredictable** - No single combination dominates
2. **UEFA teams are more likely** in Pot 2 due to confederation flexibility
3. **CONCACAF teams dominate** Pots 3 and 4 (Panama, Haiti, Curazao)
4. **Argentina could face very diverse groups** - From easy to extremely difficult

---

### üîó Want to run more simulations?

Check out the full project with 100,000+ simulations:

**GitHub Repository:** https://github.com/aschwartz97/world-cup-2026-draw-simulation

---

### üìß Questions or feedback?

Open an issue on GitHub or contact me:
- Twitter: @ari_schwartz
- LinkedIn: /arielschwartz97

---

**‚≠ê If you found this useful, please star the repository!**