# Artifact 5

## Building off a simple scheduling problem

I choose this problem because as the NHL season is coming to an end, I realized how much of an advantage it is to play teams in your conference later on in the season. As the season progresses, the standings can become very tight, especially within the conferences which directly determine playoff qualification and seeding. Playing against direct rivals towards the end of the season means that teams have greater control over their playoff destinies. Winning these games can directly impact their position in the standings or knock out close competitors.

To be able to optimize the best schedule for each team to play teams in their conference later on I came up with a very simple example.

Lets consider 4 teams

a) Vancouver <br>
b) Toronto <br>
c) Boston <br>
d) Chicago <br>

Lets consider two divisions 
1) Canadian Division
2) American Division

For simplicity lets say you play each team in you division twice and the team in the other division once. hence each team should be playing 4 games. The team with most wins makes it to the championship.

The objective is to maximize the team preferences

Our decision is to decide which team play which each week

Lets first write our constraints formally:

$X_{VT1} + X_{VT2} + X_{VT3} + X_{VT4} = 2$ <br>
$X_{BC1} + X_{BC2} + X_{BC3} + X_{BC4} = 2$ <br>
$X_{VB1} + X_{VB2} + X_{VB3} + X_{VB4} = 1$ <br>
$X_{VC1} + X_{VC2} + X_{VC3} + X_{VC4} = 1$ <br>
$X_{TB1} + X_{TB2} + X_{TB3} + X_{TB4} = 1$ <br>
$X_{TC1} + X_{TC2} + X_{TC3} + X_{TC4} = 1$ <br>

The following constraint ensures that a team plays one game a week (in this case the first week)
The same constraint needs to be set up for every week and every team <br>
$X_{VC1} + X_{VT1} + X_{VB1} = 1$ <br>

Now we need to assign preference scores. We know that the later in the season the more advantage there is for the teams playing in the same conference, hence lets assign preference scores as follows.

If teams in the same conference play in the first week: score = 1 <br>
If teams in the same conference play in the second week: score = 2 <br>
If teams in the same conference play in the third week: score = 4 <br>
If teams in the same conference play in the fourth week: score = 8 <br>


This gives us the following objective function:

$ maximize$ $x_{VT1} + 2x_{VT2} + 4x_{VT3} + 8x_{VT4} + x_{BC1} + 2x_{BC2} + 4x_{BC3} + 8x_{BC4} $



Lets solve this using pulp

In [10]:
import pandas as pd
from itertools import permutations
import pulp as pl
import random

In [2]:
# Teams and divisions
teams = ['A', 'B', 'C', 'D']
divisions = {('A', 'B'), ('B', 'A'), ('C', 'D'), ('D', 'C')}
weeks = [1, 2, 3, 4]  # Weeks of the season

# Preference scores for intra-division matches
preference_scores = {1: 1, 2: 2, 3: 4, 4: 8}

# All possible matches, including intra-division matches twice and inter-division matches once
matches = [('A', 'B'), ('A', 'B'), ('C', 'D'), ('C', 'D'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]

def valid_schedule(schedule):
    """ Check if a schedule is valid: each team plays exactly once per week """
    for week in schedule:
        teams_this_week = [team for match in week for team in match]
        if len(set(teams_this_week)) != 4:
            return False  # Some team appears more than once in a week or is missing
    return True

def generate_schedules():
    """ Generate all possible schedules of matches across 4 weeks """
    for potential_schedule in permutations(matches):
        # Split the flat permutation into weeks of 2 matches each
        schedule = [potential_schedule[i:i+2] for i in range(0, len(potential_schedule), 2)]
        if valid_schedule(schedule):
            yield schedule

def score_schedule(schedule):
    """ Calculate the preference score for a given schedule """
    score = 0
    for week_number, week_matches in enumerate(schedule, start=1):
        for match in week_matches:
            if match in divisions:
                score += preference_scores[week_number]
    return score

# Evaluate all valid schedules and find the one with the highest score
best_score = 0
best_schedule = None
for schedule in generate_schedules():
    score = score_schedule(schedule)
    if score > best_score:
        best_score = score
        best_schedule = schedule

# Print the best schedule and its score
if best_schedule:
    print("Best Schedule:")
    for week_index, week in enumerate(best_schedule, start=1):
        games = ", ".join(f"{match[0]} vs {match[1]}" for match in week)
        print(f"Week {week_index}: {games}")
    print("Total Preference Score:", best_score)
else:
    print("No valid schedule found.")


Best Schedule:
Week 1: A vs C, B vs D
Week 2: A vs D, B vs C
Week 3: A vs B, C vs D
Week 4: A vs B, C vs D
Total Preference Score: 24


## Same idea larger problem:

I wanted to run a similar idea with more data using the same approach. I put my code into chat gpt and asked it to create code that makes this idea work on larger data. However I stop the run because it takes forever to run the code. I wasn't able to find a way to make it run faster so I took another approach for bigger set of teams.

In [11]:
from itertools import permutations

# Define teams and their conferences
eastern_conference = ['Boston Bruins', 'New York Rangers', 'Toronto Maple Leafs', 'Montreal Canadiens']
western_conference = ['Los Angeles Kings', 'San Jose Sharks', 'Chicago Blackhawks', 'Vegas Golden Knights']
teams = eastern_conference + western_conference

# Define intra-conference divisions
eastern_divisions = {(a, b) for a in eastern_conference for b in eastern_conference if a != b}
western_divisions = {(a, b) for a in western_conference for b in western_conference if a != b}
divisions = eastern_divisions.union(western_divisions)

# Number of weeks in the season
weeks = list(range(1, 9))  

# Preference scores for intra-conference matches
preference_scores = {week: 2**week for week in weeks}

# Generate all possible matches
matches = []
for i in teams:
    for j in teams:
        if i != j:
            matches.append((i, j))

def valid_schedule(schedule):
    """ Check if a schedule is valid: each team plays exactly once per week """
    for week in schedule:
        teams_this_week = [team for match in week for team in match]
        if len(set(teams_this_week)) != len(teams):
            return False  # Some team appears more than once in a week or is missing
    return True

def generate_schedules():
    """ Generate all possible schedules of matches across the season """
    for potential_schedule in permutations(matches, 4 * len(weeks)):  # 4 matches per week for 8 teams
        schedule = [potential_schedule[i:i+4] for i in range(0, len(potential_schedule), 4)]
        if valid_schedule(schedule):
            yield schedule

def score_schedule(schedule):
    """ Calculate the preference score for a given schedule """
    score = 0
    for week_number, week_matches in enumerate(schedule, start=1):
        for match in week_matches:
            if match in divisions:
                score += preference_scores[week_number]
    return score

# Evaluate all valid schedules and find the one with the highest score
best_score = 0
best_schedule = None
for schedule in generate_schedules():
    score = score_schedule(schedule)
    if score > best_score:
        best_score = score
        best_schedule = schedule

# Print the best schedule and its score
print("Best Schedule:")
for week_index, week in enumerate(best_schedule, start=1):
    games = ", ".join(f"{match[0]} vs {match[1]}" for match in week)
    print(f"Week {week_index}: {games}")
print("Total Preference Score:", best_score)


KeyboardInterrupt: 

The run time on this is extremely long and the algorithm is very inefficient. I looked into how we can switch this algorithm into something that runs faster and can be used for every team in the NHL

## Another way to optimize schedules that work better with big data

Realistically I think it is hard to schedule based on when is best to play each team. Easier constraints would be the amount of rest between games and have a distance constraint between teams.

I have set up this problem that is much easier to run.

I take all the NHL teams in the league and split it based on their conference. 

I want to minimize the travel distance between all teams

The constraints are as follows:

Every team must play each team in the conference 4 times, 2 of those games must be away and 2 must be at home
Every team must play every team in the other conference 2 times, 1 of those games must be away and 1 must be at home

There must be a minimum of 3 days rest between games for every team

Lets write this problem formally

Our decision variable can be written as $X_{i, j, k}$ where the first index represents the home team, the second index represents the away team, and k represents the day index

$X_{i, j, k} = 1$ if team i plays team j on day k

Our objective function becomes:

$
minimize \sum_{i, j, k} d_{i, j} x_{j, i, k}
$


Our constraints are:

4 games between all teams in the same conference
$
\sum_{k} (x_{ijk} + x_{jik}) = 4$ for all i, j in same conference



2 games between all teams in the different conference
$
\sum_{k} (x_{ijk} + x_{jik}) = 2$ for all i, j in different conference



### Home and Away Constraints
Each team must play half of its games at home and the other half away. The constraints differ based on whether the games are intra-conference or inter-conference.

- **Intra-Conference Games**:
  For all intra-conference pairs of teams \(i\) and \(j\), each team plays two games at home:
  $
  \sum_{k} x_{ijk} = 2
  $

- **Inter-Conference Games**:
  For all inter-conference pairs of teams \(i\) and \(j\), each team plays one game at home:
  $
  \sum_{k} x_{ijk} = 1
  $

### Rest Constraints
To ensure adequate rest, there must be at least 3 days of rest between games for each team:

$
x_{ijk} + x_{i,j,k+1} + x_{i,j,k+2} + x_{i,j,k+3} \leq 1 \quad \text{for all } i, j, k
$

This constraint prevents any team from playing more than one game within a 3-day window.

### Schedule Feasibility
A team cannot be in two places at once, which requires scheduling one game per team per day:

- **Single Game per Team per Day**:
  $
  \sum_{j} x_{ijk} \leq 1 \quad \text{and} \quad \sum_{i} x_{ijk} \leq 1 \quad \text{for all } i, k
  $

This ensures that each team only has one game scheduled on any given day, whether as the home or away team.



In [6]:
# Team information (Conferences and Divisions)
eastern_teams = [
    "Tampa Bay Lightning", "Toronto Maple Leafs", "Boston Bruins", "Montreal Canadiens",
    "Florida Panthers", "Buffalo Sabres", "Detroit Red Wings", "Ottawa Senators",
    "New York Islanders", "New York Rangers", "New Jersey Devils", "Philadelphia Flyers",
    "Pittsburgh Penguins", "Washington Capitals", "Carolina Hurricanes", "Columbus Blue Jackets"
]
western_teams = [
    "Chicago Blackhawks", "St. Louis Blues", "Nashville Predators", "Minnesota Wild",
    "Dallas Stars", "Winnipeg Jets", "Colorado Avalanche", "Arizona Coyotes",
    "Edmonton Oilers", "Calgary Flames", "San Jose Sharks", "Los Angeles Kings",
    "Anaheim Ducks", "Vancouver Canucks", "Vegas Golden Knights", "Seattle Kraken"
]

teams = eastern_teams + western_teams
team_indices = {team: idx for idx, team in enumerate(teams)}

# Days in the season
days = list(range(200))

# Example of distances (in miles, approximations)
# Normally, you would use actual data here. This is a placeholder approach.
distances = {}
for team1 in teams:
    for team2 in teams:
        if team1 != team2:
            # Assign a random distance between 300 to 3000 miles
            distances[(team1, team2)] = random.randint(300, 3000)
# Problem
prob = pl.LpProblem("NHL_Travel_Minimization", pl.LpMinimize)

# Decision Variables
x = pl.LpVariable.dicts("match", (teams, teams, days), cat='Binary')

# Objective
prob += pl.lpSum(x[i][j][k] * distances[(i, j)] 
                 for i in teams for j in teams if i != j for k in days)

# Constraints
for i in teams:
    for j in teams:
        if i != j:
            conference_games = 4 if (i in eastern_teams and j in eastern_teams) or (i in western_teams and j in western_teams) else 2
            home_games = 2 if conference_games == 4 else 1
            # Match constraints
            prob += pl.lpSum(x[i][j][k] + x[j][i][k] for k in days) == conference_games
            # Home and away constraints
            prob += pl.lpSum(x[i][j][k] for k in days) == home_games
            
            # Rest constraints
            for k in days[:-3]:
                prob += x[i][j][k] + x[i][j][k+1] + x[i][j][k+2] + x[i][j][k+3] <= 1

# Solve Problem
prob.solve()

# Output results
for v in prob.variables():
    if v.varValue > 0:
        print(v.name, "=", v.varValue)

match_Anaheim_Ducks_Arizona_Coyotes_1 = 1.0
match_Anaheim_Ducks_Arizona_Coyotes_100 = 1.0
match_Anaheim_Ducks_Boston_Bruins_0 = 1.0
match_Anaheim_Ducks_Buffalo_Sabres_1 = 1.0
match_Anaheim_Ducks_Calgary_Flames_1 = 1.0
match_Anaheim_Ducks_Calgary_Flames_100 = 1.0
match_Anaheim_Ducks_Carolina_Hurricanes_0 = 1.0
match_Anaheim_Ducks_Chicago_Blackhawks_1 = 1.0
match_Anaheim_Ducks_Chicago_Blackhawks_10 = 1.0
match_Anaheim_Ducks_Colorado_Avalanche_1 = 1.0
match_Anaheim_Ducks_Colorado_Avalanche_100 = 1.0
match_Anaheim_Ducks_Columbus_Blue_Jackets_0 = 1.0
match_Anaheim_Ducks_Dallas_Stars_1 = 1.0
match_Anaheim_Ducks_Dallas_Stars_100 = 1.0
match_Anaheim_Ducks_Detroit_Red_Wings_1 = 1.0
match_Anaheim_Ducks_Edmonton_Oilers_1 = 1.0
match_Anaheim_Ducks_Edmonton_Oilers_10 = 1.0
match_Anaheim_Ducks_Florida_Panthers_1 = 1.0
match_Anaheim_Ducks_Los_Angeles_Kings_1 = 1.0
match_Anaheim_Ducks_Los_Angeles_Kings_10 = 1.0
match_Anaheim_Ducks_Minnesota_Wild_1 = 1.0
match_Anaheim_Ducks_Minnesota_Wild_100 = 1.0
mat

How to interpret the results:

match_Anaheim_Ducks_Arizona_Coyotes_0 = 1.0

This means the teams playing are Anaheim_Ducks and Arizona_Coyotes. The 0 is the day index (in this case it is the first day) and the 1.0 means that the game is good to be scheduled

## Improvements

Ideally I wanted to put in team preferences into the later optimization problem. Unfortunately I had a lot of trouble implementing that. 

Additionally I have rest day constraints but no time period constraint on how many days until all the games have to be played.

Lastly I am just using random distances to focus on the optimization problem. Next steps would include getting the actual distances.