In [2]:
import numpy as np
import random
import cvxpy as cp
import matplotlib.pyplot as plt

(CVXPY) Apr 05 11:45:11 AM: Encountered unexpected exception importing solver GLOP:
RuntimeError('Unrecognized new version of ortools (9.8.3296). Expected < 9.8.0. Please open a feature request on cvxpy to enable support for this version.')
(CVXPY) Apr 05 11:45:11 AM: Encountered unexpected exception importing solver PDLP:
RuntimeError('Unrecognized new version of ortools (9.8.3296). Expected < 9.8.0. Please open a feature request on cvxpy to enable support for this version.')


The following is a summary of a research paper:
 
Beveridge, Andrew, and Stan Wagon. “The Sorting Hat Goes to college.” Mathematics Magazine, vol. 87, no. 4, Oct. 2014, pp. 243–251, https://doi.org/10.4169/math.mag.87.4.243. 

## Problem 

Assign first-years students to their desired courses (about 500 students must be placed into about 35 first-year courses). Maximize overall student satisfaction such that each student is assigned into a course that they ranked. 

## Solution

### Hungarian Algorithm (minimum weight perfect matching algorithm)

Weighted bipartite graph - n vertices in one part correspond to n vertices in the other part

- create a bipartite graph that enforced the constraints on class size and demographics
- choose a weighting scheme that ensured trade-offs were consistent with the goals an priorities of the AP office

The Hungarian algorithm works well but sometimes the constraints cannot be met, which is why the Integer Linear Programming method works better as it handles a wide variety of constraints.


### Integer Programming problem

**Parameters:**
- students $S_i, 1 \leq i \leq n$
- courses $C_j, 1 \leq j \leq m$
- M, F, I are indices corresponding to male, female, and international students
- preferences $P_i$
- $X_j$ : set of students i having $C_j$ as one of their choices, where $j \in P_i$
 
\* worst case $X_j = \emptyset$

**Decision Variables**
- $x_{i,j}$ = 1 if for each pair $(S_i, C_j)$ where $j \in P_i$, $S_i$ is assigned to $C_j$, and 0 otherwise
- $\sum_{j \in P_i} x_{i,j}$ = 1 for each $i \leq n$ and $x_{i,j} \in \{0,1\}$

**Objective Function**
$$\text{min } \sum_{i=1}^n x_{i,j_{i,1}} + \alpha x_{i,j_{i,2}} + \alpha^2 x_{i,j_{i,3}}  + \alpha^3 x_{i,j_{i,4}}$$ 
where $j_{i,1}, ..., j_{i,4}$ are student $S_i$'s four preferences and $\alpha > 1$ 

* $\alpha$ is a value chosen according to the trade-offs for swapping students between classes
The weight of placing a student in a course increases as it is further from the first choice.

**Constraints**
1. Each class size should be from 10 to 16 students
2. The demographics of each class should be roughly comparable to the entire student body (60% female and 11% international student)
* sometimes the problem gives infeasible solutions, so the students are placed into an unranked course

**Course size constraint**

The sum of the number of students enrolled in each course $C_j$ ranges from L to U:
- $L \leq \sum_{i \in X_j} x_{i,j} \leq U$ for each $j \leq m$

Typically, we have $(L, U) = (12, 16)$ so the course can have 12 to 16 students. However, it is common to have a course that allows more than 16 or less than 12. We introduce $q_{j,s}$ where s is the size of $C_j$ and $9 \leq s \leq 17$. Suppose we have $Q_{17}, Q_{11}, Q_{10}, Q_9$ that represents the maximum number of courses allowed with 17, 11, 10, and 9 respectively. Then, the ideal size of each $C_j$ for $Q_{17}, Q_{11}, Q_{10}, Q_9$ are:
* $\sum_j q_{j,17} \leq Q_{17}$ 
* $\sum_j q_{j,11} \leq Q_{11}$ 
* $\sum_j q_{j,10} \leq Q_{10}$ 
* $\sum_j q_{j,9} \leq Q_{9}$ 

However, we need to make sure that each course only takes a unique maximum number of students ranging from 9 to 17:
$$\sum_{s=9}^{17} q_{j,s} = 1 \quad \forall j$$

Therefore, we have the total number of students enrolled in $C_j$ be equal to the maximum number of students a course can have by multiplying s with $\sum_{s=9}^{17} q_{j,s}$
$$\sum_{i \in X_j} x_{i,j} = \sum_{s=9}^{17} sq_{j,s} \quad \forall j$$ 


**Gender constraint**

There are at least 4 male and 4 female in each course

* $\sum_{i \in M \cap X_j} x_{i,j} \geq 4$
* $\sum_{i \in F \cap X_j} x_{i,j} \geq 4$

The same idea from the course size constraint can be used for the gender constraint to help loosened the constraint of having at least 4 male and female.

The paper introduced a new variable $y_{m,j} = 1$ when $C_j$ has a value of m males, and 0 otherwise. So, we can make sure that $C_j$ has a maximum of m number of males:
$$\sum_{m=0}^{17} y_{m,j} = 1 \quad \forall j$$
and thus, the total number of male students enrolled in $C_j$ can be equal to the maximum number of male students a course can have by multiplying m with $\sum_{m=0}^{17} y_{m,j}$:
$$\sum_{i \in M \cap X_j} x_{i,j} = \sum_{m=0}^{17} my_{m,j} \quad \forall j$$

Similarly, we will have the same formula for the number of females in a course:
$$\sum_{i \in F \cap X_j} x_{i,j} = \sum_{f=0}^{17} fy_{f,j} \quad \forall j$$

**International constraint**

Each course contains at most B international students

$\sum_{i \in I \cap X_j} x_{i,j} \leq B \quad \forall j$





## Building Solution

**Writing the objective function**

Trying to assign each students with their first choice of preference:

In [244]:
### try obvious dataset where the student will be assigned to their first choice
# [0,1,1,0,1,2,3] means student id 0, female, international student with preference of course 0,1,2,3 from first to last choice
s = np.array([[0, 1, 1, 1, 0, 2, 3],
               [1, 1, 0, 0, 1, 2, 3],
               [2, 0, 1, 2, 1, 0, 3],
                [3, 1, 1, 0, 1, 2, 3],
                [4, 1, 0, 3, 2, 1, 0]])

P = s[:,-4:] # preference P of student i

# [0,5] means course id 0 with maximum 5 students in the class
c = np.array([[0,5],[1,4],[2,3],[3,2]])

a = 2
n = len(s)
m = len(c)
x = cp.Variable((n, m), integer = True) # set a variable that can only have integer values

# temporary constraints to check objectives
constraints = [   
    cp.sum(x, axis=1) == 1,  # Each student must be assigned to exactly one course
    cp.sum(x, axis=0) <= c[:, 1],  # Each course can have at most its maximum number of students
    x >= 0
]

# objective function
# P[i,1] is getting the first column which is getting the first-ranked choice of each student
objective = cp.Minimize(cp.sum([cp.sum(x[i, P[i,0]] 
                               + cp.multiply(a,x[i, P[i,1]]) 
                               + cp.multiply(a**2,x[i, P[i,2]]) 
                               + cp.multiply(a**3,x[i, P[i,3]]))
                               for i in range(n)]))

prob = cp.Problem(objective, constraints)
prob.solve(verbose=True)
x.value

                                     CVXPY                                     
                                     v1.4.2                                    
(CVXPY) Apr 07 01:18:22 AM: Your problem has 20 variables, 3 constraints, and 0 parameters.
(CVXPY) Apr 07 01:18:22 AM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Apr 07 01:18:22 AM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Apr 07 01:18:22 AM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Apr 07 01:18:22 AM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Apr 07 01:18:22 AM: Compiling problem (target solver=SCIPY).
(CV

array([[ 0.,  1., -0., -0.],
       [ 1., -0., -0., -0.],
       [-0., -0.,  1., -0.],
       [ 1., -0., -0., -0.],
       [-0., -0., -0.,  1.]])

The solution indicates that student 1 is assigned to class index 1, student 2 is assigned to class index 0, student 3 is assigned to class index 2, and student 4 is assigned to class index 0. The first ranked preference of the student 1, 2, 3, and 4 are class index 1, 0, 2, 0. So, the solution provided by the CVXPY is correct. 

**Adding course constraint and editing the dataset**

Since we are using smaller scale data, we can adjust the size of the course constraint to:
$$\sum_{s=1}^{4} q_{j,s} = 1 \quad \forall j$$
$$\sum_j q_{j,3} \leq Q_{3}$$

In the following dataset, there are more students than the course seats, so some courses should be allowed to increase class size. There are four courses, we let the first 8 students to be placed in their first choice. The other two students need more space in the class, so we let $Q_3 = 2$, where two of the course can increase the maximum number of students to 3.

In [251]:
s = np.array([[0, 1, 1, 0, 1, 2, 3],
               [1, 1, 0, 0, 1, 2, 3],
               [2, 0, 1, 1, 2, 0, 3],
                [3, 1, 1, 1, 2, 0, 3],
                [4, 1, 0, 2, 3, 1, 0],
                [5, 1, 0, 2, 3, 1, 0],
                [6, 1, 0, 3, 2, 1, 0],
                [7, 1, 0, 3, 2, 1, 0],
                [8, 1, 0, 3, 2, 1, 0],
                [9, 1, 0, 3, 2, 1, 0]])
P = s[:,-4:] # preference P of student i
c = np.array([[0,2],[1,2],[2,2],[3,2]])
n = len(s)
m = len(c)
x = cp.Variable((n, m), integer = True) # set a variable that can only have integer values

In [252]:
# create new variable q that indicates the size of c_j
q = cp.Variable((m, 4), integer = True)

# create set X_j, list of students for course j based on preference
Xj = []
# for each course
for j in range(m): 
    check_student = []
    # check if the student has a preferece of that course 
    for i in range(9):
        if j in P[i]:
            Xj.append(check_student)

$$\sum_{i \in X_j} x_{i,j} = \sum_{s=1}^{4} sq_{j,s} \quad \forall j$$ 

In [261]:
course_constraint = []

q_constraint = [cp.sum(q[j, :]) == 1 for j in range(m)]
# for each j, allow max size to be 4 for only two classes
Q_constraint = [cp.sum(q[:, 3]) <= 1] 

course_constraint += q_constraint + Q_constraint
course_constraint += [cp.sum([x[i, j] for i in Xj[j]]) == cp.sum([cp.multiply(s,q [j, s]) for s in range(4)]) for j in range(m)]

In [262]:
objective = cp.Minimize(cp.sum([cp.sum(x[i, P[i,0]] 
                               + cp.multiply(a,x[i, P[i,1]]) 
                               + cp.multiply(a**2,x[i, P[i,2]]) 
                               + cp.multiply(a**3,x[i, P[i,3]]))
                               for i in range(n)]))
prob = cp.Problem(objective, course_constraint)
prob.solve(verbose=True)
x.value

                                     CVXPY                                     
                                     v1.4.2                                    
(CVXPY) Apr 07 01:25:47 AM: Your problem has 56 variables, 9 constraints, and 0 parameters.
(CVXPY) Apr 07 01:25:47 AM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Apr 07 01:25:47 AM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Apr 07 01:25:47 AM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Apr 07 01:25:47 AM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Apr 07 01:25:48 AM: Compiling problem (target solver=SCIPY).
(CV

Solver terminated with message: The problem is unbounded or infeasible. (HiGHS Status 9: model_status is Primal infeasible or unbounded; primal_status is At lower/fixed bound)


SolverError: Solver 'SCIPY' failed. Try another solver, or solve with verbose=True for more information.

### Defining the dataset for 544 students and 35 courses (in progress)

we have to create a dummy set for 544 students and 35 courses. The students will need to be classfied into male, female and international students. We need a separate array to define prefere.nces of the students by creating an ordered list of courses. Each courses also need the limitation for the number of students. 

In [179]:
n = 544 # number of students
m = 35 # number of courses

**Create courses dataset**

The course dataset will have 35 courses each with an id from 0 to 34 and a course limit ranging from 9 to 17. 

In [174]:
# create an array to store course number and 
courses = {}

for i in range(m): # iterate to fill m number of courses
    course_id = i
    # set random limit from 9 to 17 with more max_student = 16
    prob = [0.02, 0.02, 0.02, 0.03, 0.03, 0.05, 0.13, 0.42, 0.28]
    max_students = np.random.choice(range(9,18), p = prob) 
    # let i be the course id from 1 to 35
    courses[i] = {'max': max_students}

print(len(courses))


for i in list(courses.items())[:5]:
    print(i)

35
(0, {'max': 17})
(1, {'max': 17})
(2, {'max': 17})
(3, {'max': 16})
(4, {'max': 15})


The total number of seats should be more or equal to the total number of students.

In [175]:
# check how many empty seats
total_seats = 0
for course in courses.values():
    total_seats += course['max']

print(total_seats)

555


**Create student dataset**

The student dataset should not be completely random because according to the paper, the Machester college has 60% female students and 11% international students.

- Let gender be 1 for female and 0 otherwise. 
- Let international be 1 for international students and 0 otherwise.
- Let preferences be an ordered list of courses

In [180]:
# create a dictionary to store data
course_id = list(courses.keys())
students = {}
for i in range(n): # iterate to fill n students
    rand_gender = random.random()
    rand_international = random.random()
    
    # probability of having female student is 60%
    gender = 1 if (rand_gender > 0.6) else 0
    # probability of international student is 11%
    international = 1 if(rand_international < 0.11) else 0

    # create list of preference
    preference = random.sample(course_id, 4)
    # let i be the student id from 1 to 544
    students[i] = {'gender': gender, 'international': international,
                   'preference': preference}

In [181]:
for i in list(students.items())[:5]:
    print(i)

(0, {'gender': 1, 'international': 0, 'preference': [6, 18, 15, 19]})
(1, {'gender': 0, 'international': 0, 'preference': [10, 29, 27, 8]})
(2, {'gender': 0, 'international': 0, 'preference': [25, 28, 30, 6]})
(3, {'gender': 0, 'international': 1, 'preference': [3, 26, 33, 28]})
(4, {'gender': 0, 'international': 0, 'preference': [27, 26, 34, 5]})


In [182]:
students = np.array([[i, students[i]["gender"], students[i]["international"]] + students[i]["preference"] for i in range(n)])
courses = np.array([[course_id, courses[course_id]["max"]] for course_id in range(m)])

**Set the objective function**

In [183]:
x = cp.Variable((n, m), integer = True) # set a variable that can only have integer values

P = students[:,3:7] # preference P of student i

# temporary constraints to check objectives
constraints = [   
    cp.sum(x, axis=1) == 1,  # Each student must be assigned to exactly one course
    cp.sum(x, axis=0) <= courses[:, 1],  # Each course can have at most its maximum number of students
    x >= 0
]

# objective function
# P[i,1] is getting the first column which is getting the first-ranked choice of each student
objective = cp.Minimize(cp.sum([cp.sum(x[i, P[i,0]] 
                               + cp.multiply(a,x[i, P[i,1]]) 
                               + cp.multiply(a**2,x[i, P[i,2]]) 
                               + cp.multiply(a**3,x[i, P[i,3]]))
                               for i in range(n)]))

prob = cp.Problem(objective, constraints)
prob.solve(verbose=True)
x.value

                                     CVXPY                                     
                                     v1.4.2                                    
(CVXPY) Apr 06 01:43:38 AM: Your problem has 19040 variables, 3 constraints, and 0 parameters.
(CVXPY) Apr 06 01:43:38 AM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Apr 06 01:43:38 AM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Apr 06 01:43:38 AM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Apr 06 01:43:38 AM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Apr 06 01:43:38 AM: Compiling problem (target solver=SCIPY).


array([[-0., -0., -0., ..., -0., -0., -0.],
       [-0., -0., -0., ..., -0., -0., -0.],
       [-0., -0., -0., ..., -0., -0.,  1.],
       ...,
       [-0., -0., -0., ..., -0., -0., -0.],
       [-0., -0., -0., ..., -0., -0., -0.],
       [-0., -0., -0., ..., -0., -0., -0.]])

In [184]:
check = True
# in each row the sum should be equal to one since students are assigned to only one class
for n in x.value:
    if sum(n) == 1: check = True
    else: False

print(check)

True


Since it's true, then the temporary constraint is fulfilled.

**Set course constraints**
$$\sum_{s=9}^{17} q_{j,s} = 1$$
$$\sum_{i \in X_j} x_{i,j} = \sum_{s=9}^{17} sq_{j,s}$$ 

**Set gender constraints**

$$\sum_{m=0}^{17} y_{m,j} = 1$$
$$\sum_{f=0}^{17} y_{f,j} = 1$$
$$\sum_{i \in M \cap X_j} x_{i,j} = \sum_{m=0}^{17} my_{m,j}$$
$$\sum_{i \in F \cap X_j} x_{i,j} = \sum_{f=0}^{17} fy_{f,j}$$

**Set international students constraints**

$$\sum_{i \in I \cap X_j} x_{i,j} \leq 8$$

**Solve**