# Phase II Model With Real Data
First attempt at a Phase II model, one which creases the master schedule and assigns students to courses simultaneously.

$S$ -- Set of all students

$C$ -- Set of all courses

$T$ -- Set of all periods {1,2,3,4,7,8} 

$I$ -- Set of all instructors

$R$ -- set of all rooms 

**Variables:**

$x_{i,j}$ for  $i \in S, j \in C$ -- Binary, 1 if student $i$ assigned to course $j$ 

$c_{j,t}$ for $j \in C, t \in T$ -- Binary, 1 if course $j$ to be offered in period $t$

$r_{j,s,t}$ for $j \in C$, $s \in R$ and $t \in T$ -- binary, describes if course $j$ to be held in room $s$ durring period $t$.

$u_{i,j,t}$ for $i \in S$, $j \in C$, $t \in T$ -- binary, used for constraint, no practical meaning.

** Parameters:**

$P_{i,j}$ -- Preference for student $i$ on course $j$

$S_i$ -- Seniority constant, e.g., higher for seniors

$D_{i,j}$ -- Binary, 1 if course $i$ and $j$ are in the same department, i.e., if they meet the same requirement, e.g., highschool math

$Ta_{i,j}$ -- Binary, 1 if teacher $i$ is teaching course $j$

$Cap_j$ -- Capacity of course $j$

$Min_j$ -- Minimum number of students needed for course $j$

$Db_j$ -- Indicates if course $j$ is a double period (1 or 0)


** Constraints: **

$\sum_{j} x_{i,j} =6 \quad \forall i \in S$ -- Says students can be assigned to six courses (full course load).

$ x_{i,j} \leq P_{i,j} \; \forall i \in S, j \in C$ -- Says students not assigned to courses they didn't preference. 


$x_{i,j} =\sum_{t} u_{i,j,t} \; \forall i \in S, j \in C$ -- Used to ensure $u$ takes correct value of $x$ and $c$

$c_{j,t} \geq u_{i,j,t} \; \forall i \in S, j \in C, t \in T$ -- This, in conjunction with the above are sufficient to say that no student is enrolled in more than one course per period. 

$ \text{lower bound}_{i,j} \leq \sum_{i \in C} D_{i,j}x_{k,i} \leq \text{upper bound}_{i,j} \; \forall k \in S, j \in C$ -- Says tha students take within a certain number of courses in the department, where the department is defined by the course $j$. ** new, but still likely unworkable department constraint--not in code **

$\sum_{t \in T} c_{j,t} = 1 \quad \forall j \in C$ -- Says that each course taught only once

$\sum_{i \in S} x_{i,j} \leq Cap_j \quad \forall j \in C$ -- Course capacity constraint

$\sum_{j \in C} c_{j,t} Ta_{k,j} \leq 1  \quad \forall k \in I, \forall t \in T$ -- Teacher constraint (a teacher can teach at most one course per period), where $Ta_{k,j}$ is a parameter, not a variable

*Double Period Constraints*---In terms of courses, we only add constraints if the parameter value $Db_j=1$, note, this *is* linear, as the parameter is not a variable: 
$c_{j,t} \leq c_{j+1, t+1} \; \forall j:Db_j =1$ -- cannot schedule the first half without also doing the second  
$c_{j,8} = 0, c_{j,4} = 0 \; \forall j:Db_j = 1$ -- cannot schedule the first half in the pre-lunch of last periods.  
$x_{i,j+1} \geq x_{i,j} \; \forall i \in S, j:Db_j = 1$ -- students must be enrolled in both parts of double. 


$ \sum_{s \in R} r_{j,s,t} = c_{j,t} $ -- if a course is offered, it is given exactly one room
    
$\sum_{j \in C} r_{j,s,t} \leq 1 $ for $s \in R$ and $t \in T$ -- Each room can have at most one class in it at a time (we should make the gym two different rooms).

TODO: Formulate room constraint for departnemnts, i.e., at most 3 science courses per period.

** Objective: **

$ \text{max }\sum_{i \in S} \sum_{j \in C} x_{i,j} P_{i,j} $ -- Assuming preferences take a higher value if they are a student's preferred choice, this will give a higher weight to higher assignments (at this point, I am leaving out the seniority multiplier).

In [1]:
from pyscipopt import Model, quicksum
import numpy as np
import pandas as pd
from os import system

In [25]:
# read in data
prefs = pd.read_csv("Resources/FlatChoicesBinary.csv")
courses = pd.read_csv("Resources/FlatCourseSize.csv")
#prox = pd.read_csv("Resources/Proximity.csv")
prox = pd.read_csv("Resources/Proximity.csv")
teacher = pd.read_csv("Resources/Teacher_Info.csv", header=None)

# clean it up
prefs.rename(columns={"Unnamed: 0": "Student"}, inplace=True)
courses.rename(columns={"0":"Class"}, inplace=True)
courses.drop("Unnamed: 0", axis=1, inplace=True)

In [26]:
# Extract sets
S = prefs["Student"].tolist() # list of all students (once we get ID make dictionary)

Cd = {} # Course dictionary
for i in courses.index:
    Cd[i] = courses["Class"].iloc[i]
C = range(len(Cd))
    
T = [1,2,3,4,7,8] # Periods

## Instructors and corerspondence
I = list(set(teacher[0]))
DW_courses = list(set(teacher[1]))

# Need matrix with instructors as rows, all courses as columns, and 1 if teaching that course
I_C_dict = {}
for i in I:
    I_C_dict[i] = []
    for index in range(teacher.shape[0]):
        if teacher.iloc[index][0] == i:
            l = I_C_dict[i]
            l.append(teacher.iloc[index][1])
            I_C_dict[i] = l

In [27]:
# Teacher_Course_Matrix 
courses_list = list(Cd.values())
Teacher_Course_Matrix = np.zeros(len(courses_list))
for i in I:
    t = np.zeros(len(courses_list))
    for j in Cd:
        if Cd[j] in I_C_dict[i]:
            # print(i, "is teaching:", (40-len(i)-12)*".", Cd[j])
            t[j] =1
    Teacher_Course_Matrix = np.vstack([Teacher_Course_Matrix, np.matrix(t)])

Ta = np.array(Teacher_Course_Matrix[1:]) # matrix tying teachers to courses they teach

In [28]:
# Room Data (we will eventually need to tie this to subject, i.e. for science?)
R = ["U1", "Steve", "U2", "U3", "U4/5", "U7", "U7", "L2", "L3", "Library", "Art", "L4", 
        "L6", "Sci A", "Sci B", "Sci C", "Music Room", "Gym", "Gym2", "OtherRoom", "EmptyRoom"]

In [29]:
# Extract Preferences
P = prefs.drop("Student", axis=1).as_matrix()
#P = np.ones([len(S),len(C)]) # all 1's as test (student will take any course)

In [30]:
# Double periods
Db = courses["Double"].fillna(0).astype(int)

In [31]:
# Proximity Matrix
D = prox.drop("0", axis=1).as_matrix()

In [32]:
# Create Proximity dictionary {subject:proximity vector}
prox_dict = {}
Subjects = list(prox.columns)[1:]
for subj in list(prox.columns)[1:]:
    prox_dict[subj] = prox[subj]

In [33]:
# Course Sizes (min and max)
MIN = courses["Min"]
MAX = courses["Max"]

# To check feasibility:
MIN = [0]*len(C)
MAX = [100]*len(C)

In [34]:
# Setup model
m = Model("PhaseTwo")

In [20]:
Cd

{0: '6th Grade Art',
 1: '7th-8th Grade Art',
 2: '8th Grade Science',
 3: 'Adaptive PE',
 4: 'Advanced Algebra and Trigonometry',
 5: 'Advanced Algebra and Trigonometry 2',
 6: 'Advanced Spanish',
 7: 'Advanced/In-Depth French',
 8: 'African Studies',
 9: 'African Studies 2',
 10: 'Algebra A',
 11: 'Algebra A 2',
 12: 'Algebra B',
 13: 'Algebra B 2',
 14: 'American Studies/Global Perspective',
 15: 'American Studies/Global Perspective 2',
 16: 'Animal Bio',
 17: 'Asia Studies',
 18: 'Banned Books',
 19: 'Beginning Algebra and Geometry',
 20: 'Beginning Algebra and Geometry 2',
 21: 'Celtic Band',
 22: 'Chemistry',
 23: 'Childhood in Conflict',
 24: 'Childhood in Conflict 2',
 25: 'Choir',
 26: 'Community Service Class',
 27: 'Computer Literacy',
 28: 'Constitutional Law/Government',
 29: 'Dark Fiction',
 30: 'Drawing and Painting',
 31: 'Ecology',
 32: 'Economics',
 33: 'Empty',
 34: 'English Seminar',
 35: 'Evolutionary Biology',
 36: 'Facing History',
 37: 'Facing History 2',
 38: '

In [35]:
# Trackers--to verify what SCIP says
num_vars = 0
num_cons = 0

In [36]:
# Add Student Variables (X)
X = {}
for i in S:
    for j in range(len(C)):
        name = "Student " + str(i) + " in course " + str(j)
        X[i,j] = m.addVar(name, vtype='B')
        num_vars += 1

In [37]:
# Add Course Variable
Course = {} # Variable dictionary
for j in range(len(C)):
    for t in T:
        name = "Course " + str(j) + " in period " + str(t)
        Course[j,t] = m.addVar(name, vtype='B')
        num_vars += 1

In [38]:
# Create the u variable
U = {}
for i in S:
    for j in range(len(C)):
        for t in T:
            name = "min " + str(i) + ", " + str(j) + ", " + str(t)
            U[i,j,t] = m.addVar(name, vtype='B')
            num_vars += 1

$\sum_{j \in C} u_{i,j,t} = 1 \; \forall i\in S, j \in T$ -- Student is in exactly one course per period.

In [40]:
# Force student in one course per period
for i in S:
    for t in T:
        m.addCons(quicksum(U[i,j,t] for j in C) == 1) # one course per period
        num_cons += 1

It should be either the constraint above, or the one below, but we shouldn't need both. That being said, neither work. *Hopefully now they do*

In [41]:
# Add Student assignment constraint (must have two classes)
## WE CAN ELIMINATE THIS, IF WE FORCE A STUDENT IN ONE COURSE PER PERIOD
# for i in S:
#         m.addCons(quicksum(X[i,j] for j in C) == 6) # one per period
#         num_cons += 1

$x_{i,j} \leq P{i,j} \; \forall i \in S, j \in C$ -- Students only given courses they have put a preference over.

In [42]:
# Students only given courses they preferenced
for i in S:
    for j in C:
        #m.addCons((1 - X[i,j]) + P[i][j] >= 1)
        m.addCons(X[i,j] <= P[i][j])
        num_cons += 1

$\sum_{t \in T} u_{i,j,t} = x_{i,j} \; \forall i \in S, j \in C$ -- Ties the $u$ varible to the $x$ variable for course and student. 

$u_{i,j,t} \leq c_{j,t} \; \forall i \in S, j \in C, t \in T$ -- Ties $u$ to $c$ ensuring a student is not signed up for a course in a period in which the course will not be offered.

In [43]:
# "AND" Constraint--no more than one course per period for a student
for i in S:
    for j in C:
        m.addCons(X[i,j] == quicksum(U[i,j,t] for t in T))
        num_cons += 1
        for t in T:
            m.addCons(Course[j,t] >= U[i,j,t])
            num_cons += 1

In [44]:
## This is the old way pre-frans, just left so we can keep track if we want to go back
# for i in S:
#     for j in C:
#         for t in T:
#             m.addCons(U[i,j,t] >= X[i,j] + Course[j,t] - 1)
#             m.addCons(U[i,j,t] <= X[i,j])
#             m.addCons(U[i,j,t] <= Course[j,t])
#             num_cons += 3

$\text{min} \geq \sum_{i \in S} x_{i,j} \leq \text{max} \; \forall j \in C$ -- course size constraints.

In [45]:
# Add capacity and minimum constraint
for j in range(len(C)):
    #m.addCons(quicksum(X[i,j] for i in S) <= MAX[j])
    m.addCons(quicksum(X[i,j] for i in S) <= 100)
    #m.addCons(quicksum(X[i,j] for i in S) >= 0)
    num_cons += 2

## Possible Proximity Fix

The following outlines a possible fix for the proximity constraints:

Firstly, we need to define proximity matricies **for each subject**. These will be a list of *every* course, and the entry for that course will have a 1 if it is in that subject/counts for the requirement. 

Next, we need minimum and maximums for eachs student, by subject. So we want a minimum list for each subject, with an entry for each student corresponding to either the minimum of maximum number of courses in that subject for the student. 

As far as the encoding. We only want to include the constraint for minimum if it is $>0$ for that student. Is there any similar shortcut for maximums?

In [46]:
# Setup proximity min and max dicts (temp untill we generate more granular data)
min_sub_dict = {}
max_sub_dict = {}
for subj in Subjects:
    min_sub_dict[subj] = np.ones(len(S))*0
    max_sub_dict[subj] = np.ones(len(S))*8

$\text{min}_{\text{$i$, subject}}\leq \sum_{j \in C} D_{\text{subject}} x_{i,j} \leq \text{max}_{\text{$i$, subject}} \; \forall i \in S, \text{subjects}$ -- Says student within max and min of number of courses they should be taking in a given subject.

In [47]:
# proximity by subject
for subject in Subjects:
    for i in S:
        if min_sub_dict[subject][i] > 0:
            m.addCons(quicksum(prox_dict[subject][j]*X[i,j] for j in range(len(C))) >= min_sub_dict[subject][i])
        # do we always need a max?
        m.addCons(quicksum(prox_dict[subject][j]*X[i,j] for j in range(len(C))) <= max_sub_dict[subject][i])

$\sum_{j \in C} c_{j,t} Ta_{k,j} <= 1 \; \forall j \in C, t \in T, k \in I$ -- Each teacher is teaching at most one course per period. 

In [48]:
# Teacher teaching at most one course per period
for k in range(len(I)):
    for t in T:
        m.addCons(quicksum(Course[j,t]*Ta[k][j] for j in C) <= 1)
        num_cons += 1

$\sum_{t \in T} c_{j,t} = 1 \; \forall j \in C$ -- Each course taugh only once

In [50]:
# Course Taught only once Constraint
for j in range(len(C)):
    m.addCons(quicksum(Course[j,t] for t in T) == 1)
    num_cons += 1

In [51]:
# # THIS WAS THE OLD, PRE-FRANS WAY
# # Double Period--Consecutive Constraint
# for j in range(len(C))[:-1]:
#     for t in T[:-1]: # need the :-1 to ensure don't go over bounds below
#         if t != 4 and t != 8:
#             m.addCons(2 - Db[j] - Course[j,t] + Course[j+1, t+1] >= 1)
#             num_cons += 1

$c_{j,t} = c_{j+1, t+1} \; \forall j \in C : Db_j = 1$ -- if the first half of the double is taugh in period $t$ then the second half must be taugh in period $t + 1$.

In [52]:
# Double period--consecutive constraints
for j in range(len(C)):
    if Db[j] == 1: # if double period
        for t in T:
            if t != 4 and t != 8:
                m.addCons(Course[j,t] == Course[j+1, t+1]) # change to == from >= 
                num_cons += 1

$c_{j,4} = 0, c_{j,8} = 0 \; \forall j \in C : Db_j = 1$ -- If a course is a double, then the first half cannot be taught in period 4 or 8. 

In [53]:
# Double Period--not 4th or 8th
for j in range(len(C)):
    if Db[j] == 1:
        m.addCons(Course[j,4] == 0)
        m.addCons(Course[j,8] == 0)
        num_cons += 2

$x_{i, j} = x_{i, j+1} \; \forall i \in S, j \in C:Db_{j} = 1$ -- If $j$ is a double, then student $i$ must either be in both $j$ and $j+1$ or in neither. 

In [54]:
# Double Period--Student in both
for i in S:
    for j in range(len(C)):
        if Db[j] == 1:
            m.addCons(X[i,j+1] == X[i,j]) # this was >= but == is better?
            num_cons += 1

In [34]:
# Define r  room variable (over course j in room r durring period t)
R = {}
for j in range(len(C)):
    for s in R:
        for t in T:
            name = "Course " + str(j) + " in room " + str(r) + " durring period " + str(t)
            R[j,s,t] = m.addVar(name, vtype='B')
            num_vars += 1

$\sum_{s \in R} r_{j,s,t} = c_{j,t} \; \forall j \in C, t \in T$ -- If a course is taught, i.e., $c_{j,t} = 1$, then it must get exactly one room.

In [35]:
# If course taught, gets one room
for j in range(len(C)):
    for t in T:
        m.addCons(quicksum(R[j,s,t] for s in R) == Course[j,t])

### Possible way of dealing with "Other" Room issue.

In [None]:
# find j's corresponding to "Other" and "Empty"
other_indicies = [] # dictinoary with index for the "other" mapping to the number next to it
empty_index = 0
for j in Cb:
    if "Other" in Cb[j]:
        #period = Cb[j][-1]
        #other_indicies{j} = period
        other_indicies.append(j)
    if "Empty" in Cb[j]:
        empty_index = 0
 
t = 1
for j in other_indicies:
    m.addCons(Course[j,t] == t) # other t taugh in period t
    m.addCons(R[j, "Other Room", t] == 1) # in "Other Room" durring this period
    if t == 4:
        t += 2
    else:
        t += 1   
        

In [36]:
### SHOULDN"T NEED THIS IN LIGHT OF THE ABOVE
## NEED ROOM DATA
# Room Constraint--Each room gets at most one course
# for s in R:
#     for t in T:
#         m.addCons(quicksum(r[j,s,t] for j in C) <= 1)

In [37]:
## TEMPORARY CONSTAINT
## NO MORE THAN 8 CLASSES PER PERIOD
# for t in T:
#     m.addCons(quicksum(Course[j,t] for j in C) <= 20)

In [55]:
# Set objective
#m.setObjective(quicksum(X[i,j]*P[i][j] for i in S for j in C), "maximize")
m.setObjective(X[1,1]*0, "maximize") # just find a feasible solution

In [56]:
print(str(num_vars), "Variables")
print(str(num_cons), "Constraints")

196713 Variables
228003 Constraints


In [None]:
# Solve model
m.optimize() # NOTE: solver info printed to terminal

In [None]:
# Print Information on Solve
m.printStatistics() # NOTE: this will only print to terminal (note notebook)

In [None]:
m.printBestSol() # prints the soltuion to terminal

In [None]:
if m.getStatus() == "optimal":
    print("We found an optimal solution!")
else:
    print("The problem is", m.getStatus())

# determine which courses are offered in which period
offered = {}
for t in T:
    class_list = []
    for j in range(len(C)):
        if m.getVal(Course[j,t]) == 1:
            class_list.append(Cd[j])
    offered[t] = class_list

# How many courses per period is each student assigned
for t in T:
    max_courses = 0
    min_courses = 1
    for i in S:
        num_courses = 0
        for j in C:
            if m.getVal(X[i,j]) == 1 and m.getVal(Course[j,t]) ==1:
                num_courses += 1
        if num_courses > max_courses:
            max_courses = num_courses
        elif num_courses < min_courses:
            min_courses = num_courses
    print("In period", t, "max courses for any student is", max_courses, "and min courses is", min_courses)

In [None]:
offered # lists periods, and the courses offered in each

## Current issues:
- I think it is the rooms that are creating feasbility issues
    - Specifically, I think the fact that we have all these "other" courses that we are not deal with
- How do we deal with empty? Ask justina how that works

# TODO
- Make a new "Other" room 
- Force all the "Other#" courses to be in the "Other" Room, and to take place durring their numbered period
- Fine out how to deal with subject course matchings

The following block of code is meant to help better understand SCIP. It instantiates a Model instance, then parses through each of its public methods and fields, looking for, and printing, their docstrings.

In [None]:
# Figure out wtf SCIP is doing

# initialize a Model instance
mod = Model("what?")

# get methods for the model
methods = dir(mod)
i = 0
for x in methods:
    if x[0]=="_":
        i = methods.index(x)
methods = methods[i+1:] # only want the public methods

# print out each method and its info
for m in methods:
    print(m + ":")
    print(getattr(mod, m).__doc__)
    print("\n")

In [None]:
import pyscipopt as scip

In [None]:
dir(scip)