# Scheduling Decisions

## Notebook Description

In this Notebook we will implement the function of how the Neural Networks takes an action given a state-data-input.<br>
From there, we will develop schedules according to the policy it induces.<br>

With this, we can estimate data of higher Job numbers, by taking a Job Scheduling Problem with <i>n</i> Jobs, creating the successor state for every possible action and estimate the optimal future costs from there on with the Neural Network that has been trained on <i>n-1</i> Jobs already. By repeating this process, we can arbitrarily increase the number of Jobs on which we train our model.<br>

## Code

In [None]:
import os
import tensorflow
from tensorflow import keras
import import_ipynb
from Jobs_and_Machines import *
import States_and_Policies
from States_and_Policies import *
import Data_for_NN
from Data_for_NN import *
from Action_Pointer import *

### Random Job Scheduling Problems

The following notebook defines the environmental conditions by the given global variables.

In [None]:
import Global_Variables

We want to simulate random Job Scheduling Problems according to these environmental conditions. Since some of the sampled conditions may change in the number of Jobs and Machines during this notebook, we need a function to update all these global variables within all the imported notebooks.

In [None]:
import Random_Generator
from Random_Generator import *

In [None]:
#simulate random Job Scheduling Problem and pass environmental parameters as global variables to other notebooks
def create_JS_environment():
    
    #these variables define the problem environment and get randomly reassigned for every Job Scheduling Problem
    global max_runtime, max_init_runtime, list_jobs, list_machines
    environment = generate_random_environment()
    max_runtime, max_init_runtime, list_jobs, list_machines = environment
    #create dictionary of all environmental variables of this Job Scheduling Problem
    list_env = ['max_runtime', 'max_init_runtime', 'list_jobs', 'list_machines']
    dict_env = dict((list_env[i],environment[i]) for i in range(len(environment)))
    #pass them as global variables, so that imported notebooks can access them
    Global_Variables.set_var_to_global(dict_env)
    change_global_var_of_module(States_and_Policies)
    change_global_var_of_module(Data_for_NN)

Since we want to estimate data of Job Scheduling Problems with higher numbers of Machines and also use our Neural Network to create schedules to Job Scheduling Problems that consist of more Jobs and Machines to test how well it generalizes performancewise, we need a function to update the global variable of both, the numbers of Jobs and that of Machines.

In [None]:
#increase number of Jobs globally
def increase_n(new_n):
    global n
    n = new_n
    Global_Variables.n = n
    Random_Generator.n = n
    States_and_Policies.n = n
    Data_for_NN.n = n
    return n
    
#increase number of Machines globally
def increase_m(new_m):
    global m
    m = new_m
    Global_Variables.m = m
    Random_Generator.m = m
    States_and_Policies.m = m
    Data_for_NN.m = m
    return m

### Schedule based on Neural Network

We now want to create a rule of action and therefore scheduling based on the policy induced by the Neural Network.<br>
First, we need to be able to load any previously trained version of our Network.<br>
The user needs to update the path to whereever the Network is stored.

In [None]:
#load desired version of Neural Network
def load_NN(NN_name,path):
    
    NN = keras.models.load_model(f'{path}/{NN_name}.h5', custom_objects={'FeedForward': FeedForward, 'Pointer': Pointer, 'MSE_with_Softmax': MSE_with_Softmax, 'costs':costs})
    NN.run_eagerly = True
    
    return NN

Now, we need to create a function that defines how to act based on the outputs of the Neural Network to a Job Scheduling Problem. With other words, we define the policy it induces and therefore a network based scheduling rule.

In [None]:
#how to act in a given state according to Neural Network
def act(NN, state):
    
    n_state = sum(state.jobs_remaining) #get job number of state
    m_state = sum(state.machines_on_duty) #get machine number of state
    
    #object list of remaining Jobs
    remaining_jobs = [job for job in list_jobs if state.jobs_remaining[job.index] == 1]
    #append action of turning off machine
    remaining_jobs.append(None)
    #extract network compatible data from state
    data = [np.expand_dims(state.input[0],axis=0), np.expand_dims(state.input[1],axis=0)]
    #estimate Q-values for every action
    act_values = NN.predict(data, verbose=0)
    
    #since outputs are normalized, action associated with maximum value is estimated to produce minimal costs
    if m_state > 1:
        action = np.argmax(act_values[0])
    #cannot turn off last machine
    else:
        action = np.argmax(act_values[0][:-1])
    #if recommended action is assigning a Job, we have to consider the permutation of jobs
    if action < n_state:
        action = state.permutation[0][action]
    
    #job associated with recommended action.
    job = remaining_jobs[action] #is "None" if action==n_state corresponds to machine shutdown
    
    return action, job

Cost distributions might change towards the end of a schedule. At the same time, brute force computing the optimal policy 
for the last few decisions becomes very cheap then. So to give some relief to our Neural Network and therefore put its focus on learning more complex scheduling situations, we did not train it on states with less than 3 pending Jobs or only 1 Machine still working for not more than 8 remaining Jobs. We therefore will add the option of computing the optimal costs for these states with the following functions. 

In [None]:
#compute the set of all possible successor states from current state on
def compute_remaining_states(current_state):
        
    #initiate list of states = []
    list_states = []
    
    #list of current states
    current_states = [current_state]
    
    #go through every current state, save their successors, add current states to list of all states
        #then define successor states as current states, clear list of successor states and repeat until done
    while current_states:
        
        #empty list of all successor states
        successor_states = []
        
        #create and add all successor states of current states
        for state in current_states:
            
            #list of all successors of this state
            state_successors = []
            
            #check if state is not final yet
            if sum(state.jobs_remaining) > 0:
                
                #create one state for every remaining job assigned to the free machine with lowest index
                machine = list_machines[state.machine]
                
                #loop through jobs
                list_jobs_remaining = [job for job in list_jobs if state.jobs_remaining[job.index] == 1]
                for index, job in enumerate(list_jobs):
                    #check if the job still has to be done
                    if state.jobs_remaining[index] == 1:
                        #assign job to machine
                        state_successors.append(assign_job(state,job,machine))    
                
                #check if turning it off is an option
                if sum(state.machines_on_duty) > 1:                       
                    #add successor state created by shutting down machine
                    state_successors.append(turn_off_machine(state, machine))
                        
                
            #add successor list to the attributes of state
            state.successors = state_successors
            
            #add successors of this state to the list of all successors of all current states
            successor_states += state_successors
        
        #add current states to list of all states
        list_states += current_states
        
        #the successor states then become the current states
        current_states = successor_states
        
    return list_states

In [None]:
#optimal policy for remaining states
def compute_remaining_policy(current_state):
    
    #compute all remaining possible successor states from current state on
    remaining_states = create_all_states(from_state=current_state)
    #calculate their true Q-values
    backtracking(remaining_states)
    #deduce the optimal policy with a greedy approach
    remaining_policy = optimal_policy(remaining_states)
    return (remaining_policy)

We can now construct a policy for a given Job Scheduling Problem based our Neural Network and the policy it induces.<br>

Among other aims, we want to train our Neural Network on estimated data sets of higher Job numbers. Since, however, our Neural Network that has already been trained on the ground-true Q-values of Job Scheduling Problems with 8 Jobs and 4 Machines, we assume these supervisedly learned weights to very likely be superior to any that we could achieve by applying reinforced learning techniques. Consequently, we will switch to them as soon as a Job Scheduling Problem has not more than 8 remaining Jobs.

In [None]:
def create_policy(NN, Sup_Tar_NN, state, opt_end=True, n_min=3, m_min=1):
    
    state_list = []
    action_list = []

    
    n_state = sum(state.jobs_remaining) #get job number of state
    m_state = sum(state.machines_on_duty) #get machine number of state
    
    #minimum number of jobs and machines depends on whether optimal schedule shall be computed in the end or not
    n_min = max(n_min*opt_end,1)
    m_min = m_min*opt_end
    
    while n_state >= n_min and (m_state >= m_min or n_state > 8):
        
        #create data input for NN. Saved under state.input
        if not state.input:
            seq_data(state)
        
        #use uptrained target network when more than 8 Jobs remain
        if n_state > 8:
            action, job = act(NN, state)
        #use supervised target network when 8 or less Jobs remain
        else:
            action, job = act(Sup_Tar_NN, state)
        
        #currently free machine as object
        machine = list_machines[state.machine]
        
        #if job shall get assigned
        if job:
            next_state = assign_job(state, job, machine)
        #if machine shall get turned off
        else:
            next_state = turn_off_machine(state, machine)
        
        action_list.append((action,state.machine))
        state_list.append(state)
        
        #go to successor state
        state = next_state
        n_state = sum(state.jobs_remaining) #get job number of state
        m_state = sum(state.machines_on_duty) #get machine number of state
        
    #if desired, compute optimal policy for end of schedule
    if opt_end:
        remaining_policy = compute_remaining_policy(state)
        action_list += remaining_policy[0]
        state_list += remaining_policy[1]
    #elsewise, append final state
    else:
        state_list.append(state)
        
    return action_list, state_list

Having a policy, we can determine the costs its induced schedule produces.

In [None]:
def policy_costs(policy):
    pol_costs = sum(state.costs for state in policy[1])
    return pol_costs

### Estimate New Data

Since we can now create a schedule with our Neural Network, we can use this to estimate net data with higher numbers of Jobs.<br>
We already have trained our Neural Network on Job Scheduling Problems with 8 Jobs and 4 Machines. We will call this Network the Supervised Target Network. The idea is therefore to estimate data of problems with 9 Jobs and 4 Machines. To do this, we will randomly create such problem environments. Then, we will compute the transition cost for every action of assigning of the 9 Jobs in the beginning and the successor state. Since this one olny has 8 pending Jobs, we can estimate the remaining optimal schedule costs from there with our Neural Network. We create the schedule it thinks to be optimal and calculate the corresponding costs.<br>
For the action of turning of a Machine, we compute the transition costs and the belonging successor state as well. From there we iteratively continue until reaching the state of 9 Jobs and 1 working Machine. From here, deactivating the currently free Machine is no longer an option. Therefore, the set of feasible actions is given by the 9 Job assignments. Doing as stated above we can then compute the estimated scheduling costs for every of these actions. Recursively we can then compute all these costs for the state of 9 Jobs and 2 Machines, 9 Jobs and 3 Machines and finally 9 Jobs and 4 Machines.

We can then use this data to train our Network, obtaining a new Target Network. We can then iteratively estimate data of increasing numbers of Jobs. To estimate the optimal actions whenever more than 8 Jobs are left, we use this uptrained Target Network. As soon as the number of Jobs get down to 8 Jobs or less, we use the Supervised Target Network. Whenever we receive estimated data for a higher number of Jobs than before, we uptrain the Target Network.

We will first give a function to estimate the costs of every Job assignment in the way described above.

In [None]:
#estimate action costs for every job assignment with Neural Network
def estim_assignment_costs(Target_NN, Sup_Tar_NN, state):
    for action, job in enumerate(list_jobs):
        #currently free Machine as object
        machine = list_machines[state.machine]
        #create successor state for every job assignment
        successor_state = assign_job(state,job,machine)
        #get transition costs
        trans_costs = successor_state.costs
        #estimate optimate policy from there with uptrained and supervised target NN
        policy = create_policy(Target_NN, Sup_Tar_NN, successor_state)
        #get the associated estimated optimal future costs
        future_costs = policy_costs(policy)
        #estimated action values are transition costs + estimated optimal future costs
        state.Qvalues[action] = trans_costs + future_costs

As stated, we have to compute all Machine-turn-off-successor-states and apply the previous function to estimate the action values for the Job assignments for each of these states, we can get the estimated Q-values for the actions of deactivating the currently free Machine as well.

In [None]:
#estimate all Q-values
def estim_Qvalues(Target_NN, Sup_Tar_NN, state):
    
    #estimate Q-values for all job assignments of initial state
    estim_assignment_costs(Target_NN, Sup_Tar_NN, state)
    #number of Machines of current state
    m_state = sum(state.machines_on_duty)
    #as long as more than 1 Machine is active, shutting it down is a feasible action
    if m_state > 1:
        #currently free Machine as object
        machine = list_machines[state.machine]
        #compute successor state of turning it off
        turn_off_state = turn_off_machine(state, machine)
        #estimate all Q-values
        estim_Qvalues(Target_NN, Sup_Tar_NN, turn_off_state)
        #transition costs to this successor state 
        trans_costs = turn_off_state.costs
        #recursively define Q-value of shutting down free Machine
        state.Qvalues[-1] = trans_costs + min(turn_off_state.Qvalues)
    else:
        #if only 1 Machine is active, only Job assignments are feasible actions
        state.Qvalues = state.Qvalues[:-1]

Being able to estimate the Q-values, we can now estimate new data. We want to create it for the states with 9 Jobs and 4,3 and 2 Machines.

In [None]:
#estimate data of increased number of Jobs
def estim_data(data_dictionary, Target_NN, Sup_Tar_NN, init_state):
    #initial state represents Job Scheduling Problem
    state = init_state
    #start with m Machines
    m_state = m
    #estimate Q_values for this state and all machine turn off successor states
    estim_Qvalues(Target_NN, Sup_Tar_NN, state)
    #list of states whose data we estimate
    states = []
    while m_state>1:
        #compute input data of state
        state_input(state)
        #add to dictionary
        data_dictionary[(n,m_state)][0].append(state.input)
        #compute estimated target data of state 
        state_target(state)
        #add to dictionary
        data_dictionary[(n,m_state)][1].append(state.target[2])
        #transition to successor state by turning off Machine
        state = state.transition_dic[(n,state.machine)]
        #upate number of working Machines
        m_state -= 1

We want to to this to several simulated Job Scheduling Problems and create a dictionary of estimated data from it.

In [None]:
#create dictionary of estimated data for simulated problems of increased job number
def create_estim_data(Target_NN, Sup_Tar_NN, num_data):
    #init dictionary for increased number of jobs and 2 up to m machines
    data_dictionary = dict(((n,m_state),[[],[]]) 
                       for m_state in range(2,m+1))
    #num_data stands for number of job scheduling problems we want to simulate to estimate the corresponding data
    for _ in range(num_data):
        #create random list of n jobs and m machines
        create_JS_environment()
        #initial state represents this job scheduling problem
        init_state = create_initial_state()
        #estimate its data
        estim_data(data_dictionary, Target_NN, Sup_Tar_NN, init_state)
        
    return data_dictionary

Now, we only need a function to save this dictionary.

In [None]:
#save estimated dictionary
def save_estim_dictionary(data_dictionary, path, data_ind):
    #save it under path and give the data an index
    with open(f'{path}estim_data_{n}_Jobs_{data_ind}.pickle', 'wb') as f:
        pickle.dump(data_dictionary, f, pickle.HIGHEST_PROTOCOL)

### Compare Schedules

We need a measurement to evaluate how good the schedules produced by our Neural Network are.<br>
For up to 8 Jobs, we can compare it to the optimal costs. However, for higher numbers of jobs this becomes to expensive. Therefore, we define a competetive heuristic scheduling algorithm. As for the Neural Network, we give the option to brute force compute the optimal schedule as soon as less than 3 Jobs remain or only 1 Machine is still working while not more than 8 Jobs remain.

In [None]:
#heuristic algorithm as comparision
def comparative_metric(initial_state, opt_end=True, n_min=3, m_min=1):
    state_list = []
    action_list = []
    state = initial_state
    
    n_state = sum(state.jobs_remaining) #get job number of state
    m_state = sum(state.machines_on_duty) #get machine number of state
    
    #minimum number of jobs and machines depends on whether optimal schedule shall be computed in the end or not
    n_min = max(n_min*opt_end,1)
    m_min = m_min*opt_end
    
    #loop through schedule until finished
    while n_state >= n_min and (m_state >= m_min or n_state > 8):
        
        #create permutations
        if not state.input:
            seq_data(state)
            
        #list of remaining jobs
        remaining_jobs = [job for job in list_jobs if state.jobs_remaining[job.index] == 1]
        #currently free machine as object
        machine = list_machines[state.machine]
        #we will see if we assign a job in this situation
        job_assigned = False
        
        #jobs get sorted first
        for action in state.permutation[0]:
            #iteratively try all jobs until fits
            job = remaining_jobs[action]
            #processing time on the currently free machine
            proc_time = job.processing_time[state.machine]
            #time until the job would be finished on any other machine is their occupation time + the processing time
            alt_times = [state.machine_runtimes[i] + job.processing_time[i] for i in range(m) if state.machines_on_duty[i]==1]
            #if job would not finish earlier on any other machine:
            if not min(alt_times) < proc_time:
                #assign job to currently free machine
                next_state = assign_job(state, job, machine)
                #a job got assigned
                job_assigned = True
                         
        #if no job got assigned, machine has to be turned off
        if job_assigned == False:
            next_state = turn_off_machine(state, machine)
            action = len(remaining_jobs)
        
        action_list.append((action,state.machine))
        state_list.append(state)
        
        state = next_state
        n_state = sum(state.jobs_remaining) #get job number of state
        m_state = sum(state.machines_on_duty) #get machine number of state
        
    
    #if desired, compute optimal policy for end of schedule
    if opt_end:
        remaining_policy = compute_remaining_policy(state)
        action_list += remaining_policy[0]
        state_list += remaining_policy[1]
    #elsewise, append final state
    else:
        state_list.append(state)
        
    return action_list, state_list

We now give a function to compare the scheduling costs of several scheduling approaches overa number of Job Scheduling Problems.<br>
We can decide the following options:

| option ||||| meaning |
| :---: | --- | --- | --- | --- | :---: |
| comp_opt_pol ||||| Do we want to compute the optimal policy? |
| always_comp_opt_end ||||| Do we only want to use the version of the scheduling algorith with the optimal end computed or do we want both? |
| use_uptr_tar_NN ||||| Do we want to usa an uptrained target Network (from estimated data of higher job numbers) as well? |

The number of Job Scheduling Problems we want to simulate is then given by <i>num_schedules</i>.

In [None]:
#compare scheduling costs. NN is the uptrained target NN
def compare_schedules(NN, Sup_Tar_NN, num_schedules, comp_opt_pol=False, always_comp_opt_end=True, use_uptr_tar_NN=False):
    
    #most types of policies we can compute is 7
    avg_cost_ratios = np.array([0]*7,dtype=np.float64)
    
    for _ in range(num_schedules):
        #create random problem environment
        create_JS_environment()
        #create job scheduling problem from it
        initial_state = create_initial_state()
        #init list of scheduling costs
        scheduling_costs = [0]*7
        
        #if optimal schedule shall be computed
        if comp_opt_pol == True:
            #create all states
            all_states = create_all_states()
            #compute all Q-values
            backtracking(all_states)
            #policy
            opt_policy = optimal_policy(all_states[0])
            #costs
            scheduling_costs[0] = policy_costs(opt_policy)
            
        #policy and costs for Supervised NN policy with end being optimally computed
        Sup_Tar_NN_policy_opt = create_policy(Sup_Tar_NN, Sup_Tar_NN, initial_state)
        scheduling_costs[1] = policy_costs(Sup_Tar_NN_policy_opt)

        #if we do not always want to compute the optimal end
        if always_comp_opt_end == False:
            #policy and costs for Supervised NN policy without end being optimally computed
            Sup_Tar_NN_policy = create_policy(Sup_Tar_NN, Sup_Tar_NN, initial_state, opt_end=False)
            scheduling_costs[2] = policy_costs(Sup_Tar_NN_policy)

        #if we want to use an uptrained NN as well
        if use_uptr_tar_NN == True:
            
            #policy and costs for Uptrained NN policy with end being optimally computed
            Uptr_NN_policy_opt = create_policy(NN, Sup_Tar_NN, initial_state)
            scheduling_costs[3] = policy_costs(Uptr_NN_policy_opt)

            #if we do not always want to compute the optimal end
            if always_comp_opt_end == False:
                #policy and costs for Uptrained NN policy without end being optimally computed
                Uptr_NN_policy = create_policy(NN, Sup_Tar_NN, initial_state, opt_end=False)
                scheduling_costs[4] = policy_costs(Uptr_NN_policy)
        
        #policy and costs for comparative metric algorithm with end being optimally computed
        comp_metric_policy_opt = comparative_metric(initial_state)
        scheduling_costs[5] = policy_costs(comp_metric_policy_opt)
        
        #if we do not always want to compute the optimal end
        if always_comp_opt_end == False:
            #policy and costs for comparative metric algorithm without end being optimally computed
            comp_metric_policy = comparative_metric(initial_state, opt_end=False)
            scheduling_costs[6] = policy_costs(comp_metric_policy)
        

        scheduling_costs = np.array(scheduling_costs)
        #scale either by optimal costs or by costs produced by supervised target NN with optimal end
        if comp_opt_pol == True:
            avg_cost_ratios += scheduling_costs / scheduling_costs[0]
        else:
            avg_cost_ratios += scheduling_costs / scheduling_costs[1]
    
    #scale by number of simulated job scheduling problems
    avg_cost_ratios /= num_schedules
    
    #print results
    print(f"{n} Jobs and {m} Machines average costs ratio for {num_schedules} Problems:")
    policy_names = ["Optimal Policy", "Supervised Network with optimal End", "Supervised Network", "Uptrained Network with optimal End", "Uptrained Network", "Heuristic Algo with optimal End", "Heuristic Algo"]
    for i, avg_cost_ratio in enumerate(avg_cost_ratios):
        if avg_cost_ratio > 0:
            print(round(avg_cost_ratio,2), "-", policy_names[i])
    print("")