# Data Creation

## Notebook Description

For a given set of states, corresponding to the decision time points of schedules from a Job Scheduling Problem, we want to extract the information about the states as data that fulfills the markovian requirements and can be fed to a Neural Network.<br> 
The according code is given within this notebook.

## Code

In [None]:
import numpy as np
import pickle
import os
import random
import time
#import necessary notebooks
import import_ipynb
#from Jobs_and_Machines import *
#from States_and_Policies import *
from Global_Variables import *

### Input

First, we create the Input. For any state, it will consist of the remaining Jobs and Machines. We do not consider past Jobs or Machines due to the markovian requirements. For the same reason, we do not pass explicitely pass the time, but instead give this information implicitely by defining the earliness for every remaining Job and Machine as the time until its deadline exceeds, i.e. as the maximum of zero and its deadline minus the current time.<br>
For every pending Job we will pass the information of its processing times on the working Machines, its earliness and its weight. They get sorted by the shortest processing time on the currently free Machine.<br>
For every working Machine we will pass its remaining occupation time, its earliness and its weight. They get sorted by the current occupation time, thus putting the currently free Machine in first place.<br>
In case that any two Jobs or Machines are equal with regard to the respective rule stated above, they are already sorted for their weight and deadlines due to construction.<br>
Moreover, the input gets normalized. Every weight gets normalized statically, being divided by the maximum allowed weight. The Job processing times, Machine occupation times and all earlinesses get normalized dynamically with regard to their state, getting scaled by the greatest time-related value occuring in said state. The respective permutation of Jobs and Machines becomes a state attribute as well.

In [None]:
#take information of state to create normalized data for Neural Network
def state_input(state):
    
    #information about the jobs (normalized in weight)
    jobs_data = np.asarray([[proc_time for i, proc_time in enumerate(job.processing_time)
                             if state.machines_on_duty[i] == 1]
                            + [max(job.deadline-state.time,0),
                               job.weight/max_weight]
                            for job in list_jobs if state.jobs_remaining[job.index] == 1], dtype=np.float32)
    
    #information about the machines (normalized in weight)
    machines_data = np.asarray([[state.machine_runtimes[machine.index],
                                 max(machine.deadline-state.time,0),
                                 machine.weight/max_weight]
                                for machine in list_machines if state.machines_on_duty[machine.index] == 1], dtype=np.float32)
    
    #we need to know the time-related value to scale the data
    max_time = max(np.max(jobs_data[:,:-1]), np.max(machines_data[:,:-1])) #:-1 because the last column is the weight
    
    #normlalize time data
    jobs_data[:,:-1] /= max_time
    machines_data[:,:-1] /= max_time #for when target values get normalized, too
    
    #sort them and save permutation
    machines_perm = machines_data[:,0].argsort() #sort machines by remaining runtime
    orig_order = np.arange(len(machines_perm)) #just an array of the form [0,1,...,m_state-1]
    jobs_data[:,orig_order] = jobs_data[:,machines_perm] #reorder processing time of jobs by new order of machines
    jobs_perm = jobs_data[:,0].argsort() #order of jobs by processing time for currently free machine
    jobs_data = jobs_data[jobs_perm] #sort jobs by this order
    machines_data = machines_data[machines_perm] #sort machines by their remaining runtime
    
    #merge
    state.input = [jobs_data, machines_data]
    state.permutation = [jobs_perm,machines_perm]

We also give a function that will be used in later Notebooks that merges the information about the Jobs and Machines.<br>
For every Job, the processing times get split from the earliness and weight. Then, for every processing time, the entire information of the occupation time, earliness and weight of the associated Machine gets added.<br>
As a result, every Job then consists of
- a vector of dimension <i>2</i>, consisting of the normalized earliness and weight
- a sequence of dimension <i>(4 x m_state)</i>, each vector of dimension <i>4</i> consisting of the processing time and the associated Machine information 

where <i>m_state</i> stands for the number of working Machines in the respective state. This sequence can then later be fed into an LSTM to create an embedded representation of static dimension for each Job.

In [None]:
#create input data of state in sequential form (for LSTM)
def seq_data(state):
    
    #create the input
    state_input(state)
    #number of jobs and machines in given state
    n_state = sum(state.jobs_remaining)
    m_state = sum(state.machines_on_duty)
    
    #indices where the respective machine information gets added to processing time
    idxs = [ind+1 for ind in range(m_state) for _ in range(3)] #for every Machine, three consecutive indices are needed
    #add the respective machine information to processing time
    data = np.insert(state.input[0],idxs,state.input[1].flatten(), axis=1) #info gets inserted into vector of single processing time
    
    #For every job, the machine environment is the time-series of resources successively becoming available 
    resource_info = data[:,:-2].reshape((n_state,m_state,4))
    #For every job, its earliness and weight denotes its urgency
    urgency_info = data[:,-2:]
    
    state.input = [resource_info, urgency_info]

### Target

Next, we create the Target Values for every state. These correspond to the Q-values. So if there are <i>n_state</i> pending jobs in a state <i>s</i>, the <i>j</i>-th entry denotes the Q-values <i>Q<sup>*</sup>(s,j)</i> of assigning Job <i>j</i> to the currently free Machine, while the <i>n_state+1</i>-th entry stands for the Q-value of the action of deactivating the same.<br>
The Target Values then get dynamically normalized with regards to their state as well. For this, they get divided by the smallest (and therefore optimal) Q-value. Then we take the invers of the resulting quotient. Consequently, all Target Values are scaled inbetween 0 and 1.

In [None]:
#create normalized target for states as data for Neural Network
def state_target(state):
    
    #number of remaining jobs
    n_state = sum(state.jobs_remaining)
    #get Qvalues of all feasible actions
    target = np.array([qvalue for qvalue in state.Qvalues if qvalue != None], dtype=np.float32)
    #sort by permutation
    target[np.arange(n_state)] = target[state.permutation[0]]
    #in case that different NN approaches shall be tested, we save the targets in different forms
    state.target = [target, #raw Q-values
                    np.eye(target.shape[0], dtype=np.float32)[np.argmin(target)], #one_hot_vector of optimal action
                   np.min(target)/target] #normalized targets by scaling through minimum Q-value and then taking inverse value

### Data Dictionary

We now state a function that creates a dictionary for the data of the states of a given Job Scheduling Problem.<br>
The keys are the 2-tuples (<i>n_state</i>,<i>m_state</i>) of the number of remaining Jobs <i>n_state</i> and Machines <i>m_state</i> of each state.<br>
To avoid unnecessary comupational costs and to balance the training data set, we do not want to create the data for every state. Instead, for every (<i>n_state</i>,<i>m_state</i>)-combination (i.e. for every key) we have an upper limit of states, denoted by the variable <i>data_points_max</i>. From these we then select at most <i>data_points_max/(n_state+1)</i> states where action <i>a</i> is optimal for every feasible action <i>1 &leq; a &leq; n_state+1</i>.<br>

We do not wish to balance the validation and test set. To create them, set <i>training=False</i>.

In [None]:
#create data
def create_data(all_states, data_points_max, training=True, save=False):
    
    #measure start time
    st = time.time()
    
    #the minimum amount of jobs and machines a state has to have for us to be interesting enough to save its data
    n_min = 3
    m_min = 2
    
    #data will be a tuple consisting of inputs list and targets list
    data_dictionary = dict(((n_state,m_state),([],[])) 
                           for n_state in range(n_min,n+1) for m_state in range(m_min,m+1))
    #counter of how many data points there are already for each job i to be the optimal action (+option of machine shut down)
    data_points_counter = dict(((n_state,m_state,i),0) 
                               for n_state in range(n_min,n+1) for m_state in range(m_min,m+1) for i in range(n_state+1))
    
    #permutations that will be added in data below
    #we add the max_runtime+1 as last entry, so that "n" (=turning off machine) is always the last entry of permutation
    permutations = [np.argsort([job.processing_time[i] for job in list_jobs]+[max_runtime+1]) for i in range(m)]
    
    #create data for states
    for state in all_states:
        #key
        n_state = sum(state.jobs_remaining)
        m_state = sum(state.machines_on_duty)
        if n_state >= n_min and m_state >= m_min:
            #find out which of the n_state jobs + machine shut down is best action
            rev_perm_target = np.array([qvalue for qvalue in np.array(state.Qvalues)[permutations[state.machine]][::-1] if qvalue != None])
            #reversed for emphasis on higher indices equality cases
            opt_action = len(rev_perm_target) - np.argmin(rev_perm_target) - 1
            #use this condition to create balanced training data
            if training == True:
                cond = data_points_counter[(n_state,m_state,opt_action)]
                upper_lim = data_points_max/len(rev_perm_target)
            #use this condition instead if you want to create test/validation data without balancing
            else:
                cond = len(data_dictionary[(n_state,m_state)][1])
                upper_lim = data_points_max/len(rev_perm_target)
            #check if we have enough data of states already. 
            if cond < upper_lim:
                #create input
                state_input(state)
                #add input to dictionary
                data_dictionary[(n_state,m_state)][0].append(state.input)
                #create target values
                state_target(state)
                #add target values to dictionary
                data_dictionary[(n_state,m_state)][1].append(state.target[2])
                #update counter
                data_points_counter[(n_state,m_state,opt_action)] += 1
                
    #measure end time
    et = time.time()
    
    #tell how much time the entire process took
    print(round(et-st,2), "seconds to compute", sum(len(data_dictionary[key][0]) for key in data_dictionary), "data points.")
    
    #if desired, save the data dictionary
    if save:
        with open('data.pickle', 'wb') as f:
            pickle.dump(data_dictionary, f, pickle.HIGHEST_PROTOCOL)
            
    return data_dictionary

To store the produced dictionary of data, we give the following function.

In [None]:
#store data dictionary
def store_data(all_states, data_points_max, DS, DN, training=True):
    
    #create data dictionary
    data = create_data(all_states, data_points_max, training=True) #use training=False for validation or test data
    
    #give indices to Job Scheduling Problem. DS stands for Data Set, DN for Data Number
    DS_str = "0"*(2-len(str(DS))) + str(DS)
    DN_str = "0"*(4-len(str(DN))) + str(DN)
    path = f'Data/DataSet_{DS_str}/data_{DS_str}_{DN_str}.pickle'
    with open(path, 'wb') as f:
            pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)