# Deploying A Schedule Building Algorithm
> Automating a manual process!

- toc:true
- badges: true
- comments: true
- author: David De Sa
- categories: [jupyter]

# Context

## Goal
Deploy an algorithm for schedulers to use that is:
-  easy to use and learn
-  transparent
-  quick
-  flexible

## Motivation
In a 24/7 manufacturing environment, the weekend shifts are covered mostly by overtime, which is scheduled according to employee availability, subject to constraints outlined in the labour collective agreement. Due to changing production needs as well as staff availability, the schedule must be re-drafted many times, often on short notice and under tight time constraints. Drafting it is tedious, error-prone, and time consuming. It could be automated.

## Challenges
-Data being manually entered in a variety of formats or not available in machine readable form. (e.g. total hours, employee type, employee availability, individualized job restrictions, outlier reasons for non-eligibility such as consecutive days worked) This is probably the main challenge!!

 - Algorithm ambiguity. The collective agreement defines the constraints that each assignment decision is subject to, but doesn't strictly specify all aspects, allowing for arbitrary choice on schedulers part
 - Many esoteric rules and edge cases around assignments being valid or not, which are also subject to change at time of contract renegotiation.
 - Usability. The deployment must be available to all schedulers, and have a very low barrier to entry w.r.t. training and usability.

## A Bit of Lore
The notion of automating the process solution has been bouncing around my head for over a year. I always felt the main challenge was the situation posed by the data... the bad formatting relegated to excel sheets, necessarily made that way from human input and usage modality not being the same as what is best for machine readability. I am very confident I could make something that worked in VBA, but the nature of that language makes it such a pain to develop with, particularly with bad formatting. I knew python was a better solution, but didn't have the bridge between the two to make something that worked. Finally when following the FastAI course I came across the HFS+Gradio wombo combo for sharing python scripts publicly via a great UI. This discovery got me to finally choose to commit to that solution path

# Solution
## Features
The program should take in various required inputs and return a completed weekend staffing schedule, with a separate text output with the sequence of assignments made. Facilitated by the Blocks functionality within Gradio/HFS will be the possibility of feeding in an assignment number and returning a partially completed schedule at that step.

## Codebase
Using Gradio hosted via HFS for making a python algorithm available with easy integration of inputs and outputs. At first blush I thought that Pandas DataFrames would be the best input mode for tabular data, but ruled that out when HFS didn't allow for bulk copying and pasting. Maybe that was for the best because this pushed me to figure out how to work with the generic File input/output mode. It might be a little more painful to program (have to define methods to identify the right tables within the excel file), but a lot nice on the end user experience (drag and drop relevant files and go!). I was concerned about the extra steps of processing the excel file, but as with everything Python there is a library for that! I started off with some basic tests to ensure that what I wanted/needed to do was possible.

### File Manipulation
Here is my proof of concept for File manipulation. If copied into an empty Gradio space on HFS, it takes in an excel file, and adds a new table to the spreadsheet. This was all i needed to know that this could be done...

In [None]:
import gradio as gr
import openpyxl as pyxl #openPyXl allows for excel file manipulation in python

def myFunction(fl,txt):
    myWb=pyxl.load_workbook(fl.name) #Load excel file
    tab = pyxl.worksheet.table.Table(displayName="Table3", ref="E1:F5") #Define new table
    style = pyxl.worksheet.table.TableStyleInfo(name="TableStyleMedium9",showRowStripes=True, showColumnStripes=True)
    tab.tableStyleInfo = style #Assign style to table
    ws=myWb.active 
    ws.add_table(tab) #Add defined table to sheet within the loaded workbook
    otpt_fl_name='try.xlsx' 
    myWb.save(otpt_fl_name) #Save file
    return otpt_fl_name #Define output for HFS interface

demo = gr.Interface(
    myFunction, #Func to take in file and text
    [
        gr.File(
        ),
        gr.Textbox(
            label="Initial text",
            lines=3,
            value="The quick brown fox jumped over the lazy dogs.",
        ),
    ],
    gr.File(),
    description="Enter refusal files",
)
demo.launch()

### Retrieving Disparate Tables

As mentioned previously, one challenge would be to pull data from tables scattered in an unpredictable way throughout the sheet. Here I had to remember that sometimes the easiest way to rob a bank is through the front door, not trying to break through the wall... I simply changed the existing excel template files (filled in by end user) so that the data tables were actually defined as 'Tables' by excel... this made them reference-able by the openPyXl tools. Some further data type transformations were required. Example with a blank book containing a trivial data table called 'tstTbl' in Excel:

In [15]:
import openpyxl as pyxl
import pandas as pd
import numpy as np
myWb=pyxl.load_workbook('../images/Other_Files/TblTestBook.xlsx') 
#Didn't think the .. parent directory would work but it does!
ws=myWb['Sheet1']
tab=ws.tables['tstTbl'] #Pull out table
def tbl_to_df(tab):
    ref=tab.ref #Pull cell reference to string for display
    tab=[[x.value for x in sublist] for sublist in ws[tab.ref]] #Convert to list of lists (each sublist as row of excel table)
    return pd.DataFrame(tab) #Convert nested lists to Dataframe
print('Table cells reference is "'+str(ref)+'":')
print(tbl_to_df(tab))

Table cells reference is "A9":
         0        1        2
0  myHead1  myHead2  myHead3
1        1        a        .
2        2        b        ,
3        3        c        ]


And pulling info from multiple tables in a sheet

In [16]:
for t in ws.tables:
    tab=ws.tables[t]
    tab=tbl_to_df(tab)
    print('Table cells reference is "'+str(ref)+'":')
    print(tab)
    print('')

Table cells reference is "A9":
         0        1        2
0  myHead1  myHead2  myHead3
1        1        a        .
2        2        b        ,
3        3        c        ]

Table cells reference is "A9":
       0      1
0  Names  Hours
1  Alice      4
2    Bob     20
3  Clark      8
4   Dave     15

Table cells reference is "A9":
         0      1
0    Names  Hours
1   Arnold      4
2     Bill     60
3  Charles     53
4     Dick     10

Table cells reference is "A9":
        0      1
0   Names  Hours
1  Arthur     24
2  Blaire     70
3   Chuck     22
4  Darryl     12



At this point I can say I am constantly resisting the urge to just run away with the coding! Trying to enforce a best practice of starting off with creating not just an abstract understanding of the problem, but a particular and specified framework in which I am operating, that is, figuring out the specific nature of the inputs I will have before I go nuts building my tower of babel! Next is to mock up a way to retrieve data when a worksheet has a single 'table' not defined in Excel. That is, manually entered data in a tabular format that due to legacy sheet formatting is not able to be defined as a native Excel Table, precluding the use of table indexing seen in the previous example... My approach assumes a known top left cell, and knowing in my framework that only certain columns will be required here. 

In [6]:
ws=myWb['Arb_Tbl']
df = pd.DataFrame(ws.values)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,,,,,,,,,
1,,,,,,,,,
2,Name,id,Attr1,Attr2,Attr3,Attr4,Attr5,Attr6,Attr7
3,Bob Back,0,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5
4,Jeff Jahl,1,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5
5,Hodge Hoss,2,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5
6,Kev Kroll,3,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5
7,Tim Tin,4,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5,=RAND()*5


Above we can see the loose table in the wild... ws.values pulls the whole sheet, which would be good, except it grabs formulas. Unfortunately, per the docs, openPyXl will never evaluate formulas! Time to work smart, not hard. I choose to simply use the 'mouse wriggle' technique shared at [this webpage](https://trumpexcel.com/convert-formulas-to-values-excel/) to manually convert formulas to values before passing my workbook into my functions. Though it *is* sad that the user experience won't be as smooth as dragging and dropping files.

In [18]:
#Reload new book with static data
myWb=pyxl.load_workbook('../images/Other_Files/TblAsValues.xlsx') 
ws=myWb['Arb_Tbl']
#1st find bottom row with data
for i in range(3,200): #Loop up to arbitrary number, prefer to have defined end for infinte loop stopgap
    ref="A"+str(i) 
    if ws[ref].internal_value==None: 
        #Condition met when end of data found.
        btmRow=i-1
        break
tab=[[x.internal_value for x in sublist] for sublist in ws['A3:I'+str(btmRow)]]
df_IdNameHours=pd.DataFrame(tab) #Assuming column I is end of useful data
print(df_IdNameHours)

            0   1         2         3         4         5         6         7  \
0        Name  id     Attr1     Attr2     Attr3     Attr4     Attr5     Attr6   
1    Bob Back   0  0.666338  0.778057  4.822291  1.632892  2.374052  2.922761   
2   Jeff Jahl   1  3.865002  4.639129  4.989458  3.552441  0.809309   0.61727   
3  Hodge Hoss   2  3.088418  0.360548  0.416228  1.700045  3.059734  0.443541   
4   Kev Kroll   3  3.466984  3.967881  2.390047   0.40636   4.68963  0.376537   
5     Tim Tin   4  2.910975  1.203692  0.146427  1.411585  0.963408  0.467773   

          8  
0     Attr7  
1   3.91101  
2  4.264042  
3  4.720742  
4  1.797408  
5  0.369525  


Of course, the actual code deployed is more complex than this... In particular is the case of converting a data table indicating who is trained on what in human readable form to one that is more machine readable. The existing process has a table with one row for each staff person, with one column for each job, and a 1 or 0 if the person is trained or not

In [20]:
ws=myWb['Skills_Matrix']
dataArr=np.array(pd.DataFrame(ws.values)) #Convert data table (skill matrix format) into data table (skills record format)
skills=[] #Initiate new container
for individual in dataArr[1:]: #iterate over all data rows
    for skl in range(1,len(individual)): #iterate over indices not containing the name
        if individual[skl]==1:
            skills.append([individual[0],dataArr[0][skl]])
dataArr=pd.DataFrame(skills)
print(tbl_to_df(ws.tables['Skills_Mtx']))
print('')
print(dataArr)

         0     1       2     3     4       5
0  Column1  Brew  Filter  Pack  Ship  Manage
1   Alfred     1       0     0     1       0
2     Bill     0       1     0     0       1
3    Chris     1       1     1     0       0
4    Dante     1       0     0     0       1
5    Edgar     0       0     1     1       0

         0       1
0   Alfred    Brew
1   Alfred    Ship
2     Bill  Filter
3     Bill  Manage
4    Chris    Brew
5    Chris  Filter
6    Chris    Pack
7    Dante    Brew
8    Dante  Manage
9    Edgar    Pack
10   Edgar    Ship


Now that data tables are removed from excel, we need a means of filtering/sorting them. Unfortunately I could not find good means to do this with existing tools (numpy arrays, pandas DataFrames). Fortunately, this meant learning something new! Here I bring in sqlite3, which allows for running a sql table locally. SQL is Structured Query Language, a language and toolset all about tabular data. It will allow us to do any sort of filter, sort, or view of a table that we could want. The following is a little sample taken modified from [the docs] (https://docs.python.org/2/library/sqlite3.html)

In [41]:
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE IF NOT EXISTS stocks (date text, trans text, symbol text, qty, price)''') # Create table.
c.execute('''DELETE FROM stocks''') #if table already existed, will have data... delete existing data to refresh with new.
purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
            ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
            ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
        ]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
conn.commit()
c.execute('SELECT * FROM stocks ORDER BY price')
listbackTable=c.fetchall()
pd.DataFrame(listbackTable)

Unnamed: 0,0,1,2,3,4
0,2006-03-28,BUY,IBM,1000.0,45.0
1,2006-04-06,SELL,IBM,500.0,53.0
2,2006-04-05,BUY,MSFT,1000.0,72.0


And we can easily beautify the process by making our own mini API functions so that the SQL language will be hidden when reading through the algorithm, and this will also make typing these pesky commands a one-off.

In [56]:
def addTBL(tblName,fields="",data=[],addOn=False):
    """Create table if not already existing, optionally with data, optionally clearing out old data if present. Fields as list of strings"""
    conn = sqlite3.connect('example.db')
    c = conn.cursor()
    listedFields=''
    for f in fields:
        listedFields=listedFields+', '+ f
    listedFields='('+listedFields[2:]+''')''' #Add leading and closing bracket, remove naively added comma,space from leading field
    c.execute('''CREATE TABLE IF NOT EXISTS'''+tblName+listedFields) # Create table.
    if addOn==False:
        c.execute('''DELETE FROM '''+tblName)
    if data!=[]:
        c.executemany('INSERT INTO '+tblName+' VALUES (?,?,?,?,?)', data)
    conn.commit()

def isNumeric(n):
    try:
        n=int(n)
        return True
    except ValueError:
        try:
            n=float(n)
            return True
        except:
            return False

def viewTBL(tblName,fields=None,sortBy=None,filterOn=None):
    """return np array of table with optional select fields, filtered, sorted. Sort syntax=[(field1,asc/desc),(field2,asc/desc)...] Filter syntax=[(field1,value),(field2,value)...]"""
    conn = sqlite3.connect('example.db')
    c = conn.cursor()
    stmnt='SELECT '
    if fields!=None: 
        flds=''
        for f in fields:
            flds=flds+', '+f
        stmnt=stmnt+flds[2:]+ ' FROM ' +tblName+' '
    else: stmnt=stmnt+'* FROM '+tblName+' ' #unspecified, select all
    if filterOn!=None:
        filt='WHERE '
        for f in filterOn:
            if isNumeric(f[1]): filt=filt+f[0]+' = '+ str(f[1])+' AND '
            else: filt=filt+f[0]+' = "'+ f[1]+'" AND '
        filt=filt[:-4] #Remove naively added final " and "
        stmnt=stmnt+filt
    if sortBy!=None:
        srt='ORDER BY '
        for s in sortBy:
            srt=srt+s[0]+' '+s[1]+', '
        srt=srt[:-2]
        stmnt=stmnt+srt
    stmnt=stmnt+';'
    #return stmnt
    c.execute(stmnt)
    return np.array(c.fetchall())

In [55]:
viewTBL('stocks')

array([['2006-03-28', 'BUY', 'IBM', '1000.0', '45.0'],
       ['2006-04-05', 'BUY', 'MSFT', '1000.0', '72.0'],
       ['2006-04-06', 'SELL', 'IBM', '500.0', '53.0']], dtype='<U32')

In [54]:
viewTBL('stocks',['symbol','price'],[('price','asc')],[('symbol','IBM')])

array([['IBM', '45.0'],
       ['IBM', '53.0']], dtype='<U32')

In [57]:
viewTBL('stocks',['symbol','price'],[('price','asc')],[('qty',1000)])

array([['IBM', '45.0'],
       ['MSFT', '72.0']], dtype='<U32')

These custom functions should allow us to very easily perform lookups and filters within the python framework. The key use-case here is that we will sort employees by hours ascending when going in priority sequence of voluntary assignment, but we will sort by seniority descending when retrieving the priority sequence for forcing assignments in vacant slots. Other use cases are filtering the training data to identify what someone is trained on. At the time of writing this, I haven't yet figured if I will need to make this a truly relational database to make that work. I suspect it wont be necessary as it may be easier to simply perform simple lookups, retrieving key values and then plugging them into where needed using plain python. Time will tell!

### Algorithm Structure


The preceding section is all about just getting the data we need into our hands in a readily manipulable format. Now the fun begins! 

The following is the process to be carried out in progressively greater detail.. a fun exercise in breaking a larger problem into sub-problems until a programmable level is reached...
1. Assign individual to a timeslot
1. Repeat until all slots filled

That's accurate but useless to make anything happening... I've been thinking about the problem for a while and can list off design questions and answers I've reached pre-development.

#### Preliminary Brainstorm
 - Which timeslot/job combination should be assigned when multiple are available?
     - Based on [this video](https://www.youtube.com/watch?v=d1KyYyLmGpA) I learned that to have the best algorithm performance for this kind of problem, you want to perform a depth-first search where you make assignments to the most constrained variables first. From the perspective of assigning timeslots to staff, the sequence in which staff are assigned slots is not optional within the Collective Agreement. With a change in perspective, however, one can see this problem as that of assigning staff to timeslots. From this perspective, a single staff person has a subset of timelsots for which they are eligible to work. Within that subset, this heuristic of assigning to the most constrained slot can be applied. My idea is to maintain a tracking of how many eligible people there are for each job/timeslot combination in the voluntary or forced category, and always assigning people to slots for which they are eligible, for which the fewest people are available. I do have some concerns about whether or not it will work without further consideration as described in the video. Namely, the process of removing assignments from consideration which would leave no assignment available for downstream decisions. I will forgo further worrying about the problem for the time being since I think my problem case is sufficiently different from the one explored in the video. In this situation, it is acceptable and possible that a staff person with priority over another not be assigned, based on their voluntary hours and/or training.
 - How can eligibility criteria for timeslots be made flexible so that the program is useful even after a change to the CBA?
     - My solution to the problem of varied esoteric constraints that are difficult to define, and for which the supplementary data to test constraint criteria is not available, is to ignore these constraints in decision making within the algorithm, instead allowing the user to address these constraints by passing in a lit of enforced assertion statements to enforce some assignments, or disallow other assignments.
     - My solution to the problem of decision making criteria needing to be transparent and easily modifiable is to have the eligibility criteria for assignments be passed as an input to the system within a file (i.e. python module with functions defined within that will determine assignment eligibility)
 - How can the time slots be represented in if every weekend has different jobs to fill?
     - Again, the idea is to use a file that will be the same most of the time to provide the time slots to be filled as an input to the program
 - How can the algorithm be proofread by a human user to ensure results are valid?
     - It is a distributed process. Each staff person is responsible for communicating if they have been assigned to a slot that they are not eligible for. If the schedulign teams recieves this feedback, or if in schedule review they identify an assignment that seems to be contrary to policy, the program should be able to provide a step by step list of the assignments that were made, as well as being able to visualize a partially done schedule, at an arbitrarily selected step, to give the human scheduler the information required to validate if a generated schedule was valid or not, or if a new assertion should be entered into the input to prevent an invalid assignment from being made.     
         - Assertion Types:
            - Do Not Staff: DNS( slot )
            - Assign: A( eeid, slot, job, Type)
                - Type can be WWF, Voluntary, or Forced
            - Disallow Assignment: N( eeid, slot, optional job )

While in an ideal world, back tracking could be used so as not to recompute an entire schedule after a new assertion is made, the details of implementation would be very complex, and the reduction in computation time and energy would likely not be worth it, so we'll forgo that for now.
#### V1
With these ideas in mind, the algorithm becomes:
 - Review assertion list and make all prescribed assignments (such as dedicated weekend staff)
 - Iterate through staff in sequence defined by CBA
     - Apply eligibility constraint functions to timeslots to generate subset of eligible slots. If none, go to next staff person. If one, assign. If multiple, evaluate to see which is most constrained and assign to that.
         - Default constraint functions, applied in sequence of most to least constraining:
            1. trained on job
            1. volunteered for slot
            1. minimum 8 hours off between shifts
            1. <60 total hours worked in week
                -  On a long weekend, assuming 32 hours worked going into weekend
            1. max 12 total hours worked consecutively
 - If slots remain after all staff are iterated through, then:
     - Iterate through all staff from least to most senior, applying modified constraints to determine who must be forced:
        1. trained on job
        1. less than 48 hours worked in the week
        1. less than 8 hours forced
        1. minimum 12 hours off between shifts
        1. <60 total hours worked in week
        1. max 12 total hours worked consecutively

If in the end there is no one available for forcing, then the scheduling team will have to determine whether or not would most likely decide to change which slots are being assigned to move that gap to a different role for which a gap can be sustained in production circumstances. That would be another item to add to the assertion list. 

#### Focusing on Simple
While the above algorithm is what is prescribed by the collective agreement, the rules also state that a given employee has priority selection over another corresponding to which shift they were staffed in the week prior. Carrying out the above algorithm would result in lots of folks being assigned to a shift they volunteered for but don't have first dibs on; for example the most common preference is day shift but only 1/3 staff were on day shift at a given time. Following the algorithm as initially defined then would lead to significant computational efficiency and the need for more complicated programming to create a cascading/recursive bump management script whereby if someone were assigned to a shift they don't have priority selection for and it turns out another person has rights to it, the former would be removed, and, since their already being assigned implies they had greater priority to recieve any assignment in general, another slot would have to be sought for them, including slots taken by someone with lesser priority.. and so on. For this reason the actual algorithm is modified to eliminate the need for backtracking (this is analogous to the actual process carried out manually at this time): Instead of iterating through staff in sequence determined by CBA (# of hours), create a separate list of staff, separated by crew and employment type. Carry out the above simple algorithm on each subset of these with the corresponding reduced set of available slots each time. 

Summarized:
- Carry out above algorithm with on-shift full-timers, for each shift (C/A/B) (in existing data input format, probationaries are included in FT list)
- Carry out above algorithm with off-shift full-timers, for each shift (C/A/B) (following the C->A->B->C->A priority selection format)
- Carry out above algorithm with on-shift Temp staff, for each shift (C/A/B)
- Carry out above algorithm with off-shift Temp staff, for each shift (C/A/B)

To conclude, the outer most loop is across the different sets of staff groups, and then the inner loop is across shifts. And once again, this is done so that there should never be a situation where an individual is being bumped out of their slot by another later in the assignment process, simplifying the program implementation overall. 

#### Exploring Problems
A problem that comes to mind. Postulate: the 'most constrained first' assignment heuristic, as defined, could generate a schedule structure where someone is passed over for a slot because of the minimum shift gap constraint, when a different valid assignment earlier would ahve created the possibility of a longer contiguous shift, which in fact is what should happen.

Example:
Worker A is interested in working an 8 hour shift between 7a and 7p and their priority is daytime (7a-3p). Worker B is interested in working 4 hours between 7a and 3p and their priority is daytime (7a-3p). Higher priority individual B due to the 'most-constrained first' assignment heuristic may be assigned the 11a-3p slot, leaving A to be assigned either the 7a-3p, or 3p-7p slot. The problem is that if B were assigned the first slot of the day, the following two slots could both be covered by A, preventing the need to force anyone for the 4 hour gap posed by the former arrangement. 

The question is whether or not the postulate/thought experiment bears out... further thoughts follow; in this situation where forcing would be required, that would imply that no one else was available to fill the slots. Looking at the most-constrained heuristic in greater detail, this would mean that when B is assigned, slot 1 and 2 are both tied for 2 potential assignees (A or B), whereas slot 3 is most constrained with 1 potential assignee, A. It seems to me that this is a scenario I could leave under the umbrella of 'manual review + assignments' but my gut tells me that is instead a problem of providing decision criteria for when there is a tie in the 'most constrained' heuristic between slots. Proceeding with this, it seems obvious to me from the example thought experiment that the criteria should then be a comparison on the number of potential assignees for the slots neighbouring to the one being considered. The challenge in circumventing this problem is in creating a decision criteria where the cure isn't worse than the disease in terms of code implementation... The central challenge of the entire scheduling problem looms large in this small decision case, which is that the state that assignment variables will take later in the process can't be known except by carrying out the whole process to get there. The idea that leads me to is to check for a shift-splitting situation as described. That can be done simply by performing the following check: remove the worker whose assignment is being made from the pool of eligible assignees. Observe, then, if there are any sets of 2 or 3 contiguous slots with only 1 and the same worker eligible. Remove those slots from the pool of eligible assignments for the worker whose assignment is being made. Assign to remaining slot pool according to most constrained criteria. If >0 slots are available but insufficient to complete the persons voluntary shift, then return those removed shifts to the eligible pool and connect them. If 0 slots are available when the other were removed, then return them to the eligible pool but assign only slots from the edge of the group.

If one neighbouring slot are unassigned and each have only one and the same potential assignee after removing the assignee in question from the pool. Applied to the same thought experiment, slot one would not be identified as a shift splitting assignment since the previous slot would be assigned. Slot two would be assigned as a shift splitting assignment since slots 1 and 3 each have only A as the eligible assignee after B is assigned to slot 2. Omitting the shift splitting decision from the pool leaves only slot 1 to be assigned to B. Bear in mind that this assumes If both available slots have this criteria, then the fact of which slot comes first can arbitrarily be used to break the tie... This is because when it comes down to brass tacks, that individual B is entitled to their OT selection in that scenario even if it forces that shift split and leads to someone being forced, or a gap. The problem of a gap can be addressed outside the context of the program.

The former problem posed by someone selecting a small 4 hour block also brings to mind the other problem of people volunteering for 12 hour blocks. The challenge is that the algorithm is defined as looking first at each shift (8 hour blocks), but staff are eligible for 12 hour blocks across shifts, or 8 hour blocks straddling shifts. 

I'm at the point in this thought experiment now where I think that trying to implement a problem specific solution here has too great a risk of introducing unintended consequences that bring failure. For the sake of time and simplicity I'll proceed with a version 1 that leaves the resolution of these issues in the hands of the user via the forced assignments function.

### Data Structure

To make the program easy to maintain, debug, and code, the data structure of classes/objects/attributes and their relationships should be carefully constructed to facilitate the intended actions. My goal is to have a rigorously modular/generalized system, where almost every process in the final algorithm is a method on an object. This will make the code readable, flexible, and easier to debug in development and deployment. With the above algorithm in mind, I made a dummy script to let inuition guide the insight as to necessity of what classes/attributes/methods would be required:

In [None]:
CollectData():
    pullTables()  # Per code in above sections
    configData()  # Define timeslots objects, worker objects, collections of workers per shift/type
S=Sched(date)
Sched.preFill() # Iterate through & enact prescribed assignments
Sched.VolunteerFill():
    for eeTypePool in (onShiftFT,offShiftFT,onShiftTemp,offShiftTemp):
        for shift in shiftSet: #shiftSet is built based on what days selected to schedule. Always seq last to first.
            namePool=poolPicker(shift,eeTypePool)
            for person in namePool: #Idea: Define a generator function to yield the next person, across ee categories
                slotPool=filterSlots: #per sequence above, evaluate each criteria in sequence and remove slot from pool if fails any criteria
                    isTrained, Volunteered, shiftGapOK, wklyTotOK, maxShiftLenOK
                if poolEmpty: next person
                s=pickSlot #Most constrained slot (if *only* person avail for off-shift, assign there. If only person for multiple, assign first chronological. If candidates>1, take most constrained on-shift slot. If tied, take first chronological) 
                assignSlot(s,type=voluntary) #Perform necessary functions i.e. removing tally of op from no-longer compatible shifts.
Sched.forceFill():
    for slot in Sched.unassigned.chronologicalSeq:
        assignee=lowMan(slot):
                    filter all ee for training, sort seniority low-hi, check if already worked 8+ hours, check if already forced 8 hours
        #if no assignee, flag slot as 'no staff'
        result=unassignAsNecessary(assignee,slot)
                    check if the forcing would conflict with other constraint (inter-shift gap, weekly total hrs, shiftduration, etc)
                    if the conflicting slot requiring unassignment is from the assertion list, then return an error flag for printout
        assignSlot(slot,type=forced)
#Following forcing, some slots may be available if staff were unassigned from slots due to total hour limits triggered by forcing.
#Followup with the s

In [None]:
class Schedule():
    def __init__(self,slots):
        #self.Slots=   #A collection of Slot objects that compose this schedule
        pass
    def opnSlt(self):
        """Returns a collection of all unassigned slots in this schedule,"""
        pass

    def preFill():
        """Enter all predefined assignments into the schedule"""
        

class Slot():
    """A single 4 hour time slot for a single job, to be filled by 1 person"""
    def __init__(self,seq,dispNm,trnNm):
        self.trnNm=trnNm #to be used when filtering out staff for training
        self.dispNm=dispNm #to be used for printouts
        #self.datetime=   #Determined based on seq. Used in printout of assignments
        self.eeid=None #To store eeid of assignee for printout
        self.sLen
        
