# MATH 441 Group 5 Project

**Aziz, Mika, Spock**

# Project Title : Assigning Employee to Shifts for a Bank

## 1. Introduction
**Problem Statement:**

How can we assign employees to suitable shifts in a financial institution to minimize labor costs while considering employee skill levels and salary?

**Relevant Real-world Examples:**

* Study how employee is assigned to shifts in banks or financial institutions.
* Explore existing optimization algorithms applied to employee scheduling problems.

**Data and Computations:**

 Data:
* Employee skill level.
* Employee training level.
* Employee salary.
* Shift days
* Employees demanded in a shift

**Note**: In search of a suitable dataset to apply this optimization on, we will attempt to find relevant employee scheduling information from local organizations like banks, investment firms and other financial institutions. There are of course some difficulties when it comes to obtaining said data:
1. Companies might not release such data publicly
2. Needed data might not be formatted as expected
3. A lack of data in general on employee scheduling

We will attempt to remedy this through a few methods, such as reaching out to companies to obtain anonymous data, generate data based on a few known paramater distributions etc.


## 2. Defining the parameters and variables

Before we begin with the sovling of the integer programming problem at the core of our project, we need to first define our data. The variables that we identify also need to be enumerated so we can use a python solver.

First, we decide the employee skill levels, denoted as l, $0 \leq s \leq 3$, which also represent what shift they are authorized to do.

* 0 - entry-level
* 1 - junior
* 2 - senior
* 3 - manager

Next, we need the type of shifts S and the skill levels required for the employees to do the shift. 
1. Account opening - entry 
2. Credit card application - entry 
3. Loan Application - junior 
4. Mortgage Consultation - senior 
5. retirement planning - senior
6. financial advising - senior
7. wealth management - manager

We will need to assign the employees to a shift such that:
- An employee with entry-level skill cannot do a shift that requires junior level or higher
- An employee with junior-level skill cannot do a shift that requires senior level or higher
- An employee with senior-level skill cannot do a shift that requires manager level

Our plan is to schedule employees into shifts by weekly. Since we assume that all employees are working full-time, there would be only five days of seven days a week. We define days as $0 \leq t \leq 4$ that corresponds to Monday, Tuesday, Wednesday, Thursday, and Friday respectively.

There are different types of shifts that needs to be filled, we need to define $d_{jt}$, the demand of employees needed for shift j at day t.  

Each employee data is an array consisting of the following three elements:<br>

**Skill level**

A number ranging from 0-3, representing what shifts they are authorized to do<br>

**Training level**

An array consisting of 7 numbers ranging from 1-100, showing the training level of the employee on each shifts.<br>
For those shifts that the employee is not authorized to do, the training level will be set to 0.<br>
For example:<br>
An entry-level employee will be something like [32,15,0,0,0,0,0];<br>
and a senior employee will be something like [45,72,61,13,4,80,0].

**Salary**

The salary of each employee is related to their training level.<br>
Below is the salary formula I randomly make up:<br>
$$
\text{Salary}=K + a\times (\text{Sum of Training level of shift 1,2}) + b\times (\text{Training level of shift 3}) + c\times (\text{Sum of Training level of shift 4,5,6}) + d\times (\text{Sum of Training level of shift 7})
$$
$$
\text{where $K,a,b,c,d$ are all real numbers, with $K$ as the base salary and $a,b,c,d$ as the weight on each type of shifts paid.}
$$

**To summarize our problem,<br>
we have a certain amount of all the 7 types of shifts that need to assign exactly 1 employee for each shift.**

**We want to assign shifts to employees in such a way that for each shift,<br>
we select employees with the highest training level for that shift,<br>
then choose an employee with lowest salary among the selected employees to assign this employee to the shift,<br>
aiming to minimize overall labor costs.**

## Data

Below is the data that we generate for employees and shift

In [20]:
import numpy as np
import random
import csv

def generate_employee_data_custom_distribution(num_employees, skill_levels, tasks_with_min_levels, salary_weight, distribution):
    employees_data = []
    
    # Calculate the number of employees in each skill level based on the distribution
    num_employees_distribution = {level: int(pct * num_employees) for level, pct in distribution.items()}
    
    # Adjust for any rounding differences to ensure the total count matches num_employees
    while sum(num_employees_distribution.values()) < num_employees:
        num_employees_distribution[random.choice(list(num_employees_distribution.keys()))] += 1
        
    # Generate data for each employee based on the distribution
    for skill_level_label, count in num_employees_distribution.items():
        skill_level = skill_levels[skill_level_label]
        for _ in range(count):
            training_level_array = []
            for task, min_level in tasks_with_min_levels.items():
                if skill_level >= min_level:
                    training_level = random.randint(1, 100)  # Training levels range from 1 to 100
                else:
                    training_level = 0
                training_level_array.append(training_level)
            salary = salary_weight["K"] + salary_weight["a"] * (training_level_array[0]+training_level_array[1]) \
            + salary_weight["b"] * (training_level_array[2]) \
            + salary_weight["c"] * (training_level_array[3]+training_level_array[4]+training_level_array[5]) \
            + salary_weight["d"] * (training_level_array[6])
            employees_data.append((skill_level_label, training_level_array, salary))
    
    # Shuffle the data to mix skill levels
    random.shuffle(employees_data)
    
    return employees_data

# Example usage
skill_levels = {"entry-level": 0, "junior": 1, "senior": 2, "manager": 3}
tasks_with_min_levels = {
    "Account opening": 0,
    "Credit card application": 0,
    "Loan Application": 1,
    "Mortgage Consultation": 2,
    "Retirement planning": 2,
    "Financial advising": 2,
    "Wealth management": 3
}

distribution = {
    "entry-level": 0.4,
    "junior": 0.3,
    "senior": 0.2,
    "manager": 0.1
}

salary_weight = {
    "K": 1000,
    "a": 3,
    "b": 5,
    "c": 10,
    "d": 35,
}

num_employees = 100
employees_data = generate_employee_data_custom_distribution(num_employees, skill_levels, tasks_with_min_levels, salary_weight, distribution)

# Output the first and last five employees
first_five = employees_data[:5]
last_five = employees_data[-5:]

print("First five employees:")
for i, (skill, training_level, salary) in enumerate(first_five):
    print(f"Employee {i+1}: Skill Level - {skill}, Training level - {training_level}, Salary - {salary}")

print("\nLast five employees:")
for i, (skill, training_level, salary) in enumerate(last_five):
    print(f"Employee {i+1}: Skill Level - {skill}, Training level - {training_level}, Salary - {salary}")

# Save to CSV file
csv_file_path = "employees_data.csv"
with open(csv_file_path, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Skill Level", "Training level", "Salary"])
    for skill, training_level, salary in employees_data:
        writer.writerow([skill, training_level, salary])


First five employees:
Employee 1: Skill Level - senior, Training level - [74, 2, 10, 22, 34, 49, 0], Salary - 2328
Employee 2: Skill Level - junior, Training level - [26, 64, 26, 0, 0, 0, 0], Salary - 1400
Employee 3: Skill Level - entry-level, Training level - [11, 3, 0, 0, 0, 0, 0], Salary - 1042
Employee 4: Skill Level - senior, Training level - [28, 96, 82, 57, 57, 17, 0], Salary - 3092
Employee 5: Skill Level - junior, Training level - [7, 87, 60, 0, 0, 0, 0], Salary - 1582

Last five employees:
Employee 1: Skill Level - junior, Training level - [3, 20, 2, 0, 0, 0, 0], Salary - 1079
Employee 2: Skill Level - entry-level, Training level - [77, 43, 0, 0, 0, 0, 0], Salary - 1360
Employee 3: Skill Level - entry-level, Training level - [89, 24, 0, 0, 0, 0, 0], Salary - 1339
Employee 4: Skill Level - senior, Training level - [35, 75, 95, 73, 1, 50, 0], Salary - 3045
Employee 5: Skill Level - junior, Training level - [18, 7, 85, 0, 0, 0, 0], Salary - 1500


For each day, we need to decide how many customers demand for an appointment. Banks usually data from previous years to readily predict how many customers need employees with shift j in a day. Since we couldn't find any data related, we will generate a random number of customers for each shift depending on the distribution. We decided to distribute the shift j by 0.2, 0.15, 0.2, 0.1, 0.2, 0.1, 0.05 respectively. 

In [27]:
def generate_customer_data_custom_distribution(num_customer, days, distribution):
    customer_data = []

    for day, index in days.items():
        # we would have a different number of customers demanded for each day from  1 to num_customer
        num_customer = np.random.randint(1, num_customer)

        # calculate the number of customer in shifts based on the distribution
        num_customer_distribution = {shift: int(pct * num_customer) for shift, pct in distribution.items()}
        
        # Adjust for any rounding differences to ensure the total count matches num_employees
        while sum(num_customer_distribution.values()) < num_customer:
            num_customer_distribution[random.choice(list(num_customer_distribution.keys()))] += 1

        # for each shift, get the distribution
        for shift_label, count in num_customer_distribution.items():
            customer_data.append((day, shift_label, count))
    
    # Shuffle the data to mix skill levels
    random.shuffle(customer_data)
    
    return customer_data

# Example usage
days = {"Monday": 0, 
        "Tuesday": 1, 
        "Wednesday": 2, 
        "Thursday": 3,
        "Friday": 4}

distribution = {
    "Account opening": 0.2,
    "Credit card application": 0.15,
    "Loan Application": 0.2,
    "Mortgage Consultation": 0.1,
    "Retirement planning": 0.2,
    "Financial advising": 0.1,
    "Wealth management": 0.05
}

num_customer = 100

customer_data = generate_customer_data_custom_distribution(num_customer, days, distribution)

# Output the first and last five customer
first_five = customer_data[:5]
last_five = customer_data[-5:]

print("First five shift demand:")
for i, (day, shift, shift_demand) in enumerate(first_five):
    print(f"Shift {i+1}: Day - {day}, Shift - {shift}, Shift Demand - {shift_demand}")

print("\nLast five shift demand:")
for i, (day, shift, shift_demand) in enumerate(last_five):
    print(f"Shift {i+1}: Day - {day}, Shift - {shift}, Shift Demand - {shift_demand}")

# Save to CSV file
csv_file_path = "customer_data.csv"
with open(csv_file_path, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Day", "Shift", "Shift Demand"])
    for day, shift, shift_demand in customer_data:
        writer.writerow([day, shift, shift_demand])

First five shift demand:
Shift 1: Day - Thursday, Shift - Credit card application, Shift Demand - 1
Shift 2: Day - Tuesday, Shift - Wealth management, Shift Demand - 2
Shift 3: Day - Thursday, Shift - Financial advising, Shift Demand - 1
Shift 4: Day - Friday, Shift - Retirement planning, Shift Demand - 1
Shift 5: Day - Wednesday, Shift - Financial advising, Shift Demand - 2

Last five shift demand:
Shift 1: Day - Wednesday, Shift - Mortgage Consultation, Shift Demand - 1
Shift 2: Day - Thursday, Shift - Account opening, Shift Demand - 3
Shift 3: Day - Tuesday, Shift - Retirement planning, Shift Demand - 10
Shift 4: Day - Thursday, Shift - Retirement planning, Shift Demand - 2
Shift 5: Day - Tuesday, Shift - Loan Application, Shift Demand - 9


## 3) Formulate the problem as a linear programming problem

**Decision variables:**<br>
Let $x_{ij} \in {0,1}$ be a binary variable, where $x_{ij} = 1$ if employee $i$ is assigned to shift $j$, and 0 otherwise.

**Objective:**<br>
Minimize the total labor cost, which is the sum of the salaries of the assigned employees:
$$
\sum_i \sum_j x_{ij} \times \text{Salary$_i$}
$$
$$
\text{where Salary$_i$ is the salary of employee $i$.}
$$

**Constraints:**
1) Each shift must be assigned to exactly one employee:
$$
\sum_i x_{ij} = 1 \ , \ \ \text{for all shifts} \ j
$$

2) An employee can be assigned to at most one shift:
$$
\sum_j x_{ij}  \leq  1 \ , \ \ \text{for all employees} \ i
$$

3) Ensure that only employees with the required skill level can be assigned to shifts:
$$
x_{ij} = 0 \ , \ \ \text{if employee $i$ does not have the required skull level for shift $j$}
$$

Now, incorporating the additional requirement of selecting the employee with the highest training level and lowest salary for each shift, we add the following constraints:

4) For each shift, select the employee with the highest training level:
$$
\sum_i t_{ij}\times x_{ij} = \max{\{t_{i'j}| i'\text{ has the required skill level for shift } j\}}
$$
$$
\text{where $t_{ij}$ is the training level of employee $i$ for shift $j$.}
$$

4) Ensure that the selected employee has the lowest salary among those with the highest training level for each shift:
$$
\sum_i s_{i}\times x_{ij} = \min{\{s_{i'}| i'\text{ has the highest training level for shift } j\}}
$$
$$
\text{where $s_{i}$ is the salary of employee $i$.}
$$





In [7]:
pip install pulp




In [3]:
import pulp

# Create a LP minimization problem
prob = pulp.LpProblem("Shift_Assignment", pulp.LpMinimize)

In [4]:
num_shifts=100

In [5]:
# Create shifts data set
# A shifts data set is an array containing of 7 integers summing up to 100,
# indicating how many employee needs to be assigned for each seven type of shifts accordingly

totals = num_shifts

a = np.random.random(7)
a = a/np.sum(a) * totals

a = np.round(a)  # transform them into integers
remainings = totals - np.sum(a)  # check if there are corrections to be done
if remainings != 0:
     i = np.random.randint(7)
     if a[i] + step >= 0:
          a[i] += step
          r -= step

shifts_data = a
print(shifts_data)

NameError: name 'np' is not defined

In [81]:
# Define binary decision variables
# x_ij = 1 if employee i is assigned to shift j, 0 otherwise
x = pulp.LpVariable.dicts("Assignment", 
                          [(i, j) for i in range(num_employees) for j in range(num_shifts)], 
                          cat='Binary')

In [77]:
salaries = [employee[2] for employee in employees_data]
# print(salaries)

In [34]:
# Define the objective function (total labor cost)
prob += pulp.lpSum(x[i, j] * salaries[i] for i in range(num_employees) for j in range(num_shifts))

# Constraint: Each shift must be assigned to exactly one employee
for j in range(num_shifts):
    prob += pulp.lpSum(x[i, j] for i in range(num_employees)) == 1

# Constraint: An employee can be assigned to at most one shift
for i in range(num_employees):
    prob += pulp.lpSum(x[i, j] for j in range(num_shifts)) <= 1

# Constraint: Ensure that only employees with the required skill level can be assigned to shifts
for i in range(num_employees):
    for j in range(num_shifts):
        if skill_levels[i] < required_skill_levels[j]:
            prob += x[i, j] == 0

# Constraint: For each shift, select the employee with the highest training level
for j in range(num_shifts):
    prob += pulp.lpSum(training_levels[i][j] * x[i, j] for i in range(num_employees)) == max(training_levels[i][j] for i in range(num_employees) if skill_levels[i] >= required_skill_levels[j])

# Constraint: Ensure that the selected employee has the lowest salary among those with the highest training level
for j in range(num_shifts):
    prob += pulp.lpSum(salaries[i] * x[i, j] for i in range(num_employees)) == min(salaries[i] for i in range(num_employees) if training_levels[i][j] == max(training_levels[i][j] for i in range(num_employees) if skill_levels[i] >= required_skill_levels[j]))

KeyError: 0

In [23]:
employees_data

[('senior', [89, 70, 81, 38, 15, 37, 0], 2782),
 ('junior', [76, 67, 13, 0, 0, 0, 0], 1494),
 ('manager', [98, 54, 67, 21, 41, 52, 67], 5276),
 ('junior', [17, 44, 45, 0, 0, 0, 0], 1408),
 ('junior', [84, 85, 60, 0, 0, 0, 0], 1807),
 ('manager', [11, 93, 97, 71, 57, 96, 1], 4072),
 ('entry-level', [96, 80, 0, 0, 0, 0, 0], 1528),
 ('junior', [96, 82, 46, 0, 0, 0, 0], 1764),
 ('manager', [25, 80, 10, 14, 81, 7, 26], 3295),
 ('junior', [68, 2, 17, 0, 0, 0, 0], 1295),
 ('entry-level', [70, 13, 0, 0, 0, 0, 0], 1249),
 ('entry-level', [79, 31, 0, 0, 0, 0, 0], 1330),
 ('entry-level', [37, 91, 0, 0, 0, 0, 0], 1384),
 ('junior', [73, 37, 75, 0, 0, 0, 0], 1705),
 ('entry-level', [60, 19, 0, 0, 0, 0, 0], 1237),
 ('senior', [5, 91, 66, 52, 67, 60, 0], 3408),
 ('entry-level', [21, 18, 0, 0, 0, 0, 0], 1117),
 ('manager', [13, 5, 14, 96, 79, 43, 84], 6244),
 ('entry-level', [98, 37, 0, 0, 0, 0, 0], 1405),
 ('entry-level', [20, 46, 0, 0, 0, 0, 0], 1198),
 ('entry-level', [24, 3, 0, 0, 0, 0, 0], 1081),