# MATH 441 Group 5 Project

**Aziz, Mika, Spock**

## 1) Project Title: Assigning Employee to Shifts for a Financial Institution

**Problem Statement:**

How can we assign employees to suitable shifts in a financial institution to minimize labor costs while considering employee skill levels and salary?

**Relevant Real-world Examples:**

* Study how employee is assigned to shifts in banks or financial institutions.
* Explore existing optimization algorithms applied to employee scheduling problems.

**Data and Computations:**

 Data:
* Employee skill level.
* Employee training level.
* Employee salary.

**Note**: In search of a suitable dataset to apply this optimization on, we will attempt to find relevant employee scheduling information from local organizations like banks, investment firms and other financial institutions. There are of course some difficulties when it comes to obtaining said data:
1. Companies might not release such data publicly
2. Needed data might not be formatted as expected
3. A lack of data in general on employee scheduling

We will attempt to remedy this through a few methods, such as reaching out to companies to obtain anonymous data, generate data based on a few known paramater distributions etc.


## 1.5) Setting Up the Environment

But before we can start with the Data generation and LP solving procedures, we need to start by importing a few important libraries and modules, and performing any first-time setup necessary for the project.

First, install any required libraries.

In [1]:
! pip3 install -r requirements.txt



Now, let's import our common libraries.

In [2]:
import numpy as np
import random
import csv
import pulp

Finally, let's define and assign values where needed to any important constants and variables that are used globally within the project

In [3]:
employees_data = []
num_employees = 100
num_shifts = 100

skill_levels = {"entry-level": 0, "junior": 1, "senior": 2, "manager": 3}
tasks_with_min_levels = {
    "Account opening": 0,
    "Credit card application": 0,
    "Loan Application": 1,
    "Mortgage Consultation": 2,
    "Retirement planning": 2,
    "Financial advising": 2,
    "Wealth management": 3
}

distribution = {
    "entry-level": 0.4,
    "junior": 0.3,
    "senior": 0.2,
    "manager": 0.1
}

salary_weight = {
    "K": 1000,
    "a": 3,
    "b": 5,
    "c": 10,
    "d": 35,
}

## 2) Defining the parameters and variables of the scheduling problem

Before we begin with the sovling of the integer programming problem at the core of our project, we need to first define our parameters and variables. The variables that we identify also need to be enumerated so we can use a python solver.

First, we decide the employee skill levels, denoted as l, $0 \leq l\leq 3$, which also represent what shift they are authorized to do.
- skill levels l:
    - 0 - entry-level
    - 1 - junior
    - 2 - senior
    - 3 - manager


**Defining the types of shift S and what skill levels are needed**

1. Account opening - entry 
2. Credit card application - entry 
3. Loan Application - junior 
4. Mortgage Consultation - senior 
5. retirement planning - senior
6. financial advising - senior
7. wealth management - manager

**Defining employee skill levels**



**Constraints**

- An employee with entry-level skill cannot do a shift that requires junior level or higher
- An employee with junior-level skill cannot do a shift that requires senior level or higher
- An employee with senior-level skill cannot do a shift that requires manager level

**Defining labor cost by training level of employee** 

Each employee data is an array consisting of the following three elements:<br>

#### Skill level:
A number ranging from 0-3, representing what shifts they are authorized to do<br>
#### Training level:
An array consisting of 7 numbers ranging from 1-100, showing the training level of the employee on each shifts.<br>
For those shifts that the employee is not authorized to do, the training level will be set to 0.<br>
For example:<br>
An entry-level employee will be something like [32,15,0,0,0,0,0];<br>
and a senior employee will be something like [45,72,61,13,4,80,0].

#### Salary:
The salary of each employee is related to their training level.<br>
Below is the salary formula I randomly make up:<br>
$$
\text{Salary}=K + a\times (\text{Sum of Training level of shift 1,2}) + b\times (\text{Training level of shift 3}) + c\times (\text{Sum of Training level of shift 4,5,6}) + d\times (\text{Sum of Training level of shift 7})
$$
$$
\text{where $K,a,b,c,d$ are all real numbers, with $K$ as the base salary and $a,b,c,d$ as the weight on each type of shifts paid.}
$$

**To summuraise our problem,<br>
we have a certain amount of all the 7 types of shifts that need to assign exactly 1 employee for each shift.**

**We want to assign shifts to employees in such a way that for each shift,<br>
we select employees with the highest training level for that shift,<br>
then choose an employee with lowest salary among the selected employees to assign this employee to the shift,<br>
aiming to minimize overall labor costs.**

## 3) Data

**Employee data**

In [4]:
def generate_employee_data_custom_distribution(num_employees, skill_levels, tasks_with_min_levels, salary_weight, distribution):

    # Calculate the number of employees in each skill level based on the distribution
    num_employees_distribution = {level: int(pct * num_employees) for level, pct in distribution.items()}
    
    # Adjust for any rounding differences to ensure the total count matches num_employees
    while sum(num_employees_distribution.values()) < num_employees:
        num_employees_distribution[random.choice(list(num_employees_distribution.keys()))] += 1
        
    # Generate data for each employee based on the distribution
    for skill_level_label, count in num_employees_distribution.items():
        skill_level = skill_levels[skill_level_label]
        for _ in range(count):
            training_level_array = []
            for task, min_level in tasks_with_min_levels.items():
                if skill_level >= min_level:
                    training_level = random.randint(1, 100)  # Training levels range from 1 to 100
                else:
                    training_level = 0
                training_level_array.append(training_level)
            salary = salary_weight["K"] + salary_weight["a"] * (training_level_array[0]+training_level_array[1]) \
            + salary_weight["b"] * (training_level_array[2]) \
            + salary_weight["c"] * (training_level_array[3]+training_level_array[4]+training_level_array[5]) \
            + salary_weight["d"] * (training_level_array[6])
            employees_data.append((skill_level_label, training_level_array, salary))
    
    # Shuffle the data to mix skill levels
    random.shuffle(employees_data)
    
    return employees_data

employees_data = generate_employee_data_custom_distribution(num_employees, skill_levels, tasks_with_min_levels, salary_weight, distribution)

# Output the first and last five employees
first_five = employees_data[:5]
last_five = employees_data[-5:]

print("First five employees:")
for i, (skill, training_level, salary) in enumerate(first_five):
    print(f"Employee {i+1}: Skill Level - {skill}, Training level - {training_level}, Salary - {salary}")

print("\nLast five employees:")
for i, (skill, training_level, salary) in enumerate(last_five):
    print(f"Employee {i+1}: Skill Level - {skill}, Training level - {training_level}, Salary - {salary}")

# Save to CSV file
csv_file_path = "employees_data.csv"
with open(csv_file_path, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Skill Level", "Training level", "Salary"])
    for skill, training_level, salary in employees_data:
        writer.writerow([skill, training_level, salary])


First five employees:
Employee 1: Skill Level - entry-level, Training level - [94, 37, 0, 0, 0, 0, 0], Salary - 1393
Employee 2: Skill Level - junior, Training level - [52, 77, 90, 0, 0, 0, 0], Salary - 1837
Employee 3: Skill Level - manager, Training level - [30, 43, 99, 55, 17, 96, 74], Salary - 5984
Employee 4: Skill Level - entry-level, Training level - [49, 23, 0, 0, 0, 0, 0], Salary - 1216
Employee 5: Skill Level - entry-level, Training level - [93, 76, 0, 0, 0, 0, 0], Salary - 1507

Last five employees:
Employee 1: Skill Level - entry-level, Training level - [69, 4, 0, 0, 0, 0, 0], Salary - 1219
Employee 2: Skill Level - senior, Training level - [48, 47, 15, 87, 12, 45, 0], Salary - 2800
Employee 3: Skill Level - entry-level, Training level - [73, 45, 0, 0, 0, 0, 0], Salary - 1354
Employee 4: Skill Level - junior, Training level - [18, 87, 59, 0, 0, 0, 0], Salary - 1610
Employee 5: Skill Level - junior, Training level - [99, 20, 45, 0, 0, 0, 0], Salary - 1582


**Customer demand data**

For each day, we need to decide how many customers demand for an appointment. Banks usually data from previous years to readily predict how many customers need employees with shift j in a day. Since we couldn't find any data related, we will generate a random number of customers for each shift depending on the distribution. We decided to distribute the shift j by 0.2, 0.15, 0.2, 0.1, 0.2, 0.1, 0.05 respectively.

In [5]:
def generate_customer_data_custom_distribution(num_customer, days, distribution):
    customer_data = []

    for day, index in days.items():
        # we would have a different number of customers demanded for each day from  1 to num_customer
        num_customer = np.random.randint(1, num_customer)

        # calculate the number of customer in shifts based on the distribution
        num_customer_distribution = {shift: int(pct * num_customer) for shift, pct in distribution.items()}
        
        # Adjust for any rounding differences to ensure the total count matches num_employees
        while sum(num_customer_distribution.values()) < num_customer:
            num_customer_distribution[random.choice(list(num_customer_distribution.keys()))] += 1

        # for each shift, get the distribution
        for shift_label, count in num_customer_distribution.items():
            customer_data.append((day, shift_label, count))
    
    # Shuffle the data to mix skill levels
    random.shuffle(customer_data)
    
    return customer_data

# Example usage
days = {"Monday": 0, 
        "Tuesday": 1, 
        "Wednesday": 2, 
        "Thursday": 3,
        "Friday": 4}

distribution = {
    "Account opening": 0.2,
    "Credit card application": 0.15,
    "Loan Application": 0.2,
    "Mortgage Consultation": 0.1,
    "Retirement planning": 0.2,
    "Financial advising": 0.1,
    "Wealth management": 0.05
}

num_customer = 100

customer_data = generate_customer_data_custom_distribution(num_customer, days, distribution)

# Output the first and last five customer
first_five = customer_data[:5]
last_five = customer_data[-5:]

print("First five shift demand:")
for i, (day, shift, shift_demand) in enumerate(first_five):
    print(f"Shift {i+1}: Day - {day}, Shift - {shift}, Shift Demand - {shift_demand}")

print("\nLast five shift demand:")
for i, (day, shift, shift_demand) in enumerate(last_five):
    print(f"Shift {i+1}: Day - {day}, Shift - {shift}, Shift Demand - {shift_demand}")

# Save to CSV file
csv_file_path = "customer_data.csv"
with open(csv_file_path, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Day", "Shift", "Shift Demand"])
    for day, shift, shift_demand in customer_data:
        writer.writerow([day, shift, shift_demand])

First five shift demand:
Shift 1: Day - Monday, Shift - Wealth management, Shift Demand - 4
Shift 2: Day - Wednesday, Shift - Financial advising, Shift Demand - 3
Shift 3: Day - Monday, Shift - Financial advising, Shift Demand - 7
Shift 4: Day - Thursday, Shift - Account opening, Shift Demand - 2
Shift 5: Day - Monday, Shift - Retirement planning, Shift Demand - 14

Last five shift demand:
Shift 1: Day - Wednesday, Shift - Retirement planning, Shift Demand - 4
Shift 2: Day - Tuesday, Shift - Account opening, Shift Demand - 7
Shift 3: Day - Monday, Shift - Credit card application, Shift Demand - 10
Shift 4: Day - Wednesday, Shift - Account opening, Shift Demand - 3
Shift 5: Day - Tuesday, Shift - Credit card application, Shift Demand - 5


## 3) Formulate the problem as a linear programming problem

**Decision variables:**<br>
Let $x_{ij} \in {0,1}$ be a binary variable, where $x_{ij} = 1$ if employee $i$ is assigned to shift $j$, and 0 otherwise.

**Objective:**<br>
Minimize the total labor cost, which is the sum of the salaries of the assigned employees:
$$
\sum_i \sum_j x_{ij} \times \text{Salary$_i$}
$$
$$
\text{where Salary$_i$ is the salary of employee $i$.}
$$

**Constraints:**

1) Each shift must be assigned to exactly one employee:
$$
\sum_i x_{ij} = 1 \ , \ \ \text{for all shifts} \ j
$$

2) An employee can be assigned to at most one shift:
$$
\sum_j x_{ij}  \leq  1 \ , \ \ \text{for all employees} \ i
$$

3) Ensure that only employees with the required skill level can be assigned to shifts:
$$
x_{ij} = 0 \ , \ \ \text{if employee $i$ does not have the required skull level for shift $j$}
$$

Now, incorporating the additional requirement of selecting the employee with the highest training level and lowest salary for each shift, we add the following constraints:

4) For each shift, select the employee with the highest training level:
$$
\sum_i t_{ij}\times x_{ij} = \max{\{t_{i'j}| i'\text{ has the required skill level for shift } j\}}
$$
$$
\text{where $t_{ij}$ is the training level of employee $i$ for shift $j$.}
$$

4) Ensure that the selected employee has the lowest salary among those with the highest training level for each shift:
$$
\sum_i s_{i}\times x_{ij} = \min{\{s_{i'}| i'\text{ has the highest training level for shift } j\}}
$$
$$
\text{where $s_{i}$ is the salary of employee $i$.}
$$





In [None]:
# Create a LP minimization problem
prob = pulp.LpProblem("Shift_Assignment", pulp.LpMinimize)

In [None]:
# Create shifts data set
# A shifts data set is an array containing of 7 integers summing up to 100,
# indicating how many employee needs to be assigned for each seven type of shifts accordingly

def generate_shifts(num_shifts=100):
    proportions = np.random.random(7)
    proportions /= proportions.sum()  # Normalize to sum to 1
    shifts = np.round(proportions * num_shifts)  # Scale and round

    while shifts.sum() != num_shifts:
        difference = num_shifts - shifts.sum()
        indices = np.arange(7)
        np.random.shuffle(indices)  # Shuffle indices to distribute adjustments randomly
        for i in indices:
            if difference > 0 and shifts[i] < num_shifts:  # Need to add to the total
                shifts[i] += 1
                difference -= 1
            elif difference < 0:  # Need to subtract from the total
                if shifts[i] > 0:  # Avoid making any shift negative
                    shifts[i] -= 1
                    difference += 1
            if difference == 0:
                break

    return shifts

In [None]:
# Generate the shifts data

shifts_data = generate_shifts(num_shifts)
print(shifts_data)

In [None]:
# Define binary decision variables
# x_ij = 1 if employee i is assigned to shift j, 0 otherwise
x = pulp.LpVariable.dicts("Assignment", 
                          [(i, j) for i in range(num_employees) for j in range(num_shifts)], 
                          cat='Binary')

In [None]:
salaries = [employee[2] for employee in employees_data]
# print(salaries)

In [None]:
# Define the objective function (total labor cost)
prob += pulp.lpSum(x[i, j] * salaries[i] for i in range(num_employees) for j in range(num_shifts))

# Constraint: Each shift must be assigned to exactly one employee
for j in range(num_shifts):
    prob += pulp.lpSum(x[i, j] for i in range(num_employees)) == 1

# Constraint: An employee can be assigned to at most one shift
for i in range(num_employees):
    prob += pulp.lpSum(x[i, j] for j in range(num_shifts)) <= 1

# Constraint: Ensure that only employees with the required skill level can be assigned to shifts
for i in range(num_employees):
    for j in range(num_shifts):
        if skill_levels[i] < required_skill_levels[j]:
            prob += x[i, j] == 0

# Constraint: For each shift, select the employee with the highest training level
for j in range(num_shifts):
    prob += pulp.lpSum(training_levels[i][j] * x[i, j] for i in range(num_employees)) == max(training_levels[i][j] for i in range(num_employees) if skill_levels[i] >= required_skill_levels[j])

# Constraint: Ensure that the selected employee has the lowest salary among those with the highest training level
for j in range(num_shifts):
    prob += pulp.lpSum(salaries[i] * x[i, j] for i in range(num_employees)) == min(salaries[i] for i in range(num_employees) if training_levels[i][j] == max(training_levels[i][j] for i in range(num_employees) if skill_levels[i] >= required_skill_levels[j]))

In [None]:
employees_data