### Importing libraries :

In [259]:
import pandas as pd
import numpy as np
import warnings 

warnings.filterwarnings("ignore")

In [260]:
from pulp import *

### Importing the csv files created: 

In [261]:
user_eligibility =  pd.read_csv("Job_scoring_users.csv")
user_preferences = pd.read_csv("user_preferences.csv")

#### Looking at the user scores which are purely based on their eligibilty for the job (irrespective of location):

In [262]:
user_eligibility.head(5)

Unnamed: 0,Candidate,Panchkula,Ambala,Faridabad,Gurgaon,Panipat
0,A,2,2,2,2,2
1,B,4,4,4,4,4
2,C,5,5,5,5,5
3,D,2,2,2,2,2
4,E,6,6,6,6,6


#### Looking at the user prefences based on location :

In [263]:
user_preferences.head(5)

Unnamed: 0,Candidate,Panchkula,Ambala,Faridabad,Gurgaon,Panipat
0,A,1,2,5,4,3
1,B,2,1,5,4,3
2,C,2,3,1,5,4
3,D,2,3,5,1,4
4,E,1,2,5,4,3


### Deciding the scoring method: 

In [264]:
users = list(user_eligibility.Candidate.values)
jobs = list( user_eligibility.columns[1:user_eligibility.shape[1]].values)
user_eligibility_array = user_eligibility.iloc[:,range(1,user_eligibility.shape[1])].values

In [265]:
print( "  Maximum user score is : " ,user_eligibility_array.max())

  Maximum user score is :  8


The maximum score provided for scoring the user (without considering the user preferences) is 8

If we build our user preferences and scoring such that the difference between any 2 levels is more than this maximum score, it'll ensure that user preferences are of foremost priority in job matching. That is, even if a user is scored very low then other location user will not be selected for the same job

In [266]:
max_score = 10

In [267]:
user_preferences = user_preferences.replace({
    1:5 * max_score, 
    2:4 * max_score,
    3:3 * max_score,
    4:2 * max_score,
    5:1 * max_score})
user_preferences_num_cols =  user_preferences.iloc[:,range(1,user_eligibility.shape[1])]

#### Looking at the modified user preferences table: 

In [268]:
user_preferences.head(5)

Unnamed: 0,Candidate,Panchkula,Ambala,Faridabad,Gurgaon,Panipat
0,A,50,40,10,20,30
1,B,40,50,10,20,30
2,C,40,30,50,10,20
3,D,40,30,10,50,20
4,E,50,40,10,20,30


In [269]:
user_preferences_0_array =  user_preferences_num_cols.values

We now have two matrices representing the user preferences and the user eligibility. 

In [270]:
print ("User preferences matrix : \n" , user_preferences_0_array, "\n")

print ("User elgibility matrix : \n" , user_eligibility_array)

User preferences matrix : 
 [[50 40 10 20 30]
 [40 50 10 20 30]
 [40 30 50 10 20]
 [40 30 10 50 20]
 [50 40 10 20 30]
 [50 50 20 30 20]
 [40 30 10 20 50]
 [40 50 10 20 30]
 [50 40 10 20 30]
 [50 40 10 20 30]
 [50 40 10 20 30]
 [40 30 50 10 20]
 [40 30 10 50 20]
 [50 40 10 20 30]
 [50 40 10 20 30]
 [50 40 10 20 30]] 

User elgibility matrix : 
 [[2 2 2 2 2]
 [4 4 4 4 4]
 [5 5 5 5 5]
 [2 2 2 2 2]
 [6 6 6 6 6]
 [2 2 2 2 2]
 [4 4 4 4 4]
 [8 8 8 8 8]
 [4 4 4 4 4]
 [3 3 3 3 3]
 [5 5 5 5 5]
 [7 7 7 7 7]
 [2 2 2 2 2]
 [4 4 4 4 4]
 [6 6 6 6 6]
 [5 5 5 5 5]]


### Decision variable: 

Creating a variable y such that it is 1 if the job is alloted to the user and 0 if it isn't 

In [271]:
prob = LpProblem("Matching Jobs", LpMaximize)
y = LpVariable.dicts("pair", [(i,j)  for i in range(len(users)) for j in range(len(jobs)) ] ,cat='Binary')

y can be visualized as matrix of the same shape as the user_preferences and user_elibility matrix except that it can only have values 0/1. Each row is a user to whom job has to be allocated and each column is each location. A value of 1 means that the user has been allocated a job in that particular city and 0 means job has not been allocated

### Maximization problem (objective function) : 

Setting the objective function which is to select the users such that score (sum of user preference score and the eligibility score) is maximized

In [272]:
prob += lpSum([ (user_preferences_0_array[i][j] + user_eligibility_array[i][j]) * y[(i,j)] for i in range(len(users)) for j in range(len(jobs)) ])

### Contraints:

Setting the constraints. There are 3 constraints : 
- Each person should have only job. Jobs from multiple locations should not be allocated to the same person
- Constraints for number of jobs required for each location. e.g. Panchkula requires 4 jobs. So 4 users have to be allocated to Panchkula. i.e. y[i,j] should sum up to 4 for the column that represents Panchkula


In [273]:
## each person should be given only one job 

for i in range(len(users)):
    prob += lpSum(y[(i,j)] for j in range(len(jobs))) <= 1

## Place constraints: For each location, the number of users required is defined
for j in [0]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 4
                  
for j in [1]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 2     
                  
for j in [2]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 2     
                  
for j in [3]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 2     

for j in [4]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 3     

## Solving the LPP
prob.solve()

1

The LPP has been solved in the previous step 

## Results:

##### Creating matrix which shows the  matched values: 

In [274]:
def create_matches_df(y): 

    matches_array =  user_preferences_0_array.copy()
    matches_array[:,:] = 0

    for i in range(len(users)):
        for j in range(len(jobs)):
            matches_array[i,j] =  y[(i,j)].varValue 

    matches_df =  pd.DataFrame(matches_array, columns =  user_eligibility.columns[range(1,user_eligibility.shape[1] )] )
    matches_df['Candidate'] = user_eligibility["Candidate"]
    matches_df =matches_df[user_eligibility.columns]
    return(matches_df)

In [275]:
matches_df = create_matches_df(y)
matches_df

Unnamed: 0,Candidate,Panchkula,Ambala,Faridabad,Gurgaon,Panipat
0,A,0,0,0,0,0
1,B,0,1,0,0,0
2,C,0,0,1,0,0
3,D,0,0,0,1,0
4,E,1,0,0,0,0
5,F,0,0,0,0,0
6,G,0,0,0,0,1
7,H,0,1,0,0,0
8,I,0,0,0,0,1
9,J,0,0,0,0,0


The above table shows which users have been matched to which jobs. This is the optimum y matrix. 
e.g. User E has been matched to Panchkula and not matched to any other job

##### Visualizing the scoring summation :

Showing the matrix that get selected as the optimum solution : 

In [276]:
scores_array = user_preferences_0_array.copy()
scores_array[:,:] = 0

for i in range(len(users)):
    for j in range(len(jobs)):
        scores_array[i,j] = (user_preferences_0_array[i][j] + user_eligibility_array[i][j] ) * y[(i,j)].varValue

scores_array

array([[ 0,  0,  0,  0,  0],
       [ 0, 54,  0,  0,  0],
       [ 0,  0, 55,  0,  0],
       [ 0,  0,  0, 52,  0],
       [56,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 54],
       [ 0, 58,  0,  0,  0],
       [ 0,  0,  0,  0, 34],
       [ 0,  0,  0,  0,  0],
       [55,  0,  0,  0,  0],
       [ 0,  0, 57,  0,  0],
       [ 0,  0,  0, 52,  0],
       [ 0,  0,  0,  0, 34],
       [56,  0,  0,  0,  0],
       [55,  0,  0,  0,  0]], dtype=int64)

In [277]:
print(" The sum of values of this matrix i.e. ", scores_array.sum(), " is the maximum score possible for in the solution space. This is what the LPP has optimized for")

 The sum of values of this matrix i.e.  672  is the maximum score possible for in the solution space. This is what the LPP has optimized for


### Results with provided scores :

This is to show the results with the user elegibilty scores, user prefernces score and the matching 

In [278]:
## Concatenating the eligibilty and the preference scores:
results_df =  user_eligibility.copy()
for colnames  in results_df.columns[range(1,len(results_df.columns))]:
    results_df[colnames] = user_eligibility[colnames].astype('str') + "," + user_preferences[colnames].astype('str')

#####  Results with highlighted cells : 

In [279]:
def highlight_df(results_df,matches_df):
    style_df = (
            matches_df == 1                  # Compare DataFrames
    ).replace({
        True: 'background-color:yellow',  # True Styles
        False: ''                      # False Styles
    })

    highlight_df =  results_df.style.apply(lambda _: style_df, axis=None)
    return(highlight_df)

In [280]:
highlight_df(results_df,matches_df)

Unnamed: 0,Candidate,Panchkula,Ambala,Faridabad,Gurgaon,Panipat
0,A,250,240,210,220,230
1,B,440,450,410,420,430
2,C,540,530,550,510,520
3,D,240,230,210,250,220
4,E,650,640,610,620,630
5,F,250,250,220,230,220
6,G,440,430,410,420,450
7,H,840,850,810,820,830
8,I,450,440,410,420,430
9,J,350,340,310,320,330


## Caste constraints : 

Caste details can be introduced as an additional constraint to the LPP based on whatever conditions are required.

Assuming that users  A, F, J  belong to SC caste:  

In [281]:
user_SC_index =  [0,5,9]

Assuming that out of the total, **2** jobs are reserved for SC:

In [282]:
SC_required =  2

### Solving the LPP, same as above except for the additional constraint 

#### Decision variable: 

Same as above

In [283]:
prob = LpProblem("Matching Jobs", LpMaximize)
y = LpVariable.dicts("pair", [(i,j)  for i in range(len(users)) for j in range(len(jobs)) ] ,cat='Binary')

####  Maximization problem (objective function) : 

Same as above

In [284]:
prob += lpSum([ (user_preferences_0_array[i][j] + user_eligibility_array[i][j]) * y[(i,j)] for i in range(len(users)) for j in range(len(jobs)) ])

### Contraints:

An additional constraint for the caste is included here

In [285]:
## each person should be given only one job 

for i in range(len(users)):
    prob += lpSum(y[(i,j)] for j in range(len(jobs))) <= 1

## place constraints: 
for j in [0]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 4
                  
for j in [1]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 2     
                  
for j in [2]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 2     
                  
for j in [3]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 2     

for j in [4]:
    prob += lpSum(y[(i,j)] for i in range(len(users))) == 3    
    
    
## SC constraints: 

prob += lpSum([  y[(i,j)] for i in user_SC_index for j in range(len(jobs))  ]) == SC_required
    
## Solving the LPP 
prob.solve()

1

### Looking at the results:

In [286]:
matched_df_SC =  create_matches_df(y)

#results_df_SC = highlight_df(results_df,matched_df_SC)
matches_df

Unnamed: 0,Candidate,Panchkula,Ambala,Faridabad,Gurgaon,Panipat
0,A,0,0,0,0,0
1,B,0,1,0,0,0
2,C,0,0,1,0,0
3,D,0,0,0,1,0
4,E,1,0,0,0,0
5,F,0,0,0,0,0
6,G,0,0,0,0,1
7,H,0,1,0,0,0
8,I,0,0,0,0,1
9,J,0,0,0,0,0


### Plotting old and new results side by side: 

#### Plotting the old and new resutls side by side. On the left is the result without the caste constraints. On the right is the results with the caste constraints. A,F,J are the SC users and 2 jobs are reserved for SC

In [287]:
from IPython.display import display_html
from itertools import chain,cycle
def display_side_by_side(*args,titles=cycle([''])):
    html_str=''
    for df,title in zip(args, chain(titles,cycle(['</br>'])) ):
        html_str+='<th style="text-align:center"><td style="vertical-align:top">'
        html_str+=f'<h2 style="text-align: center;">{title}</h2>'
        html_str+=df.to_html().replace('table','table style="display:inline"')
        html_str+='</td></th>'
    display_html(html_str,raw=True)

display_side_by_side(highlight_df(results_df,matches_df),highlight_df(results_df,matched_df_SC))

Unnamed: 0,Candidate,Panchkula,Ambala,Faridabad,Gurgaon,Panipat
0,A,250,240,210,220,230
1,B,440,450,410,420,430
2,C,540,530,550,510,520
3,D,240,230,210,250,220
4,E,650,640,610,620,630
5,F,250,250,220,230,220
6,G,440,430,410,420,450
7,H,840,850,810,820,830
8,I,450,440,410,420,430
9,J,350,340,310,320,330

Unnamed: 0,Candidate,Panchkula,Ambala,Faridabad,Gurgaon,Panipat
0,A,250,240,210,220,230
1,B,440,450,410,420,430
2,C,540,530,550,510,520
3,D,240,230,210,250,220
4,E,650,640,610,620,630
5,F,250,250,220,230,220
6,G,440,430,410,420,450
7,H,840,850,810,820,830
8,I,450,440,410,420,430
9,J,350,340,310,320,330
