# Data Envelopment Analysis (DEA)

DEA is commonly used to evaluate the efficiency of a number of producers, commonly referred to as decision making units (DMUs). Each DMU uses a set of inputs to produce a set of outputs. The inputs and outputs can have very different units. DEA assumes that the inputs and outputs have been correctly identified, therefore, it is vitally important that any DEA analysis focus on correctly specifying the inputs and outputs.

The underlying assumption to DEA is that if a given DMU is capable of producing the outputs with the given inputs, the other DMUs should also be able to do the same. Otherwise, it is **inefficient**. Given multiple efficient DMUs, other DMUs could theoretically be created as a combination of the efficient DMUs to make other efficient DMUs. Doing so creates a virtual, composite DMU. DEA analysis attempts to find the "best" virtual, composite DMU for each real DMU. 

# Example 

There are three hospitals and we want to know if each one is efficient or inefficient. Each hospital uses two inputs: (1) capital, measured in the hundreds of hospital beds and (2) labor, measured in thousands of labor hours per month. The outputs produced by each hospital are: (1) hundreds of patient-days during the month for patients under the age of 14, (2) hundreds of patient-days during the month for patients between 14 and 65, and (3) hundreds of patient-days during the month for patients over 65.

The inputs and ouputs for each hospital are shwon in the table below.


| Hospital | Capital | Labor | &#124; | <=14 | 14-65 | >=65 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 5 | 14 | &#124; | 9 | 4 | 16 |
| 2 | 8 | 15 | &#124; | 5 | 7 | 10 |
| 3 | 7 | 12 | &#124; | 4 | 9 | 13 |

The efficiency of a given hospital is $$\frac{\textrm{value of hospital's outputs}}{\textrm{cost of hospital's inputs}}$$

To formulate this problem, we need to define the variables. 
| | | |
| --- | --- | --- |
| Let | | |
| $x_{i}$ | = | the "cost" of one unit of input $i$ |
| $y_{j}$ | = | the value of one unit of output $j$ |

We can now write each hospital's efficiency mathematically as follows:

$$
\textrm{Hospital 1 Efficiency} = \frac{9y_{1} + 4y_{2} + 16y_{3}}{5x_{1}+14x_{2}}
$$

$$
\textrm{Hospital 2 Efficiency} = \frac{5y_{1} + 7y_{2} + 10y_{3}}{8x_{1}+15x_{2}}
$$

$$
\textrm{Hospital 3 Efficiency} = \frac{4y_{1} + 9y_{2} + 13y_{3}}{7x_{1}+12x_{2}}
$$

The DEA approach uses the following ideas to determine if a hospital is efficient.

1. No hospital can be more than 100% efficient.
2. We attempt to choose output "prices" and input "costs" that maximize efficiency. If the efficiency equals 1, then the hospital efficient. If the efficiency is less than 1, then it is inefficient
3. To simplify computations, we may scale the inputs so that the cost of a hospital inputs equals 1.
4. Sometimes, we want to ensure that each input cost and output value/price is strictly positive.

We need to create an LP for each of the DMUs (e.g., hospitals) and solve them.

## Formulation 

### Hospital 1

| | | |
| --- | --- | --- |
| Let | | |
| $x_{1}$ | = | the "cost" of one unit of capital input |
| $x_{2}$ | = | the "cost" of one unit of labor input |
| $y_{1}$ | = | the value of one unit of <=14 patient-days output |
| $y_{2}$ | = | the value of one unit of 14-65 patient-days output |
| $y_{3}$ | = | the value of one unit of >=65 patient-days output |

| | | | | | | | | | | | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| $\max$ | $9y_{1}$ | $+$ | $4y_{2}$ | $+$ | $16y_{3}$ | |
| s.t. | $-9y_{1}$ | $-$ | $4y_{2}$ | $-$ | $16y_{3}$ | $+$ | $5x_{1}$ | $+$ | $14x_{2}$ |$\ge$ | $0$ | {Hospital 1 No More Than 100% Efficient} |
| | $-5y_{1}$ | $-$ | $7y_{2}$ | $-$ | $10y_{3}$ | $+$ | $8x_{1}$ | $+$ | $15x_{2}$ |$\ge$ | $0$ | {Hospital 2 No More Than 100% Efficient} |
| | $-4y_{1}$ | $-$ | $9y_{2}$ | $-$ | $13y_{3}$ | $+$ | $7x_{1}$ | $+$ | $12x_{2}$ |$\ge$ | $0$ | {Hospital 3 No More Than 100% Efficient} |
| | | | | | | | $5x_{1}$ | $+$ | $14x_{2}$ |$=$ | $1$ | {Total Input Cost of 1} |

### Hospital 2
| | | | | | | | | | | | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| $\max$ | $5y_{1}$ | $+$ | $7y_{2}$ | $+$ | $10y_{3}$ | |
| s.t. | $-9y_{1}$ | $-$ | $4y_{2}$ | $-$ | $16y_{3}$ | $+$ | $5x_{1}$ | $+$ | $14x_{2}$ |$\ge$ | $0$ | {Hospital 1 No More Than 100% Efficient} |
| | $-5y_{1}$ | $-$ | $7y_{2}$ | $-$ | $10y_{3}$ | $+$ | $8x_{1}$ | $+$ | $15x_{2}$ |$\ge$ | $0$ | {Hospital 2 No More Than 100% Efficient} |
| | $-4y_{1}$ | $-$ | $9y_{2}$ | $-$ | $13y_{3}$ | $+$ | $7x_{1}$ | $+$ | $12x_{2}$ |$\ge$ | $0$ | {Hospital 3 No More Than 100% Efficient} |
| | | | | | | | $8x_{1}$ | $+$ | $15x_{2}$ |$=$ | $1$ | {Total Input Cost of 1} |

### Hospital 3
| | | | | | | | | | | | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| $\max$ | $4y_{1}$ | $+$ | $9y_{2}$ | $+$ | $13y_{3}$ | |
| s.t. | $-9y_{1}$ | $-$ | $4y_{2}$ | $-$ | $16y_{3}$ | $+$ | $5x_{1}$ | $+$ | $14x_{2}$ |$\ge$ | $0$ | {Hospital 1 No More Than 100% Efficient} |
| | $-5y_{1}$ | $-$ | $7y_{2}$ | $-$ | $10y_{3}$ | $+$ | $8x_{1}$ | $+$ | $15x_{2}$ |$\ge$ | $0$ | {Hospital 2 No More Than 100% Efficient} |
| | $-4y_{1}$ | $-$ | $9y_{2}$ | $-$ | $13y_{3}$ | $+$ | $7x_{1}$ | $+$ | $12x_{2}$ |$\ge$ | $0$ | {Hospital 3 No More Than 100% Efficient} |
| | | | | | | | $7x_{1}$ | $+$ | $12x_{2}$ |$=$ | $1$ | {Total Input Cost of 1} |


In [None]:
# import necessary modules/packages
import gurobipy as gp
from gurobipy import GRB

import pandas as pd

In [None]:
# define a function to return the sensitivity analysis for the variables
def get_SA_vars(the_model):
    ''' 
    This is a helper function that collects all the sensitivity analysis 
    for the "variable section" that you would see in the sensitivity
    report from Excel and returns it a as pandas DataFrame.

    the_model : an instance of gp.Model 

    returns a pandas DataFrame
    '''
    var_sensitivity =[]
    for v in the_model.getVars():
        var_sensitivity.append([v.VarName, v.X, v.RC, v.Obj, v.SAObjLow, v.SAObjUp])

    retValue = pd.DataFrame(var_sensitivity)
    retValue.columns = ['variable', 'final_value', 'reduced_cost', 'obj_fn_coeff', 'range_opt_low', 'range_opt_up']

    return retValue

In [None]:
# define a function to return the sensitivity analysis for the variables
def get_SA_constraints(the_model):
    constr_sensitivity = []
    for c in the_model.getConstrs():
        constr_sensitivity.append([c.constrName, c.RHS, the_model.getRow(c).getValue(), c.Slack, c.pi, c.SARHSLow, c.SARHSUp])

    retValue = pd.DataFrame(constr_sensitivity)
    retValue.columns = ['constraint', 'RHS', 'final_value', 'slack', 'shadow_price', 'range_feasibility_low', 'range_feasibility_up']

    return retValue

In [None]:
####                ####
#      HOSPITAL 1      #
####                ####
# Create the model object
h1 = gp.Model('hospital_1')

# Specify how to optimize and time limit (seconds)
h1.ModelSense = GRB.MAXIMIZE

# You can set the time limit for the solving
# unnecessary for this small problem
#m.setParam('TimeLimit', 600)

# Create decision variables
# We tell the solver that the variables are continuous,
#   their names, and their lower bounds
# INPUTS
x_1 = h1.addVar(vtype=GRB.CONTINUOUS, name='capital', lb=0.0)
x_2 = h1.addVar(vtype=GRB.CONTINUOUS, name='labor', lb=0.0)

# OUTPUTS
y_1 = h1.addVar(vtype=GRB.CONTINUOUS, name='under_14', lb=0.0)
y_2 = h1.addVar(vtype=GRB.CONTINUOUS, name='14_65', lb=0.0)
y_3 = h1.addVar(vtype=GRB.CONTINUOUS, name='over_65', lb=0.0)

# Add the objective function
h1.setObjective(9*y_1 + 4*y_2 + 16*y_3)

# Add the constraints
# We can simply write out the constraints for the first parameter
# The second parameter names the constraint
h1.addConstr(5*x_1 + 14*x_2 - 9*y_1 - 4*y_2 - 16*y_3 >= 0, name='h1_no_more_100%')
h1.addConstr(8*x_1 + 15*x_2 - 5*y_1 - 7*y_2 - 10*y_3 >= 0, name='h2_no_more_100%')
h1.addConstr(7*x_1 + 12*x_2 - 4*y_1 - 9*y_2 - 13*y_3 >= 0, name='h3_no_more_100%')
h1.addConstr(5*x_1 + 14*x_2 == 1, name='total_cost_1')

# update the model
h1.update()

# solve
h1.optimize()

### Hospital 1 Efficient?

Yes! We seee that the objective function is 1, meaning that hospital 1 is efficient.

In [None]:
# get sensitivty for variables
get_SA_vars(h1)

In [None]:
# Get sensitivity for constraints
get_SA_constraints(h1)

In [None]:
####                ####
#      HOSPITAL 2      #
####                ####
# Create the model object
h2 = gp.Model('hospital_2')

# Specify how to optimize and time limit (seconds)
h2.ModelSense = GRB.MAXIMIZE

# Create decision variables
# We tell the solver that the variables are continuous,
#   their names, and their lower bounds
# INPUTS
x_1 = h2.addVar(vtype=GRB.CONTINUOUS, name='capital', lb=0.0)
x_2 = h2.addVar(vtype=GRB.CONTINUOUS, name='labor', lb=0.0)

# OUTPUTS
y_1 = h2.addVar(vtype=GRB.CONTINUOUS, name='under_14', lb=0.0)
y_2 = h2.addVar(vtype=GRB.CONTINUOUS, name='14_65', lb=0.0)
y_3 = h2.addVar(vtype=GRB.CONTINUOUS, name='over_65', lb=0.0)

# Add the objective function
h2.setObjective(5*y_1 + 7*y_2 + 10*y_3)

# Add the constraints
# We can simply write out the constraints for the first parameter
# The second parameter names the constraint
h2.addConstr(5*x_1 + 14*x_2 - 9*y_1 - 4*y_2 - 16*y_3 >= 0, name='h1_no_more_100%')
h2.addConstr(8*x_1 + 15*x_2 - 5*y_1 - 7*y_2 - 10*y_3 >= 0, name='h2_no_more_100%')
h2.addConstr(7*x_1 + 12*x_2 - 4*y_1 - 9*y_2 - 13*y_3 >= 0, name='h3_no_more_100%')
h2.addConstr(8*x_1 + 15*x_2 == 1, name='total_cost_1')

# update the model
h2.update()

# solve
h2.optimize()

### Hospital 2 Efficient?

You tell me

In [None]:
# get sensitivity for variables
get_SA_vars(h2)

In [None]:
# Get sensistivity for constraints
get_SA_constraints(h2)

In [None]:
####                ####
#      HOSPITAL 3      #
####                ####
# Create the model object
h3 = gp.Model('hospital_3')

# Specify how to optimize
h3.ModelSense = GRB.MAXIMIZE

# Create decision variables
# We tell the solver that the variables are continuous,
#   their names, and their lower bounds
# INPUTS
x_1 = h3.addVar(vtype=GRB.CONTINUOUS, name='capital', lb=0.0)
x_2 = h3.addVar(vtype=GRB.CONTINUOUS, name='labor', lb=0.0)

# OUTPUTS
y_1 = h3.addVar(vtype=GRB.CONTINUOUS, name='under_14', lb=0.0)
y_2 = h3.addVar(vtype=GRB.CONTINUOUS, name='14_65', lb=0.0)
y_3 = h3.addVar(vtype=GRB.CONTINUOUS, name='over_65', lb=0.0)

# Add the objective function
h3.setObjective(4*y_1 + 9*y_2 + 13*y_3)

# Add the constraints
# We can simply write out the constraints for the first parameter
# The second parameter names the constraint
h3.addConstr(5*x_1 + 14*x_2 - 9*y_1 - 4*y_2 - 16*y_3 >= 0, name='h1_no_more_100%')
h3.addConstr(8*x_1 + 15*x_2 - 5*y_1 - 7*y_2 - 10*y_3 >= 0, name='h2_no_more_100%')
h3.addConstr(7*x_1 + 12*x_2 - 4*y_1 - 9*y_2 - 13*y_3 >= 0, name='h3_no_more_100%')
h3.addConstr(7*x_1 + 12*x_2 == 1, name='total_cost_1')

# update the model
h3.update()

# solve
h3.optimize()

### Hospital 3 Efficient?

You tell me

In [None]:
# get sensitivity for variables
get_SA_vars(h3)

In [None]:
# Get sensistivity for constraints
get_SA_constraints(h3)

## Looking at Inefficiency

We saw that hospital 2 was inefficient, i.e., the objective function value was less than 1. What can we say about its inefficiency?

We would like to create composite hospital by combining the efficient hospitals is some fashion. We can do this by looking at the nonzero "shadow prices" (also called dual values) of the other hospitals. Let's pull that information out again to see it.

In [None]:
# Get sensistivity for constraints storing it in a variable
h2_sens = get_SA_constraints(h2)
h2_sens

## Creating a Composite Hospital

Both hospital 1 and hospital 2 have nonzero shadow prices. We can use the absolute values of these shadow prices as weights to get a composite hospital. We want to use those values to get an averaged output vector and averaged input vector.

**&copy; 2024 - Present: Matthew D. Dean, Ph.D.   
Clinical Full Professor of Business Analytics at William \& Mary.**