<font size=18>Project 1: Report</font>

Use this Jupyter notebook to summarize the details of this project organized in the following sections. Note, there is also a presentation notebook that accompanies this project. 

The file `Airfares.xlsx` contains real data that were collected between Q3-1996 and Q2-1997. The first sheet contains variable descriptions while the second sheet contains the data.  A csv file of the data is also provided (called *Airfares.csv*).

**To get full credit your code should all run and produce correct answers if the data in the file `Airfares.xlsx` is changed**. That means you can't type in coefficients for your linear models, but will have to store them in variables instead.

# **P1.1** - Introduction

Summarize the problem statement, establishing the context and methods used in this project. (Write an introduction that says what you're going to do and how you're going to do it!)

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
The objective of this project is to identify the optimal values for controllable traits of airline routes to maximize the average fare paid by passengers. Linear programming methods will be used to provide an actionable prescriptive solution. Linear programs are bound by constraints that define the feasible range of solutions. Some of the constraints for the problem will be well defined by the business needs of the airline. Some will rely on a predictive solution to help define the criteria we are likely to observe in the population of interest. Multiple linear regression will be employed to predict several values that will inform the linear programming model. 
</font>

# **P1.2** - Linear Regression Models

Provide a brief summary of the linear regression models used to estimate coefficients that will be used in the linear programming problem.  Explain why the multiple regression equations had to be fitted through the origin (consider the assumptions of linear programming).

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

In [1]:
# code for linear regression models goes here

from sklearn.linear_model import LinearRegression    
import pandas as pd

# read in data
airfares = pd.read_csv("data/Airfares.csv")

# function for modularity
def fareModel(resp, preds):
    
    lmodel = LinearRegression(fit_intercept = False)
    lmodel.fit(preds, resp)
    coefs = {preds.columns[coef].lower():float(lmodel.coef_[coef]) for coef in range(0,len(lmodel.coef_))}

    return coefs


<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
A simple multiple linear regression technique was employed to predict the average fare for the route, the number of passengers, the starting city's average personal income, and the ending city's average personal income using the average number of coupons, the Herfindel Index, and the distance between endpoint airports as inputs. The models had to be fitted through the origin in order to maintain the assumption of proportionality that cannot be violated for a viable linear programming model.
</font>

# **P1.3** - Optimal LP Solution

The optimal value of the airfare and for which values of COUPON, HI, and DISTANCE it occurs. 

<font color = "blue"> *** 8 points -  answer in cell below *** (don't delete this cell) </font>

In [2]:
# code for Pyomo and nicely formatted output goes here
from pyomo.environ import *

fares = fareModel(airfares['FARE'], airfares[['COUPON', 'HI', 'DISTANCE']])
pax = fareModel(airfares['PAX'], airfares[['COUPON', 'HI', 'DISTANCE']])
sIncome = fareModel(airfares['S_INCOME'], airfares[['COUPON', 'HI', 'DISTANCE']])
eIncome = fareModel(airfares['E_INCOME'], airfares[['COUPON', 'HI', 'DISTANCE']])

# construct model
model = ConcreteModel()

# decision variables
model.coupon= Var(domain=NonNegativeReals)
model.hi = Var(domain=NonNegativeReals)
model.distance = Var(domain=NonNegativeReals)

# objective function
model.max_fares = Objective(expr = (fares['coupon'] * model.coupon) + (fares['hi'] * model.hi) + (fares['distance'] * model.distance) , sense=maximize)

# constraint list
model.paxC = Constraint(expr = (pax['coupon'] * model.coupon) + (pax['hi'] * model.hi) + (pax['distance'] * model.distance) <= 20000)
model.sIncomeC = Constraint(expr = (sIncome['coupon'] * model.coupon) + (sIncome['hi'] * model.hi) + (sIncome['distance'] * model.distance) <= 30000)
model.eIncomeC = Constraint(expr = (eIncome['coupon'] * model.coupon) + (eIncome['hi'] * model.hi) + (eIncome['distance'] * model.distance) >= 30000)
model.couponC = Constraint(expr = model.coupon <= 1.5)
model.hiC1 = Constraint(expr = model.hi >= 4000)
model.hiC2 = Constraint(expr = model.hi <= 8000)
model.distanceC1 = Constraint(expr = model.distance >= 500)
model.distanceC2 = Constraint(expr = model.distance <= 1000)

# solve and display
solver = SolverFactory('glpk')
solver.solve(model)

# display solution
print(f"Max fare = ${model.max_fares():,.2f}\n")
print(f"Optimal Coupons: {model.coupon():.2f}")
print(f"Optimal HI: {model.hi():.2f}")
print(f"Optimal Distance: {model.distance():.2f}")


Max fare = $203.55

Optimal Coupons: 1.14
Optimal HI: 8000.00
Optimal Distance: 1000.00


# **P1.4** - Sensitivity Report

From the sensitivity report, explain which constraints are binding for the number of passengers on that route (PAX), the starting city’s average personal income (S_INCOME), and the ending city’s average personal income (E_INCOME). If the constraint is binding, interpret the shadow price in the context of the problem.  If the constraint is not binding, interpret the slack in the context of the problem.

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

In [3]:
# code to generate and display sensitivity report goes here
# write the model to a sensitivity report
model.write('model.lp', io_options={'symbolic_solver_labels': True})
!glpsol -m model.lp --lp --ranges sensit.sen
# widen browser and/or close TOC to see sensitivity report
import numpy as np
np.set_printoptions(linewidth=110)
f = open('sensit.sen', 'r')
file_contents = f.read()
print(file_contents)
f.close()

GLPSOL: GLPK LP/MIP Solver, v4.65
Parameter(s) specified in the command line:
 -m model.lp --lp --ranges sensit.sen
Reading problem data from 'model.lp'...
9 rows, 4 columns, 15 non-zeros
56 lines were read
GLPK Simplex Optimizer, v4.65
9 rows, 4 columns, 15 non-zeros
Preprocessing...
2 rows, 3 columns, 6 non-zeros
Scaling...
 A: min|aij| =  1.020e+00  max|aij| =  2.091e+04  ratio =  2.050e+04
GM: min|aij| =  7.309e-01  max|aij| =  1.368e+00  ratio =  1.872e+00
EQ: min|aij| =  5.342e-01  max|aij| =  1.000e+00  ratio =  1.872e+00
Constructing initial basis...
Size of triangular part is 2
      0: obj =   8.885866366e+01 inf =   2.215e+04 (1)
      3: obj =   1.739717779e+02 inf =   0.000e+00 (0)
*     4: obj =   2.035540468e+02 inf =   0.000e+00 (0)
OPTIMAL LP SOLUTION FOUND
Time used:   0.0 secs
Memory used: 0.0 Mb (40412 bytes)
Write sensitivity analysis report to 'sensit.sen'...
GLPK 4.65 - SENSITIVITY ANALYSIS REPORT                                                                   

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
The S_INCOME constraint is binding. It has a shadow price of .00108, indicating that an increase of 1 will increase the maximum fare by 0.00108.  The PAX and E_INCOME constraints are non-binding. PAX has a slack of 7938.24088, indicating that the constraint could increase by that amount before becoming binding. E_INCOME's upper bound is infinity.  It could increase indefinitely without effecting the optimal solution.

</font>

# **P1.5** - Allowable Ranges

Interpret the allowable ranges (objective coefficient range) for COUPON, HI, and DISTANCE in the context of the problem.

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
The allowable ranges for COUPON, HI, and DISTANCE are
<table>
    <th>Variable</th><th>Range</th>
    <tr>
        <td>COUPON</td><td>0 to 221.32046</td>
    </tr>
    <tr>
        <td>HI</td><td>.00120 to infinity</td>
    </tr>
    <tr>
        <td>DISTANCE</td><td>-.00306 to infinity</td>
    </tr>
    </table>
indicating the ranges for the coefficients over which the optimal solution remains optimal. Exceeding the range will change the optimal solution.
</font>

# **P1.6** - Conclusion

Briefly summarize the main conclusion of this project, state what you see as any limitations of the methods used here, and suggest other possible methods of addressing the maximizing of airfare in this problem scenario.

<font color = "blue"> *** 7 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
<p>The model suggests an optimal value of 1.14 for average number of coupons, the maximum value allowed for the Herfindel Index (4000), and the maximum value allowed for distance between endpoint airports (8000). Using these parameters will yield a new average fare of \$203.55. The improvement over the previous average fare in the provided data (\$160.88) is substantial.
</p>
<p>
An increase in average fare does not guarantee an increase in total profit. A more comprehensive study should be conducted factoring in the opportunity cost of servicing only routes yielding a maximum average fare, taking into consideration operating costs and overhead. It may not be reasonable to only operate on routes between airports that are 1000 miles apart or have the optimal Herfindel Index. This model is likely an oversimplification of the business needs, and should be refactored to incorporate more constraints informed by domain expertise. 
    </p>
</font>

# **P1.7** - Appendix

Show the mathematical formulation for the linear programming problem used in this project.

You can either use LaTeX and markdown or take a clean, cropped picture of neatly handwritten equations and drag-n-drop it here.

<font color = "blue"> *** 5 points -  answer in cell below *** (don't delete this cell) </font>

<font color = "green">
Let $a_1$, $a_2$, $a_3$ = the coefficients generated from the linear regression model predicting FARE for COUPON, HI, and DISTANCE, respectively.<br><br>
Let $b_1$, $b_2$, $b_3$ = the coefficients generated from the linear regression model predicting PAX for COUPON, HI, and DISTANCE, respectively.<br><br>
Let $c_1$, $c_2$, $c_3$ = the coefficients generated from the linear regression model predicting S_INCOME for COUPON, HI, and DISTANCE, respectively.<br><br>
Let $d_1$, $d_2$, $d_3$ = the coefficients generated from the linear regression model predicting E_INCOME for COUPON, HI, and DISTANCE, respectively.<br><br>
Let $C$ = COUPON<br>
Let $H$ = HI<br>
Let $D$ = DISTANCE<br>
Let $Z$ = Maximum average fare<br>
    
---
    
Objective:
    
Maximize $Z = a_1C + a_2H + a_3D$ 

subject to 

$ b_1C + b_2H + b_3D \leq 20000$
    
$ c_1C + c_2H + c_3D \leq 30000$

$ d_1C + d_2H + d_3D \geq 30000$
    
COUPON $\leq 1.5$
    
$4000 \leq$ HI $\leq 8000$
    
$500 \leq$ DISTANCE $\leq 1000$
    
and $C, H, D \geq 0$
</font>