In [1]:
%%bash
rm -fv ipopt*
apt install -q -yy libgfortran5
sleep 1
wget -q https://www.ieor.iitb.ac.in/files/faculty/amahajan/ipopt
wget -q https://www.ieor.iitb.ac.in/files/faculty/amahajan/test.nl
chmod u+rx ipopt

Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
  libgfortran5
0 upgraded, 1 newly installed, 0 to remove and 37 not upgraded.
Need to get 589 kB of archives.
After this operation, 2,660 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 libgfortran5 amd64 8.4.0-1ubuntu1~18.04 [589 kB]
Fetched 589 kB in 0s (6,300 kB/s)
Selecting previously unselected package libgfortran5:amd64.
(Reading database ... (Reading database ... 5%(Reading database ... 10%(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading database ... 30%(Reading database ... 35%(Reading database ... 40%(Reading database ... 45%(Reading database ... 50%(Reading database ... 55%(Reading database ... 60%(Reading database ... 65%(Reading database ... 70%(Reading database ... 75%(Reading database ... 80%(Reading database ... 85%(Reading datab





In [2]:
!pip install -q pyomo

[K     |████████████████████████████████| 9.1 MB 13.7 MB/s 
[K     |████████████████████████████████| 49 kB 4.6 MB/s 
[?25h

In [3]:
import numpy as np
from pyomo.environ import *
import pandas as pd


####Squared residue model

Let 
P = [175000, 150000, 125000,  90000,  80000, 120000]\
E = [250, 400, 450, 100, 200,  50]\
L = [1800, 2800, 3500,  800,  700,  500]

$Let \ P_i, L_i, E_i\ \  \text{denote the i-th element of  P, L and E respectively}$

The problem can be formulated as:

Decision variables: $b_0,\ b_1,\ b_2$\
$minimize \quad \sum_{i=1}^{6} (P_i - (b_0 + b_1L_i + b_2E_i))^2$\
S.T\
$ b_0 \geq 0$

In [4]:
df = pd.read_csv('ex3.csv',index_col=0)


In [5]:
n = 4

pri = df[' Pi'].values
lot = df['Li'].values
elev = df['Ei'].values

In [6]:
m4 = ConcreteModel()
m4.b = Var(np.arange(3))
m4.b[0].setlb(0)

exp = 0
for i in range(6):
  exp += (pri[i] - (m4.b[0] + m4.b[1]*lot[i] + elev[i]*m4.b[2]))**2

In [7]:
m4.obj = Objective(expr = exp)

In [8]:
solver = SolverFactory('ipopt')
res = solver.solve(m4)

##1)

In [10]:
print('b0 =', m4.b[0]())
print('b2 =', m4.b[1]())
print('b2 =', m4.b[2]())

b0 = 104904.4164371955
b2 = 38.702075831391646
b2 = -193.3223893242956


In [11]:
print('Min sum of square residuals is ',m4.obj())

Min sum of square residuals is  4291134028.8074555


####Absolute Error model

In [12]:
m3 = ConcreteModel()
m3.b = Var(np.arange(3))
m3.z = Var(np.arange(6),domain=NonNegativeReals)
m3.b[0].setlb(0)

m3.cons = ConstraintList()
for i in range(6):
    m3.cons.add(m3.z[i]>= m3.b[0] + lot[i]*m3.b[1]+ elev[i]*m3.b[2] - pri[i])
    m3.cons.add(-m3.z[i]<= m3.b[0] + lot[i]*m3.b[1]+ elev[i]*m3.b[2] - pri[i])

m3.res = Objective(expr = summation(m3.z)) 

In [13]:
opt1 = solver.solve(m3)
# display solution
print('\nOptimal value of sum of residuals is = ', m3.res())

print('\nDecision Variables:')
for i in range(3):
    print('Value of b%d is'%i, m3.b[i].value)


Optimal value of sum of residuals is =  125454.53917274893

Decision Variables:
Value of b0 is 104181.81748910129
Value of b1 is 67.27272456364885
Value of b2 is -356.3636194183321


In [14]:
print('Max deviation in squared residue model is ',max([ abs(pri[i] - lot[i]*m4.b[1]() - elev[i]*m4.b[2]() - m4.b[0]()) for i in range(6)]))

Max deviation in squared residue model is  48762.444397373445


In [15]:
print('Max deviation in absolute error model is ',max([ abs(pri[i] - lot[i]*m3.b[1]() - elev[i]*m3.b[2]() - m3.b[0]()) for i in range(6)]))

Max deviation in absolute error model is  54272.724723622814


The maximum deviation is more in the absolute error model.

In squared residue model , the influence of the most outlying data point on the regression line is proportional to the square of the distance, while for absolute error model, it is proportional to the distance. Hence the influence is more in the squared error model and it causes the regression line to be taken closer to the outlier, thus reducing its deviation.

*(Influence in squared error model would be less if the distance was less than 1 , but that is not the case in this problem, hence case is ignored)