In [1]:
import numpy as np
import copy
import matplotlib.pyplot as plt
import time

import sys
sys.path.append('../MoitraRohatgi/')
sys.path.append('../')
import auditor_tools
import algorithms
import experiments
import examples
import our_experiments

We first read in the data from Martinez's "How Much Should We Trust the Dictator’s GDP Growth Estimates?" and evaluate the value of the coefficient of interest.

In [2]:
X,Y = our_experiments.LoadMartinezData()

In [3]:
algorithms.ols(X,Y,np.ones(len(Y)))[-1]

0.02163764726005013

Next, we use Gurobi to try to solve the problem fractionally (with a 10s time limit on the solver, though run time will be significantly longer due to the relatively high setup cost).

In [4]:
print("Integer Programming (1 min cutoff):")
timer = time.time()

# get fractional bound, also fractional weights
bound_frac, val_frac, w, model = auditor_tools.solve_regression_fractional(X,Y, 
                                time_limit=10, verbose=True)

print('time taken: ', time.time()-timer)

Integer Programming (1 min cutoff):
Set parameter Username
Academic license - for non-commercial use only - expires 2023-08-04
set residual constraints
Set parameter NonConvex to value 2
Set parameter TimeLimit to value 10
start solving
Gurobi Optimizer version 9.5.2 build v9.5.2rc0 (mac64[rosetta2])
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 7790 rows, 4106 columns and 7790 nonzeros
Model fingerprint: 0x3e6c8bb7
Model has 212 quadratic constraints
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  QMatrix range    [5e-06, 1e+03]
  QLMatrix range   [6e-02, 1e+03]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 7790 rows and 0 columns

Continuous model is non-convex -- solving as a MIP

Found heuristic solution: objective -0.0000000
Presolve removed 7790 rows and 0 columns
Presolve time: 0.08s
Presolved: 102452 rows, 29667 columns, 425278 nonzeros
Presol

In [5]:
# We can check that the fractional weights indeed set the coefficeint of interest to be negative
algorithms.ols(X,Y,[x.X for x in w])[-1]

-1.4850343177386094e-12

In [6]:
# We round all weights that are smaller than .999 to 0, i.e., effectively we round every fractional value
# (up to numerical errors) to 0; the resulting weights are not guaranteed to give a negative coefficient of
# interest, so we double-check that thereafter.
weights = [1 if x.X>.999 else 0 for x in w]

In [7]:
algorithms.ols(X,Y,weights)[-1]

-9.123871833338626e-05

What remains to be done is to check the objective; how many observations did we need to remove?

In [8]:
len(X)-sum(weights)

110

### Solutions obtained via different implementations of ZAM
We first run our implementation of the usual ZAM algorithm (upper bound of 136); then run our resolving implementation (upper bound of 110), and then run MR22's resolving implementation (upper bound of 173). Whereas our resolving implementation matches the best solution found by Gurobi, the other two do not.

In [9]:
timer = time.time()
t1,w1=auditor_tools.ZAMinfluence_upper_bound(X,Y)
print('time taken: ', time.time()-timer)

time taken:  49.583979845047


In [10]:
algorithms.ols(X,Y,w1)[-1]

-0.00013245489510183006

In [11]:
print('number of samples removed by ZAMinfluence: ',len(X)-t1)

number of samples removed by ZAMinfluence:  136


In [12]:
timer = time.time()
t2,w2=auditor_tools.ZAMinfluence_resolving(X,Y)
print('time taken: ', time.time()-timer)

time taken:  121.63717913627625


In [13]:
algorithms.ols(X,Y,w2)[-1]

-8.576076709232439e-06

In [14]:
print('number of samples removed by ZAMinfluence with resolving: ',len(X)-t2)

number of samples removed by ZAMinfluence with resolving:  110


In [15]:
X_flipped = copy.deepcopy(np.flip(X,axis=1))
algorithms.ols(X_flipped,Y,np.ones(len(Y)))[0]

0.021637647259993287

In [None]:
print("KZC21 as implemented by MR:")
timer = time.time()
print("upper bound: " + str(algorithms.sensitivity(X_flipped,Y)))
print("total time: " + str(time.time() - timer))

KZC21 as implemented by MR:
