## Homework 5 for ECON-GA 3002

**{Name, ID, Email}:** {Arnav Sood, N11193569, asood@nyu.edu}

**Overview**: Show the downward bias in the standard OLS estimate of the lag coefficient by running a large number of trials for a few (alpha, n) pairs, and then plotting. Use Julia. Optimize for speed.

**Note on Speed**: When I checked this before, it looked like code ran about twice as slowly on the Ubuntu VM as on OS X. There's also a phenomenon where I observe greater memory allocation, and quicker speed, in serial on the VM than in parallel. Notes should be factored in when considering the speed of the code.

> Define parameters. We use the `const` keyword wherever possible, because that allows the compiler to optimize (otherwise, it would have to permit any range of types and values). We also use the `global` keyword -- the downside is a global variable, but the upside is a pre-allocated 10k array.

In [1]:
const alphas = collect(0.5:0.1:0.9);
const ns = collect(50:50:500);
const beta = 1;
global biases = zeros(Float64,10000);

> Define a function to calculate the bias for one run, given parameters. Note the use of a `vals` reference in the argument signature -- this allows us to sidestep declaring a new array of zeros each time we evaluate one run.

In [2]:
function bias(alpha, n, beta,vals)
    vals[1] = randn()*100;
     for i in 2:n
        vals[i] = vals[i-1]*alpha + randn()  + beta;
     end
     xs = vals[1:n-1];
     ys = vals[2:n];
     sumx = sum(xs);
     sumy = sum(ys);
     sumxsq = round(Int,norm(xs,2)^2);
     sumxy = dot(xs,ys);
     val = (sumxy-(n-1)*(sumx/(n-1))*(sumy/(n-1)))/(sumxsq-(n-1)*((sumx/(n-1))^2)) - alpha;
     # For Type Stability -- Apparently the compiler likes this.
     return typeof(val) == Float64 ? val : 0.0;
end

bias (generic function with 1 method)

> Notice the ternary operator we use in the return statement. The reason for this construction is that the Julia compiler optimizes, in large part, based on type -- a type-stable function allows the compiler more certainty about the function's behavior, which translates into quicker code.

> Now, we define a function `aggregatebias`, which runs and aggregates the biases from 10,000 experiments.

In [3]:
function aggregatebias(alpha, n, beta)
     vals = zeros(n);
     for i in 1:10000
        biases[i] = bias(alpha, n, beta,vals);
     end
     avg = mean(biases);
     return avg;
end

aggregatebias (generic function with 1 method)

> Now, initialize and declare a `results` vector, to hold the biases across each of the 50 parameter tuples. 

In [4]:
results = zeros(Float64,50);

> Create a `main` function, which will evaluate and compile the results.  

In [5]:
function main()
   for i in 1:50
        results[i] =
        aggregatebias(alphas[round(Int,ceil(i/10))],ns[round(Int,ceil(i/5))],beta);
    end
    print(results)
    return results
end

main (generic function with 1 method)

> Run and time.

In [6]:
@time main()

[-0.004539565973038508,-0.0046865799074916215,-0.004366197099032988,-0.00468809043635621,-0.004250835423886926,-0.0033909003337137863,-0.0027969991614667416,-0.0022664471588514537,-0.002389420597869988,-0.0029123067082936055,-0.002478034698421887,-0.0027895196155986036,-0.0029707752825583944,-0.002234149283055882,-0.0028195422274161996,-0.0019975455305236574,-0.0021483171340686406,-0.0020494721299255296,-0.002366144756182333,-0.0020244696569242937,-0.0022577984874553514,-0.002012056666747344,-0.002006783360241693,-0.00251986967560708,-0.002107942624269211,-0.0021032607829150765,-0.002142027748736898,-0.002267330384641003,-0.00214932425491374,-0.0019697630134741244,-0.0019745475086671268,-0.002256311351313352,-0.0021298788456212887,-0.002022302709758398,-0.0019423070703335412,-0.0018330623496171448,-0.0020261068068975915,-0.0019270282931204853,-0.0017249174379351821,-0.0018186366982917619,-0.002039788652178863,-0.001947320651550936,-0.001919650175332911,-0.0018124499089069918,-0.0018373

50-element Array{Float64,1}:
 -0.00453957
 -0.00468658
 -0.0043662 
 -0.00468809
 -0.00425084
 -0.0033909 
 -0.002797  
 -0.00226645
 -0.00238942
 -0.00291231
 -0.00247803
 -0.00278952
 -0.00297078
  ⋮         
 -0.00172492
 -0.00181864
 -0.00203979
 -0.00194732
 -0.00191965
 -0.00181245
 -0.00183733
 -0.00179985
 -0.00167056
 -0.00174537
 -0.00196214
 -0.00165411

2.766692 seconds (2.26 M allocations: 2.177 GB, 8.94% gc time)


> Plot. [Handle this later].

### What We See

Recall what we observed in the previous assignment:

> Consistency: There is a clear asymptotic trend to 0 bias. This is because the OLS estimator is a consistent one, which means that it has exactly this property.

> Downward Bias: All the data points are negative, which confirms our supposition that the OLS estimator is downward-biased.

> Proportionality: This isn't a formal statistical term, but roughly speaking, we see that the magnitude of the bias increases with the absolute value of alpha.

> Convergence Rate: We see a logarithmic-type convergence rate, which implies diminishing returns to additional data points, and a clear steep portion to avoid when running simulation.