In [1]:
library(magrittr)
library(MASS)

## Refactoring

Refactoring code is a very important concept in programming. To "refactor" means to change the implementation of some piece of code, without changing the functionality. 

If your goal with writing code is simply to "get something to work" (once), then refactoring is not useful. If, however, your goal is to write elegant code which can be used reliable, extended, and modified, then you will always need to refactor! 

In [2]:
# We will refactor the code from the "Functions in R" lecture. 

# Using the ideas from the slides, make "generate_data" a polymorphic function.
# It should operate on two classes: "univariate_params" and "multivariate_params"
# and two separate "methods" should be implemented for each of the classes 

In [272]:
# The implementation of the class is up to you! So write the params class to
# match your design. 

# beta should be c(1,2,1)
# mean of X's should be c(0,0,0)
# sd of X's should be c(.2,.5,.3)
# noise sd should be .5

# params <- list(c(1,2,1),c(0,0,0), c(.2,.5,.3),.5)
# class(params) <- "multivariate_params"

params <- list(c(1,2,1),.5)
class(params) <- "univariate_params"

library(MASS)

generate_data <- function(params, N) {
    UseMethod("generate_data")
}    

generate_data.multivariate_params <- function(params, N) { 
    beta <- params[[1]]
    sd <- params[[4]]
    x <- MASS::mvrnorm(N, params[[2]], diag(params[[3]]))
    eps <- rnorm(N, 0, sd)
    y <- x %*% beta + eps
    list(x=x, y=y)
}

generate_data.univariate_params <- function(params, N) {
    beta <- params[[1]]
    sd <- params[[2]]
    x <- rnorm(N, 0, 1)
    eps <- rnorm(N, 0, sd)
    y <- beta*x + eps
    list(x = x, y = y)
}

In [273]:
calc_coef <- function(params, y, x) {
    UseMethod("calc_coef")
}

calc_coef.multivariate_params <- function(params, y,x) lm(y~x -1)$coef

calc_coef.univariate_params <- function(params, y,x) cov(x,y) / var(x)

In [274]:
# calc_se <- function(params, y, x, coef) {
#     UseMethod("calc_se")
# }

# calc_se.multivariate_params <- function(params, y, x, coef) {
#     n <- length(y)
#     eps <- y - x %*% coef
#     e_sd <- mean(eps^2)
#     se <- sqrt(e_sd / diag(n*var(x)))
#     se
# }

# calc_se.univariate_params <- function(params, y, x, coef) {
#     n <- length(y)
#     eps <- y - x * coef
#     e_sd <- mean(eps^2)
#     se <- sqrt(e_sd / diag(n*var(x)))
#     se
# }

calc_se <- function(params, y, x, coef) {
    n <- length(y)
    eps <- y - x * coef
    e_sd <- mean(eps^2)
    se <- sqrt(e_sd / diag(n*var(x)))
    se
}


In [275]:
run_regression <- function(y, x) {
    coef <- calc_coef(params, y, x)
    se <- calc_se(y, x, coef)
#     se <- calc_se(params, y, x, coef)
    list(coef=coef, se=se)
}

In [276]:
eval_model <- function(coef, se, beta, conf = 1.96) {
    up <- coef + se*conf
    down <- coef - se*conf
    beta > down & beta < up
}

In [277]:
# simulate <- function(N, beta, params, sd) {
#     d <- generate_data(N, beta, params, sd)
#     m <- run_regression(d$y, d$x)
#     eval_model(m$coef, m$se, beta)
# }
simulate <- function(N, params) {
    d <- generate_data(params, N)
    m <- run_regression(d$y, d$x)
    eval_model(m$coef, m$se, params[[1]])
}

In [278]:
# avg_simulations <- function(M, N, beta, params, sd) {
#     inside <- sapply(1:M, function (x) {
#         simulate(N, beta, params, sd)
#     })
#     sum(inside) / M / 3
# }
avg_simulations <- function(M, N, params) {
    inside <- sapply(1:M, function (x) {
        simulate(N, params)
    })
    sum(inside) / M / 3
}

In [279]:
a <- avg_simulations(1000, 20, params)
stopifnot(round(a, 1) == .9)

“longer object length is not a multiple of shorter object length”

ERROR: Error in calc_se(y, x, coef): argument "coef" is missing, with no default


In [239]:
a <- avg_simulations(1000, 500, params)
stopifnot(a > .92)

In [None]:
# BONUS

# The calc_coef and calc_se functions, in their multivariate form, are 
# naturally a generalization, and work for univariate data do. 

# But let's pretend they're not. 

# Write them as polymorphic functions, such that when you simulation gets called
# with univariate data, the (from-the-slides) univariate formula gets called, 
# and the (from-your-exersizes) multivariate formula gets called when 
# multivariate parameters are given to the simulations. 

# Note: be creative in your design! Add more classes if you think it helps!
# Think about which functions you want to be class-agnostic and which to be
# class-aware! 