## OLS Regression Modeling in R

As you know, regression modeling is quite powerful. In this lab, we will walk through how to create well-designed tables to share your findings from multiple models with your audience. 

As always, let's start by installing necessary packages and loading them. 

In [None]:
# Install the required packages if not already installed 

#install.packages('pacman')

# let's load your packages in the R session
pacman::p_load(tidyverse, tidymodels, modelsummary, stargazer)


Now, load the data from GitHub. Let's again use our `fake.csv` dataset.

In [None]:
fake <- read_csv("https://raw.githubusercontent.com/bowendc/510_labs/main/fake.csv")

Let's create our regression models. We haven't done much of this so far in the class, but you can adjust for confounder variables by adding a new term for that variable inside of `lm()`. Here, we store each set of model results using the terms `m1`, `m2`, and `m3`.

In [None]:
m1 <- lm(y ~ x, data = fake)
m2 <- lm(y ~ x + z, data = fake)
m3 <- lm(y ~ x + z + w, data = fake)

The `tidymodels` package contain some useful functions to evaluate regression results. Here we use `tidy` to view the model estimates and hypothesis tests and `glance` to view goodness-of-fit statistics. For comparison, we can see how `summary` presents the output as well. 

In [None]:
tidy(m1)
glance(m1)
summary(m1)

### Creating regression tables using `modelsummary` and `stargazer`

Regression results are typically presented, at least in academic circles, using regression tables. Typically, we display the results of multiple models in the same table. Let's walk through how to do this using `modelsummary`. 

In [None]:
models <- list(m1,m2,m3) # you could put list(m1,m2,m3) in the model summary function below 
modelsummary(models)

This appears to work just fine, but we can keep tweaking the display. Let's place all of our statistics in a single cell:

In [None]:
# the curly brackets are referring to our statistics. Estimate is the coef, 
#   stars denotes p-value thresholds, and std. error is the standard error 
#   for the slope. By default, there are two rows, one for estimates (coefs)
#   and one for statistics (t scores, CIs, or standard errors).

modelsummary(models,
              estimate = "{estimate}{stars} ({std.error})", 
              statistic = NULL)

Great! Now let's rename the models. Let's also place the S.E.s into different cells, again organized horizontally rather than vertically. 

In [None]:
models <- list("First Model" = m1, "Second Model" = m2, "Third Model" = m3)
modelsummary(models,
              estimate = c("Coef." = "{estimate}{stars}"), # renames Estimate to Coef.
              statistic = c("S.E." = "({std.error})"),     # renames statistic to S.E.
              shape = term ~ model + statistic)           # moves statistic in to column
                                                          # instead of row. syntax: 
                                                          # table rows item ~ columns item(s). 
                                                          # This will define columns first by model
                                                          # and then by statistic.

Finally, we can rename our variables (just in the table), add a title, and export to a csv file. 

In [None]:
modelsummary(models,
              coef_rename = c("x" = "Some X",              # renames vars in table
                              "w" = "A W",
                              "z" = "This Z",
                              "(Intercept)" = "Constant"),
              estimate = c("Coef." = "{estimate}{stars}"), # renames columns
              statistic = c("S.E." = "({std.error})"),     
              shape = term ~  model + statistic,      
              title = "Model Results, OLS",                # adds title
              output = "table.csv")                        # saves to csv file

If you don't like `modelsummary`, or it does not accept the model type that you need to present, consider using `stargazer`. Here is an example. 

In [None]:
stargazer(m1,m2,m3,
            type = "text",
            title = "Model Results, OLS",
            single.row = TRUE)            # places the SEs in the same row as the coefs