# Ordinary least squares (OLS) regressions and stargazer tables using R

**Note:** This computational notebook replicates and extends some of the findings of the following paper: Ingram, M. C., & da Costa, M. M. (2019). [Political geography of violence: Municipal politics and homicide in Brazil](https://doi.org/10.1016/j.worlddev.2019.06.016). World Development, 124, 104592. Replication materials are available at https://doi.org/10.7910/DVN/NX5QIU

## Install and load libraries

In [2]:
install.packages(c("sf", "stargazer"))
# Note: Installing sf may take about 10 minutes

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘proxy’, ‘e1071’, ‘wk’, ‘classInt’, ‘s2’, ‘units’




In [3]:
# Load libraries
library(tidyverse) # For data manipulation (dplyr, readr, etc.)
library(sf)      # For spatial data (replaces geopandas)
library(stargazer) # For creating regression tables

# Suppress warnings (optional, equivalent to Python's warnings.filterwarnings('ignore'))
options(warn=-1)

Linking to GEOS 3.11.1, GDAL 3.6.4, PROJ 9.1.1; sf_use_s2() is TRUE


Please cite as: 


 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 




## List of variables

| Name            | Label   | Description                                                                                                                                                                      |
|-----------------|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| HR Change       | DfHRElc | Difference between the two-year average of homicide rates in 2011-2012 and the two-year average in 2007-2008                                                                     |
| Margin of Victory | margin  | Margin of victory in mayoral election, calculated as the difference between the percentage of votes obtained by the winner of the election and the percentage of votes obtained by the second place candidate |
| Abstention      | Abstntn | Percent of voters that abstained from three consecutive elections as of 2010                                                                                                     |
| Alignment       | stalign | Party alignment of mayor and governor (1 if the party of the mayor elected in 2008 was the same as the party of the governor; 0 otherwise)                                    |
| PT              | PT      | Mayors from PT (Workers’ Party, PT)                                                                                                                                                                  |
| PSDB            | PSDB    | Mayors from PSDB                                                                                                                                                                 |
| PMDB            | PMDB    | Mayors from PMDB                                                                                                                                                                 |
| PopDensity      | lppdnst | Total population divided by territorial area covered by the municipality (logged)                                                                                               |
| YoungMalePct    | lpctppy | Percent of the population consisting of males ages $15-29$ (logged)                                                                                                              |
| GINI            | GINI    | GINI index (continuous variable; 0 for perfect income equality; 1 for totally concentrated income)                                                                               |
| HDI             | IDHM    | Municipal Human Development Index (continuous variable; 0 low development; 1 high development)                                                                                    |
| SingleMotherHH  | HHsinpr | Percent of households headed by mothers with no education and a child below 15 years old                                                                                          |
| Employment      | Ocp18ml | Percent of residents age 18 or over who are employed (i.e., adult employment rate)                                                                                              |
| BolsaFamilia   | CoverBF | Percent of poor families eligible for Bolsa Familia who are actually covered by Bolsa Familia (i.e., coverage rate)                                                           |


## Load data

In [4]:
# Read the GeoJSON file directly using the sf package
# This creates a simple features (sf) data frame, similar to a GeoDataFrame
gdf <- st_read("https://gist.github.com/cmg777/fe858a48ff7191b9c2a3aff7d6ddfd6f/raw/858272ce03d1b818429df3e7cead9e03ef7ec3b9/homicidesBRA_WD_20190318.geojson")

# Display the first few rows and structure (similar to Python's gdf.head() and gdf.info())
# print(head(gdf))
# print(str(gdf))

Reading layer `homicidesBRA_WD_20190318' from data source 
  `https://gist.github.com/cmg777/fe858a48ff7191b9c2a3aff7d6ddfd6f/raw/858272ce03d1b818429df3e7cead9e03ef7ec3b9/homicidesBRA_WD_20190318.geojson' 
  using driver `GeoJSON'
Simple feature collection with 5562 features and 48 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -73.99094 ymin: -33.75158 xmax: -34.79333 ymax: 5.272156
Geodetic CRS:  WGS 84


## OLS regressions

In [5]:
mod1 <- lm(DfHRElc ~ margin + stalign + Abstntn + PMDB + PSDB + PT + lppdnst + lpctppy + GINI + IDHM + HHsinpr + Ocp18ml + CoverBF, data=gdf)
mod2 <- lm(DfHRElc ~ margin + stalign + Abstntn + PMDB             + lppdnst + lpctppy + GINI + IDHM + HHsinpr + Ocp18ml + CoverBF, data=gdf)
mod3 <- lm(DfHRElc ~ margin + stalign + Abstntn        + PSDB      + lppdnst + lpctppy + GINI + IDHM + HHsinpr + Ocp18ml + CoverBF, data=gdf)
mod4 <- lm(DfHRElc ~ margin + stalign + Abstntn               + PT + lppdnst + lpctppy + GINI + IDHM + HHsinpr + Ocp18ml + CoverBF, data=gdf)

In [None]:
# You can view model summaries individually
# summary(mod1)

In [15]:
stargazer(mod1, mod2, mod3, mod4,
          type = "text",
          title="Regression Results",
          align=TRUE,
          covariate.labels=c("Margin of Victory", "State Alignment", "Abstention", "PMDB Mayor", "PSDB Mayor", "PT Mayor",
                              "Pop Density (log)", "Young Male Pct (log)", "GINI Index", "HDI", "Single Mother HH",
                              "Employment Rate", "Bolsa Familia Coverage", "Intercept"),
          no.space=TRUE # Reduces vertical space in the table
         )


Regression Results
                                                               Dependent variable:                                         
                       ----------------------------------------------------------------------------------------------------
                                                                     DfHRElc                                               
                                 (1)                       (2)                      (3)                      (4)           
---------------------------------------------------------------------------------------------------------------------------
Margin of Victory               -0.730                   -0.702                    -0.707                   -0.744         
                               (0.807)                   (0.807)                  (0.807)                  (0.808)         
State Alignment                 -0.765                   -0.812*                   -0.509                   -0.6

In [7]:
# Generate and print the regression table in LaTeX format
# type = "latex" generates LaTeX code
# header = FALSE prevents extra LaTeX preamble/document tags
stargazer(mod1, mod2, mod3, mod4,
          type = "latex",
          header = FALSE,
          title="Regression Results",
          align=TRUE,
          dep.var.labels=c("Change in Homicide Rate (DfHRElc)"),
          no.space=TRUE
         )


\begin{table}[!htbp] \centering 
  \caption{Regression Results} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lD{.}{.}{-3} D{.}{.}{-3} D{.}{.}{-3} D{.}{.}{-3} } 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{4}{c}{\textit{Dependent variable:}} \\ 
\cline{2-5} 
\\[-1.8ex] & \multicolumn{4}{c}{Change in Homicide Rate (DfHRElc)} \\ 
\\[-1.8ex] & \multicolumn{1}{c}{(1)} & \multicolumn{1}{c}{(2)} & \multicolumn{1}{c}{(3)} & \multicolumn{1}{c}{(4)}\\ 
\hline \\[-1.8ex] 
 margin & -0.730 & -0.702 & -0.707 & -0.744 \\ 
  & p = 0.366 & p = 0.385 & p = 0.382 & p = 0.357 \\ 
  stalign & -0.765 & -0.812^{*} & -0.509 & -0.613 \\ 
  & p = 0.135 & p = 0.085 & p = 0.304 & p = 0.189 \\ 
  Abstntn & 1.078^{***} & 1.077^{***} & 1.051^{***} & 1.052^{***} \\ 
  & p = 0.005 & p = 0.005 & p = 0.006 & p = 0.006 \\ 
  PMDB & 1.241^{**} & 1.360^{***} &  &  \\ 
  & p = 0.015 & p = 0.005 &  &  \\ 
  PSDB & -0.082 &  & -0.406 &  \\ 
  & p = 0.896 &  & p = 0.490 &  \\ 
  PT & -0.751 &  &  & -1.042 \\ 
  