## Python and R

This setup allows you to use *Python* and *R* in the same notebook.

To set up a similar notebook, see quickstart instructions here:

https://github.com/dmil/jupyter-quickstart

Some thoughts on why I like this setup and how I use it at the [end](notebook.ipynb#Thoughts) of  this notebook.

In [None]:
%load_ext rpy2.ipython
%load_ext autoreload
%autoreload 2

%matplotlib inline  
from matplotlib import rcParams
rcParams['figure.figsize'] = (16, 100)

import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings("ignore") # Ignore all warnings
# warnings.filterwarnings("ignore", category=RRuntimeWarning) # Show some warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

# always show all columns
pd.set_option('display.max_columns', None)

This is a Python notebook, but below is an R cell. The `%%R` at the top of the cell indicates that the code in this cell will be R code.

In [None]:
%%R

# My commonly used R imports

require('tidyverse')

## Read Data

In [None]:
df = pd.read_csv('cleaned_and_merged_data.csv', dtype={'fips': str})
df

# Exploratory Data Visualization (single variable)

In [None]:
%%R -i df

# Define variables 
# 👇 modify these only and re-run the cell
#    to see different single-variable regressions
x_var <- "median_income"
y_var <- "republican_pct"

# Create formula dynamically
formula <- as.formula(paste(y_var, "~", x_var))

# Show the model
model <- lm(formula, data = df)
print(summary(model))

# Show the ggplot
ggplot(df, aes_string(x = x_var, y = y_var)) +
    geom_point(alpha = 0.2) +
    geom_smooth() +
    theme_minimal()

## 👉 Build a better model (do a multivariable linear regression here)

In [None]:
%%R 

# keep modifying this line 👇 and re-running the whole notebook...
model <- lm(republican_pct ~ median_income, data=df)
summary(model)

## Model diagnostics

In [None]:
%%R -w 1000 -h 1000

par(mfrow = c(2, 2))  # 2x2 layout
plot(model)


- Video on model diagnostics: https://www.youtube.com/watch?v=jd7x-ww7da4
- Some notes on Q-Q Plot:
https://www.youtube.com/watch?v=okjYjClSjOg


## Residual Analysis

In [None]:
%%R -o df

# add model residuals onto dataframe
df <- df %>%
    mutate(
        residuals = model$residuals,
        fitted = model$fitted.values,
        residuals_z = scale(residuals)
    ) %>% 
    arrange(residuals)  %>% 
    ungroup() %>% 
    mutate(across(everything(), as.vector))  # Ensure all columns are 1D vectors (chatgpt fix)

df %>% head()

In [None]:
df[['state', 'county', 'total_pop', 'year', 'median_income', 'pct_black', 'pct_hispanic', 'pct_white', 'pct_native', 'density_per_sqkm', 'region', 'democratic_pct', 'republican_pct', 'fitted', 'residuals', 'residuals_z']].head(10)

In [None]:
df[['state', 'county', 'total_pop', 'year', 'median_income', 'pct_black', 'pct_hispanic', 'pct_white', 'pct_native', 'density_per_sqkm', 'region', 'democratic_pct', 'republican_pct', 'fitted', 'residuals', 'residuals_z']].tail(10)