# Recreating_Section_7.3.1_Analysis_on_SOF_data

**Background: **In the text, an analysis is performed to measure whether change in BMD is related to menopause age using a non-multi-level model (which they say is not the correct way of analyzing it). I was not able to understand what the predictors in the model were, so I want to create my own version of it here. 

**Purpose: **To recreate the analysis performed in section 7.3.1 to better understand what all the predictors mean

**Methods: **
>1. Introduction
>2. Inits
>3. Load SOF data
>4. Read textbook's Stata code
>5. Recreate Stata code in R
>6. Cross-check my results with the textbook's

**Conclusions: **
* The fit results didn't match perfectly but they were quite close to what was shown in the textbook

# Inits

## Imports

In [None]:
#loading the Splines Packages
require(splines)
#ISLR contains the Dataset
require(ISLR)
library(ggplot2)

## Definitions

## Funcs

# Load SOF data

In [None]:
sof_df = read.csv('./data/sof.csv')
sof_df$id = factor(sof_df$id)

In [None]:
str(sof_df)

# Read textbook's Stata code

use sof.dta, clear

gen meno_ov_52=meno_age>52

replace meno_ov_52=. if meno_age==.

mkspline visit_spl=visit, cubic nknots(3)

save sof2.dta, replace

regress totbmd i.meno_ov visit_spl* i.meno_ov#c.visit_spl*

predict pred_spl


# Recreate Stata code in R

First, I will need to create the `meno_age_gt_52` variable, then remove it's nan values. Then fit a cuibic spline using `meno_age_gt_52` as well as the visits, and finally print out the summary

## Create `meno_age_gt_52` variable

In [None]:
sof_df$meno_age_gt_52 = sof_df$meno_age > 52

## Remove `meno_age_gt_52` nan values

This is done automatically in R

## Fit cubic spline using `meno_age_gt_52` as well as `visits`

In [None]:
fit = lm(totbmd ~ bs(visit, knots=3, degree=1)*meno_age_gt_52, data = sof_df)

## Print out summary

In [None]:
summary(fit)

**These results don't match perfectly but they are quite close to what was shown in the textbook**

# Cross-check my results with the textbook's

## Plot predictions

First, I need to generate predictions for each value of `visits` to plot

In [None]:
pred_results = cbind(sof_df)
pred_results$totbmd.pred = predict(fit, newdata = pred_results)

Plotting the prediction of `totbmd` vs. `visits`, split by `meno_age_gt_52`

In [None]:
options(repr.plot.width=7, repr.plot.height=3)
(
    ggplot(pred_results, aes(x=visit, y=totbmd.pred, color=meno_age_gt_52)) 
    + geom_line() 
    + geom_point()
    + ggtitle('Predicted BMD vs. visits, split by meno_age > 52')
)