# 5 Annotation

Sure, I'll break down the Stata script into smaller code blocks with notes so you can run them separately in a Jupyter Notebook with the Stata kernel.

### Block 1: Setup and Logging

In [1]:
// Close any existing log files
capture log close 

// Open a new log file to record the output
log using jamascript.log, replace 



--------------------------------------------------------------------------------


      name:  <unnamed>


       log:  /Users/apollo/Documents/Melody/pystata/jamascript.log


  log type:  text


 opened on:   5 Jul 2024, 14:12:25



### Block 2: Define Globals

In [2]:
// Set global macros for the repository URL and local directory path
global repo https://github.com/muzaale/forum/raw/main/ 
global dir ~/documents/melody/local


### Block 3: Load Data

In [3]:
// Load the dataset from the repository
use ${repo}esrdRisk_t02tT, clear


(Live Kidney Donors + NHANES III Nondonors, Unmatched time_tT + CMS)



### Block 4: Initial Data Exploration

In [4]:
// Display the distribution of the 'donor' variable
tab donor rSMGJcEdF_d

// Generate a new variable 'entry' based on the 'rSMGJcEdF_t0' variable
g entry = rSMGJcEdF_t0



                     |           rSMGJcEdF_d


               donor |  Censored       ESRD       Died |     Total


---------------------+---------------------------------+----------


               Donor |    95,184         99        934 |    96,217 


     HealthyNondonor |     8,570         17        777 |     9,364 


NotSoHealthyNondonor |     4,694        154      3,228 |     8,076 


---------------------+---------------------------------+----------


               Total |   108,448        270      4,939 |   113,657 





<iframe src="./jamascript-m.pdf" width="100%" height="600" style="borders:none"></iframe>
<iframe src="./jamascript-e.pdf" width="100%" height="600" style="borders:none"></iframe>



```stata

```

### Block 5: Data Cleaning and Adjustment
```stata
// Linkage for donors after 2011 is untrustworthy
replace rSMGJcEdF_d = 0 if rSMGJcEdF_tT > d(31dec2011)
replace rSMGJcEdF_tT = d(31dec2011) if rSMGJcEdF_tT > d(31dec2011)

// Linkage before 1994 is untrustworthy
replace entry = d(01jan1994) if entry < d(01jan1994) & rSMGJcEdF_tT > d(01jan1994)
```

### Block 6: Survival Analysis Setup
```stata
// Set up the survival-time data
stset rSMGJcEdF_tT, origin(rSMGJcEdF_t0) entry(entry) fail(rSMGJcEdF_d==2) scale(365.25)
```

### Block 7: Generate Kaplan-Meier Estimates
```stata
// Generate Kaplan-Meier survival estimates and save the results
sts list, fail by(donor) at(5 12 15) saving(km, replace)
```

### Block 8: Summarize Failure Rates
```stata
preserve
    use km, clear
    replace failure = failure * 100
    
    // Summarize failure rates for living donors at 5, 12, and 15 years
    sum failure if donor == 1 & time == 5
    local don5y: di %3.2f r(mean)
    
    sum failure if donor == 1 & time == 12
    local don12y: di %3.2f r(mean)
    
    sum failure if donor == 1 & time == 15
    local don15y: di %3.2f r(mean)

    // Summarize failure rates for healthy nondonors at 5, 12, and 15 years
    sum failure if donor == 2 & time == 5
    local hnd5y: di %3.2f r(mean)
    
    sum failure if donor == 2 & time == 12
    local hnd12y: di %3.2f r(mean)
    
    sum failure if donor == 2 & time == 15
    local hnd15y: di %3.2f r(mean)

    // Summarize failure rates for the general population at 5, 12, and 15 years
    sum failure if donor == 3 & time == 5
    local gpop5y: di %3.2f r(mean)
    
    sum failure if donor == 3 & time == 12
    local gpop12y: di %3.2f r(mean)
    
    sum failure if donor == 3 & time == 15
    local gpop15y: di %3.2f r(mean)
restore
```

### Block 9: Kaplan-Meier Survival Plot
```stata
// Create a Kaplan-Meier survival plot with risk tables
sts graph, by(donor) fail per(100) xlab(0(3)15) ylab(0(10)40, format(%2.0f)) tmax(15) risktable(, color(stc1) group(1) order(3 " " 2 " " 1 " ") ti("#")) risktable(, color(stc2) group(2)) risktable(, color(stc3) group(3)) legend(on ring(0) pos(11) order(3 2 1) lab(3 "General population") lab(2 "Healthy nondonor") lab(1 "Living donor")) ti("Morte") text(`don5y' 5 "`don5y'%", col(stc1)) text(`don12y' 12 "`don12y'%", col(stc1)) text(`don15y' 15 "`don15y'%", col(stc1)) text(`hnd5y' 5 "`hnd5y'%", col(stc2)) text(`hnd12y' 12 "`hnd12y'%", col(stc2)) text(`hnd15y' 15 "`hnd15y'%", col(stc2)) text(`gpop5y' 5 "`gpop5y'%", col(stc3)) text(`gpop12y' 12 "`gpop12y'%", col(stc3)) text(`gpop15y' 15 "`gpop15y'%", col(stc3))
```

### Block 10: Export Graph
```stata
// Export the survival plot as a PNG file
graph export ${dir}/jamascript.png, replace
```

### Block 11: Save Processed Data
```stata
// Keep relevant variables for further analysis
keep _* entry age_t0 female race donor
rename age_t0 age

// Save the processed dataset
save ${dir}//jamascript.dta, replace
```

### Block 12: Cox Proportional Hazards Model
```stata
// Fit a Cox proportional hazards model
noi stcox i.donor, basesurv(s0)

// List the baseline survival estimates
noi list s0 _t donor in 1/10
```

### Block 13: Save Baseline Survival and Coefficients
```stata
// Define matrix 'b' with model coefficients
matrix define b = e(b)

// Keep baseline survival estimates
keep s0 _t 

// Sort and list baseline survival estimates
sort _t s0
list in 1/10

// Save baseline survival estimates
save ${dir}/s0.dta, replace
export delimited using ${dir}/s0.csv, replace

// Save model coefficients
matrix beta = e(b)
svmat beta
keep beta*
drop if missing(beta1)
list 
save ${dir}/b.dta, replace
export delimited using ${dir}/b.csv, replace
```

### Block 14: Close Log File
```stata
// Close the log file
log close
```

You can run these blocks sequentially in your Jupyter Notebook with the Stata kernel. Each block should perform a distinct part of the analysis, making it easier to debug and understand the workflow.