ACTG320 exploratory analysis #3

bcjaeger · 2022-06-15T13:13:28Z

aorsf works very well on ACTG320 mortality prediction and it would be great to figure out why it works well

Plan:

Fit several models
Use different node sizes
Death outcome
use Out of bag error estimate from aorsf to compare the different model's prediction accuracy

Here is a little synopsis of what I would like to check with the actg320 data:

The data have two main endpoints, death and aids diagnosis. For both endpoints, I want to see how well aorsf performs with a number of different hyper-parameter values. In other words, I am guessing that the performance of aorsf on this dataset is going to depend on how well we tune it. The main tuning parameters for aorsf are below (copied from ?aorsf::orsf). I think we could set up a simple experiment where we make a dataset with one column for each tuning parameter, with each row having a specific set of inputs for orsf(), and then we assess the performance of each set of parameter inputs using cross-validation, probably with just 3 folds b/c the count of events is low. This would be a great exercise and should also provide some useful info for us, i.e., we may change the default values of orsf() for datasets with smaller event counts.

@kristinlenoir, would you like to help me with this?

The text was updated successfully, but these errors were encountered:

kristinlenoir · 2022-06-15T13:35:27Z

I would love to help with this!

kristinlenoir · 2022-08-17T12:28:46Z

FIrst analysis - observe how performance varies across leaf_min events.
I used a 3 fold cross-validation with 10 repeats and a prediction horizon of 350 days.

This is the c-statistic for AIDS - this trend has been consistent

Death - fewer events. The performance has been rather inconsistent prior to when I set a seed. A line is fit, but there is really little trend (possible that leaf_min_events=4 is an outlier). The c-statistic, however, is not bad across the board.

Future plans:
Amend code to make it more efficient with better output format
Add Brier score
vary split_num_obs
Perhaps try Monte Carlo in addition to CV (increase number of repeats?)
Maybe try varying the mtry down the line (probably not the n_retry)
Adding curved line to performance graphs may be preferable in anticipation of some point of optimal performance

bcjaeger self-assigned this Jun 15, 2022

bcjaeger assigned kristinlenoir Jun 15, 2022

bcjaeger added the question Further information is requested label Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACTG320 exploratory analysis #3

ACTG320 exploratory analysis #3

bcjaeger commented Jun 15, 2022 •

edited

Loading

kristinlenoir commented Jun 15, 2022

kristinlenoir commented Aug 17, 2022

ACTG320 exploratory analysis #3

ACTG320 exploratory analysis #3

Comments

bcjaeger commented Jun 15, 2022 • edited Loading

kristinlenoir commented Jun 15, 2022

kristinlenoir commented Aug 17, 2022

bcjaeger commented Jun 15, 2022 •

edited

Loading