Note
for either regression or classification tasks using random forests or
elastic-net, the y to be predicted is now named endpoint
check environment.yml
data table should
- have samples as row, and features as column
- contain a column named
endpoint(either class labels or continuous variable) - contain a column named
sample_id- unique id for each sample/row
- navigate to the current directory of this folder
/your/path/rf_classifier - run the main script with data table filename (e.g.
demo_iris.csv) as input
Rscript main_rf.R demo_iris.csv
- the trained model, intermediate files, and some output plots are stored under
outs/[filename]
- configure:
NUM_TREE(number of trees),RF_METRIC(metric to select the best model),PCT_PARTITION(percentage to split train, test data) - run
Rscript main_rf.R [filename].csv
- configure:
FAMILY="binomial" - important for
endpointto have numerical entries, e.g.1:"Survived",0:"Died" - run
Rscript main_enet.R [filename].csv
- configure:
FAMILY="gaussian" endpointis continuous, e.g. indemo_mpg.csv- run
Rscript main_enet.R [filename].csv