Find file History
Latest commit e24bd6b Dec 15, 2016 @deguhath deguhath committed on GitHub Added a commnent in conventions section
Added comment: The utility reads in a data-frame into memory and starts the modeling process. The training data-frame will have to be saved as "trainDF" prior to using it with this utility.

Readme.md

Product Information

The TDSP Automated Modeling and Reporting (AMR) tool creates an automated workflow for generating and comparing multiple modeling approaches on a data-set.

Currently available in R, it utilizes the Caret package to conveniently run multiple models on the data with a set of input parameters (which the users can specify through an yaml file). The accuracies of the models are then output for users to compare, and evaluate which modeling approach may be best for creating a final model for their predictive problem. Importance of the variables in the various models are also output for the users to examine which ones are important for model accuracy.

Prerequisites

You must have the following installed on your machine:

• R 3.2.3 or newer version The Data Science VM on Azure has R 3.2.3 (Linux) or R 3.2.5 (Windows) installed for you by default.
• RStudio

How to run the AMR tool

Details of how to run the AMR tool is provided here. Briefly, you have to first specify your model parameters, as well as path to your data file in an yaml file. Then navigate to the directory of the which has the markdown file you want to run (currently there is one for binary classification and another one for regression), and initiate the run using the following one of the two following command:

o Regression: rmarkdown::render("RegressionModelSelection.rmd")
o Binary classification: rmarkdown::render("BinaryModelSelection.rmd")

This will prompt you provide the location of the yaml file, after which the run will progress to completion. The time taken for the run to complete will depend on various factors, such as, size of data-set, the number of cross-validation folds to run, number of parameters to sweep over, etc.

After the run is finished, you will get an output HTML report, with accuracy of the models run, and variable importance information for each model. For details of the output, please see this markdown file.

R packages

The following R packages are used in the AMR tool:

glmnet
yaml
randomForest
xgboost
lattice
shiny
gridExtra
lme4
RODBC
pbkrtest
caret

Licensing

Use of the software is subject to acceptance of the License Agreement.