-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Welcome to the Wiki of autoML, we do hope you enjoy your time here!
Whilst you are here, consider checking out my other library autoEDA
here: https://github.com/XanderHorn/autoEDA
Note that autoML
is a work in progress.
Please take a moment and consider contributing to the future development and support of autoML
, autoEDA
and any future work by checking out my Paypal and Patreon accounts.
Patreon: https://patreon.com/XanderHorn
PayPal: https://www.paypal.me/XanderHorn
library(devtools)
install_github("XanderHorn/autoML")
Some users might experience issues when installing via github, please see this link for a potential fix: https://github.com/r-lib/devtools/issues/1900
autoML
uses a parallel back end as default, this unfortunately means that autoLearn
will never produce the exact same models when you run autoLearn
or autoML
(wrapper of autoLearn
) multiple times on the same dataset. This does not effect predictions.
Welcome to the Wiki of autoML. Whilst the library consists of a number of functions there are a few functions which are intended to be used the most:
This function is a wrapper of autoPreProcess
, autoLearn
and autoInterpret
, with some of the more advanced settings hidden and automatically set. The function should be used to train models for binary classification, multi-class classification, regression and unsupervised problems.
The function will return the final training set, all models that were trained as well as the results table comparing all trained models.
Because autoML
utilities the brilliant mlr
library, all the models trained are mlr
model objects, meaning that all of the functionality in mlr
can be applied to the models produced in autoML
This function will clean your data set and perform various methods of feature engineering on the data. Various settings can be set to the user's wish to control how much cleaning and feature engineering will take place. The function also produces a preProcess
function, which can be used to re-create all the steps that the function took during execution.
The intention of the code produced is not that it is used, but rather that it is fed to either autoLearn
or autoML
to create production code functions for each of the models trained.
autoLearn
will automate the process of model training, model validation, parameter tuning, target creation and target experimentation. Various settings can be configured ranging from tuning methods, training modes to the models that should be trained on the data. When output from autoPreProcess
is provided to autoLearn
, each model will return a production function, unique to that model.
Because autoLearn
utilities the brilliant mlr
library, all the models trained are mlr
model objects, meaning that all of the functionality in mlr
can be applied to the models produced in autoLearn
autoInterpret
will automatically produce model interpretability methods for any mlr
trained model and the training set. Since all models trained in the autoML
library are mlr
model objects, autoInterpret
can be used with them.
Allows the user to save the code that is generated by autoPreProcess
, autoLearn
and autoML
to an R script. The code generated from autoLearn
and autoML
can then be embedded into an API for rapid model productionalization.
This library aims to automate various aspects of the traditional machine learning cycle. The library automatically performs the following actions on any dataset:
- Data cleaning
- Encoding of incorrect missing values i.e. Unknown => NA
- Removal of duplicate observations
- Removal of constant, duplicate and features only containing missing values
- Correcting of feature/variable names
- Formatting of features/variables i.e. character => numeric
- Feature engineering
- Imputation and outlier clipping
- Categorical features sparse categories correction
- Categorical feature engineering i.e. one hot encoding, proportional encoding
- Flagging/Tracking features i.e. keeps track where missing data was observed
- Date and text feature engineering
- Numerical feature transformations i.e. square root transformation
- Numerical feature interactions i.e. x1 / x2
- Unsupervised feature creation using k-means clustering
- Feature scaling
- Model training
- Automated target generation i.e. regression, unsupervised learning
- Automated test and validation set creation
- Resampling of tuned models i.e. k-fold cross validation
- Tuning of models i.e. random search
- Different training targets i.e. balanced vs original vs reduced features
- Optimisation of various performance metrics i.e. auc, brier
- Training plots i.e. learning curve, threshold, calibration etc.
- Parallel processing / multicore processing
- Various models included i.e. xgboost, lasso, knn
- Probability cutoffs included for classification models
- Model interpretation
- Partial dependence plots
- Feature importance plots
- Local model interpretable plots
- Model feature/variable interaction plots
- Code generation
- Code is generated whilst functions executes
- Code is adapted to each model that is trained and ready for production
- Code easily interpreted to lessen the black-box feeling
- Lower level functionality
- Most functions utilised in the main functions available for more flexibility