Laurae's Data Science R Package
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
docs
man
.Rbuildignore
.gitattributes
.gitignore
.package_dev.R
DESCRIPTION
NAMESPACE
README.md

README.md

LauraeDS: Laurae's Data Science Package

This package is the sequel to Laurae2/Laurae R package.

It is meant to require less stuff and more robust.

Installation

devtools::install_github("Laurae2/LauraeDS", dep = FALSE)

Dependencies installation:

install.packages(c("Matrix", "sparsio", "fst", "data.table", "pbapply", "parallel"))
devtools::install_github("fstpackage/fst@e060e62")
devtools::install_github("Laurae2/ez_xgb/R-package@2017-02-15-v1")
devtools::install_github("Microsoft/LightGBM/R-package@fc59fce") # Jul 14 2017, v2.0.4

TO-DO

  • add fold generation
  • add sparse handling
  • add parallel fast csv/fst converter
  • add parallel handling (cluster)
  • add parallel xgboost
  • add parallel LightGBM
  • add metrics
  • add metric optimizations
  • xgb.DMatrix generation
  • lgb.Dataset generation
  • xgboost trainer
  • LightGBM trainer
  • easy GLM (xgboost)
  • easy Random Forest (xgboost)
  • easy Random Forest (LightGBM)
  • easy Gradient Boosted Trees (xgboost)
  • easy Gradient Boosted Trees (LightGBM)
  • grid learning ("grid search")
  • Random Patches feature generation (Subsampling + Colsampling from feature groups)
  • stacker
  • add lot of stuff

Available functions


Parallel functions

Parallel functions are provided to make R fly on multi-core and multi-socket systems, provided enough RAM.

Function Packages Description
parallel.csv data.table, fst, parallel Parallelizes and multithreads the reading of CSV files and writes to fst file format for fast reading.
parallel.threading parallel Sets processor affinity correctly on Windows machines. Provide a boost of up to 200% in memory bounded applications.
parallel.destroy parallel Stops a parallel cluster, or destroy any available clusters bound to the current R session.

I/O functions

I/O Functions allows to read files from sparse matrices quickly.

Function Packages Description
sparse.read sparsio, Matrix Reads SVMLight file format (sparse matrices)
sparse.write sparsio, Matrix Writes SVMLight file format (sparse matrices)

Fold functions

Fold functions allow to generate folds for cross-validation very quickly.

Function Packages Description
kfold None Generate cross-validated folds (stratified, treatment, pseudo-random, random)
nkfold None Generate Repeated cross-validated folds (stratified, treatment, pseudo-random, random)

Optimized Metrics

Optimized metrics might help get an edge when you can.

Function Packages Description
metrics.acc.max data.table Maximum Binary Accuracy
metrics.f1.max data.table Maximum F1 Score (Precision with Sensitivity Harmonic Mean
metrics.fallout;max data.table Minimum Fall-Out (False Positive Rate)
metrics.kappa.max data.table Maximum Kappa Statistic
metrics.mcc.max data.table Maximum Matthews Correlation Coefficient
metrics.missrate.max data.table Minim Miss-rate (False Negative Rate)
metrics.precision.max data.table Maximum Precision (Positive Predictive Rate)
metrics.sensitivity.max data.table Maximum Sensitivity (True Positive Rate)
metrics.specifity.max data.table Maximum Specificity (True Negative Rate)

Metric Computation/Solving

Computing and/or solving metrics might help you understand what default values are the best for the metric.

Function Packages Description
metrics.logloss None Logarithmic Loss (logloss)
metrics.logloss.unsafe None Logarithmic Loss (logloss) without bound checking
metrics.logloss.solve stats Logarithmic Loss Solver

Machine Learning, Binary Matrices

Generating binary matrices never got easier if you can throw lists and data.frames directly.

Function Packages Description
Laurae.xgb.dmat xgboost, Matrix Wrapper for extensible xgb.DMatrix generation.
Laurae.lgb.dmat lightgbm, Matrix Wrapper for extensible lgb.Dataset generation.

Machine Learning, Supervised

Not remembering every existing hyperparameters? Now you can by pressing Tab to autocomplete hyperparameters.

Function Packages Description
Laurae.xgb.train xgboost, Matrix Wrapper for xgboost Models

Machine Learning, Loss/Metrics Helpers

Creating loss/metrics can be a tedious task without templates. Use these as template wrappers: focus on loss/metrics, wrap them with a template quickly.

Function Packages Description
xgb.wrap.loss xgboost Wrapper to make quick xgboost loss function.
xgb.wrap.metric xgboost Wrapper to make quick xgboost metric function.
lgb.wrap.loss LightGBM Wrapper to make quick LightGBM loss function.
lgb.wrap.metric LightGBM Wrapper to make quick LightGBM metric function.

Machine Learning, Loss/Metrics Functions

Need functions answering metrics quickly? Here are some.

Function Packages Description
metrics.logloss None Computes the logarithmic loss.
metrics.logloss.unsafe None Computes the logarithmic loss faster by skipping out of bounds checks.
metrics.logloss.solve stats Solves for a parameter involving the logartihmic loss (minimal loss, constant prediction value, ratio).