Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

About H2O

In H2O Docs

About this workshop

WeCodeFest slides


About the algorithms

Generalized Linear Models (GLM)

In H2O Docs

Introduction to Generalized Linear Models

Demo H2O World

Generalized Linear Models (GLM) estimate regression models for outcomes following exponential distributions. In addition to the Gaussian (i.e. normal) distribution, these include Poisson, binomial, and gamma distributions. Each serves a different purpose, and depending on distribution and link function choice, can be used either for prediction or classification.


  • Datasets are commonly split into training, testing, and validation sets.
    • A training dataset is a dataset of examples used for learning, that is to fit the parameters of, for example, a classifier.
    • A validation dataset is a set of examples used to tune the hyperparameters of a classifier. It, as well as the testing set, should follow the same probability distribution as the training dataset.
    • A test dataset is a dataset that is independent of the training dataset, but that follows the same probability distribution as the training dataset.
  • K-fold cross-validation is used to validate a model internally, i.e., estimate the model performance without having to sacrifice a validation split. Also, you avoid statistical issues with your validation split (it might be a “lucky” split, especially for imbalanced data). Good values for K are around 5 to 10. Comparing the K validation metrics is always a good idea, to check the stability of the estimation, before “trusting” the main model.
  • Seed: This option specifies the random number generator (RNG) seed for algorithms that are dependent on randomization. When a seed is defined, the algorithm will behave deterministically.


In H2O Docs

The Word2vec algorithm takes a text corpus as an input and produces the word vectors as output. The algorithm first creates a vocabulary from the training text data and then learns vector representations of the words. The vector space can include hundreds of dimensions, with each unique word in the sample corpus being assigned a corresponding vector in the space. In addition, words that share similar contexts in the corpus are placed in close proximity to one another in the space.


GLM Booklet R Vignette.


H2O Workshop for WeCode 2018







No releases published


No packages published