# Machine Learning with H2O - Tutorial 3a: Regression Models (Basics)

<hr>

**Objective**:

- This tutorial explains how to build regression models with four different H2O algorithms.

<hr>

**Wine Quality Dataset:**

- Source: https://archive.ics.uci.edu/ml/datasets/Wine+Quality
- CSV (https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv)

<hr>
    
**Algorithms**:

1. GLM
2. DRF
3. GBM
4. DNN


<hr>

**Full Technical Reference:**

- http://docs.h2o.ai/h2o/latest-stable/h2o-r/h2o_package.pdf

<br>


In [None]:
# Start and connect to a local H2O cluster
suppressPackageStartupMessages(library(h2o))
h2o.init(nthreads = -1)

<br>

In [None]:
# Import wine quality data from a local CSV file
wine = h2o.importFile("winequality-white.csv")
head(wine, 5)

In [None]:
# Define features (or predictors)
features = colnames(wine)  # we want to use all the information
features = setdiff(features, 'quality')    # we need to exclude the target 'quality'
features

In [None]:
# Split the H2O data frame into training/test sets
# so we can evaluate out-of-bag performance
wine_split = h2o.splitFrame(wine, ratios = 0.8, seed = 1234)

wine_train = wine_split[[1]] # using 80% for training
wine_test = wine_split[[2]]  # using the rest 20% for out-of-bag evaluation

In [None]:
dim(wine_train)

In [None]:
dim(wine_test)

<br>

## Generalized Linear Model

In [None]:
# Build a Generalized Linear Model (GLM) with default settings
glm_default = h2o.glm(x = features,
                      y = 'quality',
                      training_frame = wine_train,
                      family = 'gaussian', 
                      model_id = 'glm_default')

In [None]:
# Check the model performance on training dataset
glm_default

In [None]:
# Check the model performance on test dataset
h2o.performance(glm_default, wine_test)

<br>

## Distributed Random Forest

In [None]:
# Build a Distributed Random Forest (DRF) model with default settings
drf_default = h2o.randomForest(x = features,
                               y = 'quality',
                               training_frame = wine_train,
                               seed = 1234,
                               model_id = 'drf_default')

In [None]:
# Check the DRF model summary
drf_default

In [None]:
# Check the model performance on test dataset
h2o.performance(drf_default, wine_test)

<br>

## Gradient Boosting Machines

In [None]:
# Build a Gradient Boosting Machines (GBM) model with default settings
gbm_default = h2o.gbm(x = features,
                      y = 'quality',
                      training_frame = wine_train,
                      seed = 1234,
                      model_id = 'gbm_default')

In [None]:
# Check the GBM model summary
gbm_default

In [None]:
# Check the model performance on test dataset
h2o.performance(gbm_default, wine_test)

<br>

## H2O Deep Learning

In [None]:
# Build a Deep Learning (Deep Neural Networks, DNN) model with default settings
dnn_default = h2o.deeplearning(x = features,
                               y = 'quality',
                               training_frame = wine_train,
                               model_id = 'dnn_default')

In [None]:
# Check the DNN model summary
dnn_default

In [None]:
# Check the model performance on test dataset
h2o.performance(dnn_default, wine_test)

<br>

## Making Predictions

In [None]:
# Use GLM model to make predictions
yhat_test_glm = h2o.predict(glm_default, wine_test)
head(yhat_test_glm)

In [None]:
# Use DRF model to make predictions
yhat_test_drf = h2o.predict(drf_default, wine_test)
head(yhat_test_drf)

In [None]:
# Use GBM model to make predictions
yhat_test_gbm = h2o.predict(gbm_default, wine_test)
head(yhat_test_gbm)

In [None]:
# Use DNN model to make predictions
yhat_test_dnn = h2o.predict(dnn_default, wine_test)
head(yhat_test_dnn)

<br>