# Learning the Function f(x) = x^2

In this example we will use the *pyml_ensemble* library to create an ensemble of Artificial Neural Networks (ANNs) which will be used to learn the x^2 function.

To begin we'll import a few libraries that will be useful for creating the dataset, training the ensemble models, and analyzing the predictive capabilities of the ensemble.

In [4]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import metrics

Next we will need to import the functionality from the *pyml_ensemble* library that we will make use of. Namely, we will be using

+ Ensemble - the core of the package which holds the ensemble models and provides ensemble interaction,
+ MeanAggregator - an aggregator which combines the output of the ensemble methods by returning the average predicted value,
+ ANNModel - a built-in *pyml_ensemble* ANN model.

In [2]:
from pyml_ensemble import Ensemble
from pyml_ensemble.aggregator import MeanAggregator
from ann_model import ANNModel

Using TensorFlow backend.


With that we should have everything we need to build the ensemble of ANNs.

## Creating the Dataset

We will be creating the *x* and *y* values for the function *y = x^2* manually. In our case we will only use values of *x >= 0* since this is an easier function to learn. In particular the *x* is constrained as *0 <= x <= 20*. The *x* values will be generated with the *linspace* function available in *numpy*.

For the *y* values, we will define a function to return the squared values of a *numpy* array as such

In [3]:
def y(x):
    return x**2

## Building the Ensemble

With the preliminaries taken care of we are ready to create an ensemble using the *pyml_ensemble* package.

To begin we define a few variables that create to core of the ensemble and define which aggregator we will be using. In the *pyml_ensemble* library an **aggregator** is the functionality that takes care of combine the predictions of the individual ensemble methods. The package initially came with two predefined aggregators:

+ MeanAggregator - returns the average predicted value
+ ModeAggreagator - returns the most frequently predicted value

Custom aggregators can be defined by the user by creating a class that implements the [abstract base class](https://docs.python.org/3/library/abc.html) *pyml_ensemble.aggregator.Aggregator* which only requires the implementation of *combine()* method. Creating custom aggregators and models will be handled in a different example.

Below the ensemble, aggregator, and dataset are created and the dataset is split in to testing and training data.

In [6]:
def y(x):
    return x**2

ensemble = Ensemble()                  # create the ensemble object
aggregator = MeanAggregator()          # create the aggregation object
ensemble.set_aggregator(aggregator)    # set the aggregator of the ensemble

x = np.linspace(0, 20, num=2000)       # create the x values of the dataset. Here we generate 2,000 values 
                                       # between 0 and 20, equally spaced.
y = y(x)                               # generate y values

# below we used the built-in train_test_split(...) function from sklearn to split the dataset into testing 
# and training x and y datasets
trainx, testx, trainy, testy = train_test_split(x, y, test_size=0.33)

num_models = 2                         # define the number of models in the ensemble
for i in range(num_models):
    # create the models and add them to the ensemble, for clarity we define the parameters as variables
    input_size = 1                     # x is the only input
    num_hidden_layers = 2
    hidden_layer_sizes = [5, 5]        # each hidden layer is 5 nodes wide
    output_size = 1                    # y is the only output
    epochs = 1500                      # number of training epochs
    batch_size = 16
    # more named parameters are available to the ANNModel and can be found in the documentation1
    ann = ANNModel(input_size, num_hidden_layers, hidden_layer_sizes, output_size, 
                   epochs=epochs, batch_size=batch_size, fit_verbose=0)
    # to be able to save the model weights later we need to set the weight file on each ensemble member
    ann.set_weight_filename("weights_model" + str(i) + ".hdf5")
    
    ensemble.add_model(ann)            # add the model to the ensemble
    
# here we create a list for holding the training data examples. trainx_data_list[0] will be used to 
# train the model in ensemble.models[0]. In this case each model is trained with the same data but 
# in your case the data can be segmented however you'd like to train the individual models
trainx_data_list = [trainx for _ in range(num_models)]

# similarly the same target data is used for each model
trainy_data_list = [trainy for _ in range(num_models)]

# train the ensemble models
print("Training ensemble...")
ensemble.train(trainx_data_list, trainy_data_list)

# get predictions, aggregation is automatic
print("Getting predictions...")
y_hat = ensemble.predict(testx)

# calculate and display the MSE
print(metrics.mean_squared_error(testy, y_hat))

# This ensemble function calls a method belonging to the Model object. In this case we will save the model weights.
ensemble.call_all("save_weights")

Training ensemble...
Getting predictions...
4169.145685709779


[None, None]

### Full Code No Fluff

Below is the full Python code required to build, train, and test the ensemble. The comments and fluff have been stripped to demonstrate how simple it is to create, train, and use an ensemble using the *pyml_ensemble* package.

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import metrics

from pyml_ensemble import Ensemble
from pyml_ensemble.aggregator import MeanAggregator
from pyml_ensemble.model import ANNModel

def y(x):
    return x**2

ensemble = Ensemble()           
aggregator = MeanAggregator()          
ensemble.set_aggregator(aggregator)    

x = np.linspace(0, 20, num=2000)       
y = y(x)
trainx, testx, trainy, testy = train_test_split(x, y, test_size=0.33)

num_models = 2
for i in range(num_models):
    ann = ANNModel(1, 2, [5, 5], 1, epochs=1500, batch_size=16, fit_verbose=0,
                   weight_file="weights_model" + str(i) + ".hdf5")
    ensemble.add_model(ann)

ensemble.train([trainx for _ in range(num_models)], [trainy for _ in range(num_models)])

y_hat = ensemble.predict(testx)
print(metrics.mean_squared_error(testy, y_hat))
ensemble.call_all("save_weights")