This notebook contains an example of when the model is fed two features. In the training data, $y = sin(X_1)$. $X_2$ has no correlation to $y$ whatsoever, and is just random numbers. We are testing to see how the model degrades if one of the features contains no information.

First we import the required libraries. `numpy` is used to allow us to manipulate arrays with efficiency. `pandas` gives us access to Panda Dataframes which are the preferred way of storing our data. `matplotlib.pyplot` lets us plot graphs with our data. `twinlab` is the main library we are using. Some of the libraries are renamed using `as` for convenience. 

In [None]:
# Third-party imports
import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

# Project imports
import twinlab as tl

At the top of this cell we define the name of our dataset and model.

Next, we define the training data:
- $X_1$ is an array of values between 0 and 1.
- $X_2$ creates an array of 10 random numbers between 0 and 1. 
- $y$ is $sin(X_1)$ and has no dependency on $X_2$ whatsoever. 

At the bottom of the cell we put these arrays into a Pandas dataframe with the corresponding coloumn headings.

In [None]:
dataset_id = "second_random_input.csv"
campaign_id = "second_random_input"

#Training Data
X1 = np.array([
    0.6964691855978616,
    0.28613933495037946,
    0.2268514535642031, 
    0.5513147690828912, 
    0.7194689697855631, 
    0.42310646012446096, 
    0.9807641983846155, 
    0.6848297385848633, 
    0.48093190148436094, 
    0.3921175181941505
])
X2 = np.random.rand(10)
y = np.sin(X1*2.*np.pi) + np.random.normal(0, 0.05, 10)

train_data = pd.DataFrame({'X1': X1, 'X2':X2, 'y': y })
print(train_data)

In this cell we set the parameters to be used for training the model.

In [None]:
#defines parameters for our prediction
prediction_params = {
    "filename": dataset_id,
    "inputs" : ["X1", "X2"],
    "outputs": ["y"],
}

This cell creates our input values for the model to predict outputs for. 
- The $X_1$ values are 101 equally spaced numbers between 0 and 1. 
- The $X_2$ values are 101 random values between 0 and 1. 

We now create a Pandas Dataframe with the data and corresponding data.

In [None]:
input_dict = {
    "X1": np.linspace(0, 1, 101),
    "X2": np.random.rand(101)
}

prediction_inputs = pd.DataFrame(input_dict)
print(prediction_inputs)

We now upload the training data (the already set values of $X_1, X_2$ and $y$) to the twinLab cloud.

Whenever `verbose = true` is an argument, the function returns information about what it is doing to the user. This generates the grey text below the cells when they are run.

In [None]:
tl.upload_dataset(train_data, dataset_name=dataset_id, verbose=True)

`tl.list_datasets()` lets us check if the dataset we uploaded is in the right place.
`tl.query_dataset()` lets us view statistics about the data in our dataset 

In [None]:
_ = tl.list_datasets(verbose=True)
tl.query_dataset(dataset_id)

This cell trains the model on the dataset we provided, and using the parameters we provided.

In [None]:
tl.train_campaign(prediction_params, campaign_id, verbose=True)

This simply lists the current models on the twinlab cloud.

In [None]:
_ = tl.list_campaigns(verbose=True)

This displays information about the model we are using.

In [None]:
_ = tl.query_campaign(campaign_id, verbose=True)

Here we ingest $X_1$ and $X_2$ values and, based on the training data, make predictions for the corresponding $y$ value. `df_mean` is the mean value that the model predicts, while `df_std` an estimate of the uncertainty of the model around the `df_mean value`. There is around a $68$ chance that values lie within df_std of df_mean, which rises to $95$ within `2*df_std`.

In [None]:
df_mean, df_std = tl.predict_campaign(prediction_inputs, campaign_id, verbose=True)

Now we first plot on a graph the $X_1$ against $y$, then $X_2$ against $y$. 
- The black dots on the graph are the training data we gave it. 
- The darkest blue line in the graph is the `df_mean` value.
- The blue sections either side represent the range of uncertainty in the `df_mean` value.

On the first graph ($X_1$ against $y$), the model has become more uncertain about its predictions of $y$ because of the introduction of $X_2$
On the second graph, we can see there is no correlation between $X_2$ and $y$.

In [None]:
# Plot parameters
nsigs = [1, 2]
color = "blue"
alpha = 0.5
plot_training_data = True
plot_model_mean = True
plot_model_bands = True

for X, Xlabel in zip(["X1", "X2"], ["$X_1$", "$X_2$"]):
# Plot results
    grid = prediction_inputs[X]
    mean = df_mean["y"]
    err = df_std["y"]
    if plot_model_bands:
        label = "Model prediction"
        plt.fill_between(grid, np.nan, np.nan, lw=0, color=color, alpha=alpha, label=label)
        for isig, nsig in enumerate(nsigs):
            plt.fill_between(grid, mean-nsig*err, mean+nsig*err, lw=0, color=color, alpha=alpha/(isig+1))
    if plot_model_mean:
        label = "Model prediction" if not plot_model_bands else None
        plt.plot(grid, mean, color=color, alpha=alpha, label=label)
    if plot_training_data:
        plt.plot(train_data[X], train_data["y"], ".", color="black", label="Training data")
    plt.xlim((0., 1.))
    plt.xlabel(Xlabel)
    plt.ylabel("$y$")
    plt.legend()
    plt.show()

In [None]:
# Delete campaign and dataset (if desired)
tl.delete_campaign(campaign_id, verbose=True)
tl.delete_dataset(dataset_id, verbose=True)