## **Quickstart Guide**

This guide covers the standard usage pattern and basic functionality to help you get started with twinLab. In this jupyter notebook we will:

    1. Upload a dataset to twinLab.
    2. Use `tl.train_campaign` to create a surrogate model.
    3. Use the model to make a prediction with `tl.predict_campaign`.
    4. Visualise the results and their uncertainty.

In [None]:
# Third-party imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Project imports
import twinlab as tl

### **API setup**

The first time you use twinLab you will have to enter your API key and confirm you have the correct URL set.

In [None]:
# Set the API key
tl.set_api_key("my_api_key")

# Check which URL is being used
tl.get_server_url()

# Set the server URL
tl.set_server_url("http://twinlab.digilab.co.uk")

### **Your twinLab information**

Confirm your twinLab version

In [None]:
tl.get_versions()

And view your user information, including how many credits you have.

In [None]:
tl.get_user_information()

### **Upload a dataset**

Datasets must be data presented as a `pandas.DataFrame` object, or a filepaths which points to a csv file that can be parsed to a `pandas.DataFrame` object. **Both must be formatted with clearly labelled columns.** Here, we will label the input (predictor) variable `x` and the output variable `y`. In `twinlab`, data is expected to be in column-feature format, meaning each row represents a single data sample, and each column represents a data feature. 

Datasets must be uploaded with a `dataset_id` which is used to access them from the cloud.


In [None]:
x = [0.6964691855978616,
0.28613933495037946,
0.2268514535642031,
0.5513147690828912,
0.7194689697855631,
0.42310646012446096,
0.9807641983846155,
0.6848297385848633,
0.48093190148436094,
0.3921175181941505]

y = [-0.8173739564129022,
0.8876561174050408,
0.921552660721474,
-0.3263338765412979,
-0.8325176123242133,
0.4006686354731812,
-0.16496626502368078,
-0.9607643657025954,
0.3401149876855609,
0.8457949914442409]

# Creating the dataframe using the above arrays
df = pd.DataFrame({'x': x, 'y': y})

# View the dataset before uploading
display(df)

# Define the name of the dataset
dataset_id = "example_data"

# Upload the dataset to the cloud
tl.upload_dataset(df, dataset_id, verbose=True)

If your data is stored in a csv file the you must input the filepath string into `tl.upload_dataset`.

In [None]:
df_filepath = "example_data_folder/example_data.csv"

# Define the name of the dataset
dataset_id = "example_data"

# Upload the dataset to the cloud
tl.upload_dataset(df_filepath, dataset_id, verbose=True)

### **Train a campaign**

The `campaign` class is used to train and implement your surrogate models. As with datasets, an id is defined, this is what the model will be saved as in the cloud. When training a model the arguments are passed using a dictionary; here we name that dictionary `campaign_params`. To train the model we use the `tl.train_campaign` function, inputting the dictionary and `campaign_id`.

In [None]:
campaign_id = "example_campaign"

campaign_params = {
    "dataset_id": dataset_id,       # This points the campaign to the uploaded dataset
    "inputs": ["x"],                # Using the datasets column headers define the input and output data
    "outputs": ["y"],
    "test_train_ratio": 0.8         # Determine how much data is used for training, here 80% is used to train the model  
}                                   # and 20% is used to test it.     

# Start a new campaign and train a surrogate model
tl.train_campaign(campaign_params, campaign_id, verbose=True)

### **Using a predict campaign**

The surrogate model is now trained and saved to the cloud under the `campaign_id`. It can now be used to make predictions. First define a dataset of inputs for which you want to find outputs; ensure that this is a `pandas.DataFrame` object or a file path for a correctly formatted csv. Then call `tl.predict_campaign` with the keyword arguments being the evaluation dataset and the `campaign_id` of the model you wish to use.

In [None]:
# Define the inputs for the dataset
x_eval = np.linspace(0,1,128)

# Convert to a dataframe
df_eval = pd.DataFrame({'x':x_eval})
display(df_eval)

# Predict the results
df_mean, df_std = tl.predict_campaign(df_eval, campaign_id)

Alternatively the dataset can be input from a correctly formatted csv using its filepath.

In [None]:
# The define the file path of the csv
df_eval_filepath = "example_data_folder/example_eval_data.csv"

# Predict the results
df_mean, df_std = tl.predict_campaign(df_eval, campaign_id)

### **Viewing the results**

`tl.predict_campaign` outputs mean values for each input and their standard deviation; this gives the abilty to nicely visualise the uncertainty in results.


In [None]:
# Plot parameters
nsigs = [1, 2]
# nsigs = [0.674, 1.960, 2.576]
color = "blue"
alpha = 0.5
plot_training_data = True
plot_model_mean = True
plot_model_bands = True

# Plot results
grid = df_eval["x"]
mean = df_mean["y"]
err = df_std["y"]
if plot_model_bands:
    label = r"Model prediction"
    plt.fill_between(grid, np.nan, np.nan, lw=0, color=color, alpha=alpha, label=label)
    for isig, nsig in enumerate(nsigs):
        plt.fill_between(grid, mean-nsig*err, mean+nsig*err, lw=0, color=color, alpha=alpha/(isig+1))
if plot_model_mean:
    label = r"Model prediction" if not plot_model_bands else None
    plt.plot(grid, mean, color=color, alpha=alpha, label=label)
if plot_training_data:
    plt.plot(df["x"], df["y"], ".", color="black", label="Training data")
plt.xlim((0.0, 1.0))
plt.xlabel(r"$X$")
plt.ylabel(r"$y$")
plt.legend()
plt.show()

### **Deleteing datasets and campaigns**

To keep your cloud storage tidy you should delete your datasets and campaigns when you are finished with them. `tl.delete_campaign` and `tl.delete_dataset` only deletes them from the cloud storage.

In [None]:
# Delete dataset
tl.delete_dataset(dataset_id)

# Delete campaign
tl.delete_campaign(campaign_id)