## **Quickstart**

Get started with twinLab in five easy steps! 

In this Jupyter Notebook, you'll: 

1. [Get all set up with `twinLab`.](#1-setting-up-twinlab)
2. [Upload a dataset to your `twinLab` cloud account.](#2-upload-a-dataset)
3. [View datasets on your `twinLab` cloud account.](#3-view-datasets)
4. [Use `Emulator.train` to train your surrogate model.](#4-train-an-emulator)
5. [View emulators on your `twinLab` cloud account.](#5-view-emulators) 


Start by importing all the necessary packages you need for this tutorial.

In [1]:
from pprint import pprint

import twinlab as tl


          Version     : 2.13.0
          User        : michelle@digilab.co.uk
          Server      : https://twinlab.digilab.co.uk/v3
          Environment : /Users/michellebieger/Documents/digi/twinLab/tutorials/.env



### **1. Setting up twinLab**

First things first, set up your API key. If you don't already have one, visit [https://www.digilab.co.uk/contact](https://www.digilab.co.uk/contact). If you've forgotten it, visit the [Portal](https://portal.twinlab.ai/), where you can check what your API key is. 

For security and convenience, we normally recommend performing this step with a `.env` file or `secrets`. You will notice that when `twinlab` was imported above the `TwinLab Client` was initialised. If you have setup an `.env` file, you will notice your username appears in the client initialisation report. If you would like to setup an `.env` file, simply copy the `.env.example` file provided in this repository and set the `TWINLAB_USER` and `TWINLAB_API_KEY` equal to your username and API key, respectively. Note that you will need to restart your Jupyter kernel if you have modified your `.env` file after running the first cell of this notebook.


Alternatively, these variables can be set within your Python code. Uncomment the below code and plug in your twinLab username and API key. 


In [2]:
# tl.set_user("<YOUR_USERNAME>")
# tl.set_api_key("<YOUR_API_KEY>")
# tl.set_url("https://twinlab.digilab.co.uk/v3")

You can also view your user information. 

In [3]:
tl.user_information()

{'User': 'jamie@digilab.co.uk'}

Find versioning information with the function below.

In [4]:
tl.versions()

{'cloud': '3.2.0',
 'modal': '1.2.0',
 'library': '2.1.0',
 'image': 'twinlab-prod'}

### **2. Upload a dataset**

Your dataset must be presented as a `pandas.DataFrame` object, or a filepath which points to a `csv` file that can be parsed to a `pandas.DataFrame` object. 

Your dataset must be formatted with clearly labelled columns. In twinLab, data is expected to be in column-feature format, meaning each row represents a single data sample, and each column represents a data feature (also known as a parameter).

You can check out some example datasets to help you get started. 

In [5]:
# List example datasets included in the twinLab cloud
tl.list_example_datasets()

['tritium-desorption-small',
 'biscuits',
 'tritium-desorption',
 'advancedstart',
 'jet-confinement',
 'gardening',
 'quickstart',
 'tritium-desorption-temperature-grid']

In this tutorial, you'll use the example dataset, "quickstart", which has a single input variable "x" and a single output, or response, variable "y". You might see "input" features also referred to as feature, predictor, or parameter in data science circles. 

In [2]:
# Download the example dataset
df = tl.load_example_dataset("quickstart")

# Check the dataframe before uploading
display(df)

Unnamed: 0,x,y
0,0.696469,-0.817374
1,0.286139,0.887656
2,0.226851,0.921553
3,0.551315,-0.326334
4,0.719469,-0.832518
5,0.423106,0.400669
6,0.980764,-0.164966
7,0.68483,-0.960764
8,0.480932,0.340115
9,0.392118,0.845795


twinLab contains a `Dataset` class with attributes and methods to process, view, and summarise the dataset. You must create Datasets with an `id` which is used to access them. Using the `upload` method, you can then upload the Dataset to the twinLab cloud.

In [3]:
# Initialise a Dataset object and give it a name
dataset = tl.Dataset("quickstart")

# Upload the dataset, passing in the dataframe
dataset.upload(df)

### **3. View datasets**

Once you've uploaded a dataset, you can access it easily using the built-in twinLab functions. You can see a list of your uploaded datasets with the `tl.list_datasets` function. You can also see all the in-built example datasets in a similar way. 

In [8]:
# List all of your datasets on cloud
tl.list_datasets()

['quickstart']

You can view your datasets individually with the `Dataset.view()` function in twinLab. This function is a method of the `Dataset` object, and can only be used along with an instance of a `Dataset` object.

In [9]:
# View the dataset
dataset.view()

Unnamed: 0,x,y
0,0.696469,-0.817374
1,0.286139,0.887656
2,0.226851,0.921553
3,0.551315,-0.326334
4,0.719469,-0.832518
5,0.423106,0.400669
6,0.980764,-0.164966
7,0.68483,-0.960764
8,0.480932,0.340115
9,0.392118,0.845795


You can summarise an individual dataset with the `Dataset.summarise` function. Get an idea of your dataset's overall characteristics--like the range and spread (variance)--with a `pandas.DataFrame` that contains fundamental statistics of your dataset. 

In [10]:
# Get a column-wise statistical summary of the dataset
dataset.summarise()

Unnamed: 0,x,y
count,10.0,10.0
mean,0.544199,0.029383
std,0.229352,0.748191
min,0.226851,-0.960764
25%,0.399865,-0.694614
50%,0.516123,0.087574
75%,0.693559,0.734513
max,0.980764,0.921553


### **4. Train an emulator**

The `Emulator` class is used to train and implement your surrogate models. Just like with datasets, you define these emulators with an  `id`, which is a unique ID under which your emulator will be saved as in the twinLab cloud.

In [10]:
# Initialise emulator
emulator = tl.Emulator("quickstart-model")

When training an emulator, optional arguments are passed in using a `TrainParams` object. `TrainParams` contains parameters you can tweak when training your model - you can find the defaults in the documentation.

To train the emulator we use the `Emulator.train` method, passing in the `TrainParams` object as an argument.

In [11]:
params = tl.TrainParams(train_test_ratio=1.0,output_retained_dimensions=1)

In [12]:
# Define the training parameters for the emulator.
# For example, here we set the train_test_ratio to 1, meaning that the entire dataset will be used for training.


# Train the emulator using the train method
emulator.train(dataset=dataset, inputs=["x"], outputs=["y"], params=params)

Emulator 'quickstart-model' has begun training.
0:00:00: Job status: processing
0:00:01: Job status: success
Training of emulator quickstart-model is complete!


### **5. View emulators**

Just as with datasets, all saved emulators can be listed using the `tl.list_emulators` function.

In [13]:
# List emulators
tl.list_emulators()

['quickstart-model']

You can also view all the arguments required to create an emulator and the values specified for each of these using the `Emulator.view()` function. 

The output with this function is a dictionary with all the arguments and their corresponding values as initialised by you when you trained or created the emulator (or the default values, if you didn't specify).

In [14]:
# View an emulator's parameters
emulator.view()

{'meta_data': {'author': 'jamie@digilab.co.uk',
  'version': '3.2.0',
  'campaign': 'personal',
  'description': 'A twinLab emulator.',
  'organization': 'digiLab',
  'timestamp': '2024-08-27 13:49:02'},
 'emulator_params': {'inputs': ['x'],
  'outputs': ['y'],
  'estimator': 'gaussian_process_regression',
  'estimator_params': {'detrend': False,
   'kernel': None,
   'estimator_type': 'single_task_gp'},
  'fidelity': None,
  'class_column': None,
  'decompose_inputs': False,
  'decompose_outputs': False,
  'input_explained_variance': None,
  'input_retained_dimensions': None,
  'output_explained_variance': None,
  'output_retained_dimensions': None},
 'training_params': {'dataset_id': 'quickstart',
  'dataset_std_id': None,
  'train_test_ratio': 1.0,
  'model_selection': False,
  'model_selection_kwargs': {'seed': None,
   'evaluation_metric': 'MSLL',
   'val_ratio': 0.2,
   'base_kernels': 'restricted',
   'depth': 1,
   'beam': 2},
  'shuffle': True,
  'seed': 42}}

You can also find a summary of all the statistical details of your emulator with the `Emulator.summarise()` function. This function will give you all the information you need to know about the emulator you've trained, including the learnt parameters of the kernel function, mean function, noise variances, and much more useful information about your emulator.

To get a very detailed summary of your emulator, set the `detailed` parameter to `True`.

In [15]:
# View the status of a campaign
pprint(emulator.summarise())

{'kernel': {'kernel_function_used': 'ScaleKernel(  (base_kernel): '
                                    'MaternKernel(    (lengthscale_prior): '
                                    'GammaPrior()    '
                                    '(raw_lengthscale_constraint): Positive()  '
                                    ')  (outputscale_prior): GammaPrior()  '
                                    '(raw_outputscale_constraint): Positive())',
            'lengthscale': [[0.4234508763532827]],
            'outputscale': 1.7115599219254776},
 'mean': {'mean': 0.21062309969363022, 'mean_function_used': 'ConstantMean()'},
 'properties': {'covariance_noise': [0.030319220370676556]}}


### **Deleting datasets and campaigns**

You can delete your datasets and emulators to keep your twinLab cloud account storage tidy. 

Please be aware that this is permanent. Always consider keeping your data locally backed up. 

`Emulator.delete` and `Dataset.delete` delete the emulator and the dataset respectively.


In [16]:
# Delete the emulator
emulator.delete()

# Delete the dataset
dataset.delete()