# Experiment Template <a class="tocSkip">

To use this template, please use the [file tree](./) to duplicate this notebook and move it to the folder of your repository. The following sections contain only exemplary content, please adapt and change based on your experiment implementation.

**In this notebook:**

* Describe your notbeook here in a few bullet points, e.g.:
* Method xyz on dataset abc --> Key insight: xyz works pretty well
* Modification zyx --> Dead end

**Todo:**

* List all todos that are related to this notebook here, e.g.:
* Apply xyz to another dataset

This could be some more general information on method xyz (e.g. a link to a paper).

## Dependencies
Install, load, and initialize all required dependencies for this experiment.

### Install Dependencies
- _Please use a Python 3 kernel for the notebook_

In [None]:
# Install any packages that are not included in the workspace.
# It should be possible to run the notebook independent of anything else. 
# If dependency cannot be installed via pip, either:
# - download & install it via %%bash
# - atleast mention those dependecies in this section
import sys

# sys.executable points to the python that is running in your kernel 
!{sys.executable} -m pip install -q sklearn 

### Import Dependencies

In [None]:
# System libraries
from __future__ import absolute_import, division, print_function
import logging, os, sys

# Enable logging
logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.INFO, stream=sys.stdout)

# Re-import packages if they change
%load_ext autoreload
%autoreload 2

# Intialize tqdm to always use the notebook progress bar
import tqdm
tqdm.tqdm = tqdm.tqdm_notebook

# Third-party libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams["figure.figsize"] = (12,6)
%config InlineBackend.figure_format='retina'  # adapt plots for retina displays

# Lab libraries
from lab_client import Environment

### Initialize Environment

In [None]:
# Initialize environment
env = Environment(project="test",  # Lab project you want to work on
                  # Only required in stand-alone workspace deployments
                  # lab_endpoint="LAB_ENDPOINT", # Lab endpoint url: e.g. http://10.2.3.45:8091
                  # lab_api_token="LAB_API_TOKEN"
                 ) 

# Initialize experiment
exp = env.create_experiment('Experiment Template')

## Load Data
Download, explore, and prepare all required data for the experiment in this section.

### Download & Read Data

In [None]:
# Get data from remote storage of ML Lab only if it does not exist locally
dataset_path = env.get_file('YOUR_DATASET_KEY')

# Read data into basic datastructures (e.g. dict, list, dataframe). E.g. csv via pandas:
# For example: read csv via pandas
df = pd.read_csv(dataset_path, sep=";")

### Optional: Explore Data

In [None]:
# Do data exploration, statistics visualization (pandas profiling, qgrid, facets...)
# For example: pandas profiling
import pandas_profiling
pandas_profiling.ProfileReport(df)

### Transform Data

In [None]:
# Configure data dataset configuration
dataset_config = {
    'test_size':0.20
}

# Add dataset configuration to experiment parameters
exp.log_params(dataset_config)

# Data preprocessing
# <YOUR DATA PREPROCESSING CODE HERE>

# Split the dataset into train (80%), and test (20%) based on dataset configuration
train_df, test_df = np.split(df.sample(frac=1, random_state=1), [int(1-dataset_config['test_size']*len(df))])

print('Train corpus size: '+str(len(train_df)))
print('Test corpus size: '+str(len(test_df)))

# add dataframes to experiment (will be logged and accesible within the experiment)
exp.add_artifact("train_data", train_df)
exp.add_artifact("test_data", test_df)

## Train Model
Implementation, configuration, and evaluation of the experiment.

### Define Experiment

In [None]:
# Define a function with the required code to run the experiment (e.g. train model)
def train(exp, params, artifacts):
    # the parameters will be automatically provided when running via run_exp but are not required
    # exp (= Experiment instance)
    # params (= parameter dictonary) 
    # artifacts (= dictionary of added artifacts)
    
    # Get artifacts for the experiment run
    train_df = artifacts["train_data"]
    test_df = artifacts["test_data"]
    
    # Experiment Implementation
    # <YOUR EXPERIMENT CODE HERE>
    # model_instance = <THE TRAINED MODEL INSTANCE>
    
    # Use experiment to get a path to store the trained model within the dedicated experiment folder
    model_path = exp.create_file_path("trained.model")
    # <SAVE YOUR ARTIFACTS HERE>
    
    # Add trained model instance to experiment, so it can accessed after the experiment run is finished
    # exp.add_artifact("trained_model", model_instance)
    
    # Evaluate trained model
    score = 1
    
    # log a metric to the current experiment
    # <LOG YOUR METRICS HERE>
    exp.log_metric("accuracy", score)
    
    # optional: return the most descriptive metric for the experiment (main objective of the experiment)
    return score

### Run Experiment

In [None]:
# Define parameter configuration for experiment run
params = {
    'param': 0, # value should be string, int or float
}

# Run experiment and sync all metadata
exp.run_exp(train, params)

### Optional: Evaluate Model

In [None]:
# Do evaluation, e.g. visualisations  

## Deploy Model
Wrap the model with the Unified Model API and upload it to the remote storage.

### Create Unified Model
You can find information on how to create a self-contained executable model file in the [unified model library](https://github.com/SAP/machine-learning-lab/tree/master/libraries/unified-model).

In [None]:
# Create unified model instance here

### Upload Unified Model

In [None]:
env.upload_file(model_path, data_type="model")

## Further Information
This section provides some additional information and guidelines for building high-quality reusable notebooks. **Please remove this section** for your experiment.

### Guidelines

- All cells should be executable in order (with run all and restart & run all).
- Every notebook should be self-contained and executable without any prior knowledge. 
- Frequently rewrite each cell logic into functions. These functions can be moved to separate `.py` files on regular intervals. Your notebook run should be mainly function calls. This would prevent your notebook from becoming a giant pudding of several global variables.

### Naming Conventions for Headings

#### Option 1: First Implementation

In [None]:
# Execute either this...

#### Option 2: Second Implementation

In [None]:
# ...or this.

#### Optional: Whatever

In [None]:
# Execute this for an optional feature.