SKLL can be used in 2 ways -

---
1. *Command Line*
    - Get data in [SKLL compatible format](https://skll.readthedocs.io/en/latest/run_experiment.html#file-formats).
    - Create a [python configuration file](https://skll.readthedocs.io/en/latest/run_experiment.html#create-config).
    - Run the experiment using [run_experiment](https://skll.readthedocs.io/en/latest/run_experiment.html) command.
    - Examine results using the several [utility](https://skll.readthedocs.io/en/latest/utilities.html) commands provided.
---    
2. *Python API*

# Command Line

In [None]:
!pwd

In [None]:
!ls

### Dataset Manipulation

We shall be using the IRIS dataset for this simple tutorial. It is a simple 3-Class Classification using a single set of 4 features.

The utility python script *make_iris_example_data.py* downloads the IRIS dataset from scikit-learn and pre-processes it to make train, test sub-directories within the *iris* directory. 

Each of the generated sub-directories (*iris/train* and *iris/test*) contains a feature file in SKLL compatible *jsonlines* format.

In [None]:
!python3 make_iris_example_data.py

In [None]:
import os

def list_files(startpath):
    for root, dirs, files in os.walk(startpath):
        level = root.replace(startpath, '').count(os.sep)
        indent = ' ' * 4 * (level)
        print('{}{}/'.format(indent, os.path.basename(root)))
        subindent = ' ' * 4 * (level + 1)
        for f in sorted(files):
            print('{}{}'.format(subindent, f))

list_files('iris')

In [None]:
!head -5 iris/train/example_iris_features.jsonlines

The *[skll_convert](https://skll.readthedocs.io/en/latest/utilities.html#skll-convert)* command can be used to convert between [SKLL feature file formats](https://skll.readthedocs.io/en/latest/run_experiment.html#feature-file-formats). 

In [None]:
!skll_convert iris/train/example_iris_features.jsonlines iris/train/example_iris_features.csv 
print()
!ls iris/train
print()
!head -5 iris/train/example_iris_features.csv

### Configuration File

At the core of SKLL experiments is the configuration file which is executed with the *run_experiment* command. 
SKLL configuration files are standard Python configuration files (similar in format to Windows INI files).

The 4 expected sections in a configuration file are :
1. [General](https://skll.readthedocs.io/en/latest/run_experiment.html#general)
    - Defines *experiment_name* and *task* (both compulsory fields)
    - 4 tasks are supported :
        1. cross_validate
        2. evaluate
        3. predict
        4. learning curve
2. [Input](https://skll.readthedocs.io/en/latest/run_experiment.html#input)
    - Defines the *learners* list (compulsory)
    - Additionally, one of *train_directory* or *train_file* field must be defined.
    - All other fields are optional.
3. [Tuning](https://skll.readthedocs.io/en/latest/run_experiment.html#tuning)
    - Contains fields related to tuning the models such as *objectives*, *grid_search* etc.
    - All the fields in this section are optional.
4. [Output](https://skll.readthedocs.io/en/latest/run_experiment.html#output)
    - Contains fields related to output post model training such as *probability*, *metrics*, *results* etc.
    - All the fields in this section are optional.
    
    
An example config file for the IRIS dataset is shown here.

In [None]:
with open('iris/cross_val.cfg', 'r') as config_file:
    print(config_file.read())

### run_experiment

After defining the configuration file, we can use the [run_experiment CONFIGURATION_FILE](https://skll.readthedocs.io/en/latest/run_experiment.html#using-run-experiment) command. Although most of the parameters are defined in the config file, some are passed as arguments to *run_experiment* (--ablation, --local etc.).

Here we try out the cross validation configuration shown earlier.

In [None]:
!run_experiment --local --verbose iris/cross_val.cfg

### Analysing Output

In [None]:
list_files('iris')

In [None]:
!cat iris/output/Iris_CV_example_iris_LogisticRegression.results

In [None]:
!head -5 iris/output/Iris_CV_example_iris_LogisticRegression_predictions.tsv

In [None]:
!cat iris/output/Iris_CV_example_iris_LogisticRegression.log

In [None]:
import pandas as pd

summary_df = pd.read_csv('iris/output/Iris_CV_summary.tsv', sep='\t')
print(summary_df.columns)

In [None]:
print(summary_df[['learner_name', 'accuracy', 'score', 'fold', 'featureset_name']])

### Saving Models

Modifying the cross_validation configuration to *train* task and saving models.

In [None]:
import configparser

template_config = configparser.ConfigParser()
template_config.read('iris/cross_val.cfg')
print(template_config.sections())
template_config.set('General', 'task', 'train')
template_config.set('Output', 'models', 'iris/models')

with open('iris/train.cfg', 'w') as configfile:
    template_config.write(configfile)

In [None]:
!run_experiment --local --verbose iris/train.cfg

In [None]:
!list_files('iris')