# About

Here we will create experiments with different configuration and save them as a tags.   
Also we will try to add new features - new experiment.

Then it will be possible to show and compare metrics for different experiments using _DVC_

# Prerequisites

Make sure that you performed the stage3 (_notebook step3_pipelines_automation.ipynb_) and DVC pipeline exists

# Set PYTHONPATH

Inside project root:

```bash
export PYTHONPATH=.
```

# Experiment 1 - Tune LogisticRegression

## create branch for experiment

```bash
git checkout -b exp1-tune-logreg
git branch
```

## update config/pipeline_config.yml file

add options for **C** hyperparamter in section `train:estimators:logreg:param_grid`:


```yaml
...
        param_grid:
              C: [0.1,1.0,10]
...
```

as result you should have LogisticRegression config:

```yaml
...
train:
  cv: 3
  estimator_name: logreg

  estimators:

    logreg: # sklearn.linear_model.LogisticRegression
      param_grid: # params of GridSearchCV constructor
        C: [0.1,1.0,10]
        max_iter: [100]
        solver: ['lbfgs']
        multi_class: ['multinomial']
...

```


## Run experiment and save results 

Reproduce pipeline with new params

```bash
dvc repro 
```

## Commit experiment results

```bash
git add .
git commit -m "Experiment 1 with LogisticRegression hyperparameters"
git tag -a "exp1_tune_logreg" -m "Experiment 1 with LogisticRegression hyperparameters"
```

## Show metrics 

```bash
dvc metrics show
```

## Merge results 

```bash
git checkout experiments
git merge exp1-tune-logreg && git branch -d exp1-tune-logreg
git branch
git tag --list
```


## checkout the specific tag

```bash
git checkout exp1_tune_logreg
git branch
```

## checkout back to `experiments`:

```bash
git checkout experiments
```

# Experiment 2 - Use SVM

## create branch for experiment

```bash
git checkout -b exp2-svm
git branch
```

## add SVC config to config/pipeline_config.yml file

in section `train:estimators:logreg:param_grid`:


```yaml
...
        param_grid:
              C: [0.1,1.0,10]
...
```

as result you should have SVC config:

```yaml
...
train:
  cv: 3
  estimator_name: svm
  estimators:
        
    svm: # sklearn.svm.SVC
      param_grid:
        C: [0.1,1.0,10]
        kernel: ["rbf", "linear"]
        gamma: ["scale"]
        degree: [3, 5]
...
```


## Run experiment and save results 

Reproduce pipeline with new params

```bash
dvc repro
```

## Commit experiment results

```bash
git add .
git commit -m "Experiment 2 with SVM estimator"
git tag -a "exp2_svm" -m "Experiment 2 with SVM estimator"
```

## Show metrics 

```bash
dvc metrics show
```

## Merge results 

```bash
git checkout experiments
git merge exp2-svm && git branch -d exp2-svm
git branch
```

# Experiment 3 - Add new features

## create branch for experiment

```bash
git checkout -b exp3-squared-features
git branch
```

## Edit featurization code

* open module `src/features/features.py`;
* in function `extract_features` after lines:

```python
    dataset['sepal_length_to_sepal_width'] = dataset['sepal_length'] / dataset['sepal_width']
    dataset['petal_length_to_petal_width'] = dataset['petal_length'] / dataset['petal_width']
```

add lines:

```python

    # Add new features - squares of old features
    for col in dataset.drop('target', axis=1).columns:
        dataset[col+'_squared'] = dataset[col] ** 2
    
```
;

* then replace lines:

```python
    
    dataset = dataset[[
        'sepal_length', 'sepal_width', 'petal_length', 'petal_width',
        'sepal_length_to_sepal_width', 'petal_length_to_petal_width',
        'target'
    ]]
```

by lines:

```python
    target = dataset.pop('target')
    dataset['target'] = target
```

## Reproduce pipeline with new params

```bash
dvc repro -f
```

## Commit experiment results

```bash
git add .
git commit -m "Experiment 3 with new features"
git tag -a "exp3_squared_features" -m "Experiment 3 with squared features"
```

## Show metrics 

```bash
dvc metrics show
```

## Merge results 

```bash
git checkout experiments
git merge exp3-squared-features && git branch -d exp3-squared-features
git branch
```

## Push artifacts to DVC remote

```bash
dvc push
```

# Compare experiments

## List experiments

```bash
git tag --list
```

## Select experiment (tag)

```bash
git checkout exp2_svm
```

## Reproduce

```bash
dvc repro
```

## checkout back to `experiments`:

```bash
git checkout experiments
```

## View and compare metrics

### Last experiment metrics:

```bash
dvc metrics show
```

### View and compare metrics for all experiments:

```bash
dvc metrics show -a
```

### control metrics view 

```bash
dvc metrics show --show-json -a
```

### View and compare metrics for all tags:

```bash
dvc metrics show -T
```

### Try yourself: Use KNN estimator

#### TODO 
- Make experiment with estimator kNN like with SMV;
- use your version fo param_grid
