<div align="center">
<h1><img width="30" src="https://madewithml.com/static/images/rounded_logo.png">&nbsp;<a href="https://madewithml.com/">Made With ML</a></h1>
Applied ML · MLOps · Production
<br>
Join 30K+ developers in learning how to responsibly <a href="https://madewithml.com/about/">deliver value</a> with ML.
    <br>
</div>

<br>

<div align="center">
    <a target="_blank" href="https://newsletter.madewithml.com"><img src="https://img.shields.io/badge/Subscribe-30K-brightgreen"></a>&nbsp;
    <a target="_blank" href="https://github.com/GokuMohandas/MadeWithML"><img src="https://img.shields.io/github/stars/GokuMohandas/MadeWithML.svg?style=social&label=Star"></a>&nbsp;
    <a target="_blank" href="https://www.linkedin.com/in/goku"><img src="https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social"></a>&nbsp;
    <a target="_blank" href="https://twitter.com/GokuMohandas"><img src="https://img.shields.io/twitter/follow/GokuMohandas.svg?label=Follow&style=social"></a>
    <br>
    🔥&nbsp; Among the <a href="https://github.com/topics/mlops" target="_blank">top MLOps</a> repositories on GitHub
</div>

<br>
<hr>

# Optimize (GPU)

Use this notebooks to run hyperparameter optimization on Google Colab and utilize it's free GPUs.

## Clone repository

In [1]:
# Load repository
!git clone https://github.com/GokuMohandas/MLOps.git mlops

Cloning into 'mlops'...
remote: Enumerating objects: 597, done.[K
remote: Counting objects: 100% (597/597), done.[K
remote: Compressing objects: 100% (363/363), done.[K
remote: Total 597 (delta 284), reused 491 (delta 185), pack-reused 0[K
Receiving objects: 100% (597/597), 3.01 MiB | 17.05 MiB/s, done.
Resolving deltas: 100% (284/284), done.


In [2]:
# Files
% cd mlops
!ls

/content/mlops
app	    docs		mkdocs.yml	README.md	  streamlit
config	    great_expectations	model		requirements.txt  tagifai
data	    LICENSE		notebooks	setup.py	  tests
Dockerfile  Makefile		pyproject.toml	stores


## Setup

In [None]:
# Install Python 3.7
!apt-get install python3.7
!python3.7 --version

In [None]:
# Set up
!python3.7 -m pip install --upgrade pip
!python3.7 -m pip install -e . --no-cache-dir

# Download data

We're going to download data directly from GitHub since our blob stores are local. But you can easily load the correct data versions from your cloud blob store using the *.json.dvc pointer files in the [data directory](https://github.com/GokuMohandas/MLOps/tree/main/data).

In [5]:
from tagifai import main

In [6]:
# Load auxiliary data
main.load_data()

[04/09/21 20:19:43] INFO     ✅ Data downloaded!                       cli.py:49


In [7]:
# Check if data downloaded
!ls data

projects.json  projects.json.dvc  tags.json  tags.json.dvc


# Compute features

In [None]:
# Compute features
main.compute_features()

In [None]:
# Computed features
!ls data

## Optimize

Now we're going to perform hyperparameter optimization using the objective and parameter distributions defined in the [main script](https://github.com/GokuMohandas/MLOps/blob/main/tagifai/main.py). The best parameters will be written to [config/params.json](https://raw.githubusercontent.com/GokuMohandas/MLOps/main/config/params.json) which will be used to train the best model below.

In [None]:
# Optimize
main.optimize(num_trials=100)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[01/26/21 18:00:18] INFO     Epoch: 8 | train_loss: 0.00160,        train.py:146
                             val_loss: 0.00169, lr: 4.35E-04,                   
                             _patience: 10                                      
[01/26/21 18:00:22] INFO     Epoch: 9 | train_loss: 0.00142,        train.py:146
                             val_loss: 0.00164, lr: 4.35E-04,                   
                             _patience: 10                                      
[01/26/21 18:00:26] INFO     Epoch: 10 | train_loss: 0.00125,       train.py:146
                             val_loss: 0.00161, lr: 4.35E-04,                   
                             _patience: 10                                      
[01/26/21 18:00:30] INFO     Epoch: 11 | train_loss: 0.00104,       train.py:146
                             val_loss: 0.00159, lr: 4.35E-04,                   
                             _patience: 10  

# Train

Once we're identified the best hyperparameters, we're ready to train our best model and save the corresponding artifacts (label encoder, tokenizer, etc.). Note that our best parameters from optimization are saved in [config/params.json](https://raw.githubusercontent.com/GokuMohandas/MLOps/main/config/params.json) (or if you passed in a different params file for optimization).

In [8]:
# Train best model
main.train_model()

INFO: 'best' does not exist. Creating a new experiment
[04/09/21 20:26:42] INFO     Parameters: {                           main.py:101
                               "seed": 1234,                                    
                               "cuda": true,                                    
                               "shuffle": true,                                 
                               "num_samples": null,                             
                               "min_tag_freq": 30,                              
                               "lower": true,                                   
                               "stem": false,                                   
                               "train_size": 0.7,                               
                               "char_level": true,                              
                               "max_filter_size": 10,                           
                               "batch_size": 128,     

100%|██████████| 217/217 [00:00<00:00, 23915.18it/s]


[04/09/21 20:27:54] INFO     {                                        cli.py:125
                               "precision": 0.8277560698107038,                 
                               "recall": 0.6042553191489362,                    
                               "f1": 0.6802454998163262,                        
                               "num_samples": 217.0                             
                             }                                                  


# Change metadata

In order to transfer our trained model and it's artifacts to our local model registry, we should change the metadata to match.

In [29]:
from pathlib import Path
from config import config
import yaml

In [30]:
def change_artifact_metadata(fp):
    with open(fp) as f:
        metadata = yaml.load(f)
    for key in ["artifact_location", "artifact_uri"]:
        if key in metadata:
            metadata[key] = metadata[key].replace(
                str(config.MODEL_REGISTRY), model_registry)
    with open(fp, "w") as f:
        yaml.dump(metadata, f)

In [31]:
# Location of your model store
model_registry = "/Users/goku/Documents/madewithml/mlops/stores/model"

In [32]:
# Change metadata in all meta.yaml files
experiment_dir = Path(config.MODEL_REGISTRY, "1")
for fp in list(Path(experiment_dir).glob("**/meta.yaml")):
    change_artifact_metadata(fp=fp)

## Download

Download and transfer the trained model's files to your local model registry. If you existing runs, just transfer that run's directory.

In [22]:
from google.colab import files

In [33]:
# Download
!zip -r model.zip model
!zip -r run.zip stores/model/1
files.download("run.zip")

  adding: stores/model/1/ (stored 0%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/ (stored 0%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/metrics/ (stored 0%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/metrics/recall (deflated 3%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/metrics/precision (stored 0%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/metrics/best_val_loss (stored 0%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/metrics/slices_f1 (deflated 23%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/metrics/f1 (deflated 3%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/metrics/behavioral_score (deflated 37%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/params/ (stored 0%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/params/batch_size (stored 0%)
  adding: stores/model/1/dce5cc211fbb474e9b86af40939be0ca/params/seed (stored 0%)
  adding: stores/model/1/

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>