# Verifying the MLOps environment on GCP

This notebook verifies the MLOps environment provisioned on GCP
1. Test using the local MLflow server in AI Notebooks instance in log entries to the Cloud SQL
2. Test deploying and running an Airflow workflow on Composer that uses MLflow server on GKE to log entries to the Cloud SQL

## 1. Running a local MLflow experiment
We implement a simple Scikit-learn model training routine, and examine the logged entries in Cloud SQL and produced articats in Cloud Storage through MLflow tracking.

In [None]:
import os
import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.linear_model import LogisticRegression
import pymysql

In [None]:
mlflow.set_tracking_uri("http://localhost:80")
mlflow_tracking_uri = mlflow.get_tracking_uri()
mlflow_artifact_uri = mlflow.get_artifact_uri()

print("MLflow tracking uri: {}".format(mlflow_tracking_uri)
print("MLflow articfacts store: {}".format(mlflow_artifact_uri)

### 1.1. Training a simple Scikit-learn model

In [None]:
experiment_id = mlflow.set_experiment("env-test")

with mlflow.start_run(experiment_id = experiment_id):
    X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
    y = np.array([0, 0, 1, 1, 1, 0])
    lr = LogisticRegression()
    lr.fit(X, y)
    score = lr.score(X, y)
    print("Score: %s" % score)
    mlflow.log_metric("score", score)
    mlflow.sklearn.log_model(lr, "model")
    print("Model saved in run %s" % mlflow.active_run().info.run_uuid)

### 1.2. Query the Mlfow entries from Cloud SQL

In [None]:
connection = pymysql.connect(
    host='127.0.0.1',
    port=3306,
    database='mlflow',
    user="root",
    passwd="mlflow"
)

### 1.3. List the artifacts in Cloud Storage

In [None]:
!gsutil ls {mlflow_artifact_uri}/notebooks

## 2. Submitting a workflow to Composer

We implement a one-step Airflow workflow that trains a Scikit-learn model, and examine the logged entries in Cloud SQL and produced articats in Cloud Storage through MLflow tracking.

In [None]:
COMOSER_NAME='ks-composer-dev'
REGION='us-central1'

### 2.1. Writing the Airflow workflow

In [None]:
%%writefile test-workflow.py

import airflow
import numpy as np
from datetime import timedelta
from airflow.operators import PythonOperator
from sklearn.linear_model import LogisticRegression

def train_model(**kwargs):
    print("Train lr model step started...")
    X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
    y = np.array([0, 0, 1, 1, 1, 0])
    lr = LogisticRegression()
    lr.fit(X, y)
    score = lr.score(X, y)
    print("Score: %s" % score)
    print("Train lr model step finished.")
    
default_args = {
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    'start_date': airflow.utils.dates.days_ago(0)
}

with airflow.DAG(
    'simple_sklearn_mlflow',
    default_args=default_args,
    schedule_interval=None,
    dagrun_timeout=timedelta(minutes=20)) as dag:
    
    train_model_op = PythonOperator(
        task_id='train_sklearn_model',
        provide_context=True,
        python_callable=train_model
    )

### 2.2. Uploading the Airflow workflow

In [None]:
!gcloud composer environments storage dags import \
  --environment {COMOSER_NAME}  --location {REGION} \
  --source test-workflow.py

In [None]:
!gcloud composer environments storage dags list \
  --environment {COMOSER_NAME}  --location {REGION}

### 2.3. Triggering the workflow

In [None]:
!gcloud composer environments run {COMOSER_NAME} \
    --location {REGION} trigger_dag -- simple_sklearn_mlflow

### 2.4. Query the Mlfow entries from Cloud SQL

### 2.5. List the artifacts in Cloud Storage

In [None]:
!gsutil ls {mlflow_artifact_uri}