# Metaflow and the MLOps ecosystem

_Human-centricity_ is a foundational principle of Metaflow. As a result, MF strives to be compatible with all the other ML tools that you already use (and ones you may want to use!). In this lesson, we'll show how to incorporate 2 _types of tools_, those for 
* experiment tracking and
* data validation.

We'll be using Weights & Biases for the former and Great Expectations for the latter, but keep in mind that Metaflow is agnostic with respect to the other tools you use. Let's jump in:

## Experiment Tracking

[TO-DO: provide brief intro to experiment tracking]

Note that I've already logged into wandb using my terminal. 

[TO DO: include instructions on this, or a link, or instructions on putting credentials as env vars]

In [10]:
%%writefile ../flows/rf_flow_monitor.py
from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card
import json

class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets
        from sklearn.model_selection import train_test_split

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.labels = self.iris['target_names']

        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, test_size=0.2)
        self.next(self.rf_model)
        

    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
        
    @step
    def train(self):
        """
        Train the model
        """
        import wandb
        from sklearn.model_selection import cross_val_score
        self.clf.fit(self.X_train, self.y_train)
        self.y_pred = self.clf.predict(self.X_test)
        self.y_probs = self.clf.predict_proba(self.X_test)
        self.next(self.monitor)
        

    
        
    @step
    def monitor(self):
        """
        plot some things using an experiment tracker
        
        """
        import wandb
        wandb.init(project="mf-rf-wandb", entity="hugobowne", name="mf-tutorial-iris")

        wandb.sklearn.plot_class_proportions(self.y_train, self.y_test, self.labels)
        wandb.sklearn.plot_learning_curve(self.clf, self.X_train, self.y_train)
        wandb.sklearn.plot_roc(self.y_test, self.y_probs, self.labels)
        wandb.sklearn.plot_precision_recall(self.y_test, self.y_probs, self.labels)
        wandb.sklearn.plot_feature_importances(self.clf)

        wandb.sklearn.plot_classifier(self.clf, 
                              self.X_train, self.X_test, 
                              self.y_train, self.y_test, 
                              self.y_pred, self.y_probs, 
                              self.labels, 
                              is_binary=True, 
                              model_name='RandomForest')

        wandb.finish()
        self.next(self.end)
    
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()


Overwriting ../flows/rf_flow_monitor.py


Execute the above from the command line with

```bash
! python ../flows/rf_flow_monitor.py run
```

In [11]:
! python ../flows/rf_flow_monitor.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-23 10:58:51.705 [0m[1mWorkflow starting (run-id 1647993531702023):[0m
[35m2022-03-23 10:58:51.714 [0m[32m[1647993531702023/start/1 (pid 2066)] [0m[1mTask is starting.[0m
[35m2022-03-23 10:58:53.619 [0m[32m[1647993531702023/start/1 (pid 2066)] [0m[1mTask finished successfully.[0m
[35m2022-03-23 10:58:53.628 [0m[32m[1647993531702023/rf_model/2 (pid 2075)] [0m[1mTask is starting.[0m
[35m2022-03-23 10:58:54.597 [0m[32m[1647993531702023/rf_model/2 (pid 2075)] [0m[1mTask finished successfully.[0m
[35m2022-03-23 10:58:54.607 [0m[32m[1647993531702023/train/3 (pid 2079)] [0m[1mTa

In [12]:
%wandb hugobowne/mf-rf-wandb