In [12]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# MLflow Classification Recipe Notebook

This notebook runs the MLflow Classification Recipe on Databricks and inspects its results. For more information about the MLflow Classification Recipe, including usage examples, see the [Classification Recipe overview documentation](https://mlflow.org/docs/latest/recipes.html#classification-recipe) the [Classification Recipe API documentation](https://mlflow.org/docs/latest/python_api/mlflow.recipes.html#module-mlflow.recipes.classification.v1.recipe).

In [13]:
from mlflow.recipes import Recipe

r = Recipe(profile="local")

2022/11/12 01:00:29 INFO mlflow.recipes.recipe: Creating MLflow Recipe 'recipes-classification-template' with profile: 'local'


In [14]:
r.inspect()

In [15]:
r.run("ingest")

2022/11/12 01:00:30 INFO mlflow.recipes.step: Running step ingest...


name,type
mean radius,number
mean texture,number
mean perimeter,number
mean area,number
mean smoothness,number
mean compactness,number
mean concavity,number
mean concave points,number
mean symmetry,number
mean fractal dimension,number

mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.0186,0.0134,0.01389,0.003532,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,0.7456,0.7869,4.585,94.03,0.00615,0.04006,0.03832,0.02058,0.0225,0.004571,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,0.4956,1.156,3.445,27.23,0.00911,0.07458,0.05661,0.01867,0.05963,0.009208,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.01149,0.02461,0.05688,0.01885,0.01756,0.005115,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [16]:
r.run("split")

2022/11/12 01:00:30 INFO mlflow.recipes.utils.execution: ingest: No changes. Skipping.


2022/11/12 01:00:31 INFO mlflow.recipes.step: Running step split...


In [17]:
training_data = r.get_artifact("training_data")
training_data.describe()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
count,410.0,410.0,410.0,410.0,410.0,410.0,410.0,410.0,410.0,410.0,...,410.0,410.0,410.0,410.0,410.0,410.0,410.0,410.0,410.0,410.0
mean,14.244073,19.391293,92.752683,663.897805,0.096651,0.105241,0.089447,0.049832,0.181465,0.06277,...,25.797415,108.08639,891.061707,0.132701,0.25882,0.275265,0.116329,0.29137,0.084303,0.614634
std,3.465521,4.352865,23.919776,347.986779,0.014211,0.053345,0.078927,0.038719,0.027456,0.007046,...,6.311756,33.001822,557.259419,0.02276,0.164261,0.210304,0.066064,0.063839,0.01867,0.487276
min,7.691,9.71,47.92,170.4,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,12.02,54.49,223.6,0.07117,0.02729,0.0,0.0,0.1565,0.05504,0.0
25%,11.8175,16.2175,76.11,429.65,0.086575,0.065307,0.0295,0.020682,0.161925,0.057485,...,21.065,85.1,522.0,0.1172,0.1485,0.116475,0.065028,0.25005,0.071557,0.0
50%,13.535,19.04,87.48,564.2,0.095895,0.095275,0.066145,0.037135,0.17995,0.0614,...,25.485,99.165,705.9,0.1316,0.21485,0.22655,0.1016,0.2831,0.080075,1.0
75%,15.7725,21.9075,104.025,782.2,0.1054,0.1306,0.13225,0.074022,0.1966,0.06621,...,30.2175,125.775,1083.5,0.1467,0.342575,0.38755,0.1662,0.3205,0.092202,1.0
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.2906,0.09575,...,49.54,251.2,4254.0,0.2226,1.058,1.17,0.291,0.5774,0.2075,1.0


In [18]:
r.run("transform")

2022/11/12 01:00:32 INFO mlflow.recipes.utils.execution: ingest, split: No changes. Skipping.


2022/11/12 01:00:32 INFO mlflow.recipes.step: Running step transform...


Name,Type
mean radius,float64
mean texture,float64
mean perimeter,float64
mean area,float64
mean smoothness,float64
mean compactness,float64
mean concavity,float64
mean concave points,float64
mean symmetry,float64
mean fractal dimension,float64

Name,Type
mean radius,float64
mean texture,float64
mean perimeter,float64
mean area,float64
mean smoothness,float64
mean compactness,float64
mean concavity,float64
mean concave points,float64
mean symmetry,float64
mean fractal dimension,float64

mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.0186,0.0134,0.01389,0.003532,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,0.7456,0.7869,4.585,94.03,0.00615,0.04006,0.03832,0.02058,0.0225,0.004571,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.01149,0.02461,0.05688,0.01885,0.01756,0.005115,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0
12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,0.2087,0.07613,0.3345,0.8902,2.217,27.19,0.00751,0.03345,0.03672,0.01137,0.02165,0.005082,15.47,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244,0


In [19]:
r.run("train")

2022/11/12 01:00:33 INFO mlflow.recipes.utils.execution: ingest, split, transform: No changes. Skipping.


2022/11/12 01:00:33 INFO mlflow.recipes.step: Running step train...
2022/11/12 01:00:34 INFO mlflow.recipes.steps.train: Training data has less than 5000 rows, skipping rebalancing.
[flaml.automl: 11-12 01:00:34] {2600} INFO - task = classification
[flaml.automl: 11-12 01:00:34] {2602} INFO - Data split method: stratified
[flaml.automl: 11-12 01:00:34] {2605} INFO - Evaluation method: cv
[flaml.automl: 11-12 01:00:34] {2727} INFO - Minimizing error metric: 1-accuracy
[flaml.automl: 11-12 01:00:34] {2869} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'lrl1']
[flaml.automl: 11-12 01:00:34] {3164} INFO - iteration 0, current learner lgbm
[flaml.automl: 11-12 01:00:34] {3297} INFO - Estimated sufficient time budget=264s. Estimated necessary time budget=6s.
[flaml.automl: 11-12 01:00:34] {3344} INFO -  at 0.0s,	estimator lgbm's best error=0.0902,	best estimator lgbm's best error=0.0902
[flaml.automl: 11-12 01:00:34] {3164} INFO - iterati

Metric,training,validation
accuracy_score,1.0,0.945205
custom_metric,0.5,0.5
example_count,410.0,73.0
f1_score,1.0,0.955556
false_negatives,0.0,2.0
false_positives,0.0,2.0
log_loss,0.000220952,0.166907
precision_recall_auc,1.0,0.995552
precision_score,1.0,0.955556
recall_score,1.0,0.955556

Name,Type
mean radius,double
mean texture,double
mean perimeter,double
mean area,double
mean smoothness,double
mean compactness,double
mean concavity,double
mean concave points,double
mean symmetry,double
mean fractal dimension,double

Name,Type
-,"Tensor('int64', (-1,))"

absolute_error,prediction,target,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
False,1,1,7.76,24.54,47.92,181.0,0.05263,0.04362,0.0,0.0,0.1587,0.05884,0.3857,1.428,2.548,19.15,0.007189,0.00466,0.0,0.0,0.02676,0.002783,9.456,30.37,59.16,268.6,0.08996,0.06444,0.0,0.0,0.2871,0.07039
False,1,1,9.72,18.22,60.73,288.1,0.0695,0.02344,0.0,0.0,0.1653,0.06447,0.3539,4.885,2.23,21.69,0.001713,0.006736,0.0,0.0,0.03799,0.001688,9.968,20.83,62.25,303.8,0.07117,0.02729,0.0,0.0,0.1909,0.06559
False,1,1,12.81,13.06,81.29,508.8,0.08739,0.03774,0.009193,0.0133,0.1466,0.06133,0.2889,0.9899,1.778,21.79,0.008534,0.006364,0.00618,0.007408,0.01065,0.003351,13.63,16.15,86.7,570.7,0.1162,0.05445,0.02758,0.0399,0.1783,0.07319
False,0,0,21.09,26.57,142.7,1311.0,0.1141,0.2832,0.2487,0.1496,0.2395,0.07398,0.6298,0.7629,4.414,81.46,0.004253,0.04759,0.03872,0.01567,0.01798,0.005295,26.68,33.48,176.5,2089.0,0.1491,0.7584,0.678,0.2903,0.4098,0.1284
False,0,0,15.7,20.31,101.2,766.6,0.09597,0.08799,0.06593,0.05189,0.1618,0.05549,0.3699,1.15,2.406,40.98,0.004626,0.02263,0.01954,0.009767,0.01547,0.00243,20.11,32.82,129.3,1269.0,0.1414,0.3547,0.2902,0.1541,0.3437,0.08631
False,0,0,15.28,22.41,98.92,710.6,0.09057,0.1052,0.05375,0.03263,0.1727,0.06317,0.2054,0.4956,1.344,19.53,0.00329,0.01395,0.01774,0.006009,0.01172,0.002575,17.8,28.03,113.8,973.1,0.1301,0.3299,0.363,0.1226,0.3175,0.09772
False,1,1,10.08,15.11,63.76,317.5,0.09267,0.04695,0.001597,0.002404,0.1703,0.06048,0.4245,1.268,2.68,26.43,0.01439,0.012,0.001597,0.002404,0.02538,0.00347,11.87,21.18,75.39,437.0,0.1521,0.1019,0.00692,0.01042,0.2933,0.07697
False,0,0,18.31,18.58,118.6,1041.0,0.08588,0.08468,0.08169,0.05814,0.1621,0.05425,0.2577,0.4757,1.817,28.92,0.002866,0.009181,0.01412,0.006719,0.01069,0.001087,21.31,26.36,139.2,1410.0,0.1234,0.2445,0.3538,0.1571,0.3206,0.06938
False,1,1,11.81,17.39,75.27,428.9,0.1007,0.05562,0.02353,0.01553,0.1718,0.0578,0.1859,1.926,1.011,14.47,0.007831,0.008776,0.01556,0.00624,0.03139,0.001988,12.57,26.48,79.57,489.5,0.1356,0.1,0.08803,0.04306,0.32,0.06576
False,1,1,12.3,15.9,78.83,463.7,0.0808,0.07253,0.03844,0.01654,0.1667,0.05474,0.2382,0.8355,1.687,18.32,0.005996,0.02212,0.02117,0.006433,0.02025,0.001725,13.35,19.59,86.65,546.7,0.1096,0.165,0.1423,0.04815,0.2482,0.06306

Unnamed: 0,Latest
Model Rank,> 0
accuracy_score,0.945205
custom_metric,0.5
f1_score,0.955556
false_negatives,2
false_positives,2
precision_score,0.955556
recall_score,0.955556
true_negatives,26
true_positives,43


In [20]:
trained_model = r.get_artifact("model")
print(trained_model)

mlflow.pyfunc.loaded_model:
  artifact_path: train/model
  flavor: mlflow.sklearn
  run_id: 62f37034b8b545b8a2da98d80e631487



In [21]:
r.run("evaluate")

2022/11/12 01:00:51 INFO mlflow.recipes.utils.execution: ingest, split, transform, train: No changes. Skipping.


2022/11/12 01:00:51 INFO mlflow.recipes.step: Running step evaluate...
2022/11/12 01:00:52 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2022/11/12 01:00:52 INFO mlflow.models.evaluation.default_evaluator: The evaluation dataset is inferred as binary dataset, positive label is 1, negative label is 0.
2022/11/12 01:00:53 INFO mlflow.models.evaluation.default_evaluator: Shap explainer _PatchedKernelExplainer is used.

  0%|          | 0/10 [00:00<?, ?it/s]
 20%|██        | 2/10 [00:00<00:01,  7.66it/s]
 30%|███       | 3/10 [00:00<00:01,  6.94it/s]
 40%|████      | 4/10 [00:00<00:00,  6.13it/s]
 50%|█████     | 5/10 [00:00<00:00,  5.80it/s]
 60%|██████    | 6/10 [00:00<00:00,  5.61it/s]
 70%|███████   | 7/10 [00:01<00:00,  5.44it/s]
 80%|████████  | 8/10 [00:01<00:00,  5.27it/s]
 90%|█████████ | 9/10 [00:01<00:00,  5.39it/s]
100%|██████████| 10/10 [00:01<00:00,  5.51it/s]
100%|██████████| 10/10 [00:01<00:00,  5.73it/s]
elementwise comparison failed;

Metric,validation,test
accuracy_score,0.945205,0.965116
custom_metric,0.5,0.5
example_count,73.0,86.0
f1_score,0.955556,0.975207
false_negatives,2.0,1.0
false_positives,2.0,2.0
log_loss,0.166907,0.148585
precision_recall_auc,0.995552,0.994189
precision_score,0.955556,0.967213
recall_score,0.955556,0.983333

metric,greater_is_better,value,threshold,validated
accuracy_score,True,0.965116,0.5,✅


In [22]:
r.run("register")

2022/11/12 01:00:58 INFO mlflow.recipes.utils.execution: ingest, split, transform, train, evaluate: No changes. Skipping.


2022/11/12 01:00:59 INFO mlflow.recipes.step: Running step register...
Registered model 'model' already exists. Creating a new version of this model...
2022/11/12 01:00:59 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: model, version 4
Created version '4' of model 'model'.
