forked from adrian555/DocsDump
-
Notifications
You must be signed in to change notification settings - Fork 0
/
FirstImpressionOnMLflow.html
556 lines (442 loc) · 27.4 KB
/
FirstImpressionOnMLflow.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>FirstImpressionOnMLflow</title>
<style type="text/css">
body {
font-family: Helvetica, arial, sans-serif;
font-size: 14px;
line-height: 1.6;
padding-top: 10px;
padding-bottom: 10px;
background-color: white;
padding: 30px; }
body > *:first-child {
margin-top: 0 !important; }
body > *:last-child {
margin-bottom: 0 !important; }
a {
color: #4183C4; }
a.absent {
color: #cc0000; }
a.anchor {
display: block;
padding-left: 30px;
margin-left: -30px;
cursor: pointer;
position: absolute;
top: 0;
left: 0;
bottom: 0; }
h1, h2, h3, h4, h5, h6 {
margin: 20px 0 10px;
padding: 0;
font-weight: bold;
-webkit-font-smoothing: antialiased;
cursor: text;
position: relative; }
h1:hover a.anchor, h2:hover a.anchor, h3:hover a.anchor, h4:hover a.anchor, h5:hover a.anchor, h6:hover a.anchor {
background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAA09pVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMy1jMDExIDY2LjE0NTY2MSwgMjAxMi8wMi8wNi0xNDo1NjoyNyAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNiAoMTMuMCAyMDEyMDMwNS5tLjQxNSAyMDEyLzAzLzA1OjIxOjAwOjAwKSAgKE1hY2ludG9zaCkiIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6OUM2NjlDQjI4ODBGMTFFMTg1ODlEODNERDJBRjUwQTQiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6OUM2NjlDQjM4ODBGMTFFMTg1ODlEODNERDJBRjUwQTQiPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDo5QzY2OUNCMDg4MEYxMUUxODU4OUQ4M0REMkFGNTBBNCIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo5QzY2OUNCMTg4MEYxMUUxODU4OUQ4M0REMkFGNTBBNCIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PsQhXeAAAABfSURBVHjaYvz//z8DJYCRUgMYQAbAMBQIAvEqkBQWXI6sHqwHiwG70TTBxGaiWwjCTGgOUgJiF1J8wMRAIUA34B4Q76HUBelAfJYSA0CuMIEaRP8wGIkGMA54bgQIMACAmkXJi0hKJQAAAABJRU5ErkJggg==) no-repeat 10px center;
text-decoration: none; }
h1 tt, h1 code {
font-size: inherit; }
h2 tt, h2 code {
font-size: inherit; }
h3 tt, h3 code {
font-size: inherit; }
h4 tt, h4 code {
font-size: inherit; }
h5 tt, h5 code {
font-size: inherit; }
h6 tt, h6 code {
font-size: inherit; }
h1 {
font-size: 28px;
color: black; }
h2 {
font-size: 24px;
border-bottom: 1px solid #cccccc;
color: black; }
h3 {
font-size: 18px; }
h4 {
font-size: 16px; }
h5 {
font-size: 14px; }
h6 {
color: #777777;
font-size: 14px; }
p, blockquote, ul, ol, dl, li, table, pre {
margin: 15px 0; }
hr {
background: transparent url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAYAAAAECAYAAACtBE5DAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAyJpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEzNDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBNYWNpbnRvc2giIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6OENDRjNBN0E2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6OENDRjNBN0I2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDo4Q0NGM0E3ODY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo4Q0NGM0E3OTY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PqqezsUAAAAfSURBVHjaYmRABcYwBiM2QSA4y4hNEKYDQxAEAAIMAHNGAzhkPOlYAAAAAElFTkSuQmCC) repeat-x 0 0;
border: 0 none;
color: #cccccc;
height: 4px;
padding: 0;
}
body > h2:first-child {
margin-top: 0;
padding-top: 0; }
body > h1:first-child {
margin-top: 0;
padding-top: 0; }
body > h1:first-child + h2 {
margin-top: 0;
padding-top: 0; }
body > h3:first-child, body > h4:first-child, body > h5:first-child, body > h6:first-child {
margin-top: 0;
padding-top: 0; }
a:first-child h1, a:first-child h2, a:first-child h3, a:first-child h4, a:first-child h5, a:first-child h6 {
margin-top: 0;
padding-top: 0; }
h1 p, h2 p, h3 p, h4 p, h5 p, h6 p {
margin-top: 0; }
li p.first {
display: inline-block; }
li {
margin: 0; }
ul, ol {
padding-left: 30px; }
ul :first-child, ol :first-child {
margin-top: 0; }
dl {
padding: 0; }
dl dt {
font-size: 14px;
font-weight: bold;
font-style: italic;
padding: 0;
margin: 15px 0 5px; }
dl dt:first-child {
padding: 0; }
dl dt > :first-child {
margin-top: 0; }
dl dt > :last-child {
margin-bottom: 0; }
dl dd {
margin: 0 0 15px;
padding: 0 15px; }
dl dd > :first-child {
margin-top: 0; }
dl dd > :last-child {
margin-bottom: 0; }
blockquote {
border-left: 4px solid #dddddd;
padding: 0 15px;
color: #777777; }
blockquote > :first-child {
margin-top: 0; }
blockquote > :last-child {
margin-bottom: 0; }
table {
padding: 0;border-collapse: collapse; }
table tr {
border-top: 1px solid #cccccc;
background-color: white;
margin: 0;
padding: 0; }
table tr:nth-child(2n) {
background-color: #f8f8f8; }
table tr th {
font-weight: bold;
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }
table tr td {
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }
table tr th :first-child, table tr td :first-child {
margin-top: 0; }
table tr th :last-child, table tr td :last-child {
margin-bottom: 0; }
img {
max-width: 100%; }
span.frame {
display: block;
overflow: hidden; }
span.frame > span {
border: 1px solid #dddddd;
display: block;
float: left;
overflow: hidden;
margin: 13px 0 0;
padding: 7px;
width: auto; }
span.frame span img {
display: block;
float: left; }
span.frame span span {
clear: both;
color: #333333;
display: block;
padding: 5px 0 0; }
span.align-center {
display: block;
overflow: hidden;
clear: both; }
span.align-center > span {
display: block;
overflow: hidden;
margin: 13px auto 0;
text-align: center; }
span.align-center span img {
margin: 0 auto;
text-align: center; }
span.align-right {
display: block;
overflow: hidden;
clear: both; }
span.align-right > span {
display: block;
overflow: hidden;
margin: 13px 0 0;
text-align: right; }
span.align-right span img {
margin: 0;
text-align: right; }
span.float-left {
display: block;
margin-right: 13px;
overflow: hidden;
float: left; }
span.float-left span {
margin: 13px 0 0; }
span.float-right {
display: block;
margin-left: 13px;
overflow: hidden;
float: right; }
span.float-right > span {
display: block;
overflow: hidden;
margin: 13px auto 0;
text-align: right; }
code, tt {
margin: 0 2px;
padding: 0 5px;
white-space: nowrap;
border: 1px solid #eaeaea;
background-color: #f8f8f8;
border-radius: 3px; }
pre code {
margin: 0;
padding: 0;
white-space: pre;
border: none;
background: transparent; }
.highlight pre {
background-color: #f8f8f8;
border: 1px solid #cccccc;
font-size: 13px;
line-height: 19px;
overflow: auto;
padding: 6px 10px;
border-radius: 3px; }
pre {
background-color: #f8f8f8;
border: 1px solid #cccccc;
font-size: 13px;
line-height: 19px;
overflow: auto;
padding: 6px 10px;
border-radius: 3px; }
pre code, pre tt {
background-color: transparent;
border: none; }
sup {
font-size: 0.83em;
vertical-align: super;
line-height: 0;
}
kbd {
display: inline-block;
padding: 3px 5px;
font-size: 11px;
line-height: 10px;
color: #555;
vertical-align: middle;
background-color: #fcfcfc;
border: solid 1px #ccc;
border-bottom-color: #bbb;
border-radius: 3px;
box-shadow: inset 0 -1px 0 #bbb
}
* {
-webkit-print-color-adjust: exact;
}
@media screen and (min-width: 914px) {
body {
width: 854px;
margin:0 auto;
}
}
@media print {
table, pre {
page-break-inside: avoid;
}
pre {
word-wrap: break-word;
}
}
</style>
</head>
<body>
<h1 id="toc_0">First Impression on MLflow</h1>
<p><a href="https://mlflow.org/"><strong><em>MLflow</em></strong></a> is one of the latest open source projects added to the <a href="https://spark.apache.org/"><strong>Apache Spark</strong></a> ecosystem by <a href="https://databricks.com/">databricks</a>. It first debut in the <a href="https://databricks.com/session/unifying-data-and-ai-for-better-data-products">Spark + AI Summit 2018</a>. The source code is hosted in the <a href="https://github.com/databricks/mlflow">mlflow-github</a> and is still in the alpha release stage. The current version is 0.4.1 released on 08/03/2018.</p>
<p>Blogs and meetups from databricks describe <em>MLflow</em> and its roadmap, including <a href="https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html">Introducing MLflow: an Open Source Machine Learning Platform</a> and <a href="https://www.slideshare.net/databricks/mlflow-infrastructure-for-a-complete-machine-learning-life-cycle">MLflow: Infrastructure for a Complete Machine Learning Life Cycle</a>. Users and developers can find useful information to try out <em>MLflow</em> and further contribute to the project.</p>
<p>This blog, however, will dig further and describe some internals of the <em>MLflow</em> based on the firsthand experience and the study of the source code. It will also provide suggestions on places <em>MLflow</em> may be improved.</p>
<h2 id="toc_1">What is MLflow</h2>
<p><strong><em>MLflow</em></strong> is targeted as an open source platform for the complete machine learning lifecycle. A complete machine learning lifecycle at least includes raw data ingestion, data analysis and preparing, model training, model evaluation, model deployment and finally model maintenance. <em>MLflow</em> is built as a Python package and provides open REST APIs and commands to</p>
<ul>
<li>log important parameters, metrics and other data that are mattered to the machine learning model</li>
<li>track the environment a model is run on </li>
<li>run any machine learning codes on that environment </li>
<li>deploy and export models to various platforms with multiple packaging formats </li>
</ul>
<p><em>MLflow</em> is implemented as several modules, where each module supports a specific function.</p>
<h3 id="toc_2">MLflow components</h3>
<p>Currently <em>MLflow</em> has three components as follow (source: <a href="https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html">Introducing MLflow: an Open Source Machine Learning Platform</a>)
<img src="https://databricks.com/wp-content/uploads/2018/06/mlflow.png" alt="*MLflow* components"></p>
<p>Further description of each component can be found in the blog mentioned above and the link to the <a href="https://mlflow.org/docs/latest/index.html"><em>MLflow</em> Documentation</a>. Rest of the section will give a high level overview of the internals and implementation of each component. </p>
<h4 id="toc_3">Tracking</h4>
<p><code>Tracking</code> component implements REST APIs and UI for parameters, metrics, artifacts and source logging and viewing. The backend is implemented with <a href="http://flask.pocoo.org/">Flask</a> and run on <a href="http://gunicorn.org/">gunicorn</a> HTTP server while the UI is implemented with <a href="https://reactjs.org/">React</a>.</p>
<p>The Python module for tracking is <code>mlflow.tracking</code>.</p>
<p>Every time users train a model on the machine learning platform <em>MLflow</em> creates a <code>Run</code> and save the <code>RunInfo</code> meta info onto disk. Python APIs are provided to log parameters and metrics for a <code>Run</code>. The output of the run such as the model are saved in the <code>artifacts</code> for a <code>Run</code>. Each individual <code>Run</code> is grouped into an <code>Experiment</code>. Following class diagram shows classes defined in <em>MLflow</em> to support tracking function.
<img src="images/mlflowObjects.jpg" alt="*MLflow* objects"></p>
<p>The model training source code needs to call <em>MLflow</em> APIs to log the data to be tracked. For example, calling <code>log_metric</code> to log the metrics and <code>log_param</code> to log the parameters.</p>
<p><em>MLflow</em> tracking server currently uses file system to persist all <code>Experiment</code> data. The directory structure looks like below:</p>
<div><pre><code class="language-none">mlruns
└── 0
├── 7003d550294e4755a65569dd846a7ca6
│ ├── artifacts
│ │ └── test.txt
│ ├── meta.yaml
│ ├── metrics
│ │ └── foo
│ └── params
│ └── param1
└── meta.yaml</code></pre></div>
<p>Every <code>Run</code> can be viewed through UI browser that connects to the tracking server.
<img src="images/mlflow-ui.jpg" alt="*MLflow* UI"></p>
<p>Users can search and filter models with <code>metrics</code> and <code>params</code>, and compare and retrieve model details.</p>
<h4 id="toc_4">Projects</h4>
<p><code>Projects</code> component defines the specification on how to run the model training code. It includes the platform configuration, the dependencies, the source code, the data and others that allows the model training to be executed through <em>MLflow</em>. Following is an example provided by the <em>MLflow</em>:</p>
<div><pre><code class="language-none">name: tutorial
conda_env: conda.yaml
entry_points:
main:
parameters:
alpha: float
l1_ratio: {type: float, default: 0.1}
command: "python train.py {alpha} {l1_ratio}"</code></pre></div>
<p>The <code>mlflow run</code> command looks for <code>MLproject</code> file for the spec and download the dependencies if needed, then runs the model training with the source code and data specified in the <code>MLproject</code>. </p>
<div><pre><code class="language-none">mlflow run mlflow/example/tutorial -P alpha=0.4</code></pre></div>
<p>The <code>MLproject</code> specifies the command to run the source code, therefore, the source code can be in any languages, including Python. Projects can be run on many machine learning platforms, including tensorflow, pyspark, scikit-learn and others. If the dependent Python packages are available to download by Anaconda, they can be added to <code>conda.yaml</code> file and <em>MLflow</em> will set up the packages automatically.</p>
<h4 id="toc_5">Models</h4>
<p><code>Models</code> component defines the general model format in the <code>MLmodel</code> file as follow:</p>
<div><pre><code class="language-none">artifact_path: model
flavors:
python_function:
data: model.pkl
loader_module: mlflow.sklearn
sklearn:
pickled_model: model.pkl
sklearn_version: 0.19.1
run_id: 0927ac17b2954dc0b4d944e6834817fd
utc_time_created: '2018-08-06 18:38:16.294557'</code></pre></div>
<p>It specifies different <code>flavors</code> for different tools to deploy and load the model. This allows the model to be saved in its original binary persistence output from the platform training the model. For example, in scikit-learn, the model is serialized with Python <code>pickle</code> package. The model can then be deployed to the environment which understands this format. With the <code>sklearn</code> flavor, if the environment has the scikit-learn installed, it can directly load the model and serve. Otherwise, with the <code>python_function</code> flavor, <em>MLflow</em> provides the <code>mlflow.sklearn</code> Python module as the helper to load the model.</p>
<p>So far <em>Mlflow</em> supports models load, save and deployment with scikit-learn, tensorflow, sagemaker, h2o, azure and spark platforms.</p>
<p>With <em>MLflow</em>’s modular design, the current <code>Tracking</code>, <code>Projects</code> and <code>Models</code> components touch most parts of the machine learning lifecycle. Users can also choose to use one component but not the others if they like. With its REST APIs, these components can also be easily integrated into other machine learning workflows.</p>
<h2 id="toc_6">Experiencing MLflow</h2>
<p>Installing <em>MLflow</em> is quick and easy if <a href="https://anaconda.org/">Anaconda</a> has been installed and a virtual env has been created. <code>pip install mlflow</code> will install the latest <em>MLflow</em> release.</p>
<p>To train the model with <code>tensorflow</code>, run <code>pip install tensorflow</code> to install the latest version of <code>tensorflow</code>.</p>
<p>A simple example to train a tensorflow model with following code
<a href="https://github.com/adrian555/DocsDump/files/tf-example/tf-example.py">tf-example.py</a></p>
<div><pre><code class="language-python">import tensorflow as tf
from tensorflow import keras
import numpy as np
import mlflow
from mlflow import tracking
# load dataset
dataset = np.loadtxt("/Users/wzhuang/housing.csv", delimiter=",")
# save the data as artifact
mlflow.log_artifact("/Users/wzhuang/housing.csv")
# split the features and label
X = dataset[:, 0:15]
Y = dataset[:, 15]
# define the model
first_layer_dense = 64
second_layer_dense = 64
model = keras.Sequential([
keras.layers.Dense(first_layer_dense, activation=tf.nn.relu,
input_shape=(X.shape[1],)),
keras.layers.Dense(second_layer_dense, activation=tf.nn.relu),
keras.layers.Dense(1)
])
# log some parameters
mlflow.log_param("First_layer_dense", first_layer_dense)
mlflow.log_param("Second_layer_dense", second_layer_dense)
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
# train
model.fit(X, Y, epochs=500, validation_split=0.2, verbose=0)
# log the model artifact
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
mlflow.log_artifact("model.json")</code></pre></div>
<p>The first call to the <code>tracking</code> API will start the tracking server and log all the data sent through the current and subsequent APIs. These logged data can then be viewed in the <em>MLflow</em> UI. From the example above, it is quite easy to just call the logging APIs in any place users want to track.</p>
<p>Packaging this project is also very simple by just creating a <a href="https://github.com/adrian555/DocsDump/files/tf-example/MLproject">MLproject</a> file as such:</p>
<div><pre><code class="language-none">name: tf-example
conda_env: conda.yaml
entry_points:
main:
command: "python tf-example.py"</code></pre></div>
<p>with <a href="https://github.com/adrian555/DocsDump/files/tf-example/conda.yaml">conda.yaml</a></p>
<div><pre><code class="language-none">name: tf-example
channels:
- defaults
dependencies:
- python=3.6
- numpy=1.14.3
- pip:
- mlflow
- tensorflow</code></pre></div>
<p>Then <code>mlflow run tf-example</code> will run the project on any environment. It first creates a <code>conda</code> environment with the required Python packages installed and then run the <a href="https://github.com/adrian555/DocsDump/files/tf-example/tf-example.py">tf-example.py</a> inside that virtual env. As expected, the run result is also logged to the <em>MLflow</em> tracking server.</p>
<p><em>MLflow</em> also comes with a server implementation where the <code>sklearn</code> and other types of models can be deployed and served. The <a href="https://github.com/mlflow/mlflow"><em>MLflow</em> github README.md</a> illustrates the usage. However, to deploy and serve the model built by the above example requires new code that understands Keras models. This is beyond this blog’s scope.</p>
<p>To summarize, the experience with <em>MLflow</em> is smooth. There were several bugs here and there but overall was satisfied with what the project claims to be. Of course since <em>MLflow</em> is still in its alpha phase, bugs and lacking of some features are expected.</p>
<h2 id="toc_7">Things <em>MLflow</em> can be enhanced</h2>
<p><em>MLflow</em> so far provides an open source solution to track the data science processing, package and deploy machine learning model. As it claims, it targets the management of the machine learning lifecycle. The current alpha version releases the <code>Tracking</code>, <code>Projects</code> and <code>Models</code> components that tackle individual stages of the machine learning workflow. The tool is compact in Python language while providing APIs and UI to be integrated with any machine learning platform easily.</p>
<p>However, there are still many places that <em>MLflow</em> may be improved. There are also new features required for the tool to fully manage and monitor all aspects of the lifecycle of machine learning.</p>
<p>At the Databricks' <a href="https://databricks.com/blog/2018/07/25/bay-area-apache-spark-meetup-summary-databricks-hq.html">meetup</a> on 07/19/2018, several items have been mentioned in the longer-term road map of <em>MLflow</em> according to the <a href="https://www.slideshare.net/databricks/mlflow-infrastructure-for-a-complete-machine-learning-life-cycle">presentation</a>. There are four categories, including improving current components, new MLflow Data component, hyperparameter tuning and language and library integrations. Some items are really important so they need extra explain.</p>
<p>Implementing a database backend for <code>Tracking</code> component is included in the first category. As mentioned above, the <em>MLflow</em> tracking server logs every run info in local file system. This looks like a quick and easy implementation. A better solution will be using a database as the tracking store. When the number of machine learning runs grows, database has its obvious advantage on data queries and retrieval. </p>
<p>Model metadata support is also included in the first category. This is extremely important. Current <code>Tracking</code> component does not describe the model and all runs are viewed as a flatten list ordered by date. The tool allows the search based on the parameters and metrics, but it is far away from enough. Users certainly would like to quickly retrieve the models by model name, algorithm, platform etc. This requires metadata input when a model training is tracked. Tracking server logs the file name of the source code. This does not provide any value to identify a model. Instead, it should allow to input a description of the model. Furthermore, the access control is also essential and can be part of the metadata. And model management should also have versioning support.</p>
<p>In the second category, <em>MLflow</em> will introduce a new <code>Data</code> component. It will build on top of <a href="https://spark.apache.org/">Spark</a>’s Data Source API and allows projects to load data from many formats. This can be viewed as an effort to tighten the <em>MLflow</em> relationship with <em>Spark</em>. What should be done further is of course maintaining the metadata for the data.</p>
<p>In the fourth category, the integration with R and Java is also important. Although Python today becomes the most adopted language in machine learning, there are many data scientists still using R and other languages. <em>MLflow</em> needs to provide R and Java APIs so those machine learning workflows can be managed as well.</p>
<p>There are other important features not included in the current road map. From this blog’s viewpoint, following list of items are also desired and may help complete <em>MLflow</em> as a full machine learning data and model management tool.</p>
<ul>
<li><p>Register APIs<br>
<em>MLflow</em> provides the APIs to log run info. These APIs have to be called inside the model training source code and they are called at runtime. This approach becomes inconvenient either users just want to track the previously runs without these APIs, or runs without access to the source code. To solve such problem, a set of REST APIs that can be called post run to register the run info will be very helpful. The run info, such as parameters, metrics and artifacts, can be part of the JSON input.</p></li>
<li><p>UI view enhancement<br>
In the <code>Experiments</code> UI view, <code>Parameters</code> and <code>Metrics</code> columns display all parameters and metrics for all runs. The row will become unfriendly long and difficult to view when more types of parameters and metrics are tracked. Instead, for each run, the view should just display a hyperlink to the detailed run info where the parameters and metrics will show only for this run. Furthermore, this approach can help solve the problem on logging </p></li>
<li><p>Artifact location<br>
<em>MLflow</em> can take artifacts from either local or github. It would be a great improvement to support the load and save data, source code and model from other sources, such as S3 Object Storage, HDFS, Nexus etc.</p></li>
<li><p>Import and export<br>
Once the tracking store is implemented with database as backend, the next thing will be to support import and export all experiments stored in different databases.</p></li>
<li><p>Run projects remotely<br>
<code>Projects</code> component specifies the command to run the project and the command is displayed in the tracking UI. But since the project can only run on the specific machine learning platform, which can be different from the tracking server, users still have to connect to the platform remotely and issue the command line. The <code>MLproject</code> specification should include the platform information, such as hostname and credentials. With these info, the tracking UI should add an action to kick off the run through the UI.</p></li>
<li><p>Tuning<br>
Adding the parameter tuning functionality through the tracking UI is an important feature. Users will be allowed to change the parameters and kick off the run if the project is tracked by the <code>Projects</code> component.</p></li>
<li><p>Common model format<br>
<code>Models</code> component defines <code>flavors</code> for a model. However, every model still stores in its original format only understood by that training tool. There is still gap between the model development and production. <a href="http://dmg.org/pfa/docs/motivation/">Portable Format for Analytics</a> is a specification that can help bridge the gap. <code>MLmode</code> can be improved to understand PFA and/or convert models into PFA for easy deploying models to PFA-enabled platforms.</p></li>
<li><p>Pipeline integration<br>
A complete machine learning lifecycle also includes data preparation and other pipelines. <em>MLflow</em> so far only tracks the training step. The <code>MLproject</code> may be enhanced to include the specification of other pipelines. Some pipelines may be shared by projects as well.</p></li>
</ul>
</body>
</html>