-
Notifications
You must be signed in to change notification settings - Fork 3
/
RMLproject.html
433 lines (353 loc) · 15 KB
/
RMLproject.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>RMLproject</title>
<style type="text/css">
body {
font-family: Helvetica, arial, sans-serif;
font-size: 14px;
line-height: 1.6;
padding-top: 10px;
padding-bottom: 10px;
background-color: white;
padding: 30px; }
body > *:first-child {
margin-top: 0 !important; }
body > *:last-child {
margin-bottom: 0 !important; }
a {
color: #4183C4; }
a.absent {
color: #cc0000; }
a.anchor {
display: block;
padding-left: 30px;
margin-left: -30px;
cursor: pointer;
position: absolute;
top: 0;
left: 0;
bottom: 0; }
h1, h2, h3, h4, h5, h6 {
margin: 20px 0 10px;
padding: 0;
font-weight: bold;
-webkit-font-smoothing: antialiased;
cursor: text;
position: relative; }
h1:hover a.anchor, h2:hover a.anchor, h3:hover a.anchor, h4:hover a.anchor, h5:hover a.anchor, h6:hover a.anchor {
background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAA09pVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMy1jMDExIDY2LjE0NTY2MSwgMjAxMi8wMi8wNi0xNDo1NjoyNyAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNiAoMTMuMCAyMDEyMDMwNS5tLjQxNSAyMDEyLzAzLzA1OjIxOjAwOjAwKSAgKE1hY2ludG9zaCkiIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6OUM2NjlDQjI4ODBGMTFFMTg1ODlEODNERDJBRjUwQTQiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6OUM2NjlDQjM4ODBGMTFFMTg1ODlEODNERDJBRjUwQTQiPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDo5QzY2OUNCMDg4MEYxMUUxODU4OUQ4M0REMkFGNTBBNCIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo5QzY2OUNCMTg4MEYxMUUxODU4OUQ4M0REMkFGNTBBNCIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PsQhXeAAAABfSURBVHjaYvz//z8DJYCRUgMYQAbAMBQIAvEqkBQWXI6sHqwHiwG70TTBxGaiWwjCTGgOUgJiF1J8wMRAIUA34B4Q76HUBelAfJYSA0CuMIEaRP8wGIkGMA54bgQIMACAmkXJi0hKJQAAAABJRU5ErkJggg==) no-repeat 10px center;
text-decoration: none; }
h1 tt, h1 code {
font-size: inherit; }
h2 tt, h2 code {
font-size: inherit; }
h3 tt, h3 code {
font-size: inherit; }
h4 tt, h4 code {
font-size: inherit; }
h5 tt, h5 code {
font-size: inherit; }
h6 tt, h6 code {
font-size: inherit; }
h1 {
font-size: 28px;
color: black; }
h2 {
font-size: 24px;
border-bottom: 1px solid #cccccc;
color: black; }
h3 {
font-size: 18px; }
h4 {
font-size: 16px; }
h5 {
font-size: 14px; }
h6 {
color: #777777;
font-size: 14px; }
p, blockquote, ul, ol, dl, li, table, pre {
margin: 15px 0; }
hr {
background: transparent url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAYAAAAECAYAAACtBE5DAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAyJpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEzNDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBNYWNpbnRvc2giIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6OENDRjNBN0E2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6OENDRjNBN0I2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDo4Q0NGM0E3ODY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo4Q0NGM0E3OTY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PqqezsUAAAAfSURBVHjaYmRABcYwBiM2QSA4y4hNEKYDQxAEAAIMAHNGAzhkPOlYAAAAAElFTkSuQmCC) repeat-x 0 0;
border: 0 none;
color: #cccccc;
height: 4px;
padding: 0;
}
body > h2:first-child {
margin-top: 0;
padding-top: 0; }
body > h1:first-child {
margin-top: 0;
padding-top: 0; }
body > h1:first-child + h2 {
margin-top: 0;
padding-top: 0; }
body > h3:first-child, body > h4:first-child, body > h5:first-child, body > h6:first-child {
margin-top: 0;
padding-top: 0; }
a:first-child h1, a:first-child h2, a:first-child h3, a:first-child h4, a:first-child h5, a:first-child h6 {
margin-top: 0;
padding-top: 0; }
h1 p, h2 p, h3 p, h4 p, h5 p, h6 p {
margin-top: 0; }
li p.first {
display: inline-block; }
li {
margin: 0; }
ul, ol {
padding-left: 30px; }
ul :first-child, ol :first-child {
margin-top: 0; }
dl {
padding: 0; }
dl dt {
font-size: 14px;
font-weight: bold;
font-style: italic;
padding: 0;
margin: 15px 0 5px; }
dl dt:first-child {
padding: 0; }
dl dt > :first-child {
margin-top: 0; }
dl dt > :last-child {
margin-bottom: 0; }
dl dd {
margin: 0 0 15px;
padding: 0 15px; }
dl dd > :first-child {
margin-top: 0; }
dl dd > :last-child {
margin-bottom: 0; }
blockquote {
border-left: 4px solid #dddddd;
padding: 0 15px;
color: #777777; }
blockquote > :first-child {
margin-top: 0; }
blockquote > :last-child {
margin-bottom: 0; }
table {
padding: 0;border-collapse: collapse; }
table tr {
border-top: 1px solid #cccccc;
background-color: white;
margin: 0;
padding: 0; }
table tr:nth-child(2n) {
background-color: #f8f8f8; }
table tr th {
font-weight: bold;
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }
table tr td {
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }
table tr th :first-child, table tr td :first-child {
margin-top: 0; }
table tr th :last-child, table tr td :last-child {
margin-bottom: 0; }
img {
max-width: 100%; }
span.frame {
display: block;
overflow: hidden; }
span.frame > span {
border: 1px solid #dddddd;
display: block;
float: left;
overflow: hidden;
margin: 13px 0 0;
padding: 7px;
width: auto; }
span.frame span img {
display: block;
float: left; }
span.frame span span {
clear: both;
color: #333333;
display: block;
padding: 5px 0 0; }
span.align-center {
display: block;
overflow: hidden;
clear: both; }
span.align-center > span {
display: block;
overflow: hidden;
margin: 13px auto 0;
text-align: center; }
span.align-center span img {
margin: 0 auto;
text-align: center; }
span.align-right {
display: block;
overflow: hidden;
clear: both; }
span.align-right > span {
display: block;
overflow: hidden;
margin: 13px 0 0;
text-align: right; }
span.align-right span img {
margin: 0;
text-align: right; }
span.float-left {
display: block;
margin-right: 13px;
overflow: hidden;
float: left; }
span.float-left span {
margin: 13px 0 0; }
span.float-right {
display: block;
margin-left: 13px;
overflow: hidden;
float: right; }
span.float-right > span {
display: block;
overflow: hidden;
margin: 13px auto 0;
text-align: right; }
code, tt {
margin: 0 2px;
padding: 0 5px;
white-space: nowrap;
border: 1px solid #eaeaea;
background-color: #f8f8f8;
border-radius: 3px; }
pre code {
margin: 0;
padding: 0;
white-space: pre;
border: none;
background: transparent; }
.highlight pre {
background-color: #f8f8f8;
border: 1px solid #cccccc;
font-size: 13px;
line-height: 19px;
overflow: auto;
padding: 6px 10px;
border-radius: 3px; }
pre {
background-color: #f8f8f8;
border: 1px solid #cccccc;
font-size: 13px;
line-height: 19px;
overflow: auto;
padding: 6px 10px;
border-radius: 3px; }
pre code, pre tt {
background-color: transparent;
border: none; }
sup {
font-size: 0.83em;
vertical-align: super;
line-height: 0;
}
kbd {
display: inline-block;
padding: 3px 5px;
font-size: 11px;
line-height: 10px;
color: #555;
vertical-align: middle;
background-color: #fcfcfc;
border: solid 1px #ccc;
border-bottom-color: #bbb;
border-radius: 3px;
box-shadow: inset 0 -1px 0 #bbb
}
* {
-webkit-print-color-adjust: exact;
}
@media screen and (min-width: 914px) {
body {
width: 854px;
margin:0 auto;
}
}
@media print {
table, pre {
page-break-inside: avoid;
}
pre {
word-wrap: break-word;
}
}
</style>
</head>
<body>
<h1 id="toc_0">Set up and run a R project in MLflow</h1>
<p>In this <a href="https://medium.com/@dsml4real/tracking-machine-learning-models-in-r-with-mlflow-9ccce342ce91">story</a> I have described how to use <a href="https://mlflow.org/"><em>MLflow</em></a> to track machine learning model training. <em>MLflow</em> also comes with a <a href="https://mlflow.org/docs/latest/projects.html"><code>Projects</code></a> component that packs data, source code with commands, parameters and execution environment setup together as a self-contained specification. Once a <em>MLproject</em> is defined, users can run it everywhere. Currently <em>MLproject</em> can run Python code or shell command. It can also set up the Python environment for the project specified in the <code>conda.yaml</code> file defined by users.</p>
<p>For R users, it is common to load some packages in the R source codes. These packages need to installed for the R code to run. In the future, it could be a good enhancement for <em>MLflow</em> to add something similar to <code>conda.yaml</code> to set up R package dependencies. But we do not have to wait for it. I will show how to create a <em>MLproject</em> containing R source code and run it with <code>mlflow run</code> command.</p>
<p>First, create a directory and copy the data and R source codes to that directory. For example,</p>
<div><pre><code class="language-text">.
└── R
├── MLproject
├── prep.R
├── rpart-example.R
└── wine-quality.csv</code></pre></div>
<p>In this example, the data to be learned is <a href="https://github.com/adrian555/DocsDump/raw/dev/files/mlflow-projects/R/wine-quality.csv"><code>wine-quality.csv</code></a>. The example is to run the <a href="https://github.com/adrian555/DocsDump/raw/dev/files/mlflow-projects/R/rpart-example.R"><code>rpart-example.R</code></a> to fit a tree model:</p>
<div><pre><code class="language-r"># Source prep.R file to install the dependencies
source("prep.R")
# Import mlflow python package for tracking
library(reticulate)
mlflow <- import("mlflow")
# Load rpart to build a tree model
library(rpart)
# Read in data
wine <- read.csv("wine-quality.csv")
# Build the model
fit <- rpart(quality ~ ., wine)
# Save the model that can be loaded later
saveRDS(fit, "fit.rpart")
# Save the model to mlflow tracking server
mlflow$log_artifact("fit.rpart")
# Plot
jpeg("rplot.jpg")
par(xpd=TRUE)
plot(fit)
text(fit, use.n=TRUE)
dev.off()
# Save the plot to mlflow tracking server
mlflow$log_artifact("rplot.jpg")</code></pre></div>
<p>The R code above includes three parts: the model training, the artifacts logging through <em>MLflow</em>, and the R package dependencies installation. In this example, these two R packages, <code>reticulate</code> and <code>rpart</code>, are required for the code to run. To pack these codes into a self-contained project, some sort of script should be run to automatically install these packages if the platform does not have them installed. </p>
<p>With our approach, any specific R package needed for the project is going to be installed through <a href="https://github.com/adrian555/DocsDump/raw/dev/files/mlflow-projects/R/prep.R"><code>prep.R</code></a> with these codes:</p>
<div><pre><code class="language-r"># Accept parameters, args[6] is the R package repo url
args <- commandArgs()
# All installed packages
pkgs <- installed.packages()
# List of required packages for this project
reqs <- c("reticulate", "rpart")
# Try to install the dependencies if not installed
sapply(reqs, function(x){
if (!x %in% rownames(pkgs)) {
install.packages(x, repos=c(args[6]))
}
})</code></pre></div>
<p>Before packaging these into a <em>MLproject</em>, try to test by directly invoking <code>Rscript</code> command as follow:</p>
<div><pre><code class="language-commandline">Rscript rpart-example.R https://cran.r-project.org/</code></pre></div>
<p>From the <em>MLflow</em> UI, you should see this run been tracked like this screen snapshot:<img src="https://github.com/adrian555/DocsDump/raw/dev/images/r-mlproject-bare.png" alt="snapshot"></p>
<p>Now let’s write the spec and pack this project into a <em>MLproject</em> that <em>MLflow</em> knows to run. All needed to be done is creating the <a href="https://github.com/adrian555/DocsDump/raw/dev/files/mlflow-projects/R/MLproject"><code>MLproject</code></a> file in the same directory.</p>
<div><pre><code class="language-yaml">name: r_example
entry_points:
main:
parameters:
r-repo: {type: string, default: "https://cran.r-project.org/"}
command: "Rscript rpart-example.R {r-repo}"</code></pre></div>
<p>In this file, it defines a <code>r_example</code> project with a <code>main</code> entry point. The entry point specifies the command and parameters to be executed by the <code>mlflow run</code>. For this project, <code>Rscript</code> is the shell command to invoke the R source code. <code>r-repo</code> parameter provides the URL string where the dependent packages can be installed from. A default value is set. This parameter is passed to the command running the R source code.</p>
<p>Now that all are set so the project can be checked in and pushed to github repository. With following command, it can be run on any platform that has R installed.</p>
<div><pre><code class="language-commandline">mlflow run https://github.com/adrian555/DocsDump#files/mlflow-projects/R</code></pre></div>
<p>The project can also be viewed from the <em>MLflow</em> tracking UI like this screen snapshot: <img src="https://github.com/adrian555/DocsDump/raw/dev/images/r-mlproject.png" alt="snapshot-project"></p>
<p>The differences between this view and previous run without <code>Mlproject</code> spec are the <code>Run Command</code> which captures the exact command to run the project, and the <code>Parameters</code> which automatically logs any parameters passed to entry points.</p>
<p>This is exactly what <code>Projects</code> component of <em>MLflow</em> is designed for, to define the project and make it easily to be rerun. R users can quickly set up their projects and enjoy the easiness of tracking and running projects with <em>MLflow</em> once going through this example.</p>
</body>
</html>