-
Notifications
You must be signed in to change notification settings - Fork 0
/
sharing_and_publishing.Rmd
447 lines (371 loc) · 20.3 KB
/
sharing_and_publishing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
---
title: "03 Sharing and Using Trained AI/Models"
author: "Florian Berding, Julia Pargmann, Andreas Slopinski, Elisabeth Riebenbauer, Karin Rebmann"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{03 Sharing and Using Trained AI/Models}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, include = FALSE}
library(aifeducation)
```
# 1 Introduction
In the educational and social sciences, it is a common practice to share
research instruments such as questionnaires or tests. For example, the
[Open Test Archive](https://www.testarchiv.eu/en) provides researchers
and practitioners access to a large number of open access instruments.
*aifeducation* assumes AI-based classifiers
should be shareable, similarly to research instruments, to empower
educational and social science researchers and to support the
application of AI for educational purposes. Thus, *aifeducation* aims to
make the sharing process as easy as possible.
For this aim, every object generated with *aifeducation* can be prepared
for publication in a few basic steps. In this vignette, we would like to
show you how to make your AI ready for publication and how to use models
from other persons.
Now we will start with a guide on preparing text embedding models.
# 2 Text Embedding Models
## 2.1 Adding Model Descriptions
Every object of class `TextEmbeddingModel` comes with several methods
allowing you to provide important information for potential users of
your model.
First, every model needs a clear description how it was developed,
modified and how it can be used. You can add a description via the
method `set_model_description`.
```{r, include = TRUE, eval=FALSE}
example_model$set_model_description(
eng=NULL,
native=NULL,
abstract_eng=NULL,
abstract_native=NULL,
keywords_eng=NULL,
keywords_native=NULL)
```
This method allows you to provide a description in English and in the
native language of your model to make the distribution of your model
easier. You can write your description in HTML which allows you to
add links to other sources or publications, to add tables or to
highlight important aspects of your model.
**We would like to recommend that you write at least an English
description to allow a wider community to recognize your work**.
Furthermore, your description should include:
- Which kind of data was used to create the model.
- How much data was used to create the model.
- Which steps were performed and which method was used.
- For which kinds of tasks or materials the model can be used.
With `abstract_eng` and `abstract_native` you can provide a summary of
your description. This is very important if you would like to share your
work on a repository. With `keywords_eng` and `keywords_native` you can
set a vector of keywords which help to find your work with search
engines. We would like to recommend that you at least provide this
information in English.
You can access a model's description by using the method
`get_model_description`
```{r, include = TRUE, eval=FALSE}
example_model$get_model_description()
```
Besides a description of your work it is necessary to provide
information about other people who were involved in creating a model.
This can be done with the method `set_publication_info`.
```{r, include = TRUE, eval=FALSE}
example_model$set_publication_info(
type,
authors,
citation,
url=NULL)
```
First of all you have to decide the type of information you would like
to add. You have two choices: "developer", and "modifier",
which you set with `type`.
`type="developer"` stores all information about the people involved and
the process of developing the model. If you use a transformer model from
[Hugging Face](https://huggingface.co/), the people and their description
of the model should be entered as developers. In all other cases you can
use this type for providing a description of how you developed the
model.
In some cases you might wish to modify an existing model. This might be
the case if you use a transformer model and you adapt the model to a
specific domain or task. In this case you rely on the work of other
people and modify their work. In these cases you can describe your
modifications by setting `type=modifier`.
For every type of person you can add the relevant individuals via
`authors`. Please use the *R*'s function `personList()` for this. With
`citation`you can provide a free text how to cite the work of the
different persons. With `url` you can provide a link to relevant sites
of the model.
You can access the information by using `get_publication_info`.
```{r, include = TRUE, eval=FALSE}
example_model$get_publication_info()
```
Finally, you must provide a license for using your model. This can be
done with `set_software_license` and `get_software_license`.
```{r, include = TRUE, eval=FALSE}
example_model$set_software_license("GPL-3")
```
Please note, that in most cases the license for your model must be "GPL-3" since
some of the software used to create your model are licensed under "GPL-3". Thus,
derivative work must also be licensed under "GPL-3".
The documentation of your work is not part of the software. Here you can set
other licenses such as Creative Common (CC) or Free Documentation License (FDL).
You can set the license for your documentation by using the method
'set_documentation_license'.
```{r, include = TRUE, eval=FALSE}
example_model$set_documentation_license("CC BY-SA")
```
Now you are able to share your work. Please remember to save your now
fully described object as described in the following section 2.2.
## 2.2 Saving and Loading
Saving a created text embedding model is very easy by using the
function `save_ai_model`. This function provides a unique interface for all
text embedding models. For saving your work you can pass your model to `model` and
the directory where to save the model to `model_dir`. Please do only pass
the path of a directory and not the path of a file to this function. Internally
the function creates a new folder in the directory where all files belonging
to a model are stored.
```{r, include = TRUE, eval=FALSE}
save_ai_model(
model=topic_modeling,
model_dir="text_embedding_models",
append_ID=FALSE)
save_ai_model(
model=global_vector_clusters_modeling,
model_dir="text_embedding_models",
append_ID=FALSE)
save_ai_model(
model=bert_modeling,
model_dir="text_embedding_models",
append_ID=FALSE)
```
As you can see all three text embedding models are saved within the same
directory named "text_embedding_models". Within this directory the function
creates a unique folder for every model. The name of this folder is
specified with `dir_name`.
If you set `dir_name=NULL` and `append_ID=FALSE` the the name of the folder is created
by using the models' names.
If you change the argument `append_ID` to `append_ID=TRUE` and set `dir_name=NULL`
the unique ID of the model is added to the directory. The ID is added automatically to ensure
that every model has a unique name. This is important if you would like to share your
work with other persons.
If you want to load your model, just call the function `load_ai_model` and you can continue
using your model. The following code assumes that you have specified `dir_name` manually.
```{r, include = TRUE, eval=FALSE}
topic_modeling<-load_ai_model(
model_dir="text_embedding_models/model_topic_modeling",
ml_framework=aifeducation_config$get_framework())
global_vector_clusters_modeling<-load_ai_model(
model_dir="text_embedding_models/model_global_vectors",
ml_framework=aifeducation_config$get_framework())
bert_modeling<-load_ai_model(
model_dir="text_embedding_models/model_transformer_bert",
ml_framework=aifeducation_config$get_framework())
```
In the case you set `dir_name=NULL` and `append_ID=TRUE` loading the models may
look as shown in the following code snippet:
```{r, include = TRUE, eval=FALSE}
topic_modeling<-load_ai_model(
model_dir="text_embedding_models/topic_model_embedding_ID_DfO25E1Guuaqw7tM")
global_vector_clusters_modeling<-load_ai_model(
model_dir="text_embedding_models/global_vector_clusters_embedding_ID_5Tu8HFHegIuoW14l")
bert_modeling<-load_ai_model(
model_dir="text_embedding_models/bert_embedding_ID_CmyAQKtts5RdlLaS")
```
**Please note that you have to add the name of the model to the directory path.**
In our example we have stored three models in the directory "text_embedding_models". Each
model is saved within its own folder. The folder's name is created automatically
with the help of the name of the model. Thus, for loading a model you must specify
which model you want to load by adding the model's name to the directory path as
shown above.
At this point you may wonder why there is an ID in model's name although you
did not enter an ID on model's creation. The ID is added automatically to ensure
that every model has a unique name. This is important if you would like to share your
work with other persons. During saving the ID is appended automatically by
setting `append_ID=TRUE`.
Now you are ready to share your work. Just provide all files within your model
folder. For the BERT model in the example above this would be the folder
`"text_embedding_models/model_transformer_bert"` or
`"text_embedding_models/bert_embedding_ID_CmyAQKtts5RdlLaS"` depending on how
you saved the model.
# 3 Classifiers
## 3.1 Adding Model Descriptions
Adding the model description of a classifier is similar to
`TextEmbeddingModel`s. With the methods `set_model_description` and
`get_model_description` you can provide a detailed description
(parameter `eng` and `native`) of your classifier in English and the
native language of your classifier. With `abstract_eng` and
`abstract_native` you can provide the corresponding abstract of your
descriptions while `keywords_eng` and `keywords_native` take a vector
with the corresponding keywords.
```{r, include = TRUE, eval=FALSE}
example_classifier$set_model_description(
eng="This classifier targets the realization of the need for competence from
the self-determination theory of motivation by Deci and Ryan in lesson plans
and materials. It describes a learner’s need to perceive themselves as capable.
In this classifier, the need for competence can take on the values 0 to 2.
A value of 0 indicates that the learners have no space in the lesson plan to
perceive their own learning progress and that there is no possibility for
self-comparison. At level 1, competence growth is made visible implicitly,
e.g. by demonstrating the ability to carry out complex exercises or peer
control. At level 2, the increase in competence is made explicit by giving
each learner insights into their progress towards the competence goal. For
example, a comparison between the target vs. actual development towards the
learning objectives of the lesson can be made, or the learners receive
explicit feedback on their competence growth from the teacher. Self-assessment
is also possible. The classifier was trained using 790 lesson plans, 298
materials and up to 1,400 textbook tasks. Two people who received coding
training were involved in the coding and the inter-coder reliability for the
need for competence increased from a dynamic iota value of 0.615 to 0.646 over
two rounds of training. The Krippendorffs alpha value, on the other hand,
decreased from 0.516 to 0.484. The classifier is suitable for use in all
settings in which lesson plans and materials are to be reviewed with regard
to their implementation of the need for competence.",
native="Dieser Classifier bewertet Unterrichtsentwürfe und Lernmaterial danach,
ob sie das Bedürfnis nach Kompetenzerleben aus der Selbstbestimmungstheorie
der Motivation nach Deci und Ryan unterstützen. Das Kompetenzerleben stellt
das Bedürfnis dar, sich als wirksam zu erleben. Der Classifer unterteilt es
in drei Stufen, wobei 0 bedeutet, dass die Lernenden im Unterrichtsentwurf
bzw. Material keinen Raum haben, ihren eigenen Lernfortschritt wahrzunehmen
und auch keine Möglichkeit zum Selbstvergleich besteht. Bei einer Ausprägung
von 1 wird der Kompetenzzuwachs implizit, also z.B. durch die Durchführung
komplexer Übungen oder einer Peer-Kontrolle ermöglicht. Auf Stufe 2 wird der
Kompetenzzuwachs explizit aufgezeigt, indem jede:r Lernende einen objektiven
Einblick erhält. So kann hier bspw. ein Soll-Ist-Vergleich mit den Lernzielen
der Stunde erfolgen oder die Lernenden erhalten dezidiertes Feedback zu ihrem
Kompetenzzuwachs durch die Lehrkraft. Auch eine Selbstbewertung ist möglich.
Der Classifier wurde anhand von 790 Unterrichtsentwürfen, 298 Materialien und
bis zu 1400 Schulbuchaufgaben traniert. Es waren an der Kodierung zwei Personen
beteiligt, die eine Kodierschulung erhalten haben und die
Inter-Coder-Reliabilität für das Kompetenzerleben würde über zwei
Trainingsrunden von einem dynamischen Iota-Wert von 0,615 auf 0,646 gesteigert.
Der Krippendorffs Alpha-Wert sank hingegen von 0,516 auf 0,484. Er eignet sich
zum Einsatz in allen Settings, in denen Unterrichtsentwürfe und Lernmaterial
hinsichtlich ihrer Umsetzung des Kompetenzerlebens überprüft werden sollen.",
abstract_eng="This classifier targets the realization of the need for
competence from Deci and Ryan’s self-determination theory of motivation in l
esson plans and materials. It describes a learner’s need to perceive themselves
as capable. The variable need for competence is assessed by a scale of 0-2.
The classifier was developed using 790 lesson plans, 298 materials and up to
1,400 textbook tasks. A coding training was conducted and the inter-coder
reliabilities of different measures
(i.e. Krippendorff’s Alpha and Dynamic Iota Index) of the individual categories
were calculated at different points in time.",
abstract_native="Dieser Classifier bewertet Unterrichtsentwürfe und
Lernmaterial danach, ob sie das Bedürfnis nach Kompetenzerleben aus der
Selbstbestimmungstheorie der Motivation nach Deci & Ryan unterstützen. Das
Kompetenzerleben stellt das Bedürfnis dar, sich als wirksam zu erleben. Der
Classifer unterteilt es in drei Stufen und wurde anhand von 790
Unterrichtsentwürfen, 298 Materialien und bis zu 1400 Schulbuchaufgaben
entwickelt. Es wurden stets Kodierschulungen durchgeführt und die
Inter-Coder-Reliabilitäten der einzelnen Kategorien zu verschiedenen
Zeitpunkten berechnet.",
keywords_eng=c("Self-determination theory", "motivation", "lesson planning", "business didactics"),
keywords_native=c("Selbstbestimmungstheorie", "Motivation", "Unterrichtsplanung", "Wirtschaftsdidaktik")
```
In the case of a classifier, the description should include:
- A short reference to the theoretical models that guided the
development.
- A clear and detailed description of every single category/class.
- A short statement where the classifier can be used.
- A description of the kind and quantity of data used for training.
- Information on potential bias in the data.
- If possible, information about the inter-coder-reliability of the
coding process providing the data.
- If possible, provide a link to the corresponding text embedding
model.
Again, you can provide your description in HTML to include tables (e.g.
for reporting the reliability of the initial coding process) or links to
other sources and publications.
**Please do not report the performance values of your classifier in the
description.** These can be accessed directly via
`example_classifier$reliability$test_metric_mean`.
With the methods `set_publication_info` and `get_publication_info` you
can provide the bibliographic information of your classifier.
```{r, include = TRUE, eval=FALSE}
example_classifier$set_publication_info(
authors,
citation,
url=NULL)
```
In contrast to TextEmbeddingModels there are not different types of
author groups.
Finally, you can manage the license for using your classifier via
`set_software_license` and `get_software_license`.
```{r, include = TRUE, eval=FALSE}
example_classifier$set_software_license("GPL-3")
```
Similar to TextEmbeddingModels your classifier has to be licensed via "GPL-3" since
some of the software used for creating a classifier applies this license. For the
documentation you can choose between further license since the documentation is
not part of the software. For setting and receivinf the license you can call
the methods 'set_documentation_license' and 'get_documentation_license'
```{r, include = TRUE, eval=FALSE}
example_classifier$set_documentation_license("CC BY-SA")
```
Now you are ready for sharing your classifier. Please remember to save
your changes as described in the following section 3.2.
## 3.2 Saving and Loading
If you have created a classifier, saving and loading is very easy due to
the functions `save_ai_model` and `load_ai_model`. The process
for saving a model is similar to the process for text embedding models. You only have
to pass the model and a directory path to the function `save_ai_model`. The folder
name is set with `dir_name`.
```{r, include = TRUE, eval=FALSE}
save_ai_model(
model=classifier,
model_dir="classifiers",
dir_name="movie_review_classifier",
save_format = "default",
append_ID = FALSE)
```
In contrast to text embedding models you can specify the additional argument `save_format`.
In the case of pytorch models this arguments allows you to choose between
`save_format = "safetensors"` and `save_format = "pt"`.
We recommend to chose `save_format = "safetensors"` since this is a safer method
to save your models.
In the case of tensorflow models this arguments allows you to choose between
`save_format = "keras"`, `save_format = "tf"` and `save_format = "h5"`.
We recommend to chose `save_format = "keras"` since this is the recommended format
by keras.
If you set `save_format = "default"` .safetensors is used for pytorch models and
.keras is used for tensorflow models.
If you would like to load a model you can call the function `load_ai_model`.
```{r, include = TRUE, eval=FALSE}
classifier<-load_ai_model(
model_dir="classifiers/movie_review_classifier")
```
> **Note:** Classifiers depend on the framework which was used during creation.
Thus, a classifier is always initalized with its original framework. The argument
`ml_framework` has no effect.
In the case you would like to share your classifier with a broader audience we recommend
to set `dir_name=NULL` and `append_ID = TRUE`. This will create a folder name
automatically using the classifier's name and unique ID.
Similar to text embedding models ID is added to the name during the creation
of the classifier ensuring a unique name for your model. With this options
the folder name may look like `"movie_review_classifier_ID_oWsaNEB7b09A1pPB"`.
If you would like to share your classifier with other persons you have to
provide all files within this folder `"classifiers/movie_review_classifier_ID_oWsaNEB7b09A1pPB"`.
Since all files are stored with a specific structure do **not** change or
edit the files manually.
Please note that you need the `TextEmbeddingModel` that was used for
training in order to predict new data with the classifier. You can
request the name, label, and configuration of that model with
`example_classifier$get_text_embedding_model()$model`. Thus, if you would like to
share your classifier, ensure that you also share the corresponding text
embedding model.
If you would like to apply your classifier to new data, two steps are
necessary. First, you must transform the raw text into a numerical
expression by using *exactly* the same text embedding model that was
used to train your classifier. The resulting object can be passed to the
method `predict` and you will receive the predictions together with an
estimate of certainty for each class/category.
More information can be found in the vignette [02a Using Aifeducation Studio](https://fberding.github.io/aifeducation/articles/gui_aife_studio.html) or in
[02 classification tasks](https://fberding.github.io/aifeducation/articles/classification_tasks.html).