-
Notifications
You must be signed in to change notification settings - Fork 0
/
aifeducation.Rmd
353 lines (276 loc) · 13.3 KB
/
aifeducation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
---
title: "01 Get started"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{01 Get started}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# 1) Installation and Technical Requirements
## Introduction
Several packages allow users to use machine learning directly in *R*
such as [nnet](https://cran.r-project.org/package=nnet) for single layer
neural nets, [rpart](https://CRAN.R-project.org/package=rpart) for
decision trees, and [ranger](https://CRAN.R-project.org/package=ranger)
for random forests. Furthermore, with
[mlr3verse](https://CRAN.R-project.org/package=mlr3verse) a series of
packages exists for managing different algorithms with a unified
interface.
These packages can be used with a 'normal' computer and provide an easy
installation. In terms of natural language processing, these approaches
are currently limited. State-of-the-art approaches rely on neural nets
with multiple layers and consist of a huge number of parameters making
them computationally demanding. With specialized libraries such as keras, PyTorch
and tensorflow, graphical processing units (gpu) can help to speed up
computations significantly. However, many of these specialized libraries
for machine learning are written in python. Fortunately, an interface to
python is provided via the *R* package
[reticulate](https://cran.r-project.org/package=reticulate).
The R package *Artificial Intelligence for Education (aifeducation)*
aims to provide educators, educational researchers, and social
researchers a convincing interface to these state-of-the-art models for
natural language processing and tries to address the special needs and
challenges of the educational and social sciences. The package currently
supports the application of Artificial Intelligence (AI) for tasks such
as text embedding, classification, and question answering.
Since state-of-the-art approaches in natural language processing rely on
large models compared to classical statistical methods (e.g., latent
class analysis, structural equation modeling) and are based largely on
python, some additional installation steps are necessary.
If you would like to train and to develop your own models and AIs, a
compatible graphic device is necessary. Even a low performing graphic
device can speed up computations significantly. If you prefer using
pre-trained models however, this is **not** necessary. In this case a
'normal' office computer without a graphic device should be sufficient
in most cases.
## Step 1 - Install the R Package
In order to use the package, you first need to install it. This can be
done by:
```{r, include = TRUE, eval=FALSE}
install.packages("aifeducation")
```
With this command, all necessary *R* packages are installed on your
machine.
## Step 2 - Install Python
Since natural language processing with neural nets is based on
models which are computationally intensive, *keras*, *PyTorch*, and *tensorflow* are
used within this package together with some other specialized python
libraries. To install them, you need to install python on your machine
first. This may take some time.
```{r, include = TRUE, eval=FALSE}
reticulate::install_python()
```
You can check if everything is working by using the function
`reticulate::py_available()`. This should return `TRUE`.
```{r, include = TRUE, eval=FALSE}
reticulate::py_available(initialize = TRUE)
```
## Step 3 - Install Miniconda
The next step is to install miniconda since *aifeducation* uses conda
environments for managing the different modules.
```{r, include = TRUE, eval=FALSE}
reticulate::install_miniconda()
```
## Step 4 - Install Support for Graphic Devices
PyTorch and tensorflow as underlying machine learning backend run on MacOS,
Linux, and Windows. However, there are some limitations for accelerate
computations with graphic cards. The following table provides an overview.
*Table: Possible gpu acceleration by operating system*
|Operating System|PyTorch|tensorflow|
|----------------|-------|----------|
|MacOS |No |No |
|Linux |Yes |Yes |
|Windows |Yes |<= version 2.10|
|Windows with WSL|Yes |Yes |
If you have a suitable machine and would like to use a graphic card for computations you need to install
some further software. If not you can skip this step. A list with links to downloads can be found here if you would
like to use tensorflow as machine learning framework:
https://www.tensorflow.org/install/pip#linux
If you would like to use PyTorch as framework you can find further
information here: https://pytorch.org/get-started/locally/
In general you need
- NVIDIA GPU Drivers
- CUDA Toolkit
- cuDNN SDK
Except the gpu drivers all components will be installed in step 5 automatically.
If you would like to use Windows with WSL (Windows Subsystem for Linux) installing
gpu acceleration is a more complex topic. In this case please refer to the specific
Windows or Ubuntu documentations.
## Step 5 - Install Specialized Python Libraries
If everything is working, you can now install the remaining python
libraries. For convenience, *aifeducation* comes with an auxiliary
function `install_py_modules()` doing that for you.
```{r, include = TRUE, eval=FALSE}
#For Linux
aifeducation::install_py_modules(envname="aifeducation",
install="all",
remove_first=FALSE,
tf_version="<=2.15",
pytorch_cuda_version = "12.1"
cpu_only=FALSE)
#For Windows and MacOS
aifeducation::install_py_modules(envname="aifeducation",
install="all",
remove_first=FALSE,
tf_version="<=2.15",
pytorch_cuda_version = "12.1"
cpu_only=TRUE)
```
> With `install="all"` you can decide which machine learning framework should be
installed. Use `install="all"` to request the installation of both 'PyTorch' and
'tensorflow'. If you would like to install only 'PyTorch' or 'tensorflow' set
`install="pytorch"` or `install="tenorflow"`. For *aifeducation* a version of
tensorflow between 2.13 and 2.15 is necessary.
It is very important that you call this function *before* loading the package
the first time. If you load the library without installing the necessary modules
an error may occur.
This function installs the following python modules:
**both frameworks:**
- transformers,
- tokenizers,
- datasets,
- codecarbon
**Pytorch**
- torch,
- torcheval,
- safetensors,
- accelerate
- pandas
**Tensorflow**
- keras,
- tensorflow
and its dependencies in the environment "aifeducation".
If you would like to use *aifeducation* with other packages or within
other environments, please ensure that these python modules are
available. For gpu support some further packages are installed.
With `check_aif_py_modules()` you can check, if all modules are
successfully installed or a specific machine learning framework.
```{r, include = TRUE, eval=FALSE}
aifeducation::check_aif_py_modules(print=TRUE,
check="pytorch")
aifeducation::check_aif_py_modules(print=TRUE,
check="tensorflow")
```
Now everything is ready to use the package.
> **Important note:** When you start a new *R* session, please note that you have to call
`reticulate::use_condaenv(condaenv = "aifeducation")` **before** loading the library
to make the python modules available for work.
# 2) Configuration of Tensorflow
In general, educators and educational researchers neither have access to
high performance computing nor do they own computers with a performing
graphic device for their work. Thus, some additional configuration can
be done to get computations working on your machine.
If you do use a computer that does own a graphic device, but you would like to use cpu only
you can disable the graphic device support of tensorflow with the function
`set_config_cpu_only()`.
```{r, include = TRUE, eval=FALSE}
aifeducation::set_config_cpu_only()
```
Now your machine only uses cpus only for computations.
If your machine has a graphic card but with limited memory, it is
recommended to change the configuration of the memory usage with
`set_config_gpu_low_memory()`
```{r, include = TRUE, eval=FALSE}
aifeducation::set_config_gpu_low_memory()
```
This enables your machine to compute 'large' models with limited
resources. For 'small' models, this option is not relevant since it
decreases the computational speed.
Finally, in some cases you might want to disable tensorflow to print
information on the console. You can change the behavior with the
function `set_config_tf_logger()`.
```{r, include = TRUE, eval=FALSE}
aifeducation::set_config_tf_logger()
```
You can choose between five levels
"FATAL", "ERROR", "WARN", "INFO", and "DEBUG", setting the minimal level
for logging.
# 3 Starting a New Session
Before you can work with *aifeducation* you must set up a new *R*
session. First, it is necessary that you load the library. Second, you
must set up python via reticulate. In case you installed python as
suggested in this vignette you may start a new session like this:
```{r, include = TRUE, eval=FALSE}
reticulate::use_condaenv(condaenv = "aifeducation")
library(aifeducation)
set_transformers_logger("ERROR")
```
Next you have to choose the machine learning framework you would like to use.
You can set the framework for the complete session with
```{r, include = TRUE, eval=FALSE}
#For tensorflow
aifeducation_config$set_global_ml_backend("tensorflow")
#For PyTorch
aifeducation_config$set_global_ml_backend("pytorch")
```
You can change the framework at anytime during a session by calling this method
again or by passing the framework to the `ml_framework` argument of a function
or method. Please note that not all models are available for both frameworks and
that the weights of trained models cannot be shared across frameworks for all models.
In the case that you would like to use tensorflow now is a good time to
configure that backend, since some configurations
can only be done **before** tensorflow is used the first time.
```{r, include = TRUE, eval=FALSE}
#if you would like to use only cpus
set_config_cpu_only()
#if you have a graphic device with low memory
set_config_gpu_low_memory()
#if you would like to reduce the tensorflow output to errors
set_config_os_environ_logger(level = "ERROR")
```
> **Note:** Please remember: Every time you start a new session in *R* you have to
to set the correct conda environment, to load the library *aifeducation*, and
to chose your machine learning framework.
# 4) Tutorials and Guides
A guide how to use the graphical user interface can be found in vignette
[02a classification tasks](https://fberding.github.io/aifeducation/articles/gui_aife_studio.html).
A short introduction into the
package with examples for classification tasks can be found in vignette
[02b classification tasks](https://fberding.github.io/aifeducation/articles/classification_tasks.html).
Documenting and sharing your work is described in vignette
[03 sharing and using trained AI/models](sharing_and_publishing.html)
# 5) Update *aifeducation*
In the case you already use *aifeducation* and you want to update to a newer version
of this package it is recommended to update the used python libraries. The
easiest way is to remove the conda environment "aifeducation" and to install the
libraries into a fresh environment. This can be done by setting `remove_first=TRUE`
in `install_py_modules`.
```{r, include = TRUE, eval=FALSE}
#For Linux
aifeducation::install_py_modules(envname="aifeducation",
install="all",
remove_first=TRUE,
tf_version="<=2.14",
pytorch_cuda_version = "12.1"
cpu_only=FALSE)
#For Windows with gpu support
aifeducation::install_py_modules(envname="aifeducation",
install="all",
remove_first=TRUE,
tf_version="<=2.10",
pytorch_cuda_version = "12.1"
cpu_only=FALSE)
#For Windows without gpu support
aifeducation::install_py_modules(envname="aifeducation",
install="all",
remove_first=TRUE,
tf_version="<=2.14",
pytorch_cuda_version = "12.1"
cpu_only=TRUE)
#For MacOS
aifeducation::install_py_modules(envname="aifeducation",
install="all",
remove_first=TRUE,
tf_version="<=2.14",
pytorch_cuda_version = "12.1"
cpu_only=TRUE)
```