From f62267028ea920115f0293a508671ad771221ff4 Mon Sep 17 00:00:00 2001 From: Florian Berding Date: Wed, 24 Apr 2024 12:21:32 +0200 Subject: [PATCH] Final version of 0.3.3 (2) --- docs/index.html | 6 +++--- docs/news/index.html | 2 +- docs/pkgdown.yml | 2 +- docs/search.json | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/index.html b/docs/index.html index fb0b052..4c039f5 100644 --- a/docs/index.html +++ b/docs/index.html @@ -268,7 +268,7 @@

Training AI under Challenging
  • imbalanced data: Finally, data in the educational and social sciences often occurs in an imbalanced pattern as several empirical studies show (Bloemen 2011; Stütz et al. 2022). Imbalanced means that some categories or characteristics of a data set have very high absolute frequencies compared to other categories and characteristics. Imbalance during AI training guides algorithms to focus and prioritize the categories and characteristics with high absolute frequencies, increasing the risk to miss categories/characteristics with low frequencies (Haixiang et al. 2017). This can lead AI to prefer special groups of people/material, imply false recommendations and conclusions, or to miss rare categories or characteristics.
  • -

    In order to deal with the problem of imbalanced data sets, the package integrates the Synthetic Minority Oversampling Technique into the learning process. Currently, the Basic Synthetic Minority Oversampling Technique (Chawla et al. 2002), Density-Bases Synthetic Minority Oversampling Technique (Bunkhumpornpat, Sinapiromsaran & Lursinsap 2012), and Adaptive Synthetic Sampling Approach for Imbalanced Learning (Hem Garcia & Li 2008) are implemented via the R package smotefamiliy.

    +

    In order to deal with the problem of imbalanced data sets, the package integrates the Synthetic Minority Oversampling Technique into the learning process. Currently, the Basic Synthetic Minority Oversampling Technique (Chawla et al. 2002), Density-Based Synthetic Minority Oversampling Technique (Bunkhumpornpat, Sinapiromsaran & Lursinsap 2012), and Adaptive Synthetic Sampling Approach for Imbalanced Learning (Hem Garcia & Li 2008) are implemented via the R package smotefamiliy.

    In order to address the problem of small data sets, training loops of AI integrate pseudo-labeling (e.g., Lee 2013). Pseudo-labeling is a technique which can be used for supervised learning. More specifically, educators and researchers rate a part of a data set and train AI with this very part. The remainder of the data is not processed by humans. Instead, AI uses this part of data to learn on its own. Thus, educators and researchers only have to provide additional data for the AI’s learning process without coding it themselves. This offers the possibility to add more data to the training process and to reduce labor costs.

    @@ -286,7 +286,7 @@

    Evaluating PerformanceCohen’s Kappa with squared weights
  • Fleiss’ Kappa for multiple raters without exact estimation
  • -

    In Addition the some traditional measures from the machine learning literature are also available:

    +

    In addition the some traditional measures from the machine learning literature are also available:

    • Precision
    • Recall
    • @@ -297,7 +297,7 @@

      Evaluating Performance

      Sharing Trained AI

      -

      Since the package is based on keras, tensorflow, and the transformer libraries, every trained AI can be shared with other educators and researchers. The package supports an easy use of pre-trained AI within R, but also provides the possibility to export trained AI to other environments.

      +

      Since the package is based on torch, tensorflow, and the transformer libraries, every trained AI can be shared with other educators and researchers. The package supports an easy use of pre-trained AI within R, but also provides the possibility to export trained AI to other environments.

      Using a pre-trained AI for classification only requires the classifier and the corresponding text embedding model. Use Aifeducation Studio or just load both to R and start predictions. Vignette 02a Using Aifeducation Studio describes how to use the user interface. Vignette 02b Classification Tasks describes how to save and load the objects with R syntax. In vignette 03 Sharing and Using Trained AI/Models you can find a detailed guide on how to document and share your models.

    diff --git a/docs/news/index.html b/docs/news/index.html index 31ef799..dd3b848 100644 --- a/docs/news/index.html +++ b/docs/news/index.html @@ -57,7 +57,7 @@
    -

    aifeducation 0.3.3

    +

    aifeducation 0.3.3

    CRAN release: 2024-04-22

    Graphical User Interface Aifeducation Studio

    • Fixed a bug concerning the ids of .pdf and .csv files. Now the ids are correctly saved within a text collection file.
    • Fixed a bug while checking for the selection of at least one file type during creation of a text collection.
    • diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml index 0095231..5a42de0 100644 --- a/docs/pkgdown.yml +++ b/docs/pkgdown.yml @@ -6,5 +6,5 @@ articles: classification_tasks: classification_tasks.html gui_aife_studio: gui_aife_studio.html sharing_and_publishing: sharing_and_publishing.html -last_built: 2024-04-19T10:01Z +last_built: 2024-04-24T10:18Z diff --git a/docs/search.json b/docs/search.json index 2c6aa43..8c9e4fa 100644 --- a/docs/search.json +++ b/docs/search.json @@ -1 +1 @@ -[{"path":[]},{"path":"/articles/aifeducation.html","id":"introduction","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Introduction","title":"01 Get started","text":"Several packages allow users use machine learning directly R nnet single layer neural nets, rpart decision trees, ranger random forests. Furthermore, mlr3verse series packages exists managing different algorithms unified interface. packages can used ‘normal’ computer provide easy installation. terms natural language processing, approaches currently limited. State---art approaches rely neural nets multiple layers consist huge number parameters making computationally demanding. specialized libraries keras, PyTorch tensorflow, graphical processing units (gpu) can help speed computations significantly. However, many specialized libraries machine learning written python. Fortunately, interface python provided via R package reticulate. R package Artificial Intelligence Education (aifeducation) aims provide educators, educational researchers, social researchers convincing interface state---art models natural language processing tries address special needs challenges educational social sciences. package currently supports application Artificial Intelligence (AI) tasks text embedding, classification, question answering. Since state---art approaches natural language processing rely large models compared classical statistical methods (e.g., latent class analysis, structural equation modeling) based largely python, additional installation steps necessary. like train develop models AIs, compatible graphic device necessary. Even low performing graphic device can speed computations significantly. prefer using pre-trained models however, necessary. case ‘normal’ office computer without graphic device sufficient cases.","code":""},{"path":"/articles/aifeducation.html","id":"step-1---install-the-r-package","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 1 - Install the R Package","title":"01 Get started","text":"order use package, first need install . can done : command, necessary R packages installed machine.","code":"install.packages(\"aifeducation\")"},{"path":"/articles/aifeducation.html","id":"step-2---install-python","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 2 - Install Python","title":"01 Get started","text":"Since natural language processing neural nets based models computationally intensive, keras, PyTorch, tensorflow used within package together specialized python libraries. install , need install python machine first. may take time. can check everything working using function reticulate::py_available(). return TRUE.","code":"reticulate::install_python() reticulate::py_available(initialize = TRUE)"},{"path":"/articles/aifeducation.html","id":"step-3---install-miniconda","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 3 - Install Miniconda","title":"01 Get started","text":"next step install miniconda since aifeducation uses conda environments managing different modules.","code":"reticulate::install_miniconda()"},{"path":"/articles/aifeducation.html","id":"step-4---install-support-for-graphic-devices","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 4 - Install Support for Graphic Devices","title":"01 Get started","text":"PyTorch tensorflow underlying machine learning backend run MacOS, Linux, Windows. However, limitations accelerate computations graphic cards. following table provides overview. Table: Possible gpu acceleration operating system suitable machine like use graphic card computations need install software. can skip step. list links downloads can found like use tensorflow machine learning framework: https://www.tensorflow.org/install/pip#linux like use PyTorch framework can find information : https://pytorch.org/get-started/locally/ general need NVIDIA GPU Drivers CUDA Toolkit cuDNN SDK Except gpu drivers components installed step 5 automatically. like use Windows WSL (Windows Subsystem Linux) installing gpu acceleration complex topic. case please refer specific Windows Ubuntu documentations.","code":""},{"path":"/articles/aifeducation.html","id":"step-5---install-specialized-python-libraries","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 5 - Install Specialized Python Libraries","title":"01 Get started","text":"everything working, can now install remaining python libraries. convenience, aifeducation comes auxiliary function install_py_modules() . install=\"\" can decide machine learning framework installed. Use install=\"\" request installation ‘PyTorch’ ‘tensorflow’. like install ‘PyTorch’ ‘tensorflow’ set install=\"pytorch\" install=\"tenorflow\". aifeducation version tensorflow 2.13 2.15 necessary. important call function loading package first time. load library without installing necessary modules error may occur. function installs following python modules: frameworks: - transformers, - tokenizers, - datasets, - codecarbon Pytorch - torch, - torcheval, - safetensors, - accelerate - pandas Tensorflow - keras, - tensorflow dependencies environment “aifeducation”. like use aifeducation packages within environments, please ensure python modules available. gpu support packages installed. check_aif_py_modules() can check, modules successfully installed specific machine learning framework. Now everything ready use package. Important note: start new R session, please note call reticulate::use_condaenv(condaenv = \"aifeducation\") loading library make python modules available work.","code":"#For Linux aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=FALSE, tf_version=\"<=2.15\", pytorch_cuda_version = \"12.1\" cpu_only=FALSE) #For Windows and MacOS aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=FALSE, tf_version=\"<=2.15\", pytorch_cuda_version = \"12.1\" cpu_only=TRUE) aifeducation::check_aif_py_modules(print=TRUE, check=\"pytorch\") aifeducation::check_aif_py_modules(print=TRUE, check=\"tensorflow\")"},{"path":"/articles/aifeducation.html","id":"configuration-of-tensorflow","dir":"Articles","previous_headings":"","what":"2) Configuration of Tensorflow","title":"01 Get started","text":"general, educators educational researchers neither access high performance computing computers performing graphic device work. Thus, additional configuration can done get computations working machine. use computer graphic device, like use cpu can disable graphic device support tensorflow function set_config_cpu_only(). Now machine uses cpus computations. machine graphic card limited memory, recommended change configuration memory usage set_config_gpu_low_memory() enables machine compute ‘large’ models limited resources. ‘small’ models, option relevant since decreases computational speed. Finally, cases might want disable tensorflow print information console. can change behavior function set_config_tf_logger(). can choose five levels “FATAL”, “ERROR”, “WARN”, “INFO”, “DEBUG”, setting minimal level logging.","code":"aifeducation::set_config_cpu_only() aifeducation::set_config_gpu_low_memory() aifeducation::set_config_tf_logger()"},{"path":"/articles/aifeducation.html","id":"starting-a-new-session","dir":"Articles","previous_headings":"","what":"3 Starting a New Session","title":"01 Get started","text":"can work aifeducation must set new R session. First, necessary load library. Second, must set python via reticulate. case installed python suggested vignette may start new session like : Next choose machine learning framework like use. can set framework complete session can change framework anytime session calling method passing framework ml_framework argument function method. Please note models available frameworks weights trained models shared across frameworks models. case like use tensorflow now good time configure backend, since configurations can done tensorflow used first time. Note: Please remember: Every time start new session R set correct conda environment, load library aifeducation, chose machine learning framework.","code":"reticulate::use_condaenv(condaenv = \"aifeducation\") library(aifeducation) set_transformers_logger(\"ERROR\") #For tensorflow aifeducation_config$set_global_ml_backend(\"tensorflow\") #For PyTorch aifeducation_config$set_global_ml_backend(\"pytorch\") #if you would like to use only cpus set_config_cpu_only() #if you have a graphic device with low memory set_config_gpu_low_memory() #if you would like to reduce the tensorflow output to errors set_config_os_environ_logger(level = \"ERROR\")"},{"path":"/articles/aifeducation.html","id":"tutorials-and-guides","dir":"Articles","previous_headings":"","what":"4) Tutorials and Guides","title":"01 Get started","text":"guide use graphical user interface can found vignette 02a classification tasks. short introduction package examples classification tasks can found vignette 02b classification tasks. Documenting sharing work described vignette 03 sharing using trained AI/models","code":""},{"path":"/articles/aifeducation.html","id":"update-aifeducation","dir":"Articles","previous_headings":"","what":"5) Update aifeducation","title":"01 Get started","text":"case already use aifeducation want update newer version package recommended update used python libraries. easiest way remove conda environment “aifeducation” install libraries fresh environment. can done setting remove_first=TRUE install_py_modules.","code":"#For Linux aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=TRUE, tf_version=\"<=2.14\", pytorch_cuda_version = \"12.1\" cpu_only=FALSE) #For Windows with gpu support aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=TRUE, tf_version=\"<=2.10\", pytorch_cuda_version = \"12.1\" cpu_only=FALSE) #For Windows without gpu support aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=TRUE, tf_version=\"<=2.14\", pytorch_cuda_version = \"12.1\" cpu_only=TRUE) #For MacOS aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=TRUE, tf_version=\"<=2.14\", pytorch_cuda_version = \"12.1\" cpu_only=TRUE)"},{"path":"/articles/classification_tasks.html","id":"introduction-and-overview","dir":"Articles","previous_headings":"","what":"1 Introduction and Overview","title":"02b Text Embedding and Classification Tasks","text":"educational social sciences, assignment observation scientific concepts important task allows researchers understand observation, generate new insights, derive recommendations research practice. educational science, several areas deal kind task. example, diagnosing students’ characteristics important aspect teachers’ profession necessary understand promote learning. Another example use learning analytics, data students used provide learning environments adapted individual needs. another level, educational institutions schools universities can use information data-driven performance decisions (Laurusson & White 2014) well improve . case, real-world observation aligned scientific models use scientific knowledge technology improved learning instruction. Supervised machine learning one concept allows link real-world observations existing scientific models theories (Berding et al. 2022). educational sciences great advantage allows researchers use existing knowledge insights applications AI. drawback approach training AI requires information real world observations information corresponding alignment scientific models theories. valuable source data educational science written texts, since textual data can found almost everywhere realm learning teaching (Berding et al. 2022). example, teachers often require students solve task provide written form. Students create solution tasks often document short-written essay presentation. data can used analyze learning teaching. Teachers’ written tasks students may provide insights quality instruction students’ solutions may provide insights learning outcomes prerequisites. AI can helpful assistant analyzing textual data since analysis textual data challenging time-consuming task humans. vignette, like show create AI can help tasks using package aifedcuation. Please note introduction content analysis, natural language processing machine learning beyond scope vignette. like learn , please refer cited literature. start necessary introduce definition understanding basic concepts since applying AI educational contexts means combine knowledge different scientific disciplines using different, sometimes overlapping concepts. Even within research area, concepts unified. Figure 1 illustrates package’s understanding. Since aifeducation looks application AI classification tasks perspective empirical method content analysis, overlapping concepts content analysis machine learning. content analysis, phenomenon like performance colors can described scale/dimension made several categories (e.g. Schreier 2012 pp. 59). example, exam’s performance (scale/dimension) “good”, “average” “poor”. terms colors (scale/dimension) categories “blue”, “green”, etc. Machine learning literature uses words describe kind data. machine learning, “scale” “dimension” correspond term “label” “categories” refer term “classes” (Chollet, Kalinowski & Allaire 2022, p. 114). clarifications, classification means text assigned correct category scale text labeled correct class. Figure 2 illustrates, two kinds data necessary train AI classify text line supervised machine learning principles. providing AI textual data input data corresponding information class target data, AI can learn texts imply specific class category. exam example, AI can learn texts imply “good”, “average” “poor” judgment. training, AI can applied new texts predict likely class every new text. generated class can used statistical analysis derive recommendations learning teaching. achieve support artificial intelligence, several steps necessary. Figure 3 provides overview integrating functions objects aifeducation. first step transform raw texts form computers can use. , raw texts must transformed numbers. modern approaches, usually done word embeddings. Campesato (2021, p. 102) describes “collective name set language modeling feature learning techniques (…) words phrases vocabulary mapped vectors real numbers.” definition word vector similar: „Word vectors represent semantic meaning words vectors context training corpus.” (Lane, Howard & Hapke 2019, p. 191) Campesato (2021, pp. 112) clusters approaches creating word embeddings three groups, reflecting ability provide context-sensitive numerical representations. Approaches group one account context. Typical methods rely bag--words assumptions. Thus, normally able provide word embedding single words. Group two consists approaches word2vec, GloVe (Pennington, Socher & Manning 2014) fastText, able provide one embedding word regardless context. Thus, account one context. last group consists approaches BERT (Devlin et al. 2019), able produce multiple word embeddings depending context words. different groups, aifedcuation implements several methods. Topic Modeling: Topic modeling approach uses frequencies tokens within text. frequencies tokens models observable variables one latent topic (Campesato 2021, p. 113). estimation topic model often based Latent Dirichlet Analysis (LDA) describes text distribution topics. topics described distribution words/tokens (Campesato 2021, p. 114). relationship texts, words, topics can used create text embedding computing relative amount every topic text based every token text. GlobalVectorClusters: GlobalVectors newer approach utilizes co-occurrence words/tokens compute GlobalVectors (Campesato 2021, p. 110). vectors generated way tokens/words similar meaning located close (Pennington, Socher & Manning 2014). order create text embedding word embeddings, aifeducation groups tokens clusters based vectors. Thus, tokens similar meaning members cluster. text embedding, tokens text counted every cluster frequencies every cluster text used numerical representation text. Transformers: Transformers current state---art approach many natural language tasks (Tunstall, von Werra & Wolf 2022, p. xv). help self-attention mechanism (Vaswani et al. 2017), able produce context-sensitive word embeddings (Chollet, Kalinowski & Allaire, 2022, pp. 366). approaches managed used unified interface provided object TextEmbeddingModel. object can easily convert raw texts numerical representation, can use different classification tasks time. makes possible reduce computational time. created text embedding stored object class EmbeddedText. object additionally contains information text embedding model created object. best case can apply existing text embedding model using transformer Huggingface using model colleagues. , aifeducation provides several functions allowing create models. Depending approach like use, different steps necessary. case Topic Modeling GlobalVectorClusters, must first create draft vocabulary two functions bow_pp_create_vocab_draft() bow_pp_create_basic_text_rep(). calling functions, determine central properties resulting model. case transformers, first configure train vocabulary create_xxx_model() next step can train model train_tune_xxx_model(). Every step explained next chapters. Please note xxx stands different architectures transformers supported aifedcuation. object class TextEmbeddingModel can create input data supervised machine learning process. Additionally, need target data must named factor containing classes/categories text. kinds data, able create new object class TextEmbeddingClassifierNeuralNet classifier. train classifier several options cover detail chapter 3. training classifier can share researchers apply new texts. Please note application new texts requires text transformed numbers exactly text embedding model passing text classifier. Please note: pass raw texts classifier, embedded texts work! next chapters, guide complete process, starting creation text embedding models. Please note creation new text embedding model necessary rely existing model rely pre-trained transformer.","code":""},{"path":[]},{"path":"/articles/classification_tasks.html","id":"starting-a-new-session","dir":"Articles","previous_headings":"","what":"2.1 Starting a New Session","title":"02b Text Embedding and Classification Tasks","text":"can work aifeducation must set new R session. First, necessary load library. Second, must set python via reticulate. case installed python suggested vignette 01 Get started may start new session like : Next choose machine learning framework like use. can set framework complete session Setting global machine learning framework convenience. can change framework time session calling method setting argument ‘ml_framework’ methods functions manually. case like use tensorflow now good time configure backend, since configurations can done tensorflow used first time. Note: Please remember: Every time start new session R set correct conda environment, load library aifeducation, chose machine learning framework.","code":"reticulate::use_condaenv(condaenv = \"aifeducation\") library(aifeducation) #For tensorflow aifeducation_config$set_global_ml_backend(\"tensorflow\") set_transformers_logger(\"ERROR\") #For PyTorch aifeducation_config$set_global_ml_backend(\"pytorch\") set_transformers_logger(\"ERROR\") #if you would like to use only cpus set_config_cpu_only() #if you have a graphic device with low memory set_config_gpu_low_memory() #if you would like to reduce the tensorflow output to errors set_config_os_environ_logger(level = \"ERROR\")"},{"path":"/articles/classification_tasks.html","id":"reading-texts-into-r","dir":"Articles","previous_headings":"","what":"2.2 Reading Texts into R","title":"02b Text Embedding and Classification Tasks","text":"applications aifeducation ’s necessary read text like use R. task, several packages available CRAN. experience good package readtext since allows process different kind sources textual data. Please refer readtext’s documentation details. installed package machine, can request example, stored texts excel sheet two columns (texts texts id texts’ id) can read data crucial pass file path file name column texts text_field name column id docid_field. cases may stored text separate file (e.g., .txt .pdf). cases can pass directory files read data. following example files stored directory “data”. read texts sever files need specify arguments docid_field text_field. id texts automatically set file names. text read recommend text cleaning. Please refer documentation function readtext within readtext library information. Now everything ready start preparation tasks.","code":"install.packages(\"readtext\") #for excel files textual_data<-readtext::readtext( file=\"text_data.xlsx\", text_field = \"texts\", docid_field = \"id\" ) #read all files with the extension .txt in the directory data textual_data<-readtext::readtext( file=\"data/*.txt\" ) #read all files with the extension .pdf in the directory data textual_data<-readtext::readtext( file=\"data/*.pdf\" ) #remove multiple spaces and new lines textual_data$text=stringr::str_replace_all(textual_data$text,pattern = \"[:space:]{1,}\",replacement = \" \") #remove hyphenation textual_data$text=stringr::str_replace_all(textual_data$text,pattern = \"-(?=[:space:])\",replacement = \"\")"},{"path":[]},{"path":"/articles/classification_tasks.html","id":"example-data-for-this-vignette","dir":"Articles","previous_headings":"3 Preparation Tasks","what":"3.1 Example Data for this Vignette","title":"02b Text Embedding and Classification Tasks","text":"illustrate steps vignette, use data educational settings since data generally protected privacy policies. Therefore, use data set data_corpus_moviereviews package quanteda.textmodels illustrate usage package. quanteda.textmodels automatically installed install aifeducation. now data set three columns. first contains ID movie review, second contains rating movie (positive negative), third column contains raw texts. can see, data balanced. 1,000 reviews imply positive rating movie 1,000 imply negative rating. tutorial, modify data set setting half negative positive reviews NA, indicating reviews labeled. Furthermore, bring imbalance setting 250 positive reviews NA. now use data show use different objects functions aifeducation.","code":"example_data<-data.frame( id=quanteda::docvars(quanteda.textmodels::data_corpus_moviereviews)$id2, label=quanteda::docvars(quanteda.textmodels::data_corpus_moviereviews)$sentiment) example_data$text<-as.character(quanteda.textmodels::data_corpus_moviereviews) table(example_data$label) #> #> neg pos #> 1000 1000 example_data$label[c(1:500,1001:1500)]=NA summary(example_data$label) #> neg pos NA's #> 500 500 1000 example_data$label[1501:1750]=NA summary(example_data$label) #> neg pos NA's #> 500 250 1250"},{"path":"/articles/classification_tasks.html","id":"topic-modeling-and-globalvectorclusters","dir":"Articles","previous_headings":"3 Preparation Tasks","what":"3.2 Topic Modeling and GlobalVectorClusters","title":"02b Text Embedding and Classification Tasks","text":"like create new text embedding model Topic Modeling GlobalVectorClusters, first create draft vocabulary. can calling function bow_pp_create_vocab_draft(). main input function vector texts. function’s aims create list tokens texts, reduce tokens tokens carry semantic meaning, provide lemma every token. Since Topic Modeling depends bag--word approach, reason pre-process step reduce tokens tokens really carry semantic meaning. general, tokens words either nouns, verbs adjectives (Papilloud & Hinneburg 2018, p. 32). example data, application function : can see, additional parameter: path_language_model. must insert path udpipe pre-trained language model since function uses udpipe package part--speech tagging lemmataziation. collection pre-trained models 65 languages can found [https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3131]. Just download relevant model machine provide path model. parameter upos can select tokens selected. example, tokens represent noun, adjective verb remain analysis. list possible tags can found : [https://universaldependencies.org/u/pos/index.html]. Please forget provide label udpipe model use please also provide language analyzing. information important since transferred text embedding model. researchers/users need information decide model help work. next step, can use draft vocabulary create basic text representation function bow_pp_create_basic_text_rep(). function takes raw texts draft vocabulary main input. function aims remove tokens referring stopwords, clean data (e.g., removing punctuation, numbers), lower case tokens requested, remove tokens specific minimal frequency, remove tokens occur many documents create document-feature-matrix (dfm), create feature-co-occurrence-matrix (fcm). Applied example, call function look like : data takes raw texts vocab_draft takes draft vocabulary created first step. main goal create document-feature-matrix(dfm) feature-co- occurrence-matrix (fcm). dfm matrix reports texts rows number tokens columns. matrix later used create text embedding model based topic modeling. dfm reduced tokens correspond part--speech tags vocabulary draft. Punctuation, symbols, numbers etc. removed matrix set corresponding parameter TRUE. set use_lemmata = TRUE can reduce dimensionality matrix using lemmas instead tokens (Papilloud & Hinneburg 2018, p.33). set to_lower = TRUE tokens transformed lower case. end get matrix tries represent semantic meaning text smallest possible number tokens. applies fcm. , tokens/features reduced way. However, features reduced, token’s co-occurrence calculated. aim window used shifted across text, counting tokens left right token investigation. size window can determined window. weights can provide weights counting. example, tokens far away token investigation count less tokens closer token investigation. fcm later used create text embedding model based GlobalVectorClusters. may notice, dfm counts words text. Thus, position text within sentence matter. lower-case tokens use lemmas, syntactic information lost advantage dfm lower dimensionality losing little semantic meaning. contrast, fcm matrix describes often different tokens occur together. Thus, fcm recovers part position words sentence text. Now, everything ready create new text embedding model based Topic Modeling GlobalVectorClusters. show create new model, look preparation new transformer.","code":"vocab_draft<-bow_pp_create_vocab_draft( path_language_model=\"language_model/english-gum-ud-2.5-191206.udpipe\", data=example_data$text, upos=c(\"NOUN\", \"ADJ\",\"VERB\"), label_language_model=\"english-gum-ud-2.5-191206\", language=\"english\", trace=TRUE) basic_text_rep<-bow_pp_create_basic_text_rep( data = example_data$text, vocab_draft = vocab_draft, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, remove_separators = TRUE, split_hyphens = FALSE, split_tags = FALSE, language_stopwords=\"en\", use_lemmata = FALSE, to_lower=FALSE, min_termfreq = NULL, min_docfreq= NULL, max_docfreq=NULL, window = 5, weights = 1 / (1:5), trace=TRUE)"},{"path":"/articles/classification_tasks.html","id":"creating-a-new-transformer","dir":"Articles","previous_headings":"3 Preparation Tasks","what":"3.3 Creating a New Transformer","title":"02b Text Embedding and Classification Tasks","text":"general, recommended use pre-trained model since creation new transformer requires large data set texts computationally intensive. vignette illustrate process BERT model. However, many transformers, process . creation new transformer requires least two steps. First, must decide architecture transformer. includes creation set vocabulary. aifedcuation can calling function create_bert_model(). example look like : First, function receives machine learning framework chose start session. However, can change setting ml_framework=\"tensorflow\" ml_framework=\"pytorch\". function work, must provide path directory new transformer saved. Furthermore, must provide raw texts. texts used training transformer training vocabulary. maximum size vocabulary determined vocab_size. Please provide size 50,000 60,000 since kind vocabulary works differently approaches described section 2.2. Modern tokenizers WordPiece (Wu et al. 2016) use algorithms splits tokens smaller elements, allowing build huge number words small number elements. Thus, even small number 30,000 tokens, able represent large number words. consequence, kinds vocabularies many times smaller vocabularies build section 2.2. parameters allow customize BERT model. example, increase number hidden layers 12 24 reduce hidden size 768 256, allowing build test larger smaller transformers. Please note max_position_embeddings determine many tokens transformer can process. text tokens tokenization, tokens ignored. However, like analyze long documents, please avoid increase number significantly computational time increase linear way quadratic (Beltagy, Peters & Cohan 2020). long documents can use another architecture BERT (e.g. Longformer Beltagy, Peters & Cohan 2020) split long document several chunks used sequentially classification (e.g., Pappagari et al. 2019). Using chunks supported aifedcuation. Since creating transformer model energy consuming aifeducation allows estimate ecological impact help python library codecarbon. Thus, sustain_track set TRUE default. use sustainability tracker must provide alpha-3 code country computer located (e.g., “CAN”=“Canada”, “Deu”=“Germany”). list codes can found wikipedia. reason different countries use different sources techniques generating energy resulting specific impact CO2 emissions. USA Canada can additionally specify region setting sustain_region. Please refer documentation codecarbon information. calling function, find new model model directory. next step train model calling train_tune_bert_model(). important provide path directory new transformer stored. Furthermore, important provide another directory trained transformer saved avoid reading writing collisions. Now, provided raw data used train model using Masked Language Modeling. First, can set length token sequences chunk_size. whole_word can choose masking single tokens masking complete words (Please remember modern tokenizers split words several tokens. Thus, tokens words forced match directly). p_mask can determine many tokens masked. Finally, val_size, set many chunks used validation sample. Please remember set correct alpha-3 code tracking ecological impact training model (sustain_iso_code). work machine graphic device small memory, please reduce batch size significantly. also recommend change usage memory set_config_gpu_low_memory() beginning session. training finishes, can find transformer ready use output_directory. Now able create text embedding model. can change machine learning framework setting ml_framework=\"tensorflow\" ml_framework=\"pytorch\". change argument framework chose beginning used.","code":"create_bert_model( ml_framework=aifeducation_config$get_framework(), model_dir = \"my_own_transformer\", vocab_raw_texts=example_data$text, vocab_size=30522, vocab_do_lower_case=FALSE, max_position_embeddings=512, hidden_size=768, num_hidden_layer=12, num_attention_heads=12, intermediate_size=3072, hidden_act=\"gelu\", hidden_dropout_prob=0.1, sustain_track=TRUE, sustain_iso_code=\"DEU\", sustain_region=NULL, sustain_interval=15, trace=TRUE) train_tune_bert_model( ml_framework=aifeducation_config$get_framework(), output_dir = \"my_own_transformer_trained\", model_dir_path = \"my_own_transformer\", raw_texts = example_data$text, p_mask=0.15, whole_word=TRUE, val_size=0.1, n_epoch=1, batch_size=12, chunk_size=250, n_workers=1, multi_process=FALSE, sustain_track=TRUE, sustain_iso_code=\"DEU\", sustain_region=NULL, sustain_interval=15, trace=TRUE)"},{"path":[]},{"path":"/articles/classification_tasks.html","id":"introduction","dir":"Articles","previous_headings":"4 Text Embedding","what":"4.1 Introduction","title":"02b Text Embedding and Classification Tasks","text":"aifedcuation, text embedding model stored object class TextEmbeddingModel. object contains relevant information transforming raw texts numeric representation can used machine learning. aifedcuation, transformation raw texts numbers separate step downstream tasks classification. reduce computational time machines low performance. separating text embedding tasks, text embedding calculated can used different tasks time. Another advantage training downstream tasks involves downstream tasks parameters embedding model, making training less time-consuming, thus decreasing computational intensity. Finally, approach allows analysis long documents applying algorithm different parts. text embedding model provides unified interface: creating model different methods, handling model always . following show use object. start Topic Modeling.","code":""},{"path":[]},{"path":"/articles/classification_tasks.html","id":"topic-modeling","dir":"Articles","previous_headings":"4 Text Embedding > 4.2 Creating Text Embedding Models","what":"4.2.1 Topic Modeling","title":"02b Text Embedding and Classification Tasks","text":"creating new text embedding model based Topic Modeling, need basic text representation generated function bow_pp_create_basic_text_rep() (see section 2.2). Now can create new instance text embedding model calling TextEmbeddingModel$new(). First provide name new model (model_name). unique short name without spaces. model_label can provide label model freedom. important provide version model case want create improved version future. model_language provide users information language model designed. important plan share model wider community. method determine approach used model. like use Topic Modeling, set method = \"lda\". number topics set via bow_n_dim. example like create topic model twelve topics. number topics also determines dimensionality text embedding. Consequently, every text characterized twelve topics. Please forget pass basic text representation bow_basic_text_rep. model estimated, stored topic_modeling example.","code":"topic_modeling<-TextEmbeddingModel$new( model_name=\"topic_model_embedding\", model_label=\"Text Embedding via Topic Modeling\", model_version=\"0.0.1\", model_language=\"english\", method=\"lda\", bow_basic_text_rep=basic_text_rep, bow_n_dim=12, bow_max_iter=500, bow_cr_criterion=1e-8, trace=TRUE )"},{"path":"/articles/classification_tasks.html","id":"globalvectorclusters","dir":"Articles","previous_headings":"4 Text Embedding > 4.2 Creating Text Embedding Models","what":"4.2.2 GlobalVectorClusters","title":"02b Text Embedding and Classification Tasks","text":"creation text embedding model based GlobalVectorClusters similar model based Topic Modeling. two differences. First, request model based GlobalVectorCluster setting method=\"glove_cluster\". Second, determine dimensionality global vectors bow_n_dim number clusters bow_n_cluster. creating new text embedding model, global vector token calculated based feature-co-occurrence-matrix (fcm) provide basic_text_rep. token, vector calculated length bow_n_dim. Since vectors word embeddings text embeddings, additional step necessary create text embeddings. aifedcuation word embeddings used group words clusters. number clusters set bow_n_cluster. Now, text embedding produced counting tokens every cluster every text. final model stored global_vector_clusters_modeling.","code":"global_vector_clusters_modeling<-TextEmbeddingModel$new( model_name=\"global_vector_clusters_embedding\", model_label=\"Text Embedding via Clusters of GlobalVectors\", model_version=\"0.0.1\", model_language=\"english\", method=\"glove_cluster\", bow_basic_text_rep=basic_text_rep, bow_n_dim=96, bow_n_cluster=384, bow_max_iter=500, bow_max_iter_cluster=500, bow_cr_criterion=1e-8, trace=TRUE )"},{"path":"/articles/classification_tasks.html","id":"transformers","dir":"Articles","previous_headings":"4 Text Embedding > 4.2 Creating Text Embedding Models","what":"4.2.3 Transformers","title":"02b Text Embedding and Classification Tasks","text":"Using transformer creating text embedding model similar two approaches. request model based transformer must set method accordingly. Since use BERT model example, set method = \"bert\". Next, provide directory model stored. example bert_model_dir_path=\"my_own_transformer_trained. course can use pre-trained model Huggingface addresses needs. Using BERT model text embedding problem since text provide tokens transformer can process. maximal value set configuration transformer (see section 2.3). text produces tokens last tokens ignored. instances might want analyze long texts. situations, reducing text first tokens (e.g. first 512 tokens) result problematic loss information. deal situations can configure text embedding model aifecuation split long texts several chunks processed transformer. maximal number chunks set chunks. example , text embedding model split text consisting 1024 tokens two chunks every chunk consisting 512 tokens. every chunk text embedding calculated. result, receive sequence embeddings. first embeddings characterizes first part text second embedding characterizes second part text (). Thus, example text embedding model able process texts 4*512=2048 tokens. approach inspired work Pappagari et al. (2019). Since transformers able account context, may useful interconnect every chunk bring context calculations. can done overlap determine many tokens end prior chunk added next.example last 30 tokens prior chunks added beginning following chunk. can help add correct context text sections analysis. Altogether, example model can analyse maximum 512+(4-1)*(512-30)=1958 tokens text. Finally, decide hidden layer layers embeddings drawn. emb_layer_min emb_layer_max can decide layers average value every token calculated. Please note calculation considers layers emb_layer_min emb_layer_max. initial work, Devlin et al. (2019) used hidden states different layers classification. emb_pool_type decide tokens used pooling within every layer. case emb_pool_type=\"cls\" cls token used. case emb_pool_type=\"average\" tokens within layer averaged except padding tokens. deciding configuration, can use model. Note: version 0.3.1 aifeducation every transformer can used machine learning frameworks. Even pre-trained weights can used across backends. However, future models implemented available specific framework.","code":"bert_modeling<-TextEmbeddingModel$new( ml_framework=aifeducation_config$get_framework(), model_name=\"bert_embedding\", model_label=\"Text Embedding via BERT\", model_version=\"0.0.1\", model_language=\"english\", method = \"bert\", max_length = 512, chunks=4, overlap=30, emb_layer_min=\"middle\", emb_layer_max=\"2_3_layer\", emb_pool_type=\"average\", model_dir=\"my_own_transformer_trained\" )"},{"path":"/articles/classification_tasks.html","id":"transforming-raw-texts-into-embedded-texts","dir":"Articles","previous_headings":"4 Text Embedding","what":"4.3 Transforming Raw Texts into Embedded Texts","title":"02b Text Embedding and Classification Tasks","text":"Although mechanics within text embedding model different, usage always . transform raw text numeric representation use embed method model. , must provide raw texts raw_text. addition, necessary provide character vector containing ID every text. IDs must unique. method embedcreates object class EmbeddedText. just data.frame consisting embedding every text. Depending method, data.frame different meaning: Topic Modeling: Regarding topic modeling, rows represent texts columns represent percentage every topic within text. GlobalVectorClusters: , rows represent texts columns represent absolute frequencies tokens belonging semantic cluster. Transformer - Bert: BERT, rows represent texts columns represents contextualized text embedding BERT’s understanding relevant text chunk. Please note case transformer models, embeddings every chunks interlinked. embedded texts now input train new classifier apply pre-trained classifier predicting categories/classes. next chapter show use classifiers. start, show save load model.","code":"topic_embeddings<-topic_modeling$embed( raw_text=example_data$text, doc_id=example_data$id, trace = TRUE) cluster_embeddings<-global_vector_clusters_modeling$embed( raw_text=example_data$text, doc_id=example_data$id, trace = TRUE) bert_embeddings<-bert_modeling$embed( raw_text=example_data$text, doc_id=example_data$id, trace = TRUE)"},{"path":"/articles/classification_tasks.html","id":"saving-and-loading-text-embedding-models","dir":"Articles","previous_headings":"4 Text Embedding","what":"4.4 Saving and Loading Text Embedding Models","title":"02b Text Embedding and Classification Tasks","text":"Saving created text embedding model easy aifeducation using function save_ai_model. function provides unique interface text embedding models. saving work can pass model model directory save model model_dir. Please pass path directory path file function. Internally function creates new folder directory files belonging model stored. can see three text embedding models saved within directory named “text_embedding_models”. Within directory function creates unique folder every model. name folder specified dir_name. set dir_name=NULL append_ID=FALSE name folder created using models’ names. change argument append_ID append_ID=TRUE set dir_name=NULL unique ID model added directory. ID added automatically ensure every model unique name. important like share work persons. Since files stored special structure please change files manually. want load model, just call function load_ai_model can continue using model. ml_framework can decide framework model use. set ml_framework=\"auto\" models initialized framework saving model. Please note moment implemented text embedding models can used frameworks. However, may change future. Please note add name model directory path. example stored three models directory “text_embedding_models”. model saved within folder. folder’s name created automatically help name model. Thus, loading model must specify model want load adding model’s name directory path shown . Now can use text embedding model.","code":"save_ai_model( model=topic_modeling, model_dir=\"text_embedding_models\", dir_name=\"model_topic_modeling\", save_format=\"default\", append_ID=FALSE) save_ai_model( model=global_vector_clusters_modeling, model_dir=\"text_embedding_models\", dir_name=\"model_global_vectors\", save_format=\"default\", append_ID=FALSE) save_ai_model( model=bert_modeling, model_dir=\"text_embedding_models\", dir_name=\"model_transformer_bert\", save_format=\"default\", append_ID=FALSE) topic_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_topic_modeling\", ml_framework=aifeducation_config$get_framework()) global_vector_clusters_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_global_vectors\", ml_framework=aifeducation_config$get_framework()) bert_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_transformer_bert\", ml_framework=aifeducation_config$get_framework())"},{"path":"/articles/classification_tasks.html","id":"sustainability","dir":"Articles","previous_headings":"4 Text Embedding","what":"4.5 Sustainability","title":"02b Text Embedding and Classification Tasks","text":"case underlying model trained active sustainability tracker (section 3.3) can receive table showing energy consumption, CO2 emissions, hardware used training calling bert_modeling$get_sustainability_data().","code":""},{"path":[]},{"path":"/articles/classification_tasks.html","id":"creating-a-new-classifier","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.1 Creating a New Classifier","title":"02b Text Embedding and Classification Tasks","text":"aifedcuation, classifiers based neural nets stored objects class TextEmbeddingClassifierNeuralNet. can create new classifier calling TextEmbeddingClassifierNeuralNet$new(). Similar text embedding model provide name (name) label (label) new classifier. text_embeddings provide embedded text. like recommend use embedding like use training. continue example use embedding produced BERT model. targets takes target data supervised learning. Please omit cases category/class since can used special training technique show later. important provide target data factors. Otherwise error occur. also important name factor. , entries factor mus names correspond IDs corresponding texts. Without names method match input data (text embeddings) target data. parameters decide structure classifier. Figure 4 illustrates . hidden takes vector integers, determining number layers number neurons. example, dense layers. rec also takes vector integers determining number size Gated Recurrent Unit (gru). example, use one layer 256 neurons. Since classifiers aifeducation use standardized scheme creation, dense layers used gru layers. want omit gru layers dense layers, set corresponding argument NULL. use text embedding model processes one chunk like recommend use recurrent layers since able use sequential structure data. cases can rely dense layers . use text embeddings one chunk, good idea try self-attention layering order take context chunks account. add self-attention two choices: - can use attention mechanism used classic transformer models multihead attention (Vaswani et al. 2017). variant set attention_type=\"multihead\", repeat_encoder value least 1, self_attention_headsto value least 1. - Furthermore can use attention mechanism described Lee-Thorp et al. (2021) FNet model allows much fast computations low accuracy costs. use kind attention setattention_type=“fourierandrepeat_encoder` value least 1. repeat_encoder can chose many times encoder layer added. encoder implemented described Chollet, Kalinowski, Allaire (2022, pp. 373) variants attention. can extend abilities network adding positional embeddings. Positional embeddings take care order chunks. Thus, adding layer may increase performance order information important. can add layer setting add_pos_embedding=TRUE. layer created described Chollet, Kalinowski, Allaire (2022, pp. 378) Masking, normalization, creation input layer well output layer done automatically. created new classifier, can begin training. Note: contrast text embedding models decision machine learning framework important since classifier can used framework created trained model.","code":"example_targets<-as.factor(example_data$label) names(example_targets)=example_data$id classifier<-TextEmbeddingClassifierNeuralNet$new( ml_framework=aifeducation_config$get_framework(), name=\"movie_review_classifier\", label=\"Classifier for Estimating a Postive or Negative Rating of Movie Reviews\", text_embeddings=bert_embeddings, targets=example_targets, hidden=NULL, rec=c(256), self_attention_heads=2, intermediate_size=512, attention_type=\"fourier\", add_pos_embedding=TRUE, rec_dropout=0.1, repeat_encoder=1, dense_dropout=0.4, recurrent_dropout=0.4, encoder_dropout=0.1, optimizer=\"adam\")"},{"path":"/articles/classification_tasks.html","id":"training-a-classifier","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.2 Training a Classifier","title":"02b Text Embedding and Classification Tasks","text":"start training classifier, call train method. Similarly, creation classifier, must provide text embedding data_embeddings categories/classes target data data_targets. Please remember data_targets expects named factor names correspond IDs corresponding text embeddings. Text embeddings target data matched omitted training. train classifier, necessary provide path dir_checkpoint. directory stores best set weights training epoch. training, weights automatically used final weights classifier. performance estimation, training splits data several chunks based cross-fold validation. number folds set data_n_test_samples. every case, one fold used training serves test sample. remaining data used create training validation sample. performance values saved trained classifier refer test sample. data never used training provides realistic estimation classifier`s performance. Since aifedcuation tries address special needs educational social science, special training steps integrated method. Baseline: interested training classifier without applying additional statistical techniques, set use_baseline = TRUE. case, classifier trained provided data . Cases missing values target data omitted. Even like apply statistical adjustments, makes sense compute baseline model comparing effect modified training process unmodified training. using bsl_val_size can determine much data used training data much used validation data. Balanced Synthetic Cases: case imbalanced data, recommended set use_bsc=TRUE. training, number synthetic units created via different techniques. Currently can request Basic Synthetic Minority Oversampling Technique, Density-Bases Synthetic Minority Oversampling Technique, Adaptive Synthetic Sampling Approach Imbalanced Learning. aim create new cases fill gap majority class. Multi-class problems reduced two class problem (class investigation vs. ) generating units. can even request several techniques . number synthetic units original minority units exceeds number cases majority class, random sample drawn. technique allows set number neighbors generation, k = bsc_max_k used. Balanced Pseudo-Labeling: technique relevant labeled target data large number unlabeled target data. different parameter starting “bpl_”, can request different implementations pseudo-labeling, example based work Lee (2013) Cascante-Bonilla et al. (2020). turn pseudo-labeling, set use_bpl=TRUE. request pseudo-labeling based Cascante-Bonilla et al. (2020), following parameters set: bpl_max_steps = 5 (splits unlabeled data five chunks) bpl_dynamic_inc = TRUE (ensures number used chunks increases every step) bpl_model_reset = TRUE (re-initializes model every step) bpl_epochs_per_step=30 (number training epochs within step) bpl_balance=FALSE (ensures cases highest certainty added training regardless absolute frequencies classes) bpl_weight_inc=0.00 bpl_weight_start=1.00 (ensures labeled unlabeled data weight training) bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00 (ensures unlabeled data considered training cases highest certainty used training.) request original pseudo-labeling proposed Lee (2013), set following parameters: bpl_max_steps=30 (steps must treated epochs) bpl_dynamic_inc=FALSE (ensures pseudo-labeled cases used) bpl_model_reset=FALSE (model allowed re-initialized) bpl_epochs_per_step=1 (steps treated epochs must one) bpl_balance=FALSE (ensures cases added regardless absolute frequencies classes) bpl_weight_inc=0.02 bpl_weight_start=0.00 (gives pseudo labeled data increasing weight every step) bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00 (ensures pseudo labeled cases used training. bpl_anchor affect calculations) Please note Lee (2013) suggests recalculate pseudo-labels unlabeled data every weight actualization, aifeducation, pseudo-labels recalculated every epoch. bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00 used describe certainty prediction. 0 refers random guessing 1 refers perfect certainty. bpl_anchor used reference value. distance bpl_anchor calculated every case. , sorted increasing distance bpl_anchor. resulting order cases relevant set bpl_dynamic_inc=TRUE bpl_balance=TRUE. Figure 5 illustrates training loop cases three options set TRUE. example applies algorithm proposed Cascante-Bonilla et al. (2020). training classifier labeled data, unlabeled data introduced training. classifier predicts potential labels unlabeled data adds 20% cases highest certainty pseudo-labels training. classifier re-initialized trained . training, classifier predicts potential labels originally unlabeled data adds 40% pseudo-labeled data training data. model re-initialized trained unlabeled data used training. Since training neural net energy consuming aifeducation allows estimate ecological impact help python library codecarbon. Thus, sustain_track set TRUE default. use sustainability tracker must provide alpha-3 code country computer located (e.g., “CAN”=“Canada”, “Deu”=“Germany”). list codes can found wikipedia. reason different countries use different sources techniques generating energy resulting specific impact CO2 emissions. USA Canada can additionally specify region setting sustain_region. Please refer documentation codecarbon information. Finally, trace, view_metrics, keras_trace allow control much information training progress printed console. Please note training classifier can take time. Please note performance estimation, final training classifier makes use data available. , test sample left empty.","code":"example_targets<-as.factor(example_data$label) names(example_targets)=example_data$id classifier$train( data_embeddings = bert_embeddings, data_targets = example_targets, data_n_test_samples=5, use_baseline=TRUE, bsl_val_size=0.33, use_bsc=TRUE, bsc_methods=c(\"dbsmote\"), bsc_max_k=10, bsc_val_size=0.25, use_bpl=TRUE, bpl_max_steps=5, bpl_epochs_per_step=30, bpl_dynamic_inc=TRUE, bpl_balance=FALSE, bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00, bpl_weight_inc=0.00, bpl_weight_start=1.00, bpl_model_reset=TRUE, epochs=30, batch_size=8, sustain_track=TRUE, sustain_iso_code=\"DEU\", sustain_region=NULL, sustain_interval=15, trace=TRUE, view_metrics=FALSE, keras_trace=0, n_cores=2, dir_checkpoint=\"training/classifier\")"},{"path":"/articles/classification_tasks.html","id":"evaluating-classifiers-performance","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.3 Evaluating Classifier’s Performance","title":"02b Text Embedding and Classification Tasks","text":"finishing training, can evaluate performance classifier. every fold, classifier applied test sample results compared true categories/classes. Since test sample never part training, performance measures provide realistic idea classifier`s performance. support researchers judging quality predictions, aifeducation utilizes several measures concepts content analysis. Iota Concept Second Generation (Berding & Pargmann 2022) Krippendorff’s Alpha (Krippendorff 2019) Percentage Agreement Gwet’s AC1/AC2 (Gwet 2014) Kendall’s coefficient concordance W Cohen’s Kappa unweighted Cohen’s Kappa equal weights Cohen’s Kappa squared weights Fleiss’ Kappa multiple raters without exact estimation can access concrete values accessing field reliability stores relevant information. list find reliability values every fold every requested training configuration. addition, reliability every step within balanced pseudo-labeling reported. central estimates reliability values can found via reliability$test_metric_mean. example : now table relevant values. particular interest values alpha Iota Concept since represent measure reliability independent frequency distribution classes/categories. alpha values describe probability case specific class recognized specific class. can see, compared baseline model, applying Balanced Synthetic Cases increased increases minimal value alpha, reducing risk miss cases belong rare class (see row “BSC”). contrary, alpha values major category decrease slightly, thus losing unjustified bonus high number cases training set. provides realistic performance estimation classifier. Furthermore, can see application pseudo-labeling increases alpha values minor class , step 3. Finally, can plot coding stream scheme showing cases different classes labeled. use package iotarelr. can see small number negative reviews treated good review larger number positive reviews treated bad review. Thus, data major class (negative reviews) reliable valid data minor class (positive reviews). Evaluating performance classifier complex task beyond scope vignette. Instead, like refer cited literature content analysis machine learning like dive deeper topic.","code":"classifier$reliability$test_metric_mean test_metric_mean #> iota_index min_iota2 avg_iota2 max_iota2 min_alpha avg_alpha #> Baseline 0.6320000 0.10294118 0.3877251 0.6725090 0.136 0.549 #> BSC 0.4346667 0.06895416 0.2676750 0.4663959 0.072 0.512 #> BPL 0.6293333 0.51019563 0.6401731 0.7701506 0.580 0.756 #> Final 0.6293333 0.51019563 0.6401731 0.7701506 0.580 0.756 #> max_alpha static_iota_index dynamic_iota_index kalpha_nominal #> Baseline 0.962 0.5455732 0.5281005 -0.04487101 #> BSC 0.952 0.3785559 0.3744595 -0.25488654 #> BPL 0.932 0.3846018 0.5242565 0.54678492 #> Final 0.932 0.3846018 0.5242565 0.54678492 #> kalpha_ordinal kendall kappa2 kappa_fleiss kappa_light #> Baseline -0.04487101 0.5531797 0.10142922 0.10142922 0.10142922 #> BSC -0.25488654 0.5199922 0.02869454 0.02869454 0.02869454 #> BPL 0.54678492 0.7827658 0.55104690 0.55104690 0.55104690 #> Final 0.54678492 0.7827658 0.55104690 0.55104690 0.55104690 #> percentage_agreement gwet_ac #> Baseline 0.6866667 0.543828 #> BSC 0.4920000 0.106742 #> BPL 0.8146667 0.686684 #> Final 0.8146667 0.686684 library(iotarelr) iotarelr::plot_iota2_alluvial(test_classifier$reliability$iota_object_end_free)"},{"path":"/articles/classification_tasks.html","id":"sustainability-1","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.4 Sustainability","title":"02b Text Embedding and Classification Tasks","text":"case classifier trained active sustainability tracker can receive information sustainability calling classifier$get_sustainability_data().","code":"sustainability_data #> $sustainability_tracked #> [1] TRUE #> #> $date #> [1] \"Thu Oct 5 11:20:53 2023\" #> #> $sustainability_data #> $sustainability_data$duration_sec #> [1] 7286.503 #> #> $sustainability_data$co2eq_kg #> [1] 0.05621506 #> #> $sustainability_data$cpu_energy_kwh #> [1] 0.08602103 #> #> $sustainability_data$gpu_energy_kwh #> [1] 0.05598303 #> #> $sustainability_data$ram_energy_kwh #> [1] 0.01180879 #> #> $sustainability_data$total_energy_kwh #> [1] 0.1538128 #> #> #> $technical #> $technical$tracker #> [1] \"codecarbon\" #> #> $technical$py_package_version #> [1] \"2.3.1\" #> #> $technical$cpu_count #> [1] 12 #> #> $technical$cpu_model #> [1] \"12th Gen Intel(R) Core(TM) i5-12400F\" #> #> $technical$gpu_count #> [1] 1 #> #> $technical$gpu_model #> [1] \"1 x NVIDIA GeForce RTX 4070\" #> #> $technical$ram_total_size #> [1] 15.84258 #> #> #> $region #> $region$country_name #> [1] \"Germany\" #> #> $region$country_iso_code #> [1] \"DEU\" #> #> $region$region #> [1] NA"},{"path":"/articles/classification_tasks.html","id":"saving-and-loading-a-classifier","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.5 Saving and Loading a Classifier","title":"02b Text Embedding and Classification Tasks","text":"created classifier, saving loading easy. process saving model similar process text embedding models. pass model directory path function save_ai_model. contrast text embedding models can specify additional argument save_format. case pytorch models arguments allows choose save_format = \"safetensors\" save_format = \"pt\". recommend chose save_format = \"safetensors\" since safer method save models. case tensorflow models arguments allows choose save_format = \"keras\", save_format = \"tf\" save_format = \"h5\". recommend chose save_format = \"keras\" since recommended format keras. set save_format = \"default\" .safetensors used pytorch models .keras used tensorflow models. like load model can call function load_ai_model. Note: Classifiers depend framework used creation. Thus, classifier always initalized original framework. argument ml_framework effect.","code":"save_ai_model( model=classifier, model_dir=\"classifiers\", dir_name=\"movie_classifier\", save_format = \"default\", append_ID=FALSE) classifier<-load_ai_model( model_dir=\"classifiers/movie_classifier\")"},{"path":"/articles/classification_tasks.html","id":"predicting-new-data","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.6 Predicting New Data","title":"02b Text Embedding and Classification Tasks","text":"like apply classifier new data, two steps necessary. must first transform raw text numerical expression using exactly text embedding model used training classifier. case example classifier use BERT model. transform raw texts numeric representation Just pass raw texts IDs every text method embed loaded model. easy used package readtext read raw text disk, since object resulting readtext always stores texts column “texts” IDs column “doc_id”. Depending machine, embedding raw texts may take time. case use machine graphic device, possible “memory” error occurs. case reduce batch size. error still occurs, restart R session, switch cpu-mode directly loading libraries aifeducation::set_config_cpu_only() request embedding . example , text embeddings stored text_embedding. Since embedding texts may take time, good idea save embeddings future analysis (use save function R). allows load embedding without need apply text embedding model raw texts . resulting object can passed method predict classifier get predictions together estimate certainty class/category. classifier finishes prediction, estimated categories/classes stored predicted_categories. object data.frame containing texts’ IDs rows probabilities different categories/classes columns. last column name expected_category represents category assigned text due highest probability. estimates can used analysis common methods educational social sciences correlation analysis, regression analysis, structural equation modeling, latent class analysis analysis variance.","code":"# If our mode is not loaded bert_modeling<-load_ai_model( model_dir=\"text_embedding_models/bert_embedding\") # Create a numerical representation of the text text_embeddings<-bert_modeling$embed( raw_text = textual_data$texts, doc_id = textual_data$doc_id, batch_size=8, trace=TRUE) # If your classifier is not loaded classifier<-load_ai_model( model_dir=\"classifiers/movie_review_classifier\") # Predict the classes of new texts predicted_categories<-classifier$predict( newdata = text_embeddings, batch_size=8, verbose=0)"},{"path":"/articles/classification_tasks.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"02b Text Embedding and Classification Tasks","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. https://doi.org/10.48550/arXiv.2004.05150 Berding, F., & Pargmann, J. (2022). Iota Reliability Concept Second Generation. Berlin: Logos. https://doi.org/10.30819/5581 Berding, F., Riebenbauer, E., Stütz, S., Jahncke, H., Slopinski, ., & Rebmann, K. (2022). Performance Configuration Artificial Intelligence Educational Settings.: Introducing New Reliability Concept Based Content Analysis. Frontiers Education, 1–21. https://doi.org/10.3389/feduc.2022.818365 Campesato, O. (2021). Natural Language Processing Fundamentals Developers. Mercury Learning & Information. https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=6647713 Cascante-Bonilla, P., Tan, F., Qi, Y. & Ordonez, V. (2020). Curriculum Labeling: Revisiting Pseudo-Labeling Semi-Supervised Learning. https://doi.org/10.48550/arXiv.2001.06001 Chollet, F., Kalinowski, T., & Allaire, J. J. (2022). Deep learning R (Second edition). Manning Publications Co. https://learning.oreilly.com/library/view/-/9781633439849/?ar Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. https://doi.org/10.48550/arXiv.2006.03236 Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171–4186). Association Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Gwet, K. L. (2014). Handbook inter-rater reliability: definitive guide measuring extent agreement among raters (Fourth edition). Gaithersburg: STATAXIS. , P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. https://doi.org/10.48550/arXiv.2006.03654 Krippendorff, K. (2019). Content Analysis: Introduction Methodology (4th ed.). Los Angeles: SAGE. Lane, H., Howard, C., & Hapke, H. M. (2019). Natural language processing action: Understanding, analyzing, generating text Python. Shelter Island: Manning. Larusson, J. ., & White, B. (Eds.). (2014). Learning Analytics: Research Practice. New York: Springer. https://doi.org/10.1007/978-1-4614-3305-7 Lee, D.‑H. (2013). Pseudo-Label: Simple Efficient Semi-Supervised Learning Method Deep Neural Networks. CML 2013 Workshop: Challenges Representation Learning. Lee-Thorp, J., Ainslie, J., Eckstein, . & Ontanon, S. (2021). FNet: Mixing Tokens Fourier Transforms. https://doi.org/10.48550/arXiv.2105.03824 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 Papilloud, C., & Hinneburg, . (2018). Qualitative Textanalyse mit Topic-Modellen: Eine Einführung für Sozialwissenschaftler. Wiesbaden: Springer. https://doi.org/10.1007/978-3-658-21980-2 Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., & Dehak, N. (2019). Hierarchical Transformers Long Document Classification. 2019 IEEE Automatic Speech Recognition Understanding Workshop (ASRU) (pp. 838–844). IEEE. https://doi.org/10.1109/ASRU46091.2019.9003958 Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global Vectors Word Representation. Proceedings 2014 Conference Empirical Methods Natural Language Processing. https://aclanthology.org/D14-1162.pdf Schreier, M. (2012). Qualitative Content Analysis Practice. Los Angeles: SAGE. Tunstall, L., Werra, L. von, Wolf, T., & Géron, . (2022). Natural language processing transformers: Building language applications hugging face (Revised edition). Heidelberg: O’Reilly. Vaswani, ., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, . N., Kaiser, L., & Polosukhin, . (2017). Attention Need. https://doi.org/10.48550/arXiv.1706.03762 Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, ., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., . . . Dean, J. (2016). Google’s Neural Machine Translation System: Bridging Gap Human Machine Translation. https://doi.org/10.48550/arXiv.1609.08144","code":""},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"preface","dir":"Articles","previous_headings":"1 Introduction and Overview","what":"1.1 Preface","title":"02a Using Aifeducation Studio","text":"vignette introduces Aifeducation - Studio graphical user interface creating, training, documenting, analyzing, applying artificial intelligence (AI). made users unfamiliar R coding skills relevant languages (e.g., python). experienced users, interface provides convenient way working AI educational context. article overlaps vignette 02b Classification Tasks, explains use package R syntax. assume aifeducation installed described vignette 01 Get Started. introduction starts brief explanation basic concepts, necessary work package.","code":""},{"path":"/articles/gui_aife_studio.html","id":"basic-concepts","dir":"Articles","previous_headings":"1 Introduction and Overview","what":"1.2 Basic Concepts","title":"02a Using Aifeducation Studio","text":"educational social sciences, assigning observation scientific concepts important task allows researchers understand observation, generate new insights, derive recommendations research practice. educational science, several areas deal kind task. example, diagnosing students’ characteristics important aspect teachers’ profession necessary understand promote learning. Another example use learning analytics, data students used provide learning environments adapted individual needs. another level, educational institutions schools universities can use information data-driven performance decisions (Laurusson & White 2014) well improve . case, real-world observation aligned scientific models use scientific knowledge technology improved learning instruction. Supervised machine learning one concept allows link real-world observations existing scientific models theories (Berding et al. 2022). educational sciences, great advantage allows researchers use existing knowledge insights apply AI. drawback approach training AI requires information real world observations information corresponding alignment scientific models theories. valuable source data educational science written texts, since textual data can found almost everywhere realm learning teaching (Berding et al. 2022). example, teachers often require students solve task provide written form. Students create solution tasks often document short written essay presentation. data can used analyze learning teaching. Teachers’ written tasks students may provide insights quality instruction students’ solutions may provide insights learning outcomes prerequisites. AI can helpful assistant analyzing textual data since analysis textual data challenging time-consuming task humans. Please note introduction content analysis, natural language processing machine learning beyond scope vignette. like learn , please refer cited literature. start, necessary introduce definition understanding basic concepts, since applying AI educational contexts means combine knowledge different scientific disciplines using different, sometimes overlapping concepts. Even within single research area, concepts unified. Figure 1 illustrates package’s understanding. Since aifeducation looks application AI classification tasks perspective empirical method content analysis, overlapping concepts content analysis machine learning. content analysis, phenomenon like performance colors can described scale/dimension made several categories (e.g. Schreier 2012, pp. 59). example, exam’s performance (scale/dimension) “good”, “average” “poor”. terms colors (scale/dimension) categories “blue”, “green”, etc. Machine learning literature uses words describe kind data. machine learning, “scale” “dimension” correspond term “label” “categories” refer term “classes” (Chollet, Kalinowski & Allaire 2022, p. 114). clarifications, classification means text assigned correct category scale , respectively, text labeled correct class. Figure 2 illustrates, two kinds data necessary train AI classify text line supervised machine learning principles. providing AI textual data input data corresponding information class target data, AI can learn texts imply specific class category. exam example, AI can learn texts imply “good”, “average” “poor” judgment. training, AI can applied new texts predict likely class every new text. generated class can used statistical analysis derive recommendations learning teaching. use cases described vignette, AI “understand” natural language: „Natural language processing area research computer science artificial intelligence (AI) concerned processing natural languages English Mandarin. processing generally involves translating natural language data (numbers) computer can use learn world. (…)” (Lane , Howard & Hapke 2019, p. 4) Thus, first step transform raw texts form usable computer, hence raw texts must transformed numbers. modern approaches, usually done word embeddings. Campesato (2021, p. 102) describes “collective name set language modeling feature learning techniques (…) words phrases vocabulary mapped vectors real numbers.” definition word vector similar: „Word vectors represent semantic meaning words vectors context training corpus.” (Lane, Howard & Hapke 2019, p. 191). next step, words text embeddings can used input data labels target data training AI classify text. aifeducation, steps covered three different types models, shown Figure 3. Base Models: base models models contain capacities understand natural language. general, transformers BERT, RoBERTa, etc. huge number pre-trained models can found Huggingface. Text Embedding Models: modes built top base models store directions use base models converting raw texts sequences numbers. Please note base model can used create different text embedding models. Classifiers: Classifiers used top text embedding model. used classify text categories/classes based numeric representation provided corresponding text embedding model. Please note text embedding model can used create different classifiers (e.g. one classifier colors, one classifier estimate quality text, etc.). help overview can start introduction Aifeducation Studio.","code":""},{"path":"/articles/gui_aife_studio.html","id":"starting-aifeducation-studio","dir":"Articles","previous_headings":"","what":"2 Starting Aifeducation Studio","title":"02a Using Aifeducation Studio","text":"recommend start clean R session. can start Aifeducation Studio entering following console: Please note can take moment. beginning see start page (Figure 4). can configure current session. First, important choose machine learning framework like use session (box Machine Learning Framework). choice changed session starts. change framework restart Aifeducation Studio. Depending chosen framework, can activate settings (box Machine Learning Framework). like use tensorflow computer graphic device low memory, recommend activate option low memory. pytorch, configuration necessary. right side start page, can decide energy consumption recorded training AI (box Sustainability Tracking). Tracking energy consumption allows estimate CO2 emissions using AI. Since world faces challenge climate change, recommend enable option. case choose country order allow accurate estimation model’s sustainability impact. ready can press start button (box Start Session), directs home page.","code":"library(aifeducation) start_aifeducation_studio()"},{"path":[]},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"collection-of-raw-texts","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.1 Preparing Data","what":"3.1.1 Collection of Raw Texts","title":"02a Using Aifeducation Studio","text":"fist step working AI gather structure data. scope aifeducation, data can either collection raw texts, sequences numbers representing texts (text embeddings) texts’ labels. Collections raw texts necessary two cases: First, train fine-tune base models. Second, transform texts texts embeddings can used input training classifier predicting texts’ labels via classifier. create collection raw texts, choose Data Preparation page right side shown Figure 5. resulting page (see Figure 6), first choose directory texts stored (box Text Sources). recommend store texts like use single folder. Within folder, can structure data sub-folders. case use sub-folders, please ensure include creating collection raw texts. next step, can decide file formats included (box File Types). Currently, aifeducation supports .pdf, .csv, .xlsx files. enabled, files requested file format included data collection. case like consider .xlsx files, files must one column containing texts one column texts’ IDs shown Figure 7. name corresponding columns must identical files provide column names (first row Figure 7). last step choose folder collection raw texts saved. Please select folder provide name file (box Text Output). Finally, can start creating collection (box Start Process). Please note can take time. process finishes, single file can used tasks. file contains data.table stores texts together IDs. case .xlsx files, texts’ IDs set IDs stored corresponding column ID. case .pdf .csv files, file names used ID (without file extension, see Figure 8). Please note consequence two files text_01.csv text_01.pdf ID, allowed. Please ensure use unique IDs across file formats. IDs important since used match corresponding class/category available.","code":""},{"path":"/articles/gui_aife_studio.html","id":"collections-of-texts-labels","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.1 Preparing Data","what":"3.1.2 Collections of Texts’ Labels","title":"02a Using Aifeducation Studio","text":"Labels necessary like train classifier. easiest way create table contains column texts’ ID one multiple columns contain texts’ categories/classes. Supported file formats .xlsx, .csv, .rda/.rdata. Figure 9 illustrates example .xslx file. case, table must contain column name “id” contains texts’ IDs. columns must also unique names. Please pay attention use “id” “ID” “Id”.","code":""},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"overview","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.2 Base Models","what":"3.2.1 Overview","title":"02a Using Aifeducation Studio","text":"Base models foundation models aifeducation. moment, transformer models BERT (Devlin et al. 2019), RoBERTa (Liu et al. 2019), DeBERTa version 2 (et al. 2020), Funnel-Transformer (Dai et al. 2020), Longformer (Beltagy, Peters & Cohan 2020). general, models trained large corpus general texts first step. next step, models fine-tuned domain-specific texts /fine-tuned specific tasks. Since creation base models requires huge number texts resulting high computational time, recommended use pre-trained models. can found Huggingface. Sometimes, however, straightforward create new model fit specific purpose. Aifeducation Studio supports opportunitiy create train/fine-tune base models.","code":""},{"path":"/articles/gui_aife_studio.html","id":"creation-of-base-models","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.2 Base Models","what":"3.2.2 Creation of Base Models","title":"02a Using Aifeducation Studio","text":"order create new base model choose option Create tab Language Modeling Base Models right side app (Figure 5). Figure 10 shows corresponding page. Figure 10: Language Modeling - Create Transformer (click image enlarge) Every transformer model composed two parts: 1) tokenizer splits raw texts smaller pieces model large number words limited, small number tokens 2) neural network used model capabilities understanding natural language. beginning can choose different supported transformer architectures (box Model Architecture). Depending architecture, different options determining shape neural network. middle, find box named Vocabulary. must provide path file contains collection raw texts. raw texts used calculate vocabulary transformer. file created Aifeducation Studio ensure compatibility. See section 3.1.1 details. important provide number many tokens vocabulary include. Depending transformer method, can set additional options affecting transformer’s vocabulary. Transform Lower: option enabled, words raw text transformed lower cases. instance, resulting token Learners learners . disabled, Learners learners different tokenization. Add Prefix Spaces: enabled, space added first word already one. Thus, enabling option leads similar tokenization word learners cases: 1) “learners need high motivation high achievement.” 2) “high motivation necessary learners achieve high performance.”. Trim Offsets: option enabled, white spaces produced offsets trimmed. last step choose folder new base model saved (box Creation). Finally, can start creation model clicking button “Start Creation”. creation model may take time.","code":""},{"path":"/articles/gui_aife_studio.html","id":"traintune-a-base-model","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.2 Base Models","what":"3.2.3 Train/Tune a Base Model","title":"02a Using Aifeducation Studio","text":"like train new base model (see section 3.2.2) first time want adapt pre-trained model domain-specific language task, click “Train/Tune” right side app. can find option via “Language Modeling” shown Figure 5. Figure 11: Language Modeling - Train/Tune Transformer (click image enlarge) first step, choose base model like train/tune (box Base Model). Please note every base model consists several files. Thus, provide neither single multiple files. Instead provide folder stores entire model. Compatible models base models created Aifeducation Studio. addition can use model Huggingface uses architecture implemented aifeducation BERT, DeBERTa, etc. choosing base model, new boxes appear shown Figure 11. train model, must first provide collection raw texts (box Raw Texts). recommend create collection texts described section 3.1.1. Next can configure training base model (box Train Tune Settings): Chunk Size: training validating base model, raw texts split several smaller texts. value determines maximum length smaller text pieces number tokens. value exceed maximum size set creation base model. Minimal Sequence Length: value determines minimal length text chunk order part training validation data. Full Sequences : option enabled, text chunks number tokens equal “chunk size” included data. Disable option lot small text chunks like use training validation. Probability Token Masking: option determines many tokens every sequence masked. Whole Word Masking: option activated, tokens belonging single word masked. options disabled available token masking used. Validation Size: option determines many sequences used validating performance base model. Sequences used validation available training. Batch Size: option determines many sequences processed time. Please adjust value computation capacities machine. n Epochs: maximum number epochs training. training, model best validation loss saved disk used final model. Learning Rate: initial learning rate. last step, provide directory trained model saved training (box Start Training/Tuning). corresponding folder also contain checkpoints training. important directory directory one stored original model . clicking button “Start Training/Tuning”, training starts. Please note training base model can last days even weeks, depending size kind model, amount data, capacities machine.","code":""},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"create-a-text-embedding-model","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.3 Text Embedding Models","what":"3.3.1 Create a Text Embedding Model","title":"02a Using Aifeducation Studio","text":"text embedding model interface R aifeducation. order create new model, need base model provides ability understand natural language. can open creation page clicking “Create” section “Text Embedding Model” via “Language Modeling” (see Figure 5). Figure 12 shows corresponding page. Figure 12: Text Embedding Model - Create (click image enlarge) First choose base model form foundation new text embedding model. Please select folder contains entire model single files (box Base Model). choosing model, new boxes appear allow customize interface (box Interface Setting). important give model unique name label. difference Name Label Name used computer Label users. Thus, Name contain spaces special characters. Label restrictions. Think Label title book paper. Version can provide version number create newer version model. case create new model, recommend use “0.0.1”. Language, necessary choose language model created , English, French, German, etc. right side box Interface Setting can set interface process raw text: N chunks: Sometimes texts long. value, can decide many chunks longer texts divided. maximum length every chunk determined value provided “Maximum Sequence Length”. Maximal Sequence Length: value determines maximum number tokens model processes every chunk. N Token Overlap: value determines many tokens form prior chunk included current chunk. overlap can useful provide correct context every chunk. Layers Embeddings - Min: Base models transform raw data sequence numbers using different layers’ hidden states. option can decide first layer use. Layers Embeddings - Max: option can decide last layer use. hidden states layers min max averaged form embedding text chunk. Pooling Type: option can decide hidden states cls-token used embedding. set thisn option “average” hidden states tokens averaged within layer except hidden states padding tokens. maximum number tokens model can process provide downstream tasks can calculated \\[Max Tokens = NChunks*MaximalSequenceLength-(NChunks-1)*NOverlap\\] text longer, remaining tokens ignored lost analysis. Please note can create multiple text embedding models different configuration based base model. last step provide name folder save model (box Creation).","code":""},{"path":"/articles/gui_aife_studio.html","id":"using-a-text-embedding-model","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.3 Text Embedding Models","what":"3.3.2 Using a Text Embedding Model","title":"02a Using Aifeducation Studio","text":"Using text embedding model central aspect applying artificial intelligence aifeducation. corresponding page can found clicking “Use” tab “Language Modeling”. start choose model like use. Please select folder contains entire model instead selecting single files. selecting loading model, new box appears shows different aspects model can use . tab Model Description (Figure 13) provides documentation model. Figure 13: Text Embedding Model - Description (click image enlarge) tab Training shows development loss validation loss last training corresponding base model. plot displayed history data available. tab Create Text Embeddings (Figure 14) allows transform raw texts numerical representation texts, called text embeddings. text embeddings can used downstream tasks classifying texts. order transform raw texts embedded texts, first select collection raw texts. recommend create collection according section 3.1.1. Next provide folder embeddings stored name . Batch Size can determine many raw texts processed simultaneously. Please adjust value machine’s capacities. clicking button “Start Embed” transformation texts begins. Figure 14: Text Embedding Model - Embeddings (click image enlarge) tab Encode/Decode/Tokenize (Figure 15) offers insights way text embedding model processes data. box Encode can insert raw text clicking Encode can see text divided tokens corresponding IDs. IDs passed base model used generate numeric representation text. box Decode allows reverse process. can insert sequence numbers (separated comma spaces) clicking Decode, corresponding tokens raw text appear. Figure 15: Text Embedding Model - Encode/Decode/Tokenize (click image enlarge) Finally, tab Fill Mask (Figure 16) allows request underlying base model text embedding model calculate solution fill---blank text. box Text can insert raw text. gap signaled insert corresponding masking token. token can found table row “mask_token”. insert gap/mask_token please ensure correct spelling. “N Solutions per Mask” can determine many tokens model calculate every gap/mask_token. clicking”Calculate Tokens”, find image right side box, showing reasonable token selected gap. tokens ordered certainty; perspective model, reasonable tokens top less reasonable tokens bottom. Figure 16: Text Embedding Model - Fill Mask (click image enlarge)","code":""},{"path":"/articles/gui_aife_studio.html","id":"documenting-a-text-embedding-model","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.3 Text Embedding Models","what":"3.3.3 Documenting a Text Embedding Model","title":"02a Using Aifeducation Studio","text":"Creating “good” AI models requires lot effort. Thus, sharing work users important support progress discipline. Thus, meaningful documentation required. addition, well written documentation makes AI model transparent, allowing others understand AI model generated solution. also important order judge limitations model. support developers documenting work, Aifeducation Studio provides easy way add comprehensive description. find part app clicking “Document” tab Language Modeling. First, choose text embedding model like document (base model!). choosing model, new box appears, allowing insert necessary information. Via tabs Developers Modifiers, can provide names email addresses relevant contributors. Developers refer people created model, Modifiers refers people adapted pre-trained model another domain task. Figure 17: Text Embedding Model - Documentation (click image enlarge) tabs Abstract Description, can provide abstract detailed description work English /native language text embedding model (e.g., French, German, etc.), allowing reach broader audience (Figure 17). four tabs can provide documentation plain text, html, /markdown allowing insert tables highlight parts documentation. like see documentation look like internet, can click button “Preview”. Saving changes possible clicking Save. information document model, please refer vignette 03 Sharing Using Trained AI/Models.","code":""},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"create-a-classifier","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.4 Classifiers","what":"3.4.1 Create a Classifier","title":"02a Using Aifeducation Studio","text":"Classifiers built top text embedding model. create classifier, click “Create Train” tab Classification (see Figure 5). Figure 18 shows corresponding page. Figure 18: Classifier - Creation Part 1 (click image enlarge) Creating classifier requires two kinds data. First, text embedding collection texts. embeddings created text embedding model described section 3.3.2. Second, table labels every text. kind data created described section 3.1.2. can provide text embeddings opening corresponding file first box (Input Data). selecting embeddings, see summary underlying text embedding model generated embeddings. addition, can see many documents file. Please note classifier bound text embedding model generated embeddings. , classifier can used access corresponding text embedding model. text embedding model necessary transform raw texts format classifier can understand. second box (Target Data), can select file contains corresponding labels. loading file, can select column table like use target data training. addition, can see short summary absolute frequencies single classes/categories. Please note can create multiple classifiers different target data based text embedding model. Thus, need create new text embedding model new classifier. particular, can use text embeddings training different classifiers. third box (Architecture) create architecture neural network (Figure 19). important provide model’s name label section General. Model Name used internal purposes machine Model Label used title classifiers users. Thus, Model Name contain spaces special characters. Model Label, restrictions. Figure 19: Classifier - Creation Part 2 (click image enlarge) expand different sections can click “+” right side, since detailed explanation every option beyond scope introduction. can provide overview. Positional Embedding: activating option, add positional embedding classifier. provides neural network ability take order within sequence account. Encoding Layers: layers similar encoding layers used transformer models, allowing calculate context-sensitive text embeddings. , provide classifier ability take surrounding text chunks (see section 3.3.1) sequences account. Recurrent Layers: section allows add recurrent layers classifier. layers able account order within sequence. order add layers, just pass numbers input field Reccurent Layers separate comma space. Every number represents layer number determines number neurons. field can see Aifeducation Studio input. helpful avoid invalid specifications layers. Dense Layers: section can add dense layers network. process add layers similar process recurrent layers. Optimizer: can choose different optimizers training. next box (Training Settings) contains setting training classifier (Figure 19). Going detail beyond scope introduction. can provide overview. Section: General Setting Balance Class Weights: option enabled, loss adjusted absolute frequencies classes/categories according ‘Inverse Class Frequency’ method. option activated deal imbalanced data. Number Folds: number folds used estimating performance classifier. Proportion Validation Sample: percentage cases within fold used validation sample. sample used determine state model generalizes best. Epochs: Maximal number epochs. training, model best balanced accuracy saved used. Batch Size: number cases processed simultaneously. Please adjust value machine’s capacities. Please note batch size can impact classifier’s performance. Section: Baseline Model Calculate Baseline Model: active, performance baseline model estimated. include Balanced Pseudo Labeling Balanced Synthetic Cases. Section: Balanced Synthetic Cases Number Cores: number cores can used generating synthetic cases. higher number can speed training process. Method: method used generating cases. Max k: maximum number neighbors used generating synthetic cases. algorithm create cases draw random sample . Proportion Validation Sample: percentage synthetic cases added validation sample instead training sample. Add Synthetic Cases: enabled, synthetic cases added. disabled, given number cases added sample ensure balanced frequency classes/categories. Section: Balanced Pseudo-Labeling Add Pseudo Labeling: activated, pseudo-labeling used training. way pseudo-labeling applied can configured following parameters: Max Steps: number steps pseudo-labeling. example, first step, 1/Max Steps pseudo-labeled cases added, second step, 2/Max Steps pseudo-labeled cases added, etc.. cases added can influenced Balance Pseudo-Labels, Certainty Anchor, Max Certainty Value, Min Certainty Value. Balance Pseudo-Labels: option active, number pseudo-labeled cases added every class/category. general, number determined class smallest absolute frequency. Certainty Anchor: value determines reference point choosing pseudo-labeled cases. 1 refers perfect certainty, 0 refers certainty similar random guessing. Selected cases closest value. Max Certainty Value: Pseudo-labeled cases exceeding value included training. Min Certainty Value: Pseudo-labeled cases falling bellow value included training. Reset Model Every Step: enabled, classier set untrained state. can prevent -fitting. Dynamic Weight Increase: enabled, sample weights pseudo-labeled cases increase every step. weights determined Start Weights Weight Increase per Step. Start Weights: Initial value sample weights included pseudo-labeled cases. Weight Increase per Step: Value determining much sample weights increased included pseudo-labeled cases every step. recommend use pseudo-labeling described Cascante-Bonilla et al. (2020). Therefore, following parameters set: bpl_max_steps = 5 (splits unlabeled data five chunks) bpl_dynamic_inc = TRUE (ensures number used chunks increases every step) bpl_model_reset = TRUE (re-initializes model every step) bpl_epochs_per_step=30 (number training epochs within step) bpl_balance=FALSE (ensures cases highest certainty added training regardless absolute frequencies classes) bpl_weight_inc=0.00 bpl_weight_start=1.00 (ensures labeled unlabeled data weight training) bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00 (ensures unlabeled data considered training cases highest certainty used training.) Figure 20: Classifier - Creation Part 3 (click image enlarge) last box (Figure 20), provide directory like save model. name folder created within directory can set Folder Name. start training, can check many cases can matched text embeddings target data clicking button Test Data Matching (box Model Saving). allows check structure data working. everything okay can start training model clicking Start Training. Please note training classifier can take several hours.","code":""},{"path":"/articles/gui_aife_studio.html","id":"using-a-classifier","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.4 Classifiers","what":"3.4.2 Using a Classifier","title":"02a Using Aifeducation Studio","text":"case trained classifier using classifier trained users, can analyze model’s performance use model classify new texts. , select “Use” tab Classification. Similar functions app, first select classifier providing folder contains entire model. Please note classifier made several files. Thus, Aifeducation Studio asks select folder containing files single files. loading classifier, new box appears. first tab, Model Description, (Figure 21) find documentation model. Figure 21: Classifier - Description (click image enlarge) second tab, Training (Figure 22) receive summary training process model. includes visualization loss, accuracy, balanced accuracy every fold every epoch. Depending applied training techniques (Balanced Pseudo-Labeling), can request additional images. Figure 22: Classifier - Training (click image enlarge) third tab, Reliability, (Figure 23) provides information quality model. find visualizations giving insights classifier able generate reliable results. addition, measures content analysis well machine learning allow analyze specific aspects model’s performance. Figure 23: Classifier - Reliability (click image enlarge) last tab, Prediction (Figure 24) allows apply trained model new data. can use trained model assign classes/categories new texts. purpose, must first provide file contains text embeddings documents like classify. can create embeddings text embedding model used providing training data classifier. necessary steps described section 3.3.2. Figure 24: Classifier - Prediction (click image enlarge) embeddings must created text embedding model created text embeddings training. , error occur. See section 3.4.1 3.3.2 details. next step provide folder like save predictions provide file name. default case store predictions .rda file, allowing load data directly R analysis. However, can additionally save results .csv file, allowing export predictions programs. resulting data table may look like shown Figure 25. Figure 25: Classifier - Prediction Results (click image enlarge)","code":""},{"path":"/articles/gui_aife_studio.html","id":"documenting-a-classifier","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.4 Classifiers","what":"3.4.3 Documenting a Classifier","title":"02a Using Aifeducation Studio","text":"Documenting classifier similar documentation text embedding model (section 3.3.3, see Figure 18). support developers documenting work, Aifeducation Studio provides easy way add comprehensive description. find part app clicking “Document” tab Classification. First, choose classifier like document. choosing model, new box appears, allowing insert necessary information. Via tab Developers, can provide names email addressees relevant contributors. tabs Abstract Description, can provide abstract detailed description work English /native language classifier (e.g., French, German, etc.), allowing reach broader audience. four tabs can provide documentation plain text, html, /markdown allowing insert tables highlight parts documentation. like see documentation look like internet, can click button “Preview”. Saving changes possible clicking Save information document model, please refer vignette 03 Sharing Using Trained AI/Models.","code":""},{"path":"/articles/gui_aife_studio.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"02a Using Aifeducation Studio","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. https://doi.org/10.48550/arXiv.2004.05150 Berding, F., Riebenbauer, E., Stütz, S., Jahncke, H., Slopinski, ., & Rebmann, K. (2022). Performance Configuration Artificial Intelligence Educational Settings.: Introducing New Reliability Concept Based Content Analysis. Frontiers Education, 1–21. https://doi.org/10.3389/feduc.2022.818365 Campesato, O. (2021). Natural Language Processing Fundamentals Developers. Mercury Learning & Information. https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=6647713 Cascante-Bonilla, P., Tan, F., Qi, Y. & Ordonez, V. (2020). Curriculum Labeling: Revisiting Pseudo-Labeling Semi-Supervised Learning. https://doi.org/10.48550/arXiv.2001.06001 Chollet, F., Kalinowski, T., & Allaire, J. J. (2022). Deep learning R (Second edition). Manning Publications Co. https://learning.oreilly.com/library/view/-/9781633439849/?ar Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. https://doi.org/10.48550/arXiv.2006.03236 Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171–4186). Association Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 , P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. https://doi.org/10.48550/arXiv.2006.03654 Lane, H., Howard, C., & Hapke, H. M. (2019). Natural language processing action: Understanding, analyzing, generating text Python. Shelter Island: Manning. Larusson, J. ., & White, B. (Eds.). (2014). Learning Analytics: Research Practice. New York: Springer. https://doi.org/10.1007/978-1-4614-3305-7 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 Schreier, M. (2012). Qualitative Content Analysis Practice. Los Angeles: SAGE.","code":""},{"path":"/articles/sharing_and_publishing.html","id":"introduction","dir":"Articles","previous_headings":"","what":"1 Introduction","title":"03 Sharing and Using Trained AI/Models","text":"educational social sciences, common practice share research instruments questionnaires tests. example, Open Test Archive provides researchers practitioners access large number open access instruments. aifeducation assumes AI-based classifiers shareable, similarly research instruments, empower educational social science researchers support application AI educational purposes. Thus, aifeducation aims make sharing process easy possible. aim, every object generated aifeducation can prepared publication basic steps. vignette, like show make AI ready publication use models persons. Now start guide preparing text embedding models.","code":""},{"path":[]},{"path":"/articles/sharing_and_publishing.html","id":"adding-model-descriptions","dir":"Articles","previous_headings":"2 Text Embedding Models","what":"2.1 Adding Model Descriptions","title":"03 Sharing and Using Trained AI/Models","text":"Every object class TextEmbeddingModel comes several methods allowing provide important information potential users model. First, every model needs clear description developed, modified can used. can add description via method set_model_description. method allows provide description English native language model make distribution model easier. can write description HTML allows add links sources publications, add tables highlight important aspects model. like recommend write least English description allow wider community recognize work. Furthermore, description include: kind data used create model. much data used create model. steps performed method used. kinds tasks materials model can used. abstract_eng abstract_native can provide summary description. important like share work repository. keywords_eng keywords_native can set vector keywords help find work search engines. like recommend least provide information English. can access model’s description using method get_model_description Besides description work necessary provide information people involved creating model. can done method set_publication_info. First decide type information like add. two choices: “developer”, “modifier”, set type. type=\"developer\" stores information people involved process developing model. use transformer model Hugging Face, people description model entered developers. cases can use type providing description developed model. cases might wish modify existing model. might case use transformer model adapt model specific domain task. case rely work people modify work. cases can describe modifications setting type=modifier. every type person can add relevant individuals via authors. Please use R’s function personList() . citationyou can provide free text cite work different persons. url can provide link relevant sites model. can access information using get_publication_info. Finally, must provide license using model. can done set_software_license get_software_license. Please note, cases license model must “GPL-3” since software used create model licensed “GPL-3”. Thus, derivative work must also licensed “GPL-3”. documentation work part software. can set licenses Creative Common (CC) Free Documentation License (FDL). can set license documentation using method ‘set_documentation_license’. Now able share work. Please remember save now fully described object described following section 2.2.","code":"example_model$set_model_description( eng=NULL, native=NULL, abstract_eng=NULL, abstract_native=NULL, keywords_eng=NULL, keywords_native=NULL) example_model$get_model_description() example_model$set_publication_info( type, authors, citation, url=NULL) example_model$get_publication_info() example_model$set_software_license(\"GPL-3\") example_model$set_documentation_license(\"CC BY-SA\")"},{"path":"/articles/sharing_and_publishing.html","id":"saving-and-loading","dir":"Articles","previous_headings":"2 Text Embedding Models","what":"2.2 Saving and Loading","title":"03 Sharing and Using Trained AI/Models","text":"Saving created text embedding model easy using function save_ai_model. function provides unique interface text embedding models. saving work can pass model model directory save model model_dir. Please pass path directory path file function. Internally function creates new folder directory files belonging model stored. can see three text embedding models saved within directory named “text_embedding_models”. Within directory function creates unique folder every model. name folder specified dir_name. set dir_name=NULL append_ID=FALSE name folder created using models’ names. change argument append_ID append_ID=TRUE set dir_name=NULL unique ID model added directory. ID added automatically ensure every model unique name. important like share work persons. want load model, just call function load_ai_model can continue using model. following code assumes specified dir_name manually. case set dir_name=NULL append_ID=TRUE loading models may look shown following code snippet: Please note add name model directory path. example stored three models directory “text_embedding_models”. model saved within folder. folder’s name created automatically help name model. Thus, loading model must specify model want load adding model’s name directory path shown . point may wonder ID model’s name although enter ID model’s creation. ID added automatically ensure every model unique name. important like share work persons. saving ID appended automatically setting append_ID=TRUE. Now ready share work. Just provide files within model folder. BERT model example folder \"text_embedding_models/model_transformer_bert\" \"text_embedding_models/bert_embedding_ID_CmyAQKtts5RdlLaS\" depending saved model.","code":"save_ai_model( model=topic_modeling, model_dir=\"text_embedding_models\", append_ID=FALSE) save_ai_model( model=global_vector_clusters_modeling, model_dir=\"text_embedding_models\", append_ID=FALSE) save_ai_model( model=bert_modeling, model_dir=\"text_embedding_models\", append_ID=FALSE) topic_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_topic_modeling\", ml_framework=aifeducation_config$get_framework()) global_vector_clusters_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_global_vectors\", ml_framework=aifeducation_config$get_framework()) bert_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_transformer_bert\", ml_framework=aifeducation_config$get_framework()) topic_modeling<-load_ai_model( model_dir=\"text_embedding_models/topic_model_embedding_ID_DfO25E1Guuaqw7tM\") global_vector_clusters_modeling<-load_ai_model( model_dir=\"text_embedding_models/global_vector_clusters_embedding_ID_5Tu8HFHegIuoW14l\") bert_modeling<-load_ai_model( model_dir=\"text_embedding_models/bert_embedding_ID_CmyAQKtts5RdlLaS\")"},{"path":[]},{"path":"/articles/sharing_and_publishing.html","id":"adding-model-descriptions-1","dir":"Articles","previous_headings":"3 Classifiers","what":"3.1 Adding Model Descriptions","title":"03 Sharing and Using Trained AI/Models","text":"Adding model description classifier similar TextEmbeddingModels. methods set_model_description get_model_description can provide detailed description (parameter eng native) classifier English native language classifier. abstract_eng abstract_native can provide corresponding abstract descriptions keywords_eng keywords_native take vector corresponding keywords. case classifier, description include: short reference theoretical models guided development. clear detailed description every single category/class. short statement classifier can used. description kind quantity data used training. Information potential bias data. possible, information inter-coder-reliability coding process providing data. possible, provide link corresponding text embedding model. , can provide description HTML include tables (e.g. reporting reliability initial coding process) links sources publications. Please report performance values classifier description. can accessed directly via example_classifier$reliability$test_metric_mean. methods set_publication_info get_publication_info can provide bibliographic information classifier. contrast TextEmbeddingModels different types author groups. Finally, can manage license using classifier via set_software_license get_software_license. Similar TextEmbeddingModels classifier licensed via “GPL-3” since software used creating classifier applies license. documentation can choose license since documentation part software. setting receivinf license can call methods ‘set_documentation_license’ ‘get_documentation_license’ Now ready sharing classifier. Please remember save changes described following section 3.2.","code":"example_classifier$set_model_description( eng=\"This classifier targets the realization of the need for competence from the self-determination theory of motivation by Deci and Ryan in lesson plans and materials. It describes a learner’s need to perceive themselves as capable. In this classifier, the need for competence can take on the values 0 to 2. A value of 0 indicates that the learners have no space in the lesson plan to perceive their own learning progress and that there is no possibility for self-comparison. At level 1, competence growth is made visible implicitly, e.g. by demonstrating the ability to carry out complex exercises or peer control. At level 2, the increase in competence is made explicit by giving each learner insights into their progress towards the competence goal. For example, a comparison between the target vs. actual development towards the learning objectives of the lesson can be made, or the learners receive explicit feedback on their competence growth from the teacher. Self-assessment is also possible. The classifier was trained using 790 lesson plans, 298 materials and up to 1,400 textbook tasks. Two people who received coding training were involved in the coding and the inter-coder reliability for the need for competence increased from a dynamic iota value of 0.615 to 0.646 over two rounds of training. The Krippendorffs alpha value, on the other hand, decreased from 0.516 to 0.484. The classifier is suitable for use in all settings in which lesson plans and materials are to be reviewed with regard to their implementation of the need for competence.\", native=\"Dieser Classifier bewertet Unterrichtsentwürfe und Lernmaterial danach, ob sie das Bedürfnis nach Kompetenzerleben aus der Selbstbestimmungstheorie der Motivation nach Deci und Ryan unterstützen. Das Kompetenzerleben stellt das Bedürfnis dar, sich als wirksam zu erleben. Der Classifer unterteilt es in drei Stufen, wobei 0 bedeutet, dass die Lernenden im Unterrichtsentwurf bzw. Material keinen Raum haben, ihren eigenen Lernfortschritt wahrzunehmen und auch keine Möglichkeit zum Selbstvergleich besteht. Bei einer Ausprägung von 1 wird der Kompetenzzuwachs implizit, also z.B. durch die Durchführung komplexer Übungen oder einer Peer-Kontrolle ermöglicht. Auf Stufe 2 wird der Kompetenzzuwachs explizit aufgezeigt, indem jede:r Lernende einen objektiven Einblick erhält. So kann hier bspw. ein Soll-Ist-Vergleich mit den Lernzielen der Stunde erfolgen oder die Lernenden erhalten dezidiertes Feedback zu ihrem Kompetenzzuwachs durch die Lehrkraft. Auch eine Selbstbewertung ist möglich. Der Classifier wurde anhand von 790 Unterrichtsentwürfen, 298 Materialien und bis zu 1400 Schulbuchaufgaben traniert. Es waren an der Kodierung zwei Personen beteiligt, die eine Kodierschulung erhalten haben und die Inter-Coder-Reliabilität für das Kompetenzerleben würde über zwei Trainingsrunden von einem dynamischen Iota-Wert von 0,615 auf 0,646 gesteigert. Der Krippendorffs Alpha-Wert sank hingegen von 0,516 auf 0,484. Er eignet sich zum Einsatz in allen Settings, in denen Unterrichtsentwürfe und Lernmaterial hinsichtlich ihrer Umsetzung des Kompetenzerlebens überprüft werden sollen.\", abstract_eng=\"This classifier targets the realization of the need for competence from Deci and Ryan’s self-determination theory of motivation in l esson plans and materials. It describes a learner’s need to perceive themselves as capable. The variable need for competence is assessed by a scale of 0-2. The classifier was developed using 790 lesson plans, 298 materials and up to 1,400 textbook tasks. A coding training was conducted and the inter-coder reliabilities of different measures (i.e. Krippendorff’s Alpha and Dynamic Iota Index) of the individual categories were calculated at different points in time.\", abstract_native=\"Dieser Classifier bewertet Unterrichtsentwürfe und Lernmaterial danach, ob sie das Bedürfnis nach Kompetenzerleben aus der Selbstbestimmungstheorie der Motivation nach Deci & Ryan unterstützen. Das Kompetenzerleben stellt das Bedürfnis dar, sich als wirksam zu erleben. Der Classifer unterteilt es in drei Stufen und wurde anhand von 790 Unterrichtsentwürfen, 298 Materialien und bis zu 1400 Schulbuchaufgaben entwickelt. Es wurden stets Kodierschulungen durchgeführt und die Inter-Coder-Reliabilitäten der einzelnen Kategorien zu verschiedenen Zeitpunkten berechnet.\", keywords_eng=c(\"Self-determination theory\", \"motivation\", \"lesson planning\", \"business didactics\"), keywords_native=c(\"Selbstbestimmungstheorie\", \"Motivation\", \"Unterrichtsplanung\", \"Wirtschaftsdidaktik\") example_classifier$set_publication_info( authors, citation, url=NULL) example_classifier$set_software_license(\"GPL-3\") example_classifier$set_documentation_license(\"CC BY-SA\")"},{"path":"/articles/sharing_and_publishing.html","id":"saving-and-loading-1","dir":"Articles","previous_headings":"3 Classifiers","what":"3.2 Saving and Loading","title":"03 Sharing and Using Trained AI/Models","text":"created classifier, saving loading easy due functions save_ai_model load_ai_model. process saving model similar process text embedding models. pass model directory path function save_ai_model. folder name set dir_name. contrast text embedding models can specify additional argument save_format. case pytorch models arguments allows choose save_format = \"safetensors\" save_format = \"pt\". recommend chose save_format = \"safetensors\" since safer method save models. case tensorflow models arguments allows choose save_format = \"keras\", save_format = \"tf\" save_format = \"h5\". recommend chose save_format = \"keras\" since recommended format keras. set save_format = \"default\" .safetensors used pytorch models .keras used tensorflow models. like load model can call function load_ai_model. Note: Classifiers depend framework used creation. Thus, classifier always initalized original framework. argument ml_framework effect. case like share classifier broader audience recommend set dir_name=NULL append_ID = TRUE. create folder name automatically using classifier’s name unique ID. Similar text embedding models ID added name creation classifier ensuring unique name model. options folder name may look like \"movie_review_classifier_ID_oWsaNEB7b09A1pPB\". like share classifier persons provide files within folder \"classifiers/movie_review_classifier_ID_oWsaNEB7b09A1pPB\". Since files stored specific structure change edit files manually. Please note need TextEmbeddingModel used training order predict new data classifier. can request name, label, configuration model example_classifier$get_text_embedding_model()$model. Thus, like share classifier, ensure also share corresponding text embedding model. like apply classifier new data, two steps necessary. First, must transform raw text numerical expression using exactly text embedding model used train classifier. resulting object can passed method predict receive predictions together estimate certainty class/category. information can found vignette 02a Using Aifeducation Studio 02 classification tasks.","code":"save_ai_model( model=classifier, model_dir=\"classifiers\", dir_name=\"movie_review_classifier\", save_format = \"default\", append_ID = FALSE) classifier<-load_ai_model( model_dir=\"classifiers/movie_review_classifier\")"},{"path":"/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Berding Florian. Author, maintainer. Pargmann Julia. Contributor. Riebenbauer Elisabeth. Contributor. Rebmann Karin. Contributor. Slopinski Andreas. Contributor.","code":""},{"path":"/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Florian Berding, Julia Pargmann, Elisabeth Riebenbauer, Karin Rebmann, Andreas Slopinski (2023). AI Education (aifeducation). R package educators researchers educational social sciences. URL=https://fberding.github.io/aifeducation/index.html","code":"@Manual{, title = {AI for Education (aifeducation). A R package for educators and reserachers of the educational and social sciences.}, author = {Florian Berding and Julia Pargmann and Elisabeth Riebenbauer and Karin Rebmann and Andreas Slopinski}, year = {2023}, url = {https://fberding.github.io/aifeducation/index.html}, }"},{"path":"/index.html","id":"aifeducation","dir":"","previous_headings":"","what":"Artificial Intelligence for Education","title":"Artificial Intelligence for Education","text":"R package Artificial Intelligence Education (aifeducation) designed special requirements educators, educational researchers, social researchers. target audience package educators researchers coding skills like develop models, well people like use models created researchers/educators. package supports application Artificial Intelligence (AI) Natural Language Processing tasks text embedding classification special conditions educational social sciences.","code":""},{"path":"/index.html","id":"features-overview","dir":"","previous_headings":"","what":"Features Overview","title":"Artificial Intelligence for Education","text":"Simple usage artificial intelligence providing routines important tasks educators researchers social educational science. Provides graphical user interface (Aifeducation Studio), allowing people work AI without coding skills. Supports ‘PyTorch’ ‘Tensorflow’ machine learning frameworks. Implements advantages python library ‘datasets’ increasing computational speed allowing use large datasets. Uses safetensors saving models ‘PyTorch’. Supports usage trained models frameworks, providing high level flexibility. Supports pre-trained language models Hugging Face. Supports BERT, RoBERTa, DeBERTa, Longformer, Funnel Transformer creating context-sensitive text embedding. Makes sharing pre-trained models easy. Integrates sustainability tracking. Integrates special statistical techniques dealing data structures common social educational sciences. Supports classification long text documents. Currently, package focuses classification tasks can either used diagnose characteristics learners written material estimate properties learning teaching material. future, tasks implemented.","code":""},{"path":"/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Artificial Intelligence for Education","text":"can install latest stable version package CRAN : can install development version aifeducation GitHub : minimal version includes functions limited use transformers. full version additionally includes Aifeducation Studio (graphical user interface) older approaches (GlobalVectors, Topic Modeling). instructions installation can found vignette 01 Get Started. Please note update version aifeducation may require update python libraries. Refer 01 Get Started details.","code":"#Minimal version install.packages(\"aifeducation\") #Full version install.packages(\"aifeducation\",dependencies=TRUE) #Minimal version install.packages(\"devtools\") devtools::install_github(repo=\"FBerding/aifeducation\", ref=\"master\", dependencies = \"Imports\") #Maximal version install.packages(\"devtools\") devtools::install_github(repo=\"FBerding/aifeducation\", ref=\"master\", dependencies = TRUE)"},{"path":"/index.html","id":"graphical-user-interface-aifeducation-studio","dir":"","previous_headings":"","what":"Graphical User Interface Aifeducation Studio","title":"Artificial Intelligence for Education","text":"package ships shiny app serves graphical user interface. Figure 1: Aifeducation Studio Aifeducation Studio allows users easily develop, train, apply, document, analyse AI models without coding skills. See corresponding vignette details: 02a Using Aifeducation Studio.","code":""},{"path":"/index.html","id":"sustainability","dir":"","previous_headings":"","what":"Sustainability","title":"Artificial Intelligence for Education","text":"Training AI models consumes time energy. help researchers estimate ecological impact work, sustainability tracker implemented. based python library ‘codecarbon’ Courty et al. (2023). tracker allows estimate energy consumption CPUs, GPUs RAM training derives value CO2 emission. value based energy mix country computer located.","code":""},{"path":"/index.html","id":"pytorch-and-tensorflow-compatibility","dir":"","previous_headings":"","what":"PyTorch and Tensorflow Compatibility","title":"Artificial Intelligence for Education","text":"package allows supported models either based ‘PyTorch’ ‘tensorflow’, thus providing high level flexibility. Even pre-trained models can used frameworks cases. following table provides details: Table: Framework compatibility Please tensorflow currently supported following versions: 2.13-2.15.","code":""},{"path":[]},{"path":"/index.html","id":"transforming-texts-into-numbers","dir":"","previous_headings":"Classification Tasks","what":"Transforming Texts into Numbers","title":"Artificial Intelligence for Education","text":"Classification tasks require transformation raw texts representation numbers. step, aifeducation supports newer approaches BERT (Devlin et al. 2019), RoBERTa (Liu et al. 2019), DeBERTa version 2 (et al. 2020), Funnel-Transformer (Dai et al. 2020), Longformer (Beltagy, Peters & Cohan 2020) older approaches GlobalVectors (Pennington, Socher & Manning 2014) Latent Dirichlet Allocation/Topic Modeling classification tasks. aifeducation supports use pre-trained transformer models provided Hugging Face creation new transformers, allowing educators researchers develop specialized domain-specific models. package supports analysis long texts. Depending method, long texts transformed vectors , long, split several chunks results sequence vectors.","code":""},{"path":"/index.html","id":"training-ai-under-challenging-conditions","dir":"","previous_headings":"Classification Tasks","what":"Training AI under Challenging Conditions","title":"Artificial Intelligence for Education","text":"second step within classification task, aifeducation integrates important statistical mathematical methods dealing main challenges educational social sciences applying AI. : digital data availability: educational social sciences, data often available handwritten form. example, schools universities, students often solve tasks creating handwritten documents. Thus, educators researchers first transform analogue data digital form, involving human action. makes data generation financially expensive time-consuming, leading small data sets. high privacy policy standards: Furthermore, educational social sciences, data often refers humans /actions. kinds data protected privacy policies many countries, limiting access usage data, also results small data sets. long research tradition: Educational social sciences long research tradition generating insights social phenomena well learning teaching. insights incorporated applications AI (e.g., Luan et al. 2020; Wong et al. 2019). makes supervised machine learning important technology since provides link educational social theories models one hand machine learning hand (Berding et al. 2022). However, kind machine learning requires humans generate valid data set training process, leading small data sets. complex constructs: Compared classification tasks , instance, AI differentiate ‘good’ ‘bad’ movie review, constructs educational social sciences complex. example, research instruments motivational psychology require infer personal motifs written essays (e.g., Gruber & Kreuzpointner 2013). reliable valid interpretation kind information requires well qualified human raters, making data generation expensive. also limits size data set. imbalanced data: Finally, data educational social sciences often occurs imbalanced pattern several empirical studies show (Bloemen 2011; Stütz et al. 2022). Imbalanced means categories characteristics data set high absolute frequencies compared categories characteristics. Imbalance AI training guides algorithms focus prioritize categories characteristics high absolute frequencies, increasing risk miss categories/characteristics low frequencies (Haixiang et al. 2017). can lead AI prefer special groups people/material, imply false recommendations conclusions, miss rare categories characteristics. order deal problem imbalanced data sets, package integrates Synthetic Minority Oversampling Technique learning process. Currently, Basic Synthetic Minority Oversampling Technique (Chawla et al. 2002), Density-Bases Synthetic Minority Oversampling Technique (Bunkhumpornpat, Sinapiromsaran & Lursinsap 2012), Adaptive Synthetic Sampling Approach Imbalanced Learning (Hem Garcia & Li 2008) implemented via R package smotefamiliy. order address problem small data sets, training loops AI integrate pseudo-labeling (e.g., Lee 2013). Pseudo-labeling technique can used supervised learning. specifically, educators researchers rate part data set train AI part. remainder data processed humans. Instead, AI uses part data learn . Thus, educators researchers provide additional data AI’s learning process without coding . offers possibility add data training process reduce labor costs.","code":""},{"path":"/index.html","id":"evaluating-performance","dir":"","previous_headings":"Classification Tasks","what":"Evaluating Performance","title":"Artificial Intelligence for Education","text":"Classification tasks machine learning comparable empirical method content analysis social sciences. method looks back long research tradition ongoing discussion evaluate reliability validity generated data. order provide link research tradition provide educators well educational social researchers performance measures familiar , every AI trained package evaluated following measures concepts: Iota Concept Second Generation (Berding & Pargmann 2022) Krippendorff’s Alpha (Krippendorff 2019) Percentage Agreement Gwet’s AC1/AC2 (Gwet 2014) Kendall’s coefficient concordance W Cohen’s Kappa unweighted Cohen’s Kappa equal weights Cohen’s Kappa squared weights Fleiss’ Kappa multiple raters without exact estimation Addition traditional measures machine learning literature also available: Precision Recall F1-Score","code":""},{"path":"/index.html","id":"sharing-trained-ai","dir":"","previous_headings":"","what":"Sharing Trained AI","title":"Artificial Intelligence for Education","text":"Since package based keras, tensorflow, transformer libraries, every trained AI can shared educators researchers. package supports easy use pre-trained AI within R, also provides possibility export trained AI environments. Using pre-trained AI classification requires classifier corresponding text embedding model. Use Aifeducation Studio just load R start predictions. Vignette 02a Using Aifeducation Studio describes use user interface. Vignette 02b Classification Tasks describes save load objects R syntax. vignette 03 Sharing Using Trained AI/Models can find detailed guide document share models.","code":""},{"path":"/index.html","id":"tutorial-and-guides","dir":"","previous_headings":"","what":"Tutorial and Guides","title":"Artificial Intelligence for Education","text":"Installation configuration package: 01 Get Started. Introduction graphical user interface Aifeducation Studio: 02a Using Aifeducation Studio. short introduction package examples classification tasks: 02b Classification Tasks. description sharing models: 03 Sharing Using Trained AI/Models","code":""},{"path":"/index.html","id":"references","dir":"","previous_headings":"","what":"References","title":"Artificial Intelligence for Education","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. https://doi.org/10.48550/arXiv.2004.05150 Berding, F., & Pargmann, J. (2022). Iota Reliability Concept Second Generation. Berlin: Logos. https://doi.org/10.30819/5581 Berding, F., Riebenbauer, E., Stütz, S., Jahncke, H., Slopinski, ., & Rebmann, K. (2022). Performance Configuration Artificial Intelligence Educational Settings.: Introducing New Reliability Concept Based Content Analysis. Frontiers Education, 1-21. https://doi.org/10.3389/feduc.2022.818365 Bloemen, . (2011). Lernaufgaben Schulbüchern der Wirtschaftslehre: Analyse, Konstruktion und Evaluation von Lernaufgaben für die Lernfelder industrieller Geschäftsprozesse. Hampp. Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2012). DBSMOTE: Density-Based Synthetic Minority -sampling Technique. Applied Intelligence, 36(3), 664–684. https://doi.org/10.1007/s10489-011-0287-y Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority -sampling Technique. Journal Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953 Courty, B., Schmidt, V., Goyal-Kamal, Coutarel, M., Feld, B., Lecourt, J., & … (2023). mlco2/codecarbon: v2.2.7. https://doi.org/10.5281/zenodo.8181237 Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. https://doi.org/10.48550/arXiv.2006.03236 Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171–4186). Association Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Gruber, N., & Kreuzpointner, L. (2013). Measuring reliability picture story exercises like TAT. PloS One, 8(11), e79450. https://doi.org/10.1371/journal.pone.0079450 Gwet, K. L. (2014). Handbook inter-rater reliability: definitive guide measuring extent agreement among raters (Fourth edition). STATAXIS. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning class-imbalanced data: Review methods applications. Expert Systems Applications, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.035 , H., Bai, Y., Garcia, E. ., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach imbalanced learning. 2008 IEEE International Joint Conference Neural Networks (IEEE World Congress Computational Intelligence) (pp. 1322–1328). IEEE. https://doi.org/10.1109/IJCNN.2008.4633969 , P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. https://doi.org/10.48550/arXiv.2006.03654 Krippendorff, K. (2019). Content Analysis: Introduction Methodology (4th Ed.). SAGE. Lee, D.‑H. (2013). Pseudo-Label: Simple Efficient Semi-Supervised Learning Method Deep Neural Networks. CML 2013 Workshop: Challenges Representation Learning. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 Luan, H., Geczy, P., Lai, H., Gobert, J., Yang, S. J. H., Ogata, H., Baltes, J., Guerra, R., Li, P., & Tsai, C.‑C. (2020). Challenges Future Directions Big Data Artificial Intelligence Education. Frontiers Psychology, 11, 1–11. https://doi.org/10.3389/fpsyg.2020.580820 Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global Vectors Word Representation. Proceedings 2014 Conference Empirical Methods Natural Language Processing. https://aclanthology.org/D14-1162.pdf Stütz, S., Berding, F., Reincke, S., & Scheper, L. (2022). Characteristics learning tasks accounting textbooks: AI assisted analysis. Empirical Research Vocational Education Training, 14(1). https://doi.org/10.1186/s40461-022-00138-2 Wong, J., Baars, M., Koning, B. B. de, van der Zee, T., Davis, D., Khalil, M., Houben, G.‑J., & Paas, F. (2019). Educational Theories Learning Analytics: Data Knowledge. D. Ifenthaler, D.-K. Mah, & J. Y.-K. Yau (Eds.), Utilizing Learning Analytics Support Study Success (pp. 3–25). Springer. https://doi.org/10.1007/978-3-319-64792-0_1","code":""},{"path":"/reference/AifeducationConfiguration.html","id":null,"dir":"Reference","previous_headings":"","what":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"R6 class settting global machine learning framework. R6 class settting global machine learning framework.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"function nothing return. used side effects.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"R6 class setting global machine learning framework 'PyTorch' 'tensorflow'.","code":""},{"path":[]},{"path":[]},{"path":"/reference/AifeducationConfiguration.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"AifeducationConfiguration$get_framework() AifeducationConfiguration$set_global_ml_backend() AifeducationConfiguration$global_framework_set() AifeducationConfiguration$clone()","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"method-get-framework-","dir":"Reference","previous_headings":"","what":"Method get_framework()","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Method requesting used machine learning framework.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"AifeducationConfiguration$get_framework()"},{"path":"/reference/AifeducationConfiguration.html","id":"returns","dir":"Reference","previous_headings":"","what":"Returns","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Returns string containing used machine learning framework TextEmbeddingModels well TextEmbeddingClassifierNeuralNet.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"method-set-global-ml-backend-","dir":"Reference","previous_headings":"","what":"Method set_global_ml_backend()","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Method setting global machine learning framework.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"AifeducationConfiguration$set_global_ml_backend(backend)"},{"path":"/reference/AifeducationConfiguration.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"backend string Framework use training inference. backend=\"tensorflow\" 'tensorflow' backend=\"pytorch\" 'PyTorch'.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"returns-1","dir":"Reference","previous_headings":"","what":"Returns","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"method nothing return. used setting global configuration 'aifeducation'.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"method-global-framework-set-","dir":"Reference","previous_headings":"","what":"Method global_framework_set()","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Method checking global ml framework set.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"AifeducationConfiguration$global_framework_set()"},{"path":"/reference/AifeducationConfiguration.html","id":"returns-2","dir":"Reference","previous_headings":"","what":"Returns","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Return TRUE global machine learning framework set. Otherwise FALSE.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"objects class cloneable method.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"AifeducationConfiguration$clone(deep = FALSE)"},{"path":"/reference/AifeducationConfiguration.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"deep Whether make deep clone.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"library(aifeducation) #Example for setting the global machine learning framework #aifeducation_config is the object created during loading the package #For using PyTorch aifeducation_config$set_global_ml_backend(\"pytorch\") #> Global Backend set to: pytorch #For using Tensorflow aifeducation_config$set_global_ml_backend(\"pytorch\") #> Global Backend set to: pytorch #Example for requesting the global machine learning framework aifeducation_config$get_framework() #> $global_ml_framework #> [1] \"pytorch\" #> #> $TextEmbeddingFramework #> [1] \"pytorch\" #> #> $ClassifierFramework #> [1] \"pytorch\" #> #Example for checking if the global macheine learning framework is set aifeducation_config$global_framework_set() #> [1] TRUE"},{"path":"/reference/aifeducation_config.html","id":null,"dir":"Reference","previous_headings":"","what":"R6 object of class AifeducationConfiguration — aifeducation_config","title":"R6 object of class AifeducationConfiguration — aifeducation_config","text":"Object managing setting machine learning framework session.","code":""},{"path":"/reference/aifeducation_config.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 object of class AifeducationConfiguration — aifeducation_config","text":"","code":"aifeducation_config"},{"path":"/reference/aifeducation_config.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"R6 object of class AifeducationConfiguration — aifeducation_config","text":"object class aifeducationConfiguration (inherits R6) length 5.","code":""},{"path":[]},{"path":"/reference/array_to_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Array to matrix — array_to_matrix","title":"Array to matrix — array_to_matrix","text":"Function transforming array matrix.","code":""},{"path":"/reference/array_to_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Array to matrix — array_to_matrix","text":"","code":"array_to_matrix(text_embedding)"},{"path":"/reference/array_to_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Array to matrix — array_to_matrix","text":"text_embedding array containing text embedding. array created via object class TextEmbeddingModel.","code":""},{"path":"/reference/array_to_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Array to matrix — array_to_matrix","text":"Returns matrix contains cases rows columns represent features sequences. sequences concatenated.","code":""},{"path":[]},{"path":"/reference/array_to_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Array to matrix — array_to_matrix","text":"","code":"#text embedding is an array of shape (batch,times,features) example_embedding<-c(1:24) example_embedding<-array(example_embedding,dim=c(4,3,2)) example_embedding #> , , 1 #> #> [,1] [,2] [,3] #> [1,] 1 5 9 #> [2,] 2 6 10 #> [3,] 3 7 11 #> [4,] 4 8 12 #> #> , , 2 #> #> [,1] [,2] [,3] #> [1,] 13 17 21 #> [2,] 14 18 22 #> [3,] 15 19 23 #> [4,] 16 20 24 #> #Transform array to a matrix #matrix has shape (batch,times*features) array_to_matrix(example_embedding) #> feat_1 feat_2 feat_3 feat_4 feat_5 feat_6 #> [1,] 1 13 5 17 9 21 #> [2,] 2 14 6 18 10 22 #> [3,] 3 15 7 19 11 23 #> [4,] 4 16 8 20 12 24"},{"path":"/reference/bow_pp_create_basic_text_rep.html","id":null,"dir":"Reference","previous_headings":"","what":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","title":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","text":"function prepares raw texts use TextEmbeddingModel.","code":""},{"path":"/reference/bow_pp_create_basic_text_rep.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","text":"","code":"bow_pp_create_basic_text_rep( data, vocab_draft, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, remove_separators = TRUE, split_hyphens = FALSE, split_tags = FALSE, language_stopwords = \"de\", use_lemmata = FALSE, to_lower = FALSE, min_termfreq = NULL, min_docfreq = NULL, max_docfreq = NULL, window = 5, weights = 1/(1:5), trace = TRUE )"},{"path":"/reference/bow_pp_create_basic_text_rep.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","text":"data vector containing raw texts. vocab_draft Object created bow_pp_create_vocab_draft. remove_punct bool TRUE punctuation removed. remove_symbols bool TRUE symbols removed. remove_numbers bool TRUE numbers removed. remove_url bool TRUE urls removed. remove_separators bool TRUE separators removed. split_hyphens bool TRUE hyphens split several tokens. split_tags bool TRUE tags split. language_stopwords string Abbreviation language stopwords removed. use_lemmata bool TRUE lemmas instead original tokens used. to_lower bool TRUE tokens lemmas used lower cases. min_termfreq int Minimum frequency token part vocabulary. min_docfreq int Minimum appearance token documents part vocabulary. max_docfreq int Maximum appearance token documents part vocabulary. window int size window creating feature-co-occurance matrix. weights vector weights corresponding window. vector length must equal window size. trace bool TRUE information progress printed console.","code":""},{"path":"/reference/bow_pp_create_basic_text_rep.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","text":"Returns list class basic_text_rep following components. dfm: Document-Feature-Matrix. Rows correspond documents. Columns represent number tokens document. fcm: Feature-Co-Occurance-Matrix. information: list containing information used vocabulary. : n_sentence: Number sentences n_document_segments: Number document segments/raw texts n_token_init: Number initial tokens n_token_final: Number final tokens n_lemmata: Number lemmas configuration: list containing information vocabulary created lower cases vocabulary uses original tokens lemmas. language_model: list containing information applied language model. : model: udpipe language model label: label udpipe language model upos: applied universal part--speech tags language: language vocab: data.frame original vocabulary","code":""},{"path":[]},{"path":"/reference/bow_pp_create_vocab_draft.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"Function creating first draft vocabulary function creates list tokens refer specific universal part--speech tags (UPOS) provides corresponding lemmas.","code":""},{"path":"/reference/bow_pp_create_vocab_draft.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"","code":"bow_pp_create_vocab_draft( path_language_model, data, upos = c(\"NOUN\", \"ADJ\", \"VERB\"), label_language_model = NULL, language = NULL, chunk_size = 100, trace = TRUE )"},{"path":"/reference/bow_pp_create_vocab_draft.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"path_language_model string Path udpipe language model used tagging lemmatization. data vector containing raw texts. upos vector containing universal part--speech tags used build vocabulary. label_language_model string Label udpipe language model used. language string Name language (e.g., English, German) chunk_size int Number raw texts processed . trace bool TRUE information progress printed console.","code":""},{"path":"/reference/bow_pp_create_vocab_draft.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"list following components. vocab: data.frame containing tokens, lemmas, tokens lower case, lemmas lower case. ud_language_model udpipe language model used tagging. label_language_model Label udpipe language model. language Language raw texts. upos Used univerisal part--speech tags. n_sentence int Estimated number sentences raw texts. n_token int Estimated number tokens raw texts. n_document_segments int Estimated number document segments/raw texts.","code":""},{"path":"/reference/bow_pp_create_vocab_draft.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"list possible tags can found : https://universaldependencies.org/u/pos/index.html. huge number models can found : https://ufal.mff.cuni.cz/udpipe/2/models.","code":""},{"path":[]},{"path":"/reference/calc_standard_classification_measures.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate standard classification measures — calc_standard_classification_measures","title":"Calculate standard classification measures — calc_standard_classification_measures","text":"Function calculating recall, precision, f1.","code":""},{"path":"/reference/calc_standard_classification_measures.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate standard classification measures — calc_standard_classification_measures","text":"","code":"calc_standard_classification_measures(true_values, predicted_values)"},{"path":"/reference/calc_standard_classification_measures.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate standard classification measures — calc_standard_classification_measures","text":"true_values factor containing true labels/categories. predicted_values factor containing predicted labels/categories.","code":""},{"path":"/reference/calc_standard_classification_measures.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate standard classification measures — calc_standard_classification_measures","text":"Returns matrix contains cases categories rows measures (precision, recall, f1) columns.","code":""},{"path":[]},{"path":"/reference/check_aif_py_modules.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if all necessary python modules are available — check_aif_py_modules","title":"Check if all necessary python modules are available — check_aif_py_modules","text":"function checks python modules necessary package aifeducation work available.","code":""},{"path":"/reference/check_aif_py_modules.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if all necessary python modules are available — check_aif_py_modules","text":"","code":"check_aif_py_modules(trace = TRUE, check = \"all\")"},{"path":"/reference/check_aif_py_modules.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if all necessary python modules are available — check_aif_py_modules","text":"trace bool TRUE list modules availability printed console. check string determining machine learning framework check . check=\"pytorch\" 'pytorch', check=\"tensorflow\" 'tensorflow', check=\"\" frameworks.","code":""},{"path":"/reference/check_aif_py_modules.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if all necessary python modules are available — check_aif_py_modules","text":"function prints table relevant packages shows modules available unavailable. relevant modules available, functions returns TRUE. cases returns FALSE","code":""},{"path":[]},{"path":"/reference/check_embedding_models.html","id":null,"dir":"Reference","previous_headings":"","what":"Check of compatible text embedding models — check_embedding_models","title":"Check of compatible text embedding models — check_embedding_models","text":"function checks different objects based text embedding model. necessary ensure classifiers used data generated compatible embedding models.","code":""},{"path":"/reference/check_embedding_models.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check of compatible text embedding models — check_embedding_models","text":"","code":"check_embedding_models(object_list, same_class = FALSE)"},{"path":"/reference/check_embedding_models.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check of compatible text embedding models — check_embedding_models","text":"object_list list object class EmbeddedText TextEmbeddingClassifierNeuralNet. same_class bool TRUE object must class.","code":""},{"path":"/reference/check_embedding_models.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check of compatible text embedding models — check_embedding_models","text":"Returns TRUE objects refer text embedding model. FALSE cases.","code":""},{"path":[]},{"path":"/reference/clean_pytorch_log_transformers.html","id":null,"dir":"Reference","previous_headings":"","what":"Clean pytorch log of transformers — clean_pytorch_log_transformers","title":"Clean pytorch log of transformers — clean_pytorch_log_transformers","text":"Function preparing cleaning log created object class Trainer python library 'transformer's","code":""},{"path":"/reference/clean_pytorch_log_transformers.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Clean pytorch log of transformers — clean_pytorch_log_transformers","text":"","code":"clean_pytorch_log_transformers(log)"},{"path":"/reference/clean_pytorch_log_transformers.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Clean pytorch log of transformers — clean_pytorch_log_transformers","text":"log data.frame containing log.","code":""},{"path":"/reference/clean_pytorch_log_transformers.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Clean pytorch log of transformers — clean_pytorch_log_transformers","text":"Returns data.frame containing epochs, loss, val_loss.","code":""},{"path":[]},{"path":"/reference/combine_embeddings.html","id":null,"dir":"Reference","previous_headings":"","what":"Combine embedded texts — combine_embeddings","title":"Combine embedded texts — combine_embeddings","text":"Function combining embedded texts model","code":""},{"path":"/reference/combine_embeddings.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Combine embedded texts — combine_embeddings","text":"","code":"combine_embeddings(embeddings_list)"},{"path":"/reference/combine_embeddings.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Combine embedded texts — combine_embeddings","text":"embeddings_list list objects class EmbeddedText.","code":""},{"path":"/reference/combine_embeddings.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Combine embedded texts — combine_embeddings","text":"Returns object class EmbeddedText contains unique cases input objects.","code":""},{"path":[]},{"path":"/reference/create_bert_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on BERT — create_bert_model","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"function creates transformer configuration based BERT base architecture vocabulary based WordPiece using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_bert_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"","code":"create_bert_model( ml_framework = aifeducation_config$get_framework(), model_dir, vocab_raw_texts = NULL, vocab_size = 30522, vocab_do_lower_case = FALSE, max_position_embeddings = 512, hidden_size = 768, num_hidden_layer = 12, num_attention_heads = 12, intermediate_size = 3072, hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_bert_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. vocab_do_lower_case bool TRUE words/tokens lower case. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Number neurons layer. parameter determines dimensionality resulting text embedding. num_hidden_layer int Number hidden layers. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout. attention_probs_dropout_prob double Ratio dropout attention probabilities. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_bert_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_bert_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"train model, pass directory model function train_tune_bert_model. models uses WordPiece Tokenizer like BERT can trained whole word masking. Transformer library may show warning can ignored.","code":""},{"path":"/reference/create_bert_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171--4186). Association Computational Linguistics. doi:10.18653/v1/N19-1423 Hugging Face documentation https://huggingface.co/docs/transformers/model_doc/bert#transformers.TFBertForMaskedLM","code":""},{"path":[]},{"path":"/reference/create_deberta_v2_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"function creates transformer configuration based DeBERTa-V2 base architecture vocabulary based SentencePiece tokenizer using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_deberta_v2_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"","code":"create_deberta_v2_model( ml_framework = aifeducation_config$get_framework(), model_dir, vocab_raw_texts = NULL, vocab_size = 128100, do_lower_case = FALSE, max_position_embeddings = 512, hidden_size = 1536, num_hidden_layer = 24, num_attention_heads = 24, intermediate_size = 6144, hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_deberta_v2_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. do_lower_case bool TRUE characters transformed lower case. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Number neurons layer. parameter determines dimensionality resulting text embedding. num_hidden_layer int Number hidden layers. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout. attention_probs_dropout_prob double Ratio dropout attention probabilities. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_deberta_v2_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_deberta_v2_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"train model, pass directory model function train_tune_deberta_v2_model. model WordPiece tokenizer created. standard implementation DeBERTa version 2 HuggingFace uses SentencePiece tokenizer. Thus, please use AutoTokenizer 'transformers' library use model.","code":""},{"path":"/reference/create_deberta_v2_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":", P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. doi:10.48550/arXiv.2006.03654 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/deberta-v2#debertav2","code":""},{"path":[]},{"path":"/reference/create_funnel_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"function creates transformer configuration based Funnel Transformer base architecture vocabulary based WordPiece using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_funnel_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"","code":"create_funnel_model( ml_framework = aifeducation_config$get_framework(), model_dir, vocab_raw_texts = NULL, vocab_size = 30522, vocab_do_lower_case = FALSE, max_position_embeddings = 512, hidden_size = 768, target_hidden_size = 64, block_sizes = c(4, 4, 4), num_attention_heads = 12, intermediate_size = 3072, num_decoder_layers = 2, pooling_type = \"mean\", hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, activation_dropout = 0, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_funnel_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. vocab_do_lower_case bool TRUE words/tokens lower case. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Initial number neurons layer. target_hidden_size int Number neurons final layer. parameter determines dimensionality resulting text embedding. block_sizes vector int determining number sizes block. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. num_decoder_layers int Number decoding layers. pooling_type string \"mean\" pooling mean \"max\" pooling maximum values. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout. attention_probs_dropout_prob double Ratio dropout attention probabilities. activation_dropout float Dropout probability layers feed-forward blocks. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_funnel_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_funnel_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"model uses configuration truncate_seq=TRUE avoid implementation problems tensorflow. train model, pass directory model function train_tune_funnel_model. Model created separete_cls=TRUE,truncate_seq=TRUE, pool_q_only=TRUE. models uses WordPiece Tokenizer like BERT can trained whole word masking. Transformer library may show warning can ignored.","code":""},{"path":"/reference/create_funnel_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. doi:10.48550/arXiv.2006.03236 Hugging Face documentation https://huggingface.co/docs/transformers/model_doc/funnel#funnel-transformer","code":""},{"path":[]},{"path":"/reference/create_iota2_mean_object.html","id":null,"dir":"Reference","previous_headings":"","what":"Create an iota2 object — create_iota2_mean_object","title":"Create an iota2 object — create_iota2_mean_object","text":"Function creates object class iotarelr_iota2 can used package iotarelr. function internal use .","code":""},{"path":"/reference/create_iota2_mean_object.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create an iota2 object — create_iota2_mean_object","text":"","code":"create_iota2_mean_object( iota2_list, free_aem = FALSE, call = \"aifeducation::te_classifier_neuralnet\", original_cat_labels )"},{"path":"/reference/create_iota2_mean_object.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create an iota2 object — create_iota2_mean_object","text":"iota2_list list objects class iotarelr_iota2. free_aem bool TRUE iota2 objects estimated without forcing assumption weak superiority. call string characterizing source estimation. , function within object estimated. original_cat_labels vector containing original labels category.","code":""},{"path":"/reference/create_iota2_mean_object.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create an iota2 object — create_iota2_mean_object","text":"Returns object class iotarelr_iota2 mean iota2 object.","code":""},{"path":[]},{"path":"/reference/create_longformer_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on Longformer — create_longformer_model","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"function creates transformer configuration based Longformer base architecture vocabulary based Byte-Pair Encoding (BPE) tokenizer using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_longformer_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"","code":"create_longformer_model( ml_framework = aifeducation_config$get_framework, model_dir, vocab_raw_texts = NULL, vocab_size = 30522, add_prefix_space = FALSE, trim_offsets = TRUE, max_position_embeddings = 512, hidden_size = 768, num_hidden_layer = 12, num_attention_heads = 12, intermediate_size = 3072, hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, attention_window = 512, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_longformer_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. add_prefix_space bool TRUE additional space insert leading words. trim_offsets bool TRUE trims whitespaces produced offsets. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Number neurons layer. parameter determines dimensionality resulting text embedding. num_hidden_layer int Number hidden layers. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout attention_probs_dropout_prob double Ratio dropout attention probabilities. attention_window int Size window around token attention mechanism every layer. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_longformer_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_longformer_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"train model, pass directory model function train_tune_longformer_model.","code":""},{"path":"/reference/create_longformer_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. doi:10.48550/arXiv.2004.05150 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/longformer#transformers.LongformerConfig","code":""},{"path":[]},{"path":"/reference/create_roberta_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on RoBERTa — create_roberta_model","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"function creates transformer configuration based RoBERTa base architecture vocabulary based Byte-Pair Encoding (BPE) tokenizer using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_roberta_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"","code":"create_roberta_model( ml_framework = aifeducation_config$get_framework(), model_dir, vocab_raw_texts = NULL, vocab_size = 30522, add_prefix_space = FALSE, trim_offsets = TRUE, max_position_embeddings = 512, hidden_size = 768, num_hidden_layer = 12, num_attention_heads = 12, intermediate_size = 3072, hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_roberta_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. add_prefix_space bool TRUE additional space insert leading words. trim_offsets bool TRUE post processing trims offsets avoid including whitespaces. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Number neurons layer. parameter determines dimensionality resulting text embedding. num_hidden_layer int Number hidden layers. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout. attention_probs_dropout_prob double Ratio dropout attention probabilities. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_roberta_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_roberta_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"train model, pass directory model function train_tune_roberta_model.","code":""},{"path":"/reference/create_roberta_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. doi:10.48550/arXiv.1907.11692 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/roberta#transformers.RobertaConfig","code":""},{"path":[]},{"path":"/reference/create_synthetic_units.html","id":null,"dir":"Reference","previous_headings":"","what":"Create synthetic units — create_synthetic_units","title":"Create synthetic units — create_synthetic_units","text":"Function creating synthetic cases order balance data training TextEmbeddingClassifierNeuralNet. auxiliary function use get_synthetic_cases allow parallel computations.","code":""},{"path":"/reference/create_synthetic_units.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create synthetic units — create_synthetic_units","text":"","code":"create_synthetic_units(embedding, target, k, max_k, method, cat, cat_freq)"},{"path":"/reference/create_synthetic_units.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create synthetic units — create_synthetic_units","text":"embedding Named data.frame containing text embeddings. cases object taken EmbeddedText$embeddings. target Named factor containing labels/categories corresponding cases. k int number nearest neighbors sampling process. max_k int maximum number nearest neighbors sampling process. method vector containing strings requested methods generating new cases. Currently \"smote\",\"dbsmote\", \"adas\" package smotefamily available. cat string category new cases created. cat_freq Object class \"table\" containing absolute frequencies every category/label.","code":""},{"path":"/reference/create_synthetic_units.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create synthetic units — create_synthetic_units","text":"Returns list contains text embeddings new synthetic cases named data.frame labels named factor.","code":""},{"path":[]},{"path":"/reference/EmbeddedText.html","id":null,"dir":"Reference","previous_headings":"","what":"Embedded text — EmbeddedText","title":"Embedded text — EmbeddedText","text":"Object class R6 stores text embeddings generated object class TextEmbeddingModel via method embed().","code":""},{"path":"/reference/EmbeddedText.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Embedded text — EmbeddedText","text":"Returns object class EmbeddedText. objects used storing managing text embeddings created objects class TextEmbeddingModel. Objects class EmbeddedText serve input classifiers class TextEmbeddingClassifierNeuralNet. main aim class provide structured link embedding models classifiers. Since objects class save information text embedding model created text embedding ensures embedding generated embedding model combined. Furthermore, stored information allows classifiers check embeddings correct text embedding model used training predicting.","code":""},{"path":[]},{"path":"/reference/EmbeddedText.html","id":"public-fields","dir":"Reference","previous_headings":"","what":"Public fields","title":"Embedded text — EmbeddedText","text":"embeddings ('data.frame()') data.frame containing text embeddings chunks. Documents rows. Embedding dimensions columns.","code":""},{"path":[]},{"path":"/reference/EmbeddedText.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"Embedded text — EmbeddedText","text":"EmbeddedText$new() EmbeddedText$get_model_info() EmbeddedText$get_model_label() EmbeddedText$clone()","code":""},{"path":"/reference/EmbeddedText.html","id":"method-new-","dir":"Reference","previous_headings":"","what":"Method new()","title":"Embedded text — EmbeddedText","text":"Creates new object representing text embeddings.","code":""},{"path":"/reference/EmbeddedText.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Embedded text — EmbeddedText","text":"","code":"EmbeddedText$new( model_name = NA, model_label = NA, model_date = NA, model_method = NA, model_version = NA, model_language = NA, param_seq_length = NA, param_chunks = NULL, param_overlap = NULL, param_emb_layer_min = NULL, param_emb_layer_max = NULL, param_emb_pool_type = NULL, param_aggregation = NULL, embeddings )"},{"path":"/reference/EmbeddedText.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Embedded text — EmbeddedText","text":"model_name string Name model generates embedding. model_label string Label model generates embedding. model_date string Date embedding generating model created. model_method string Method underlying embedding model. model_version string Version model generated embedding. model_language string Language model generated embedding. param_seq_length int Maximum number tokens processes generating model chunk. param_chunks int Maximum number chunks supported generating model. param_overlap int Number tokens added beginning sequence next chunk model. param_emb_layer_min int string determining first layer included creation embeddings. param_emb_layer_max int string determining last layer included creation embeddings. param_emb_pool_type string determining method pooling token embeddings within layer. param_aggregation string Aggregation method hidden states. Deprecated. included backward compatibility. embeddings data.frame containing text embeddings.","code":""},{"path":"/reference/EmbeddedText.html","id":"returns","dir":"Reference","previous_headings":"","what":"Returns","title":"Embedded text — EmbeddedText","text":"Returns object class EmbeddedText stores text embeddings produced objects class TextEmbeddingModel. object serves input objects class TextEmbeddingClassifierNeuralNet.","code":""},{"path":"/reference/EmbeddedText.html","id":"method-get-model-info-","dir":"Reference","previous_headings":"","what":"Method get_model_info()","title":"Embedded text — EmbeddedText","text":"Method retrieving information model generated embedding.","code":""},{"path":"/reference/EmbeddedText.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"Embedded text — EmbeddedText","text":"","code":"EmbeddedText$get_model_info()"},{"path":"/reference/EmbeddedText.html","id":"returns-1","dir":"Reference","previous_headings":"","what":"Returns","title":"Embedded text — EmbeddedText","text":"list contains saved information underlying text embedding model.","code":""},{"path":"/reference/EmbeddedText.html","id":"method-get-model-label-","dir":"Reference","previous_headings":"","what":"Method get_model_label()","title":"Embedded text — EmbeddedText","text":"Method retrieving label model generated embedding.","code":""},{"path":"/reference/EmbeddedText.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"Embedded text — EmbeddedText","text":"","code":"EmbeddedText$get_model_label()"},{"path":"/reference/EmbeddedText.html","id":"returns-2","dir":"Reference","previous_headings":"","what":"Returns","title":"Embedded text — EmbeddedText","text":"string Label corresponding text embedding model","code":""},{"path":"/reference/EmbeddedText.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"Embedded text — EmbeddedText","text":"objects class cloneable method.","code":""},{"path":"/reference/EmbeddedText.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"Embedded text — EmbeddedText","text":"","code":"EmbeddedText$clone(deep = FALSE)"},{"path":"/reference/EmbeddedText.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"Embedded text — EmbeddedText","text":"deep Whether make deep clone.","code":""},{"path":"/reference/generate_id.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate ID suffix for objects — generate_id","title":"Generate ID suffix for objects — generate_id","text":"Function generating ID suffix objects class TextEmbeddingModel TextEmbeddingClassifierNeuralNet.","code":""},{"path":"/reference/generate_id.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate ID suffix for objects — generate_id","text":"","code":"generate_id(length = 16)"},{"path":"/reference/generate_id.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate ID suffix for objects — generate_id","text":"length int determining length id suffix.","code":""},{"path":"/reference/generate_id.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate ID suffix for objects — generate_id","text":"Returns string requested length","code":""},{"path":[]},{"path":"/reference/get_coder_metrics.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate reliability measures based on content analysis — get_coder_metrics","title":"Calculate reliability measures based on content analysis — get_coder_metrics","text":"function calculates different reliability measures based empirical research method content analysis.","code":""},{"path":"/reference/get_coder_metrics.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate reliability measures based on content analysis — get_coder_metrics","text":"","code":"get_coder_metrics( true_values = NULL, predicted_values = NULL, return_names_only = FALSE )"},{"path":"/reference/get_coder_metrics.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate reliability measures based on content analysis — get_coder_metrics","text":"true_values factor containing true labels/categories. predicted_values factor containing predicted labels/categories. return_names_only bool TRUE returns names resulting vector. Use FALSE request computation values.","code":""},{"path":"/reference/get_coder_metrics.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate reliability measures based on content analysis — get_coder_metrics","text":"return_names_only=FALSE returns vector following reliability measures: #' iota_index: Iota Index Iota Reliability Concept Version 2. min_iota2: Minimal Iota Iota Reliability Concept Version 2. avg_iota2: Average Iota Iota Reliability Concept Version 2. max_iota2: Maximum Iota Iota Reliability Concept Version 2. min_alpha: Minmal Alpha Reliability Iota Reliability Concept Version 2. avg_alpha: Average Alpha Reliability Iota Reliability Concept Version 2. max_alpha: Maximum Alpha Reliability Iota Reliability Concept Version 2. static_iota_index: Static Iota Index Iota Reliability Concept Version 2. dynamic_iota_index: Dynamic Iota Index Iota Reliability Concept Version 2. kalpha_nominal: Krippendorff's Alpha nominal variables. kalpha_ordinal: Krippendorff's Alpha ordinal variables. kendall: Kendall's coefficient concordance W. kappa2_unweighted: Cohen's Kappa unweighted. kappa2_equal_weighted: Weighted Cohen's Kappa equal weights. kappa2_squared_weighted: Weighted Cohen's Kappa squared weights. kappa_fleiss: Fleiss' Kappa multiple raters without exact estimation. percentage_agreement: Percentage Agreement. balanced_accuracy: Average accuracy within class. gwet_ac: Gwet's AC1/AC2 agreement coefficient. return_names_only=TRUE returns names vector elements.","code":""},{"path":[]},{"path":"/reference/get_folds.html","id":null,"dir":"Reference","previous_headings":"","what":"Create cross-validation samples — get_folds","title":"Create cross-validation samples — get_folds","text":"Function creates cross-validation samples ensures relative frequency every category/label within fold equals relative frequency category/label within initial data.","code":""},{"path":"/reference/get_folds.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create cross-validation samples — get_folds","text":"","code":"get_folds(target, k_folds)"},{"path":"/reference/get_folds.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create cross-validation samples — get_folds","text":"target Named factor containing relevant labels/categories. Missing cases declared NA. k_folds int number folds.","code":""},{"path":"/reference/get_folds.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create cross-validation samples — get_folds","text":"Return list following components: val_sample: vector strings containing names cases validation sample. train_sample: vector strings containing names cases train sample. n_folds: int Number realized folds. unlabeled_cases: vector strings containing names unlabeled cases.","code":""},{"path":"/reference/get_folds.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create cross-validation samples — get_folds","text":"parameter target allows cases missing categories/labels. declared NA. cases ignored creating different folds. names saved within component unlabeled_cases. cases can used Pseudo Labeling. function checks absolute frequencies every category/label. absolute frequency sufficient ensure least four cases every fold, number folds adjusted. cases, warning printed console. least four cases per fold necessary ensure training TextEmbeddingClassifierNeuralNet works well options turned .","code":""},{"path":[]},{"path":"/reference/get_n_chunks.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the number of chunks/sequences for each case — get_n_chunks","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"Function calculating number chunks/sequences every case","code":""},{"path":"/reference/get_n_chunks.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"","code":"get_n_chunks(text_embeddings, features, times)"},{"path":"/reference/get_n_chunks.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"text_embeddings data.frame array containing text embeddings. features int Number features within sequence. times int Number sequences","code":""},{"path":"/reference/get_n_chunks.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"Namedvector integers representing number chunks/sequences every case.","code":""},{"path":[]},{"path":"/reference/get_n_chunks.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"","code":"test_array<-array(data=c(1,1,1,0,0,0,0,0,0, 2,1,1,2,5,6,0,0,0, 1,2,5,6,1,2,0,4,2), dim=c(3,3,3)) test_array #> , , 1 #> #> [,1] [,2] [,3] #> [1,] 1 0 0 #> [2,] 1 0 0 #> [3,] 1 0 0 #> #> , , 2 #> #> [,1] [,2] [,3] #> [1,] 2 2 0 #> [2,] 1 5 0 #> [3,] 1 6 0 #> #> , , 3 #> #> [,1] [,2] [,3] #> [1,] 1 6 0 #> [2,] 2 1 4 #> [3,] 5 2 2 #> #test array has shape (batch,times,features) with #times=3 and features=3 #Slices where all values are zero are padded. get_n_chunks(text_embeddings=test_array,features=3,times=3) #> [1] 2 3 3 #The length of case 1 is 1, case 2 is 3, and case 3 is 2."},{"path":"/reference/get_stratified_train_test_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a stratified random sample — get_stratified_train_test_split","title":"Create a stratified random sample — get_stratified_train_test_split","text":"function creates stratified random sample.difference get_train_test_split function require text embeddings split text embeddings train validation sample.","code":""},{"path":"/reference/get_stratified_train_test_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a stratified random sample — get_stratified_train_test_split","text":"","code":"get_stratified_train_test_split(targets, val_size = 0.25)"},{"path":"/reference/get_stratified_train_test_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a stratified random sample — get_stratified_train_test_split","text":"targets Named vector containing labels/categories case. val_size double Value 0 1 indicating many cases label/category part validation sample.","code":""},{"path":"/reference/get_stratified_train_test_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a stratified random sample — get_stratified_train_test_split","text":"list contains names cases belonging train sample validation sample.","code":""},{"path":[]},{"path":"/reference/get_synthetic_cases.html","id":null,"dir":"Reference","previous_headings":"","what":"Create synthetic cases for balancing training data — get_synthetic_cases","title":"Create synthetic cases for balancing training data — get_synthetic_cases","text":"function creates synthetic cases balancing training object class TextEmbeddingClassifierNeuralNet.","code":""},{"path":"/reference/get_synthetic_cases.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create synthetic cases for balancing training data — get_synthetic_cases","text":"","code":"get_synthetic_cases( embedding, times, features, target, method = c(\"smote\"), max_k = 6 )"},{"path":"/reference/get_synthetic_cases.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create synthetic cases for balancing training data — get_synthetic_cases","text":"embedding Named data.frame containing text embeddings. cases, object taken EmbeddedText$embeddings. times int number sequences/times. features int number features within sequence. target Named factor containing labels corresponding embeddings. method vector containing strings requested methods generating new cases. Currently \"smote\",\"dbsmote\", \"adas\" package smotefamily available. max_k int maximum number nearest neighbors sampling process.","code":""},{"path":"/reference/get_synthetic_cases.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create synthetic cases for balancing training data — get_synthetic_cases","text":"list following components. syntetic_embeddings: Named data.frame containing text embeddings synthetic cases. syntetic_targets Named factor containing labels corresponding synthetic cases. n_syntetic_units table showing number synthetic cases every label/category.","code":""},{"path":[]},{"path":"/reference/get_train_test_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for splitting data into a train and validation sample — get_train_test_split","title":"Function for splitting data into a train and validation sample — get_train_test_split","text":"function creates train validation sample based stratified random sampling. relative frequencies category train validation sample equal relative frequencies initial data (proportional stratified sampling).","code":""},{"path":"/reference/get_train_test_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for splitting data into a train and validation sample — get_train_test_split","text":"","code":"get_train_test_split(embedding = NULL, target, val_size)"},{"path":"/reference/get_train_test_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for splitting data into a train and validation sample — get_train_test_split","text":"embedding Object class EmbeddedText. target Named factor containing labels every case. val_size double Ratio 0 1 indicating relative frequency cases used validation sample.","code":""},{"path":"/reference/get_train_test_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for splitting data into a train and validation sample — get_train_test_split","text":"Returns list following components. target_train: Named factor containing labels training sample. embeddings_train: Object class EmbeddedText containing text embeddings training sample target_test: Named factor containing labels validation sample. embeddings_test: Object class EmbeddedText containing text embeddings validation sample","code":""},{"path":[]},{"path":"/reference/imdb_movie_reviews.html","id":null,"dir":"Reference","previous_headings":"","what":"Standford Movie Review Dataset — imdb_movie_reviews","title":"Standford Movie Review Dataset — imdb_movie_reviews","text":"data.frame consisting subset 100 negative 200 positive movie reviews dataset provided Maas et al. (2011). data.frame consists three columns. first column 'text' stores movie review. second stores labels (0 = negative, 1 = positive). last column stores id. purpose data illustration vignettes.","code":""},{"path":"/reference/imdb_movie_reviews.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Standford Movie Review Dataset — imdb_movie_reviews","text":"","code":"imdb_movie_reviews"},{"path":"/reference/imdb_movie_reviews.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Standford Movie Review Dataset — imdb_movie_reviews","text":"data.frame","code":""},{"path":"/reference/imdb_movie_reviews.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Standford Movie Review Dataset — imdb_movie_reviews","text":"Maas, . L., Daly, R. E., Pham, P. T., Huang, D., Ng, . Y., & Potts, C. (2011). Learning Word Vectors Sentiment Analysis. D. Lin, Y. Matsumoto, & R. Mihalcea (Eds.), Proceedings 49th Annual Meeting Association Computational Linguistics: Human Language Technologies (pp. 142–150). Association Computational Linguistics. https://aclanthology.org/P11-1015","code":""},{"path":"/reference/install_py_modules.html","id":null,"dir":"Reference","previous_headings":"","what":"Installing necessary python modules to an environment — install_py_modules","title":"Installing necessary python modules to an environment — install_py_modules","text":"Function installing necessary python modules","code":""},{"path":"/reference/install_py_modules.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Installing necessary python modules to an environment — install_py_modules","text":"","code":"install_py_modules( envname = \"aifeducation\", install = \"pytorch\", tf_version = \"2.15\", pytorch_cuda_version = \"12.1\", python_version = \"3.9\", remove_first = FALSE, cpu_only = FALSE )"},{"path":"/reference/install_py_modules.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Installing necessary python modules to an environment — install_py_modules","text":"envname string Name environment packages installed. install character determining machine learning frameworks installed. install=\"\" 'pytorch' 'tensorflow'. install=\"pytorch\" 'pytorch', install=\"tensorflow\" 'tensorflow'. tf_version string determining desired version 'tensorflow'. pytorch_cuda_version string determining desired version 'cuda' 'PyTorch'. python_version string Python version use. remove_first bool TRUE removes environment completely recreating environment installing packages. FALSE packages installed existing environment without prior changes. cpu_only bool TRUE installs cpu version machine learning frameworks.","code":""},{"path":"/reference/install_py_modules.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Installing necessary python modules to an environment — install_py_modules","text":"Returns values objects. Function used installing necessary python libraries conda environment.","code":""},{"path":[]},{"path":"/reference/is.null_or_na.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if NULL or NA — is.null_or_na","title":"Check if NULL or NA — is.null_or_na","text":"Function checking object NULL NA","code":""},{"path":"/reference/is.null_or_na.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if NULL or NA — is.null_or_na","text":"","code":"is.null_or_na(object)"},{"path":"/reference/is.null_or_na.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if NULL or NA — is.null_or_na","text":"object object test.","code":""},{"path":"/reference/is.null_or_na.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if NULL or NA — is.null_or_na","text":"Returns FALSE object NULL NA. Returns TRUE cases.","code":""},{"path":[]},{"path":"/reference/load_ai_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Loading models created with 'aifeducation' — load_ai_model","title":"Loading models created with 'aifeducation' — load_ai_model","text":"Function loading models created 'aifeducation'.","code":""},{"path":"/reference/load_ai_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Loading models created with 'aifeducation' — load_ai_model","text":"","code":"load_ai_model(model_dir, ml_framework = aifeducation_config$get_framework())"},{"path":"/reference/load_ai_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Loading models created with 'aifeducation' — load_ai_model","text":"model_dir Path directory model stored. ml_framework string Determines machine learning framework using model. Possible ml_framework=\"pytorch\" 'pytorch', ml_framework=\"tensorflow\" 'tensorflow', ml_framework=\"auto\". using framework used saving model.","code":""},{"path":"/reference/load_ai_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Loading models created with 'aifeducation' — load_ai_model","text":"Returns object class TextEmbeddingClassifierNeuralNet TextEmbeddingModel.","code":""},{"path":[]},{"path":"/reference/matrix_to_array_c.html","id":null,"dir":"Reference","previous_headings":"","what":"Reshape matrix to array — matrix_to_array_c","title":"Reshape matrix to array — matrix_to_array_c","text":"Function written C++ reshaping matrix containing sequential data array use keras.","code":""},{"path":"/reference/matrix_to_array_c.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Reshape matrix to array — matrix_to_array_c","text":"","code":"matrix_to_array_c(matrix, times, features)"},{"path":"/reference/matrix_to_array_c.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Reshape matrix to array — matrix_to_array_c","text":"matrix matrix containing sequential data. times uword Number sequences. features uword Number features within sequence.","code":""},{"path":"/reference/matrix_to_array_c.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Reshape matrix to array — matrix_to_array_c","text":"Returns array. first dimension corresponds cases, second times, third features.","code":""},{"path":[]},{"path":"/reference/matrix_to_array_c.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Reshape matrix to array — matrix_to_array_c","text":"","code":"#matrix has shape (batch,times*features) matrix<-matrix(data=c(1,1,1,2,2,2, 2,2,2,3,3,3, 1,1,1,1,1,1), nrow=3, byrow=TRUE) matrix #> [,1] [,2] [,3] [,4] [,5] [,6] #> [1,] 1 1 1 2 2 2 #> [2,] 2 2 2 3 3 3 #> [3,] 1 1 1 1 1 1 #Transform matrix to a array #array has shape (batch,times*features) matrix_to_array_c(matrix=matrix,times=2,features=3) #> , , 1 #> #> [,1] [,2] #> [1,] 1 2 #> [2,] 2 3 #> [3,] 1 1 #> #> , , 2 #> #> [,1] [,2] #> [1,] 1 2 #> [2,] 2 3 #> [3,] 1 1 #> #> , , 3 #> #> [,1] [,2] #> [1,] 1 2 #> [2,] 2 3 #> [3,] 1 1 #>"},{"path":"/reference/save_ai_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Saving models created with 'aifeducation' — save_ai_model","title":"Saving models created with 'aifeducation' — save_ai_model","text":"Function saving models created 'aifeducation'.","code":""},{"path":"/reference/save_ai_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Saving models created with 'aifeducation' — save_ai_model","text":"","code":"save_ai_model( model, model_dir, dir_name = NULL, save_format = \"default\", append_ID = TRUE )"},{"path":"/reference/save_ai_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Saving models created with 'aifeducation' — save_ai_model","text":"model Object class TextEmbeddingClassifierNeuralNet TextEmbeddingModel saved. model_dir Path directory model stored. dir_name Name folder created model_dir. Ifdir_name=NULL model's name used. additionally append_ID=TRUE models's name ID used generating name directory. save_format relevant TextEmbeddingClassifierNeuralNet. Format saving model. 'tensorflow'/'keras' models \"keras\" 'Keras v3 format', \"tf\" SavedModel \"h5\" HDF5. 'pytorch' models \"safetensors\" 'safetensors' \"pt\" 'pytorch via pickle'. Use \"default\" standard format. keras 'tensorflow'/'keras' models safetensors 'pytorch' models. append_ID bool TRUE ID appended model directory saving purposes. FALSE .","code":""},{"path":"/reference/save_ai_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Saving models created with 'aifeducation' — save_ai_model","text":"Function return value. saves model disk. return value, called side effects.","code":""},{"path":[]},{"path":"/reference/set_config_cpu_only.html","id":null,"dir":"Reference","previous_headings":"","what":"Setting cpu only for 'tensorflow' — set_config_cpu_only","title":"Setting cpu only for 'tensorflow' — set_config_cpu_only","text":"functions configurates 'tensorflow' use cpus.","code":""},{"path":"/reference/set_config_cpu_only.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Setting cpu only for 'tensorflow' — set_config_cpu_only","text":"","code":"set_config_cpu_only()"},{"path":"/reference/set_config_cpu_only.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Setting cpu only for 'tensorflow' — set_config_cpu_only","text":"function return anything. used side effects.","code":""},{"path":"/reference/set_config_cpu_only.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Setting cpu only for 'tensorflow' — set_config_cpu_only","text":"os$environ$setdefault(\"CUDA_VISIBLE_DEVICES\",\"-1\")","code":""},{"path":[]},{"path":"/reference/set_config_gpu_low_memory.html","id":null,"dir":"Reference","previous_headings":"","what":"Setting gpus' memory usage — set_config_gpu_low_memory","title":"Setting gpus' memory usage — set_config_gpu_low_memory","text":"function changes memory usage gpus allow computations machines small memory. function, computations large models may possible speed computation decreases.","code":""},{"path":"/reference/set_config_gpu_low_memory.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Setting gpus' memory usage — set_config_gpu_low_memory","text":"","code":"set_config_gpu_low_memory()"},{"path":"/reference/set_config_gpu_low_memory.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Setting gpus' memory usage — set_config_gpu_low_memory","text":"function return anything. used side effects.","code":""},{"path":"/reference/set_config_gpu_low_memory.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Setting gpus' memory usage — set_config_gpu_low_memory","text":"function sets TF_GPU_ALLOCATOR \"cuda_malloc_async\" sets memory growth TRUE.","code":""},{"path":[]},{"path":"/reference/set_config_os_environ_logger.html","id":null,"dir":"Reference","previous_headings":"","what":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","title":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","text":"function changes level logging information 'tensorflow' via os environment. function must called importing 'tensorflow'.","code":""},{"path":"/reference/set_config_os_environ_logger.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","text":"","code":"set_config_os_environ_logger(level = \"ERROR\")"},{"path":"/reference/set_config_os_environ_logger.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","text":"level string Minimal level printed console. Four levels available: INFO, WARNING, ERROR NONE.","code":""},{"path":"/reference/set_config_os_environ_logger.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","text":"function return anything. used side effects.","code":""},{"path":[]},{"path":"/reference/set_config_tf_logger.html","id":null,"dir":"Reference","previous_headings":"","what":"Sets the level for logging information in tensor flow. — set_config_tf_logger","title":"Sets the level for logging information in tensor flow. — set_config_tf_logger","text":"function changes level logging information 'tensorflow'.","code":""},{"path":"/reference/set_config_tf_logger.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sets the level for logging information in tensor flow. — set_config_tf_logger","text":"","code":"set_config_tf_logger(level = \"ERROR\")"},{"path":"/reference/set_config_tf_logger.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sets the level for logging information in tensor flow. — set_config_tf_logger","text":"level string Minimal level printed console. Five levels available: FATAL, ERROR, WARN, INFO, DEBUG.","code":""},{"path":"/reference/set_config_tf_logger.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Sets the level for logging information in tensor flow. — set_config_tf_logger","text":"function return anything. used side effects.","code":""},{"path":[]},{"path":"/reference/set_transformers_logger.html","id":null,"dir":"Reference","previous_headings":"","what":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","title":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","text":"function changes level logging information 'transformers' library. influences output printed console creating training transformer models well TextEmbeddingModels.","code":""},{"path":"/reference/set_transformers_logger.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","text":"","code":"set_transformers_logger(level = \"ERROR\")"},{"path":"/reference/set_transformers_logger.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","text":"level string Minimal level printed console. Four levels available: INFO, WARNING, ERROR DEBUG","code":""},{"path":"/reference/set_transformers_logger.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","text":"function return anything. used side effects.","code":""},{"path":[]},{"path":"/reference/split_labeled_unlabeled.html","id":null,"dir":"Reference","previous_headings":"","what":"Split data into labeled and unlabeled data — split_labeled_unlabeled","title":"Split data into labeled and unlabeled data — split_labeled_unlabeled","text":"functions splits data labeled unlabeled data.","code":""},{"path":"/reference/split_labeled_unlabeled.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Split data into labeled and unlabeled data — split_labeled_unlabeled","text":"","code":"split_labeled_unlabeled(embedding, target)"},{"path":"/reference/split_labeled_unlabeled.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Split data into labeled and unlabeled data — split_labeled_unlabeled","text":"embedding Object class EmbeddedText. target Named factor containing cases labels missing labels.","code":""},{"path":"/reference/split_labeled_unlabeled.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Split data into labeled and unlabeled data — split_labeled_unlabeled","text":"Returns list following components embeddings_labeled: Object class EmbeddedText containing cases labels. embeddings_unlabeled: Object class EmbeddedText containing cases labels. targets_labeled: Named factor containing labels relevant cases.","code":""},{"path":[]},{"path":"/reference/start_aifeducation_studio.html","id":null,"dir":"Reference","previous_headings":"","what":"Aifeducation Studio — start_aifeducation_studio","title":"Aifeducation Studio — start_aifeducation_studio","text":"Functions starts shiny app represents Aifeducation Studio","code":""},{"path":"/reference/start_aifeducation_studio.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Aifeducation Studio — start_aifeducation_studio","text":"","code":"start_aifeducation_studio()"},{"path":"/reference/start_aifeducation_studio.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Aifeducation Studio — start_aifeducation_studio","text":"function nothing return. used start shiny app.","code":""},{"path":"/reference/summarize_tracked_sustainability.html","id":null,"dir":"Reference","previous_headings":"","what":"Summarizing tracked sustainability data — summarize_tracked_sustainability","title":"Summarizing tracked sustainability data — summarize_tracked_sustainability","text":"Function summarizing tracked sustainability data tracker python library 'codecarbon'.","code":""},{"path":"/reference/summarize_tracked_sustainability.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Summarizing tracked sustainability data — summarize_tracked_sustainability","text":"","code":"summarize_tracked_sustainability(sustainability_tracker)"},{"path":"/reference/summarize_tracked_sustainability.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Summarizing tracked sustainability data — summarize_tracked_sustainability","text":"sustainability_tracker Object class codecarbon.emissions_tracker.OfflineEmissionsTracker python library codecarbon.","code":""},{"path":"/reference/summarize_tracked_sustainability.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Summarizing tracked sustainability data — summarize_tracked_sustainability","text":"Returns list contains tracked sustainability data.","code":""},{"path":[]},{"path":"/reference/test_classifier_sustainability.html","id":null,"dir":"Reference","previous_headings":"","what":"Sustainability data for an example classifier — test_classifier_sustainability","title":"Sustainability data for an example classifier — test_classifier_sustainability","text":"list length 5 containing used energy consumption co2 emissions classifier training. purpose data illustration vignettes.","code":""},{"path":"/reference/test_classifier_sustainability.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sustainability data for an example classifier — test_classifier_sustainability","text":"","code":"test_classifier_sustainability"},{"path":"/reference/test_classifier_sustainability.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Sustainability data for an example classifier — test_classifier_sustainability","text":"list","code":""},{"path":"/reference/test_metric_mean.html","id":null,"dir":"Reference","previous_headings":"","what":"Test metric for an example classifier — test_metric_mean","title":"Test metric for an example classifier — test_metric_mean","text":"matrix 4 rows 17 columns containing test metrics example classifier. purpose data illustration vignettes.","code":""},{"path":"/reference/test_metric_mean.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test metric for an example classifier — test_metric_mean","text":"","code":"test_metric_mean"},{"path":"/reference/test_metric_mean.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Test metric for an example classifier — test_metric_mean","text":"matrix","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":null,"dir":"Reference","previous_headings":"","what":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Abstract class neural nets 'keras'/'tensorflow' 'pytorch'.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Objects class used assigning texts classes/categories. creation training classifier object class EmbeddedText factor necessary. object class EmbeddedText contains numerical text representations (text embeddings) raw texts generated object class TextEmbeddingModel. factor contains classes/categories every text. Missing values (unlabeled cases) supported. predictions object class EmbeddedText used created text embedding model training.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"public-fields","dir":"Reference","previous_headings":"","what":"Public fields","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"model ('tensorflow_model()') Field storing tensorflow model loading. model_config ('list()') List storing information configuration model. information used predict new data. model_config$n_rec: Number recurrent layers. model_config$n_hidden: Number dense layers. model_config$target_levels: Levels target variable. change manually. model_config$input_variables: Order name input variables. change manually. model_config$init_config: List storing parameters passed method new(). last_training ('list()') List storing history results last training. information overwritten new training started. last_training$learning_time: Duration training process. config$history: History last training. config$data: Object class table storing initial frequencies passed data. config$data_pb:l Matrix storing number additional cases (test training) added balanced pseudo-labeling. rows refer folds final training. columns refer steps pseudo-labeling. config$data_bsc_test: Matrix storing number cases category used testing phase balanced synthetic units. Please note frequencies include original synthetic cases. case number original synthetic cases exceeds limit majority classes, frequency represents number cases created cluster analysis. config$date: Time last training finished. config$config: List storing kind estimation requested last training. config$config$use_bsc: TRUE balanced synthetic cases requested. FALSE . config$config$use_baseline: TRUE baseline estimation requested. FALSE . config$config$use_bpl: TRUE balanced, pseudo-labeling cases requested. FALSE . reliability ('list()') List storing central reliability measures last training. reliability$test_metric: Array containing reliability measures validation data every fold, method, step (case pseudo-labeling). reliability$test_metric_mean: Array containing reliability measures validation data every method step (case pseudo-labeling). values represent mean values every fold. reliability$raw_iota_objects: List containing iota_object generated package iotarelr every fold start end last training. reliability$raw_iota_objects$iota_objects_start: List objects class iotarelr_iota2 containing estimated iota reliability second generation baseline model every fold. estimation baseline model requested, list set NULL. reliability$raw_iota_objects$iota_objects_end: List objects class iotarelr_iota2 containing estimated iota reliability second generation final model every fold. Depending requested training method values refer baseline model, trained model basis balanced synthetic cases, balanced pseudo labeling combination balanced synthetic cases pseudo labeling. reliability$raw_iota_objects$iota_objects_start_free: List objects class iotarelr_iota2 containing estimated iota reliability second generation baseline model every fold. estimation baseline model requested, list set NULL.Please note model estimated without forcing Assignment Error Matrix line assumption weak superiority. reliability$raw_iota_objects$iota_objects_end_free: List objects class iotarelr_iota2 containing estimated iota reliability second generation final model every fold. Depending requested training method, values refer baseline model, trained model basis balanced synthetic cases, balanced pseudo-labeling combination balanced synthetic cases pseudo-labeling. Please note model estimated without forcing Assignment Error Matrix line assumption weak superiority. reliability$iota_object_start: Object class iotarelr_iota2 mean individual objects every fold. estimation baseline model requested, list set NULL. reliability$iota_object_start_free: Object class iotarelr_iota2 mean individual objects every fold. estimation baseline model requested, list set NULL. Please note model estimated without forcing Assignment Error Matrix line assumption weak superiority. reliability$iota_object_end: Object class iotarelr_iota2 mean individual objects every fold. Depending requested training method, object refers baseline model, trained model basis balanced synthetic cases, balanced pseudo-labeling combination balanced synthetic cases pseudo-labeling. reliability$iota_object_end_free: Object class iotarelr_iota2 mean individual objects every fold. Depending requested training method, object refers baseline model, trained model basis balanced synthetic cases, balanced pseudo-labeling combination balanced synthetic cases pseudo-labeling. Please note model estimated without forcing Assignment Error Matrix line assumption weak superiority. reliability$standard_measures_end: Object class list containing final measures precision, recall, f1 every fold. Depending requested training method, values refer baseline model, trained model basis balanced synthetic cases, balanced pseudo-labeling combination balanced synthetic cases pseudo-labeling. reliability$standard_measures_mean: matrix containing mean measures precision, recall, f1 end every fold.","code":""},{"path":[]},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"TextEmbeddingClassifierNeuralNet$new() TextEmbeddingClassifierNeuralNet$train() TextEmbeddingClassifierNeuralNet$predict() TextEmbeddingClassifierNeuralNet$check_embedding_model() TextEmbeddingClassifierNeuralNet$get_model_info() TextEmbeddingClassifierNeuralNet$get_text_embedding_model() TextEmbeddingClassifierNeuralNet$set_publication_info() TextEmbeddingClassifierNeuralNet$get_publication_info() TextEmbeddingClassifierNeuralNet$set_software_license() TextEmbeddingClassifierNeuralNet$get_software_license() TextEmbeddingClassifierNeuralNet$set_documentation_license() TextEmbeddingClassifierNeuralNet$get_documentation_license() TextEmbeddingClassifierNeuralNet$set_model_description() TextEmbeddingClassifierNeuralNet$get_model_description() TextEmbeddingClassifierNeuralNet$save_model() TextEmbeddingClassifierNeuralNet$load_model() TextEmbeddingClassifierNeuralNet$get_package_versions() TextEmbeddingClassifierNeuralNet$get_sustainability_data() TextEmbeddingClassifierNeuralNet$get_ml_framework() TextEmbeddingClassifierNeuralNet$clone()","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-new-","dir":"Reference","previous_headings":"","what":"Method new()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Creating new instance class.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$new( ml_framework = aifeducation_config$get_framework(), name = NULL, label = NULL, text_embeddings = NULL, targets = NULL, hidden = c(128), rec = c(128), self_attention_heads = 0, intermediate_size = NULL, attention_type = \"fourier\", add_pos_embedding = TRUE, rec_dropout = 0.1, repeat_encoder = 1, dense_dropout = 0.4, recurrent_dropout = 0.4, encoder_dropout = 0.1, optimizer = \"adam\" )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch' name Character Name new classifier. Please refer common name conventions. Free text can used parameter label. label Character Label new classifier. can use free text. text_embeddings object classTextEmbeddingModel. targets factor containing target values classifier. hidden vector containing number neurons dense layer. length vector determines number dense layers. want dense layer, set parameter NULL. rec vector containing number neurons recurrent layer. length vector determines number dense layers. want dense layer, set parameter NULL. self_attention_heads integer determining number attention heads self-attention layer. relevant attention_type=\"multihead\" intermediate_size int determining size projection layer within transformer encoder. attention_type string Choose relevant attention type. Possible values \"fourier\" multihead. add_pos_embedding bool TRUE positional embedding used. rec_dropout double ranging 0 lower 1, determining dropout bidirectional gru layers. repeat_encoder int determining many times encoder added network. dense_dropout double ranging 0 lower 1, determining dropout dense layers. recurrent_dropout double ranging 0 lower 1, determining recurrent dropout recurrent layer. relevant keras models. encoder_dropout double ranging 0 lower 1, determining dropout dense projection within encoder layers. optimizer Object class keras.optimizers.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns object class TextEmbeddingClassifierNeuralNet ready training.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-train-","dir":"Reference","previous_headings":"","what":"Method train()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method training neural net.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$train( data_embeddings, data_targets, data_n_test_samples = 5, balance_class_weights = TRUE, use_baseline = TRUE, bsl_val_size = 0.25, use_bsc = TRUE, bsc_methods = c(\"dbsmote\"), bsc_max_k = 10, bsc_val_size = 0.25, bsc_add_all = FALSE, use_bpl = TRUE, bpl_max_steps = 3, bpl_epochs_per_step = 1, bpl_dynamic_inc = FALSE, bpl_balance = FALSE, bpl_max = 1, bpl_anchor = 1, bpl_min = 0, bpl_weight_inc = 0.02, bpl_weight_start = 0, bpl_model_reset = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, epochs = 40, batch_size = 32, dir_checkpoint, trace = TRUE, keras_trace = 2, pytorch_trace = 2, n_cores = 2 )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"data_embeddings Object class TextEmbeddingModel. data_targets Factor containing labels cases stored data_embeddings. Factor must named use names used data_embeddings. data_n_test_samples int determining number cross-fold samples. balance_class_weights bool TRUE class weights generated based frequencies training data method Inverse Class Frequency'. FALSE class weight 1. use_baseline bool TRUE calculation baseline model requested. option relevant use_bsc=TRUE use_pbl=TRUE. FALSE, baseline model calculated. bsl_val_size double 0 1, indicating proportion cases class used validation sample estimation baseline model. remaining cases part training data. use_bsc bool TRUE estimation integrate balanced synthetic cases. FALSE . bsc_methods vector containing methods generating synthetic cases via 'smotefamily'. Multiple methods can passed. Currently bsc_methods=c(\"adas\"), bsc_methods=c(\"smote\") bsc_methods=c(\"dbsmote\") possible. bsc_max_k int determining maximal number k used creating synthetic units. bsc_val_size double 0 1, indicating proportion cases class used validation sample estimation synthetic cases. bsc_add_all bool FALSE synthetic cases necessary fill gab class major class added data. TRUE generated synthetic cases added data. use_bpl bool TRUE estimation integrate balanced pseudo-labeling. FALSE . bpl_max_steps int determining maximum number steps pseudo-labeling. bpl_epochs_per_step int Number training epochs within every step. bpl_dynamic_inc bool TRUE, specific percentage cases included step. percentage determined \\(step/bpl_max_steps\\). FALSE, cases used. bpl_balance bool TRUE, number cases every category/class pseudo-labeled data used training. , number cases determined minor class/category. bpl_max double 0 1, setting maximal level confidence considering case pseudo-labeling. bpl_anchor double 0 1 indicating reference point sorting new cases every label. See notes details. bpl_min double 0 1, setting minimal level confidence considering case pseudo-labeling. bpl_weight_inc double value much sample weights increased cases pseudo-labels every step. bpl_weight_start dobule Starting value weights unlabeled cases. bpl_model_reset bool TRUE, model re-initialized every step. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. epochs int Number training epochs. batch_size int Size batches. dir_checkpoint string Path directory checkpoint training saved. directory exist, created. trace bool TRUE, information estimation phase printed console. keras_trace int keras_trace=0 print information training process keras console. pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_trace=2 prints one line information every epoch. n_cores int Number cores used creating synthetic units.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"bsc_max_k: values 2 bsc_max_k successively used. number bsc_max_k high, value reduced number allows calculating synthetic units. bpl_anchor: help value, new cases sorted. aim, distance anchor calculated cases arranged ascending order.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-1","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. changes object trained classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-predict-","dir":"Reference","previous_headings":"","what":"Method predict()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method predicting new data trained neural net.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$predict(newdata, batch_size = 32, verbose = 1)"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-2","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"newdata Object class TextEmbeddingModel data.frame predictions made. batch_size int Size batches. verbose int verbose=0 cat information training process keras console. verbose=1 prints progress bar. verbose=2 prints one line information every epoch.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-2","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns data.frame containing predictions probabilities different labels case.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-check-embedding-model-","dir":"Reference","previous_headings":"","what":"Method check_embedding_model()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method checking provided text embeddings created TextEmbeddingModel classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$check_embedding_model(text_embeddings)"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-3","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"text_embeddings Object class EmbeddedText.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-3","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"TRUE underlying TextEmbeddingModel . FALSE models differ.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-model-info-","dir":"Reference","previous_headings":"","what":"Method get_model_info()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting model information","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-4","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_model_info()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-4","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"list relevant model information","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-text-embedding-model-","dir":"Reference","previous_headings":"","what":"Method get_text_embedding_model()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting text embedding model information","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-5","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_text_embedding_model()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-5","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"list relevant model information text embedding model underlying classifier","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-set-publication-info-","dir":"Reference","previous_headings":"","what":"Method set_publication_info()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method setting publication information classifier","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-6","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$set_publication_info( authors, citation, url = NULL )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-4","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"authors List authors. citation Free text citation. url URL corresponding homepage.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-6","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used setting private members publication information.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-publication-info-","dir":"Reference","previous_headings":"","what":"Method get_publication_info()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting bibliographic information classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-7","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_publication_info()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-7","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"list saved bibliographic information.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-set-software-license-","dir":"Reference","previous_headings":"","what":"Method set_software_license()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method setting license classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-8","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$set_software_license(license = \"GPL-3\")"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-5","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-8","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used setting private member software license model.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-software-license-","dir":"Reference","previous_headings":"","what":"Method get_software_license()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method getting license classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-9","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_software_license()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-6","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-9","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"string representing license software.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-set-documentation-license-","dir":"Reference","previous_headings":"","what":"Method set_documentation_license()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method setting license classifier's documentation.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-10","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$set_documentation_license( license = \"CC BY-SA\" )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-7","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-10","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used setting private member documentation license model.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-documentation-license-","dir":"Reference","previous_headings":"","what":"Method get_documentation_license()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method getting license classifier's documentation.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-11","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_documentation_license()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-8","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-11","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns license string.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-set-model-description-","dir":"Reference","previous_headings":"","what":"Method set_model_description()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method setting description classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-12","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$set_model_description( eng = NULL, native = NULL, abstract_eng = NULL, abstract_native = NULL, keywords_eng = NULL, keywords_native = NULL )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-9","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"eng string text describing training learner, theoretical empirical background, different output labels English. native string text describing training learner, theoretical empirical background, different output labels native language classifier. abstract_eng string text providing summary description English. abstract_native string text providing summary description native language classifier. keywords_eng vector keyword English. keywords_native vector keyword native language classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-12","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used setting private members description model.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-model-description-","dir":"Reference","previous_headings":"","what":"Method get_model_description()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting model description.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-13","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_model_description()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-13","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"list description classifier English native language.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-save-model-","dir":"Reference","previous_headings":"","what":"Method save_model()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method saving model 'Keras v3 format', 'tensorflow' SavedModel format h5 format.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-14","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$save_model(dir_path, save_format = \"default\")"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-10","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"dir_path string() Path directory model saved. save_format Format saving model. 'tensorflow'/'keras' models \"keras\" 'Keras v3 format', \"tf\" SavedModel \"h5\" HDF5. 'pytorch' models \"safetensors\" 'safetensors' \"pt\" 'pytorch' via pickle. Use \"default\" standard format. keras 'tensorflow'/'keras' models safetensors 'pytorch' models.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-14","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. saves model disk.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-load-model-","dir":"Reference","previous_headings":"","what":"Method load_model()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method importing model 'Keras v3 format', 'tensorflow' SavedModel format h5 format.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-15","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$load_model(dir_path, ml_framework = \"auto\")"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-11","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"dir_path string() Path directory model saved. ml_framework string Determines machine learning framework using model. Possible ml_framework=\"pytorch\" 'pytorch', ml_framework=\"tensorflow\" 'tensorflow', ml_framework=\"auto\".","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-15","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used load weights model.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-package-versions-","dir":"Reference","previous_headings":"","what":"Method get_package_versions()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting summary R python packages' versions used creating classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-16","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_package_versions()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-16","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns list containing versions relevant R python packages.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-sustainability-data-","dir":"Reference","previous_headings":"","what":"Method get_sustainability_data()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting summary tracked energy consumption training estimate resulting CO2 equivalents kg.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-17","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_sustainability_data()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-17","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns list containing tracked energy consumption, CO2 equivalents kg, information tracker used, technical information training infrastructure.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-ml-framework-","dir":"Reference","previous_headings":"","what":"Method get_ml_framework()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting machine learning framework used classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-18","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_ml_framework()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-18","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns string describing machine learning framework used classifier","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"objects class cloneable method.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-19","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$clone(deep = FALSE)"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-12","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"deep Whether make deep clone.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":null,"dir":"Reference","previous_headings":"","what":"Text embedding model — TextEmbeddingModel","title":"Text embedding model — TextEmbeddingModel","text":"R6 class stores text embedding model can used tokenize, encode, decode, embed raw texts. object provides unique interface different text processing methods.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Text embedding model — TextEmbeddingModel","text":"Objects class TextEmbeddingModel transform raw texts numerical representations can used downstream tasks. aim objects class allow tokenize raw texts, encode tokens sequences integers, decode sequences integers back tokens.","code":""},{"path":[]},{"path":"/reference/TextEmbeddingModel.html","id":"public-fields","dir":"Reference","previous_headings":"","what":"Public fields","title":"Text embedding model — TextEmbeddingModel","text":"last_training ('list()') List storing history results last training. information overwritten new training started.","code":""},{"path":[]},{"path":"/reference/TextEmbeddingModel.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"Text embedding model — TextEmbeddingModel","text":"TextEmbeddingModel$new() TextEmbeddingModel$load_model() TextEmbeddingModel$save_model() TextEmbeddingModel$encode() TextEmbeddingModel$decode() TextEmbeddingModel$get_special_tokens() TextEmbeddingModel$embed() TextEmbeddingModel$fill_mask() TextEmbeddingModel$set_publication_info() TextEmbeddingModel$get_publication_info() TextEmbeddingModel$set_software_license() TextEmbeddingModel$get_software_license() TextEmbeddingModel$set_documentation_license() TextEmbeddingModel$get_documentation_license() TextEmbeddingModel$set_model_description() TextEmbeddingModel$get_model_description() TextEmbeddingModel$get_model_info() TextEmbeddingModel$get_package_versions() TextEmbeddingModel$get_basic_components() TextEmbeddingModel$get_bow_components() TextEmbeddingModel$get_transformer_components() TextEmbeddingModel$get_sustainability_data() TextEmbeddingModel$get_ml_framework() TextEmbeddingModel$clone()","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-new-","dir":"Reference","previous_headings":"","what":"Method new()","title":"Text embedding model — TextEmbeddingModel","text":"Method creating new text embedding model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$new( model_name = NULL, model_label = NULL, model_version = NULL, model_language = NULL, method = NULL, ml_framework = aifeducation_config$get_framework()$TextEmbeddingFramework, max_length = 0, chunks = 1, overlap = 0, emb_layer_min = \"middle\", emb_layer_max = \"2_3_layer\", emb_pool_type = \"average\", model_dir, bow_basic_text_rep, bow_n_dim = 10, bow_n_cluster = 100, bow_max_iter = 500, bow_max_iter_cluster = 500, bow_cr_criterion = 1e-08, bow_learning_rate = 1e-08, trace = FALSE )"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"model_name string containing name new model. model_label string containing label/title new model. model_version string version model. model_language string containing language model represents (e.g., English). method string determining kind embedding model. Currently following models supported: method=\"bert\" Bidirectional Encoder Representations Transformers (BERT), method=\"roberta\" Robustly Optimized BERT Pretraining Approach (RoBERTa), method=\"longformer\" Long-Document Transformer, method=\"funnel\" Funnel-Transformer, method=\"deberta_v2\" Decoding-enhanced BERT Disentangled Attention (DeBERTa V2), method=\"glove\" GlobalVector Clusters, method=\"lda\" topic modeling. See details information. ml_framework string Framework use model. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. relevant transformer models. max_length int determining maximum length token sequences used transformer models. relevant methods. chunks int Maximum number chunks. relevant transformer models. overlap int determining number tokens added beginning next chunk. relevant BERT models. emb_layer_min int string determining first layer included creation embeddings. integer correspondents layer number. first layer number 1. Instead integer following strings possible: \"start\" first layer, \"middle\" middle layer, \"2_3_layer\" layer two-third layer, \"last\" last layer. emb_layer_max int string determining last layer included creation embeddings. integer correspondents layer number. first layer number 1. Instead integer following strings possible: \"start\" first layer, \"middle\" middle layer, \"2_3_layer\" layer two-third layer, \"last\" last layer. emb_pool_type string determining method pooling token embeddings within layer. \"cls\" embedding CLS token used. \"average\" token embedding tokens averaged (excluding padding tokens). model_dir string path directory BERT model stored. bow_basic_text_rep object class basic_text_rep created via function bow_pp_create_basic_text_rep. relevant method=\"glove_cluster\" method=\"lda\". bow_n_dim int Number dimensions GlobalVector number topics LDA. bow_n_cluster int Number clusters created basis GlobalVectors. Parameter relevant method=\"lda\" method=\"bert\" bow_max_iter int Maximum number iterations fitting GlobalVectors Topic Models. bow_max_iter_cluster int Maximum number iterations fitting cluster method=\"glove\". bow_cr_criterion double convergence criterion GlobalVectors. bow_learning_rate double initial learning rate GlobalVectors. trace bool TRUE prints information progress. FALSE .","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Text embedding model — TextEmbeddingModel","text":"method: case method=\"bert\", method=\"roberta\", method=\"longformer\", pretrained transformer model must supplied via model_dir. method=\"glove\" method=\"lda\" new model created based data provided via bow_basic_text_rep. original algorithm GlobalVectors provides word embeddings, text embeddings. achieve text embeddings words clustered based word embeddings kmeans.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns object class TextEmbeddingModel.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-load-model-","dir":"Reference","previous_headings":"","what":"Method load_model()","title":"Text embedding model — TextEmbeddingModel","text":"Method loading transformers model R.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$load_model(model_dir, ml_framework = \"auto\")"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"model_dir string containing path relevant model directory. ml_framework string Determines machine learning framework using model. Possible ml_framework=\"pytorch\" 'pytorch', ml_framework=\"tensorflow\" 'tensorflow', ml_framework=\"auto\".","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-1","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used loading saved transformer model R interface.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-save-model-","dir":"Reference","previous_headings":"","what":"Method save_model()","title":"Text embedding model — TextEmbeddingModel","text":"Method saving transformer model disk.Relevant transformer models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$save_model(model_dir, save_format = \"default\")"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-2","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"model_dir string containing path relevant model directory. save_format Format saving model. 'tensorflow'/'keras' models \"h5\" HDF5. 'pytorch' models \"safetensors\" 'safetensors' \"pt\" 'pytorch' via pickle. Use \"default\" standard format. h5 'tensorflow'/'keras' models safetensors 'pytorch' models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-2","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used saving transformer model disk.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-encode-","dir":"Reference","previous_headings":"","what":"Method encode()","title":"Text embedding model — TextEmbeddingModel","text":"Method encoding words raw texts integers.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$encode( raw_text, token_encodings_only = FALSE, to_int = TRUE, trace = FALSE )"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-3","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"raw_text vector containing raw texts. token_encodings_only bool TRUE, token encodings returned. FALSE, complete encoding returned important BERT models. to_int bool TRUE integer ids tokens returned. FALSE tokens returned. Argument applies transformer models token_encodings_only==TRUE. trace bool TRUE, information progress printed. FALSE requested.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-3","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list containing integer sequences raw texts special tokens.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-decode-","dir":"Reference","previous_headings":"","what":"Method decode()","title":"Text embedding model — TextEmbeddingModel","text":"Method decoding sequence integers tokens","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-4","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$decode(int_seqence, to_token = FALSE)"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-4","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"int_seqence list containing integer sequences transformed tokens plain text. to_token bool FALSE plain text returned. TRUE sequence tokens returned. Argument relevant model based transformer.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-4","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list token sequences","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-special-tokens-","dir":"Reference","previous_headings":"","what":"Method get_special_tokens()","title":"Text embedding model — TextEmbeddingModel","text":"Method receiving special tokens model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-5","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_special_tokens()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-5","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns matrix containing special tokens rows type, token, id columns.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-embed-","dir":"Reference","previous_headings":"","what":"Method embed()","title":"Text embedding model — TextEmbeddingModel","text":"Method creating text embeddings raw texts case using GPU running memory reduce batch size restart R switch use cpu via set_config_cpu_only.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-6","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$embed( raw_text = NULL, doc_id = NULL, batch_size = 8, trace = FALSE )"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-5","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"raw_text vector containing raw texts. doc_id vector containing corresponding IDs every text. batch_size int determining maximal size every batch. trace bool TRUE, information progression printed console.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-6","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Method returns R6 object class EmbeddedText. object contains embeddings data.frame information model creating embeddings.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-fill-mask-","dir":"Reference","previous_headings":"","what":"Method fill_mask()","title":"Text embedding model — TextEmbeddingModel","text":"Method calculating tokens behind mask tokens.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-7","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$fill_mask(text, n_solutions = 5)"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-6","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"text string Text containing mask tokens. n_solutions int Number estimated tokens every mask.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-7","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list containing data.frame every mask. data.frame contains solutions rows reports score, token id, token string columns.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-set-publication-info-","dir":"Reference","previous_headings":"","what":"Method set_publication_info()","title":"Text embedding model — TextEmbeddingModel","text":"Method setting bibliographic information model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-8","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$set_publication_info(type, authors, citation, url = NULL)"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-7","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"type string Type information changed/added. type=\"developer\", type=\"modifier\" possible. authors List people. citation string Citation free text. url string Corresponding URL applicable.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-8","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used set private members publication information model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-publication-info-","dir":"Reference","previous_headings":"","what":"Method get_publication_info()","title":"Text embedding model — TextEmbeddingModel","text":"Method getting bibliographic information model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-9","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_publication_info()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-9","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list bibliographic information.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-set-software-license-","dir":"Reference","previous_headings":"","what":"Method set_software_license()","title":"Text embedding model — TextEmbeddingModel","text":"Method setting license model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-10","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$set_software_license(license = \"GPL-3\")"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-8","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-10","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used setting private member software license model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-software-license-","dir":"Reference","previous_headings":"","what":"Method get_software_license()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting license model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-11","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_software_license()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-11","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"string License model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-set-documentation-license-","dir":"Reference","previous_headings":"","what":"Method set_documentation_license()","title":"Text embedding model — TextEmbeddingModel","text":"Method setting license models' documentation.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-12","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$set_documentation_license(license = \"CC BY-SA\")"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-9","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-12","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used set private member documentation license model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-documentation-license-","dir":"Reference","previous_headings":"","what":"Method get_documentation_license()","title":"Text embedding model — TextEmbeddingModel","text":"Method getting license models' documentation.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-13","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_documentation_license()"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-10","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-set-model-description-","dir":"Reference","previous_headings":"","what":"Method set_model_description()","title":"Text embedding model — TextEmbeddingModel","text":"Method setting description model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-14","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$set_model_description( eng = NULL, native = NULL, abstract_eng = NULL, abstract_native = NULL, keywords_eng = NULL, keywords_native = NULL )"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-11","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"eng string text describing training classifier, theoretical empirical background, different output labels English. native string text describing training classifier, theoretical empirical background, different output labels native language model. abstract_eng string text providing summary description English. abstract_native string text providing summary description native language classifier. keywords_eng vector keywords English. keywords_native vector keywords native language classifier.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-13","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used set private members description model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-model-description-","dir":"Reference","previous_headings":"","what":"Method get_model_description()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting model description.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-15","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_model_description()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-14","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list description model English native language.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-model-info-","dir":"Reference","previous_headings":"","what":"Method get_model_info()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting model information","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-16","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_model_info()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-15","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list relevant model information","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-package-versions-","dir":"Reference","previous_headings":"","what":"Method get_package_versions()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting summary R python packages' versions used creating classifier.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-17","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_package_versions()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-16","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list containing versions relevant R python packages.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-basic-components-","dir":"Reference","previous_headings":"","what":"Method get_basic_components()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting part interface's configuration necessary models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-18","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_basic_components()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-17","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-bow-components-","dir":"Reference","previous_headings":"","what":"Method get_bow_components()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting part interface's configuration necessary bag--words models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-19","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_bow_components()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-18","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-transformer-components-","dir":"Reference","previous_headings":"","what":"Method get_transformer_components()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting part interface's configuration necessary transformer models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-20","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_transformer_components()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-19","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-sustainability-data-","dir":"Reference","previous_headings":"","what":"Method get_sustainability_data()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting log tracked energy consumption training estimate resulting CO2 equivalents kg.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-21","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_sustainability_data()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-20","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns matrix containing tracked energy consumption, CO2 equivalents kg, information tracker used, technical information training infrastructure every training run.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-ml-framework-","dir":"Reference","previous_headings":"","what":"Method get_ml_framework()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting machine learning framework used classifier.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-22","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_ml_framework()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-21","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns string describing machine learning framework used classifier","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"Text embedding model — TextEmbeddingModel","text":"objects class cloneable method.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-23","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$clone(deep = FALSE)"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-12","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"deep Whether make deep clone.","code":""},{"path":"/reference/to_categorical_c.html","id":null,"dir":"Reference","previous_headings":"","what":"Transforming classes to one-hot encoding — to_categorical_c","title":"Transforming classes to one-hot encoding — to_categorical_c","text":"Function written C++ transforming vector classes (int) binary class matrix.","code":""},{"path":"/reference/to_categorical_c.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Transforming classes to one-hot encoding — to_categorical_c","text":"","code":"to_categorical_c(class_vector, n_classes)"},{"path":"/reference/to_categorical_c.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Transforming classes to one-hot encoding — to_categorical_c","text":"class_vector vector containing integers every class. integers must range 0 n_classes-1. n_classes int Total number classes.","code":""},{"path":"/reference/to_categorical_c.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Transforming classes to one-hot encoding — to_categorical_c","text":"Returns matrix containing binary representation every class.","code":""},{"path":[]},{"path":"/reference/train_tune_bert_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a BERT model — train_tune_bert_model","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"function can used train fine-tune transformer based BERT architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_bert_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"","code":"train_tune_bert_model( ml_framework = aifeducation_config$get_framework(), output_dir, model_dir_path, raw_texts, p_mask = 0.15, whole_word = TRUE, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, full_sequences_only = FALSE, min_seq_len = 50, learning_rate = 0.003, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_bert_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. whole_word bool TRUE whole word masking applied. FALSE token masking used. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. full_sequences_only bool TRUE using chunks sequence length equal chunk_size. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. learning_rate double Learning rate adam optimizer. n_workers int Number workers. relevant ml_framework=\"tensorflow\". multi_process bool TRUE multiple processes activated. relevant ml_framework=\"tensorflow\". sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. relevant ml_framework=\"tensorflow\". pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_bert_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_bert_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"models uses WordPiece Tokenizer like BERT can trained whole word masking. Transformer library may show warning can ignored. Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_bert_model. Training model makes use dynamic masking contrast original paper static masking applied.","code":""},{"path":"/reference/train_tune_bert_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171--4186). Association Computational Linguistics. doi:10.18653/v1/N19-1423 Hugging Face documentation https://huggingface.co/docs/transformers/model_doc/bert#transformers.TFBertForMaskedLM","code":""},{"path":[]},{"path":"/reference/train_tune_deberta_v2_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"function can used train fine-tune transformer based DeBERTa-V2 architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_deberta_v2_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"","code":"train_tune_deberta_v2_model( ml_framework = aifeducation_config$get_framework(), output_dir, model_dir_path, raw_texts, p_mask = 0.15, whole_word = TRUE, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, full_sequences_only = FALSE, min_seq_len = 50, learning_rate = 0.03, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_deberta_v2_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. whole_word bool TRUE whole word masking applied. FALSE token masking used. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. full_sequences_only bool TRUE using chunks sequence length equal chunk_size. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. learning_rate bool Learning rate adam optimizer. n_workers int Number workers. relevant ml_framework=\"tensorflow\". multi_process bool TRUE multiple processes activated. relevant ml_framework=\"tensorflow\". sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. relevant ml_framework=\"tensorflow\". pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_deberta_v2_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_deberta_v2_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_deberta_v2_model. Training model makes use dynamic masking.","code":""},{"path":"/reference/train_tune_deberta_v2_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":", P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. doi:10.48550/arXiv.2006.03654 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/deberta-v2#debertav2","code":""},{"path":[]},{"path":"/reference/train_tune_funnel_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"function can used train fine-tune transformer based Funnel Transformer architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_funnel_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"","code":"train_tune_funnel_model( ml_framework = aifeducation_config$get_framework(), output_dir, model_dir_path, raw_texts, p_mask = 0.15, whole_word = TRUE, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, min_seq_len = 50, full_sequences_only = FALSE, learning_rate = 0.003, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_funnel_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. whole_word bool TRUE whole word masking applied. FALSE token masking used. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. full_sequences_only bool TRUE token sequences length equal chunk_size used training. learning_rate double Learning rate adam optimizer. n_workers int Number workers. multi_process bool TRUE multiple processes activated. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_funnel_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_funnel_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"aug_vocab_by > 0 raw text used training WordPiece tokenizer. end process, additional entries added vocabulary part original vocabulary. experimental state. Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_funnel_model. Training model makes use dynamic masking.","code":""},{"path":"/reference/train_tune_funnel_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. doi:10.48550/arXiv.2006.03236 Hugging Face documentation https://huggingface.co/docs/transformers/model_doc/funnel#funnel-transformer","code":""},{"path":[]},{"path":"/reference/train_tune_longformer_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"function can used train fine-tune transformer based Longformer architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_longformer_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"","code":"train_tune_longformer_model( ml_framework = aifeducation_config$get_framework, output_dir, model_dir_path, raw_texts, p_mask = 0.15, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, full_sequences_only = FALSE, min_seq_len = 50, learning_rate = 0.03, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_longformer_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. full_sequences_only bool TRUE using chunks sequence length equal chunk_size. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. learning_rate bool Learning rate adam optimizer. n_workers int Number workers. relevant ml_framework=\"tensorflow\". multi_process bool TRUE multiple processes activated. relevant ml_framework=\"tensorflow\". sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. relevant ml_framework=\"tensorflow\". pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_longformer_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_longformer_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_roberta_model. Training model makes use dynamic masking.","code":""},{"path":"/reference/train_tune_longformer_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. doi:10.48550/arXiv.2004.05150 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/longformer#transformers.LongformerConfig","code":""},{"path":[]},{"path":"/reference/train_tune_roberta_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"function can used train fine-tune transformer based RoBERTa architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_roberta_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"","code":"train_tune_roberta_model( ml_framework = aifeducation_config$get_framework(), output_dir, model_dir_path, raw_texts, p_mask = 0.15, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, full_sequences_only = FALSE, min_seq_len = 50, learning_rate = 0.03, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_roberta_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. full_sequences_only bool TRUE using chunks sequence length equal chunk_size. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. learning_rate bool Learning rate adam optimizer. n_workers int Number workers. relevant ml_framework=\"tensorflow\". multi_process bool TRUE multiple processes activated. relevant ml_framework=\"tensorflow\". sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. relevant ml_framework=\"tensorflow\". pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_roberta_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_roberta_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_roberta_model. Training model makes use dynamic masking.","code":""},{"path":"/reference/train_tune_roberta_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. doi:10.48550/arXiv.1907.11692 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/roberta#transformers.RobertaConfig","code":""},{"path":[]},{"path":"/reference/update_aifeducation_progress_bar.html","id":null,"dir":"Reference","previous_headings":"","what":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","title":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","text":"function updates master progress bar aifeducation shiny app. progress bar reports current state overall process.","code":""},{"path":"/reference/update_aifeducation_progress_bar.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","text":"","code":"update_aifeducation_progress_bar(value, total, title = NULL)"},{"path":"/reference/update_aifeducation_progress_bar.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","text":"value int Value describing current step process. total int Total number steps process. title string Title displaying top progress bar.","code":""},{"path":"/reference/update_aifeducation_progress_bar.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","text":"Function nothing returns. updates progress bar id \"pgr_bar_aifeducation\".","code":""},{"path":[]},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":null,"dir":"Reference","previous_headings":"","what":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"function updates epoch progress bar aifeducation shiny app. progress bar reports current state overall process.","code":""},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"","code":"update_aifeducation_progress_bar_epochs(value, total, title = NULL)"},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"value int Value describing current step process. total int Total number steps process. title string Title displaying top progress bar.","code":""},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"Function nothing returns. updates progress bar id \"pgr_bar_aifeducation_epochs\".","code":""},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"function called often training model. Thus, function check requirements updating progress bar reduce computational time. check fulfilling necessary conditions must implemented separately.","code":""},{"path":[]},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":null,"dir":"Reference","previous_headings":"","what":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"function updates step/batch progress bar aifeducation shiny app. progress bar reports current state overall process.","code":""},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"","code":"update_aifeducation_progress_bar_steps(value, total, title = NULL)"},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"value int Value describing current step process. total int Total number steps process. title string Title displaying top progress bar.","code":""},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"Function nothing returns. updates progress bar id \"pgr_bar_aifeducation_steps\".","code":""},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"function called often training model. Thus, function check requirements updating progress bar reduce computational time. check fulfilling necessary conditions must implemented separately.","code":""},{"path":[]},{"path":"/news/index.html","id":"aifeducation-033","dir":"Changelog","previous_headings":"","what":"aifeducation 0.3.3","title":"aifeducation 0.3.3","text":"Graphical User Interface Aifeducation Studio Fixed bug concerning ids .pdf .csv files. Now ids correctly saved within text collection file. Fixed bug checking selection least one file type creation text collection. TextEmbeddingClassifiers Fixed process checking TextEmbeddingModels compatible. Python Installation Fixed bug caused installation incompatible versions keras Tensorflow. Changes Removed quanteda.textmodels necessary library testing package. Added dataset testing package based Maas et al. (2011).","code":""},{"path":"/news/index.html","id":"aifeducation-032","dir":"Changelog","previous_headings":"","what":"aifeducation 0.3.2","title":"aifeducation 0.3.2","text":"CRAN release: 2024-03-15 TextEmbeddingClassifiers Fixed bug GlobalAveragePooling1D_PT. Now layer makes correct pooling. change effect PyTorch models trained version 0.3.1. TextEmbeddingModel Replaced parameter ‘aggregation’ three new parameters allowing explicitly choose start end layer included creation embeddings. Furthermore, two options pooling method within layer added (“cls” “average”). Added support reporting training validation loss training corresponding base model. Transformer Models Fixed bug creation transformer models except funnel. Now choosing number layers working. file ‘history.log’ now saved within model’s folder reporting loss validation loss training epoch. EmbeddedText Changed process validating EmbeddedTexts compatible. Now model’s unique name used validation. Added new fields updated methods account new options creating embeddings (layer selection pooling type). Graphical User Interface Aifeducation Studio Adapted interface according changes made version. Improved read raw texts. Reading now reduces multiple spaces characters one single space character. Hyphenation removed. Python Installation Updated installation account new version keras.","code":""},{"path":"/news/index.html","id":"aifeducation-031","dir":"Changelog","previous_headings":"","what":"aifeducation 0.3.1","title":"aifeducation 0.3.1","text":"CRAN release: 2024-02-18 Graphical User Interface Aifeducation Studio Added shiny app package serves graphical user interface. Transformer Models Fixed bug transformers except BERT concerning unk_token. Switched SentencePiece tokenizer WordPiece tokenizer DeBERTa_V2. Add possibility train DeBERTa_V2 FunnelTransformer models Whole Word Masking. TextEmbeddingModel Added method ‘fill-mask’. Added new argument method ‘encode’, allowing chose encoding token ids token strings. Added new argument method ‘decode’, allowing chose decoding single tokens plain text. Fixed bug embedding texts using pytorch. fix decrease computational time enables gpu support (available machine). Fixed two missing columns saving results sustainability tracking machines without gpu. Implemented advantages datasets python library ‘datasets’ increasing computational speed allowing use large datasets. TextEmbeddingClassifiers Adding support pytorch without need kerasV3 keras-core. Classifiers pytorch now implemented native pytorch. Changed architecture new classifiers extended abilities neural nets adding possibility add positional embedding. Changed architecture new classifiers extended abilities neural nets adding alternative method self-attention mechanism via fourier transformation (similar FNet). Added balanced_accuracy new metric determining state model predicts classes best. Fixed error training history saved correctly. Added record metric test dataset training history pytorch. Added option balance class weights calculating training loss according Inverse Frequency method. Balance class weights activated default. Added method checking compatibility underlying TextEmbeddingModels classifier object class EmbeddedText. Added precision, recall, f1-score new metrics. Python Installation Added argument ‘install_py_modules’, allowing choose machine learning framework installed. Updated ‘check_aif_py_modules’. Changes Setting machine learning framework start session longer necessary. function setting global ml_framework remains active convenience. ml_framework can now switched time session. Updated documentation.","code":""},{"path":"/news/index.html","id":"aifeducation-030","dir":"Changelog","previous_headings":"","what":"aifeducation 0.3.0","title":"aifeducation 0.3.0","text":"CRAN release: 2023-10-10 Added DeBERTa Funnel-Transformer support. Fixed issues installing required python packages. Fixed issues training transformer models. Fixed issue calculating final iota values classifiers pseudo labeling active. Added support PyTorch Tensorflow transformer models. Added support PyTorch classifier objects via keras 3 future. Removed augmentation vocabulary training BERT models. Updated documentation. Changed reported values kappa.","code":""},{"path":"/news/index.html","id":"aifeducation-020","dir":"Changelog","previous_headings":"","what":"aifeducation 0.2.0","title":"aifeducation 0.2.0","text":"CRAN release: 2023-08-15 First release CRAN","code":""}] +[{"path":[]},{"path":"/articles/aifeducation.html","id":"introduction","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Introduction","title":"01 Get started","text":"Several packages allow users use machine learning directly R nnet single layer neural nets, rpart decision trees, ranger random forests. Furthermore, mlr3verse series packages exists managing different algorithms unified interface. packages can used ‘normal’ computer provide easy installation. terms natural language processing, approaches currently limited. State---art approaches rely neural nets multiple layers consist huge number parameters making computationally demanding. specialized libraries keras, PyTorch tensorflow, graphical processing units (gpu) can help speed computations significantly. However, many specialized libraries machine learning written python. Fortunately, interface python provided via R package reticulate. R package Artificial Intelligence Education (aifeducation) aims provide educators, educational researchers, social researchers convincing interface state---art models natural language processing tries address special needs challenges educational social sciences. package currently supports application Artificial Intelligence (AI) tasks text embedding, classification, question answering. Since state---art approaches natural language processing rely large models compared classical statistical methods (e.g., latent class analysis, structural equation modeling) based largely python, additional installation steps necessary. like train develop models AIs, compatible graphic device necessary. Even low performing graphic device can speed computations significantly. prefer using pre-trained models however, necessary. case ‘normal’ office computer without graphic device sufficient cases.","code":""},{"path":"/articles/aifeducation.html","id":"step-1---install-the-r-package","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 1 - Install the R Package","title":"01 Get started","text":"order use package, first need install . can done : command, necessary R packages installed machine.","code":"install.packages(\"aifeducation\")"},{"path":"/articles/aifeducation.html","id":"step-2---install-python","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 2 - Install Python","title":"01 Get started","text":"Since natural language processing neural nets based models computationally intensive, keras, PyTorch, tensorflow used within package together specialized python libraries. install , need install python machine first. may take time. can check everything working using function reticulate::py_available(). return TRUE.","code":"reticulate::install_python() reticulate::py_available(initialize = TRUE)"},{"path":"/articles/aifeducation.html","id":"step-3---install-miniconda","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 3 - Install Miniconda","title":"01 Get started","text":"next step install miniconda since aifeducation uses conda environments managing different modules.","code":"reticulate::install_miniconda()"},{"path":"/articles/aifeducation.html","id":"step-4---install-support-for-graphic-devices","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 4 - Install Support for Graphic Devices","title":"01 Get started","text":"PyTorch tensorflow underlying machine learning backend run MacOS, Linux, Windows. However, limitations accelerate computations graphic cards. following table provides overview. Table: Possible gpu acceleration operating system suitable machine like use graphic card computations need install software. can skip step. list links downloads can found like use tensorflow machine learning framework: https://www.tensorflow.org/install/pip#linux like use PyTorch framework can find information : https://pytorch.org/get-started/locally/ general need NVIDIA GPU Drivers CUDA Toolkit cuDNN SDK Except gpu drivers components installed step 5 automatically. like use Windows WSL (Windows Subsystem Linux) installing gpu acceleration complex topic. case please refer specific Windows Ubuntu documentations.","code":""},{"path":"/articles/aifeducation.html","id":"step-5---install-specialized-python-libraries","dir":"Articles","previous_headings":"1) Installation and Technical Requirements","what":"Step 5 - Install Specialized Python Libraries","title":"01 Get started","text":"everything working, can now install remaining python libraries. convenience, aifeducation comes auxiliary function install_py_modules() . install=\"\" can decide machine learning framework installed. Use install=\"\" request installation ‘PyTorch’ ‘tensorflow’. like install ‘PyTorch’ ‘tensorflow’ set install=\"pytorch\" install=\"tenorflow\". aifeducation version tensorflow 2.13 2.15 necessary. important call function loading package first time. load library without installing necessary modules error may occur. function installs following python modules: frameworks: - transformers, - tokenizers, - datasets, - codecarbon Pytorch - torch, - torcheval, - safetensors, - accelerate - pandas Tensorflow - keras, - tensorflow dependencies environment “aifeducation”. like use aifeducation packages within environments, please ensure python modules available. gpu support packages installed. check_aif_py_modules() can check, modules successfully installed specific machine learning framework. Now everything ready use package. Important note: start new R session, please note call reticulate::use_condaenv(condaenv = \"aifeducation\") loading library make python modules available work.","code":"#For Linux aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=FALSE, tf_version=\"<=2.15\", pytorch_cuda_version = \"12.1\" cpu_only=FALSE) #For Windows and MacOS aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=FALSE, tf_version=\"<=2.15\", pytorch_cuda_version = \"12.1\" cpu_only=TRUE) aifeducation::check_aif_py_modules(print=TRUE, check=\"pytorch\") aifeducation::check_aif_py_modules(print=TRUE, check=\"tensorflow\")"},{"path":"/articles/aifeducation.html","id":"configuration-of-tensorflow","dir":"Articles","previous_headings":"","what":"2) Configuration of Tensorflow","title":"01 Get started","text":"general, educators educational researchers neither access high performance computing computers performing graphic device work. Thus, additional configuration can done get computations working machine. use computer graphic device, like use cpu can disable graphic device support tensorflow function set_config_cpu_only(). Now machine uses cpus computations. machine graphic card limited memory, recommended change configuration memory usage set_config_gpu_low_memory() enables machine compute ‘large’ models limited resources. ‘small’ models, option relevant since decreases computational speed. Finally, cases might want disable tensorflow print information console. can change behavior function set_config_tf_logger(). can choose five levels “FATAL”, “ERROR”, “WARN”, “INFO”, “DEBUG”, setting minimal level logging.","code":"aifeducation::set_config_cpu_only() aifeducation::set_config_gpu_low_memory() aifeducation::set_config_tf_logger()"},{"path":"/articles/aifeducation.html","id":"starting-a-new-session","dir":"Articles","previous_headings":"","what":"3 Starting a New Session","title":"01 Get started","text":"can work aifeducation must set new R session. First, necessary load library. Second, must set python via reticulate. case installed python suggested vignette may start new session like : Next choose machine learning framework like use. can set framework complete session can change framework anytime session calling method passing framework ml_framework argument function method. Please note models available frameworks weights trained models shared across frameworks models. case like use tensorflow now good time configure backend, since configurations can done tensorflow used first time. Note: Please remember: Every time start new session R set correct conda environment, load library aifeducation, chose machine learning framework.","code":"reticulate::use_condaenv(condaenv = \"aifeducation\") library(aifeducation) set_transformers_logger(\"ERROR\") #For tensorflow aifeducation_config$set_global_ml_backend(\"tensorflow\") #For PyTorch aifeducation_config$set_global_ml_backend(\"pytorch\") #if you would like to use only cpus set_config_cpu_only() #if you have a graphic device with low memory set_config_gpu_low_memory() #if you would like to reduce the tensorflow output to errors set_config_os_environ_logger(level = \"ERROR\")"},{"path":"/articles/aifeducation.html","id":"tutorials-and-guides","dir":"Articles","previous_headings":"","what":"4) Tutorials and Guides","title":"01 Get started","text":"guide use graphical user interface can found vignette 02a classification tasks. short introduction package examples classification tasks can found vignette 02b classification tasks. Documenting sharing work described vignette 03 sharing using trained AI/models","code":""},{"path":"/articles/aifeducation.html","id":"update-aifeducation","dir":"Articles","previous_headings":"","what":"5) Update aifeducation","title":"01 Get started","text":"case already use aifeducation want update newer version package recommended update used python libraries. easiest way remove conda environment “aifeducation” install libraries fresh environment. can done setting remove_first=TRUE install_py_modules.","code":"#For Linux aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=TRUE, tf_version=\"<=2.14\", pytorch_cuda_version = \"12.1\" cpu_only=FALSE) #For Windows with gpu support aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=TRUE, tf_version=\"<=2.10\", pytorch_cuda_version = \"12.1\" cpu_only=FALSE) #For Windows without gpu support aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=TRUE, tf_version=\"<=2.14\", pytorch_cuda_version = \"12.1\" cpu_only=TRUE) #For MacOS aifeducation::install_py_modules(envname=\"aifeducation\", install=\"all\", remove_first=TRUE, tf_version=\"<=2.14\", pytorch_cuda_version = \"12.1\" cpu_only=TRUE)"},{"path":"/articles/classification_tasks.html","id":"introduction-and-overview","dir":"Articles","previous_headings":"","what":"1 Introduction and Overview","title":"02b Text Embedding and Classification Tasks","text":"educational social sciences, assignment observation scientific concepts important task allows researchers understand observation, generate new insights, derive recommendations research practice. educational science, several areas deal kind task. example, diagnosing students’ characteristics important aspect teachers’ profession necessary understand promote learning. Another example use learning analytics, data students used provide learning environments adapted individual needs. another level, educational institutions schools universities can use information data-driven performance decisions (Laurusson & White 2014) well improve . case, real-world observation aligned scientific models use scientific knowledge technology improved learning instruction. Supervised machine learning one concept allows link real-world observations existing scientific models theories (Berding et al. 2022). educational sciences great advantage allows researchers use existing knowledge insights applications AI. drawback approach training AI requires information real world observations information corresponding alignment scientific models theories. valuable source data educational science written texts, since textual data can found almost everywhere realm learning teaching (Berding et al. 2022). example, teachers often require students solve task provide written form. Students create solution tasks often document short-written essay presentation. data can used analyze learning teaching. Teachers’ written tasks students may provide insights quality instruction students’ solutions may provide insights learning outcomes prerequisites. AI can helpful assistant analyzing textual data since analysis textual data challenging time-consuming task humans. vignette, like show create AI can help tasks using package aifedcuation. Please note introduction content analysis, natural language processing machine learning beyond scope vignette. like learn , please refer cited literature. start necessary introduce definition understanding basic concepts since applying AI educational contexts means combine knowledge different scientific disciplines using different, sometimes overlapping concepts. Even within research area, concepts unified. Figure 1 illustrates package’s understanding. Since aifeducation looks application AI classification tasks perspective empirical method content analysis, overlapping concepts content analysis machine learning. content analysis, phenomenon like performance colors can described scale/dimension made several categories (e.g. Schreier 2012 pp. 59). example, exam’s performance (scale/dimension) “good”, “average” “poor”. terms colors (scale/dimension) categories “blue”, “green”, etc. Machine learning literature uses words describe kind data. machine learning, “scale” “dimension” correspond term “label” “categories” refer term “classes” (Chollet, Kalinowski & Allaire 2022, p. 114). clarifications, classification means text assigned correct category scale text labeled correct class. Figure 2 illustrates, two kinds data necessary train AI classify text line supervised machine learning principles. providing AI textual data input data corresponding information class target data, AI can learn texts imply specific class category. exam example, AI can learn texts imply “good”, “average” “poor” judgment. training, AI can applied new texts predict likely class every new text. generated class can used statistical analysis derive recommendations learning teaching. achieve support artificial intelligence, several steps necessary. Figure 3 provides overview integrating functions objects aifeducation. first step transform raw texts form computers can use. , raw texts must transformed numbers. modern approaches, usually done word embeddings. Campesato (2021, p. 102) describes “collective name set language modeling feature learning techniques (…) words phrases vocabulary mapped vectors real numbers.” definition word vector similar: „Word vectors represent semantic meaning words vectors context training corpus.” (Lane, Howard & Hapke 2019, p. 191) Campesato (2021, pp. 112) clusters approaches creating word embeddings three groups, reflecting ability provide context-sensitive numerical representations. Approaches group one account context. Typical methods rely bag--words assumptions. Thus, normally able provide word embedding single words. Group two consists approaches word2vec, GloVe (Pennington, Socher & Manning 2014) fastText, able provide one embedding word regardless context. Thus, account one context. last group consists approaches BERT (Devlin et al. 2019), able produce multiple word embeddings depending context words. different groups, aifedcuation implements several methods. Topic Modeling: Topic modeling approach uses frequencies tokens within text. frequencies tokens models observable variables one latent topic (Campesato 2021, p. 113). estimation topic model often based Latent Dirichlet Analysis (LDA) describes text distribution topics. topics described distribution words/tokens (Campesato 2021, p. 114). relationship texts, words, topics can used create text embedding computing relative amount every topic text based every token text. GlobalVectorClusters: GlobalVectors newer approach utilizes co-occurrence words/tokens compute GlobalVectors (Campesato 2021, p. 110). vectors generated way tokens/words similar meaning located close (Pennington, Socher & Manning 2014). order create text embedding word embeddings, aifeducation groups tokens clusters based vectors. Thus, tokens similar meaning members cluster. text embedding, tokens text counted every cluster frequencies every cluster text used numerical representation text. Transformers: Transformers current state---art approach many natural language tasks (Tunstall, von Werra & Wolf 2022, p. xv). help self-attention mechanism (Vaswani et al. 2017), able produce context-sensitive word embeddings (Chollet, Kalinowski & Allaire, 2022, pp. 366). approaches managed used unified interface provided object TextEmbeddingModel. object can easily convert raw texts numerical representation, can use different classification tasks time. makes possible reduce computational time. created text embedding stored object class EmbeddedText. object additionally contains information text embedding model created object. best case can apply existing text embedding model using transformer Huggingface using model colleagues. , aifeducation provides several functions allowing create models. Depending approach like use, different steps necessary. case Topic Modeling GlobalVectorClusters, must first create draft vocabulary two functions bow_pp_create_vocab_draft() bow_pp_create_basic_text_rep(). calling functions, determine central properties resulting model. case transformers, first configure train vocabulary create_xxx_model() next step can train model train_tune_xxx_model(). Every step explained next chapters. Please note xxx stands different architectures transformers supported aifedcuation. object class TextEmbeddingModel can create input data supervised machine learning process. Additionally, need target data must named factor containing classes/categories text. kinds data, able create new object class TextEmbeddingClassifierNeuralNet classifier. train classifier several options cover detail chapter 3. training classifier can share researchers apply new texts. Please note application new texts requires text transformed numbers exactly text embedding model passing text classifier. Please note: pass raw texts classifier, embedded texts work! next chapters, guide complete process, starting creation text embedding models. Please note creation new text embedding model necessary rely existing model rely pre-trained transformer.","code":""},{"path":[]},{"path":"/articles/classification_tasks.html","id":"starting-a-new-session","dir":"Articles","previous_headings":"","what":"2.1 Starting a New Session","title":"02b Text Embedding and Classification Tasks","text":"can work aifeducation must set new R session. First, necessary load library. Second, must set python via reticulate. case installed python suggested vignette 01 Get started may start new session like : Next choose machine learning framework like use. can set framework complete session Setting global machine learning framework convenience. can change framework time session calling method setting argument ‘ml_framework’ methods functions manually. case like use tensorflow now good time configure backend, since configurations can done tensorflow used first time. Note: Please remember: Every time start new session R set correct conda environment, load library aifeducation, chose machine learning framework.","code":"reticulate::use_condaenv(condaenv = \"aifeducation\") library(aifeducation) #For tensorflow aifeducation_config$set_global_ml_backend(\"tensorflow\") set_transformers_logger(\"ERROR\") #For PyTorch aifeducation_config$set_global_ml_backend(\"pytorch\") set_transformers_logger(\"ERROR\") #if you would like to use only cpus set_config_cpu_only() #if you have a graphic device with low memory set_config_gpu_low_memory() #if you would like to reduce the tensorflow output to errors set_config_os_environ_logger(level = \"ERROR\")"},{"path":"/articles/classification_tasks.html","id":"reading-texts-into-r","dir":"Articles","previous_headings":"","what":"2.2 Reading Texts into R","title":"02b Text Embedding and Classification Tasks","text":"applications aifeducation ’s necessary read text like use R. task, several packages available CRAN. experience good package readtext since allows process different kind sources textual data. Please refer readtext’s documentation details. installed package machine, can request example, stored texts excel sheet two columns (texts texts id texts’ id) can read data crucial pass file path file name column texts text_field name column id docid_field. cases may stored text separate file (e.g., .txt .pdf). cases can pass directory files read data. following example files stored directory “data”. read texts sever files need specify arguments docid_field text_field. id texts automatically set file names. text read recommend text cleaning. Please refer documentation function readtext within readtext library information. Now everything ready start preparation tasks.","code":"install.packages(\"readtext\") #for excel files textual_data<-readtext::readtext( file=\"text_data.xlsx\", text_field = \"texts\", docid_field = \"id\" ) #read all files with the extension .txt in the directory data textual_data<-readtext::readtext( file=\"data/*.txt\" ) #read all files with the extension .pdf in the directory data textual_data<-readtext::readtext( file=\"data/*.pdf\" ) #remove multiple spaces and new lines textual_data$text=stringr::str_replace_all(textual_data$text,pattern = \"[:space:]{1,}\",replacement = \" \") #remove hyphenation textual_data$text=stringr::str_replace_all(textual_data$text,pattern = \"-(?=[:space:])\",replacement = \"\")"},{"path":[]},{"path":"/articles/classification_tasks.html","id":"example-data-for-this-vignette","dir":"Articles","previous_headings":"3 Preparation Tasks","what":"3.1 Example Data for this Vignette","title":"02b Text Embedding and Classification Tasks","text":"illustrate steps vignette, use data educational settings since data generally protected privacy policies. Therefore, use data set data_corpus_moviereviews package quanteda.textmodels illustrate usage package. quanteda.textmodels automatically installed install aifeducation. now data set three columns. first contains ID movie review, second contains rating movie (positive negative), third column contains raw texts. can see, data balanced. 1,000 reviews imply positive rating movie 1,000 imply negative rating. tutorial, modify data set setting half negative positive reviews NA, indicating reviews labeled. Furthermore, bring imbalance setting 250 positive reviews NA. now use data show use different objects functions aifeducation.","code":"example_data<-data.frame( id=quanteda::docvars(quanteda.textmodels::data_corpus_moviereviews)$id2, label=quanteda::docvars(quanteda.textmodels::data_corpus_moviereviews)$sentiment) example_data$text<-as.character(quanteda.textmodels::data_corpus_moviereviews) table(example_data$label) #> #> neg pos #> 1000 1000 example_data$label[c(1:500,1001:1500)]=NA summary(example_data$label) #> neg pos NA's #> 500 500 1000 example_data$label[1501:1750]=NA summary(example_data$label) #> neg pos NA's #> 500 250 1250"},{"path":"/articles/classification_tasks.html","id":"topic-modeling-and-globalvectorclusters","dir":"Articles","previous_headings":"3 Preparation Tasks","what":"3.2 Topic Modeling and GlobalVectorClusters","title":"02b Text Embedding and Classification Tasks","text":"like create new text embedding model Topic Modeling GlobalVectorClusters, first create draft vocabulary. can calling function bow_pp_create_vocab_draft(). main input function vector texts. function’s aims create list tokens texts, reduce tokens tokens carry semantic meaning, provide lemma every token. Since Topic Modeling depends bag--word approach, reason pre-process step reduce tokens tokens really carry semantic meaning. general, tokens words either nouns, verbs adjectives (Papilloud & Hinneburg 2018, p. 32). example data, application function : can see, additional parameter: path_language_model. must insert path udpipe pre-trained language model since function uses udpipe package part--speech tagging lemmataziation. collection pre-trained models 65 languages can found [https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3131]. Just download relevant model machine provide path model. parameter upos can select tokens selected. example, tokens represent noun, adjective verb remain analysis. list possible tags can found : [https://universaldependencies.org/u/pos/index.html]. Please forget provide label udpipe model use please also provide language analyzing. information important since transferred text embedding model. researchers/users need information decide model help work. next step, can use draft vocabulary create basic text representation function bow_pp_create_basic_text_rep(). function takes raw texts draft vocabulary main input. function aims remove tokens referring stopwords, clean data (e.g., removing punctuation, numbers), lower case tokens requested, remove tokens specific minimal frequency, remove tokens occur many documents create document-feature-matrix (dfm), create feature-co-occurrence-matrix (fcm). Applied example, call function look like : data takes raw texts vocab_draft takes draft vocabulary created first step. main goal create document-feature-matrix(dfm) feature-co- occurrence-matrix (fcm). dfm matrix reports texts rows number tokens columns. matrix later used create text embedding model based topic modeling. dfm reduced tokens correspond part--speech tags vocabulary draft. Punctuation, symbols, numbers etc. removed matrix set corresponding parameter TRUE. set use_lemmata = TRUE can reduce dimensionality matrix using lemmas instead tokens (Papilloud & Hinneburg 2018, p.33). set to_lower = TRUE tokens transformed lower case. end get matrix tries represent semantic meaning text smallest possible number tokens. applies fcm. , tokens/features reduced way. However, features reduced, token’s co-occurrence calculated. aim window used shifted across text, counting tokens left right token investigation. size window can determined window. weights can provide weights counting. example, tokens far away token investigation count less tokens closer token investigation. fcm later used create text embedding model based GlobalVectorClusters. may notice, dfm counts words text. Thus, position text within sentence matter. lower-case tokens use lemmas, syntactic information lost advantage dfm lower dimensionality losing little semantic meaning. contrast, fcm matrix describes often different tokens occur together. Thus, fcm recovers part position words sentence text. Now, everything ready create new text embedding model based Topic Modeling GlobalVectorClusters. show create new model, look preparation new transformer.","code":"vocab_draft<-bow_pp_create_vocab_draft( path_language_model=\"language_model/english-gum-ud-2.5-191206.udpipe\", data=example_data$text, upos=c(\"NOUN\", \"ADJ\",\"VERB\"), label_language_model=\"english-gum-ud-2.5-191206\", language=\"english\", trace=TRUE) basic_text_rep<-bow_pp_create_basic_text_rep( data = example_data$text, vocab_draft = vocab_draft, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, remove_separators = TRUE, split_hyphens = FALSE, split_tags = FALSE, language_stopwords=\"en\", use_lemmata = FALSE, to_lower=FALSE, min_termfreq = NULL, min_docfreq= NULL, max_docfreq=NULL, window = 5, weights = 1 / (1:5), trace=TRUE)"},{"path":"/articles/classification_tasks.html","id":"creating-a-new-transformer","dir":"Articles","previous_headings":"3 Preparation Tasks","what":"3.3 Creating a New Transformer","title":"02b Text Embedding and Classification Tasks","text":"general, recommended use pre-trained model since creation new transformer requires large data set texts computationally intensive. vignette illustrate process BERT model. However, many transformers, process . creation new transformer requires least two steps. First, must decide architecture transformer. includes creation set vocabulary. aifedcuation can calling function create_bert_model(). example look like : First, function receives machine learning framework chose start session. However, can change setting ml_framework=\"tensorflow\" ml_framework=\"pytorch\". function work, must provide path directory new transformer saved. Furthermore, must provide raw texts. texts used training transformer training vocabulary. maximum size vocabulary determined vocab_size. Please provide size 50,000 60,000 since kind vocabulary works differently approaches described section 2.2. Modern tokenizers WordPiece (Wu et al. 2016) use algorithms splits tokens smaller elements, allowing build huge number words small number elements. Thus, even small number 30,000 tokens, able represent large number words. consequence, kinds vocabularies many times smaller vocabularies build section 2.2. parameters allow customize BERT model. example, increase number hidden layers 12 24 reduce hidden size 768 256, allowing build test larger smaller transformers. Please note max_position_embeddings determine many tokens transformer can process. text tokens tokenization, tokens ignored. However, like analyze long documents, please avoid increase number significantly computational time increase linear way quadratic (Beltagy, Peters & Cohan 2020). long documents can use another architecture BERT (e.g. Longformer Beltagy, Peters & Cohan 2020) split long document several chunks used sequentially classification (e.g., Pappagari et al. 2019). Using chunks supported aifedcuation. Since creating transformer model energy consuming aifeducation allows estimate ecological impact help python library codecarbon. Thus, sustain_track set TRUE default. use sustainability tracker must provide alpha-3 code country computer located (e.g., “CAN”=“Canada”, “Deu”=“Germany”). list codes can found wikipedia. reason different countries use different sources techniques generating energy resulting specific impact CO2 emissions. USA Canada can additionally specify region setting sustain_region. Please refer documentation codecarbon information. calling function, find new model model directory. next step train model calling train_tune_bert_model(). important provide path directory new transformer stored. Furthermore, important provide another directory trained transformer saved avoid reading writing collisions. Now, provided raw data used train model using Masked Language Modeling. First, can set length token sequences chunk_size. whole_word can choose masking single tokens masking complete words (Please remember modern tokenizers split words several tokens. Thus, tokens words forced match directly). p_mask can determine many tokens masked. Finally, val_size, set many chunks used validation sample. Please remember set correct alpha-3 code tracking ecological impact training model (sustain_iso_code). work machine graphic device small memory, please reduce batch size significantly. also recommend change usage memory set_config_gpu_low_memory() beginning session. training finishes, can find transformer ready use output_directory. Now able create text embedding model. can change machine learning framework setting ml_framework=\"tensorflow\" ml_framework=\"pytorch\". change argument framework chose beginning used.","code":"create_bert_model( ml_framework=aifeducation_config$get_framework(), model_dir = \"my_own_transformer\", vocab_raw_texts=example_data$text, vocab_size=30522, vocab_do_lower_case=FALSE, max_position_embeddings=512, hidden_size=768, num_hidden_layer=12, num_attention_heads=12, intermediate_size=3072, hidden_act=\"gelu\", hidden_dropout_prob=0.1, sustain_track=TRUE, sustain_iso_code=\"DEU\", sustain_region=NULL, sustain_interval=15, trace=TRUE) train_tune_bert_model( ml_framework=aifeducation_config$get_framework(), output_dir = \"my_own_transformer_trained\", model_dir_path = \"my_own_transformer\", raw_texts = example_data$text, p_mask=0.15, whole_word=TRUE, val_size=0.1, n_epoch=1, batch_size=12, chunk_size=250, n_workers=1, multi_process=FALSE, sustain_track=TRUE, sustain_iso_code=\"DEU\", sustain_region=NULL, sustain_interval=15, trace=TRUE)"},{"path":[]},{"path":"/articles/classification_tasks.html","id":"introduction","dir":"Articles","previous_headings":"4 Text Embedding","what":"4.1 Introduction","title":"02b Text Embedding and Classification Tasks","text":"aifedcuation, text embedding model stored object class TextEmbeddingModel. object contains relevant information transforming raw texts numeric representation can used machine learning. aifedcuation, transformation raw texts numbers separate step downstream tasks classification. reduce computational time machines low performance. separating text embedding tasks, text embedding calculated can used different tasks time. Another advantage training downstream tasks involves downstream tasks parameters embedding model, making training less time-consuming, thus decreasing computational intensity. Finally, approach allows analysis long documents applying algorithm different parts. text embedding model provides unified interface: creating model different methods, handling model always . following show use object. start Topic Modeling.","code":""},{"path":[]},{"path":"/articles/classification_tasks.html","id":"topic-modeling","dir":"Articles","previous_headings":"4 Text Embedding > 4.2 Creating Text Embedding Models","what":"4.2.1 Topic Modeling","title":"02b Text Embedding and Classification Tasks","text":"creating new text embedding model based Topic Modeling, need basic text representation generated function bow_pp_create_basic_text_rep() (see section 2.2). Now can create new instance text embedding model calling TextEmbeddingModel$new(). First provide name new model (model_name). unique short name without spaces. model_label can provide label model freedom. important provide version model case want create improved version future. model_language provide users information language model designed. important plan share model wider community. method determine approach used model. like use Topic Modeling, set method = \"lda\". number topics set via bow_n_dim. example like create topic model twelve topics. number topics also determines dimensionality text embedding. Consequently, every text characterized twelve topics. Please forget pass basic text representation bow_basic_text_rep. model estimated, stored topic_modeling example.","code":"topic_modeling<-TextEmbeddingModel$new( model_name=\"topic_model_embedding\", model_label=\"Text Embedding via Topic Modeling\", model_version=\"0.0.1\", model_language=\"english\", method=\"lda\", bow_basic_text_rep=basic_text_rep, bow_n_dim=12, bow_max_iter=500, bow_cr_criterion=1e-8, trace=TRUE )"},{"path":"/articles/classification_tasks.html","id":"globalvectorclusters","dir":"Articles","previous_headings":"4 Text Embedding > 4.2 Creating Text Embedding Models","what":"4.2.2 GlobalVectorClusters","title":"02b Text Embedding and Classification Tasks","text":"creation text embedding model based GlobalVectorClusters similar model based Topic Modeling. two differences. First, request model based GlobalVectorCluster setting method=\"glove_cluster\". Second, determine dimensionality global vectors bow_n_dim number clusters bow_n_cluster. creating new text embedding model, global vector token calculated based feature-co-occurrence-matrix (fcm) provide basic_text_rep. token, vector calculated length bow_n_dim. Since vectors word embeddings text embeddings, additional step necessary create text embeddings. aifedcuation word embeddings used group words clusters. number clusters set bow_n_cluster. Now, text embedding produced counting tokens every cluster every text. final model stored global_vector_clusters_modeling.","code":"global_vector_clusters_modeling<-TextEmbeddingModel$new( model_name=\"global_vector_clusters_embedding\", model_label=\"Text Embedding via Clusters of GlobalVectors\", model_version=\"0.0.1\", model_language=\"english\", method=\"glove_cluster\", bow_basic_text_rep=basic_text_rep, bow_n_dim=96, bow_n_cluster=384, bow_max_iter=500, bow_max_iter_cluster=500, bow_cr_criterion=1e-8, trace=TRUE )"},{"path":"/articles/classification_tasks.html","id":"transformers","dir":"Articles","previous_headings":"4 Text Embedding > 4.2 Creating Text Embedding Models","what":"4.2.3 Transformers","title":"02b Text Embedding and Classification Tasks","text":"Using transformer creating text embedding model similar two approaches. request model based transformer must set method accordingly. Since use BERT model example, set method = \"bert\". Next, provide directory model stored. example bert_model_dir_path=\"my_own_transformer_trained. course can use pre-trained model Huggingface addresses needs. Using BERT model text embedding problem since text provide tokens transformer can process. maximal value set configuration transformer (see section 2.3). text produces tokens last tokens ignored. instances might want analyze long texts. situations, reducing text first tokens (e.g. first 512 tokens) result problematic loss information. deal situations can configure text embedding model aifecuation split long texts several chunks processed transformer. maximal number chunks set chunks. example , text embedding model split text consisting 1024 tokens two chunks every chunk consisting 512 tokens. every chunk text embedding calculated. result, receive sequence embeddings. first embeddings characterizes first part text second embedding characterizes second part text (). Thus, example text embedding model able process texts 4*512=2048 tokens. approach inspired work Pappagari et al. (2019). Since transformers able account context, may useful interconnect every chunk bring context calculations. can done overlap determine many tokens end prior chunk added next.example last 30 tokens prior chunks added beginning following chunk. can help add correct context text sections analysis. Altogether, example model can analyse maximum 512+(4-1)*(512-30)=1958 tokens text. Finally, decide hidden layer layers embeddings drawn. emb_layer_min emb_layer_max can decide layers average value every token calculated. Please note calculation considers layers emb_layer_min emb_layer_max. initial work, Devlin et al. (2019) used hidden states different layers classification. emb_pool_type decide tokens used pooling within every layer. case emb_pool_type=\"cls\" cls token used. case emb_pool_type=\"average\" tokens within layer averaged except padding tokens. deciding configuration, can use model. Note: version 0.3.1 aifeducation every transformer can used machine learning frameworks. Even pre-trained weights can used across backends. However, future models implemented available specific framework.","code":"bert_modeling<-TextEmbeddingModel$new( ml_framework=aifeducation_config$get_framework(), model_name=\"bert_embedding\", model_label=\"Text Embedding via BERT\", model_version=\"0.0.1\", model_language=\"english\", method = \"bert\", max_length = 512, chunks=4, overlap=30, emb_layer_min=\"middle\", emb_layer_max=\"2_3_layer\", emb_pool_type=\"average\", model_dir=\"my_own_transformer_trained\" )"},{"path":"/articles/classification_tasks.html","id":"transforming-raw-texts-into-embedded-texts","dir":"Articles","previous_headings":"4 Text Embedding","what":"4.3 Transforming Raw Texts into Embedded Texts","title":"02b Text Embedding and Classification Tasks","text":"Although mechanics within text embedding model different, usage always . transform raw text numeric representation use embed method model. , must provide raw texts raw_text. addition, necessary provide character vector containing ID every text. IDs must unique. method embedcreates object class EmbeddedText. just data.frame consisting embedding every text. Depending method, data.frame different meaning: Topic Modeling: Regarding topic modeling, rows represent texts columns represent percentage every topic within text. GlobalVectorClusters: , rows represent texts columns represent absolute frequencies tokens belonging semantic cluster. Transformer - Bert: BERT, rows represent texts columns represents contextualized text embedding BERT’s understanding relevant text chunk. Please note case transformer models, embeddings every chunks interlinked. embedded texts now input train new classifier apply pre-trained classifier predicting categories/classes. next chapter show use classifiers. start, show save load model.","code":"topic_embeddings<-topic_modeling$embed( raw_text=example_data$text, doc_id=example_data$id, trace = TRUE) cluster_embeddings<-global_vector_clusters_modeling$embed( raw_text=example_data$text, doc_id=example_data$id, trace = TRUE) bert_embeddings<-bert_modeling$embed( raw_text=example_data$text, doc_id=example_data$id, trace = TRUE)"},{"path":"/articles/classification_tasks.html","id":"saving-and-loading-text-embedding-models","dir":"Articles","previous_headings":"4 Text Embedding","what":"4.4 Saving and Loading Text Embedding Models","title":"02b Text Embedding and Classification Tasks","text":"Saving created text embedding model easy aifeducation using function save_ai_model. function provides unique interface text embedding models. saving work can pass model model directory save model model_dir. Please pass path directory path file function. Internally function creates new folder directory files belonging model stored. can see three text embedding models saved within directory named “text_embedding_models”. Within directory function creates unique folder every model. name folder specified dir_name. set dir_name=NULL append_ID=FALSE name folder created using models’ names. change argument append_ID append_ID=TRUE set dir_name=NULL unique ID model added directory. ID added automatically ensure every model unique name. important like share work persons. Since files stored special structure please change files manually. want load model, just call function load_ai_model can continue using model. ml_framework can decide framework model use. set ml_framework=\"auto\" models initialized framework saving model. Please note moment implemented text embedding models can used frameworks. However, may change future. Please note add name model directory path. example stored three models directory “text_embedding_models”. model saved within folder. folder’s name created automatically help name model. Thus, loading model must specify model want load adding model’s name directory path shown . Now can use text embedding model.","code":"save_ai_model( model=topic_modeling, model_dir=\"text_embedding_models\", dir_name=\"model_topic_modeling\", save_format=\"default\", append_ID=FALSE) save_ai_model( model=global_vector_clusters_modeling, model_dir=\"text_embedding_models\", dir_name=\"model_global_vectors\", save_format=\"default\", append_ID=FALSE) save_ai_model( model=bert_modeling, model_dir=\"text_embedding_models\", dir_name=\"model_transformer_bert\", save_format=\"default\", append_ID=FALSE) topic_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_topic_modeling\", ml_framework=aifeducation_config$get_framework()) global_vector_clusters_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_global_vectors\", ml_framework=aifeducation_config$get_framework()) bert_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_transformer_bert\", ml_framework=aifeducation_config$get_framework())"},{"path":"/articles/classification_tasks.html","id":"sustainability","dir":"Articles","previous_headings":"4 Text Embedding","what":"4.5 Sustainability","title":"02b Text Embedding and Classification Tasks","text":"case underlying model trained active sustainability tracker (section 3.3) can receive table showing energy consumption, CO2 emissions, hardware used training calling bert_modeling$get_sustainability_data().","code":""},{"path":[]},{"path":"/articles/classification_tasks.html","id":"creating-a-new-classifier","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.1 Creating a New Classifier","title":"02b Text Embedding and Classification Tasks","text":"aifedcuation, classifiers based neural nets stored objects class TextEmbeddingClassifierNeuralNet. can create new classifier calling TextEmbeddingClassifierNeuralNet$new(). Similar text embedding model provide name (name) label (label) new classifier. text_embeddings provide embedded text. like recommend use embedding like use training. continue example use embedding produced BERT model. targets takes target data supervised learning. Please omit cases category/class since can used special training technique show later. important provide target data factors. Otherwise error occur. also important name factor. , entries factor mus names correspond IDs corresponding texts. Without names method match input data (text embeddings) target data. parameters decide structure classifier. Figure 4 illustrates . hidden takes vector integers, determining number layers number neurons. example, dense layers. rec also takes vector integers determining number size Gated Recurrent Unit (gru). example, use one layer 256 neurons. Since classifiers aifeducation use standardized scheme creation, dense layers used gru layers. want omit gru layers dense layers, set corresponding argument NULL. use text embedding model processes one chunk like recommend use recurrent layers since able use sequential structure data. cases can rely dense layers . use text embeddings one chunk, good idea try self-attention layering order take context chunks account. add self-attention two choices: - can use attention mechanism used classic transformer models multihead attention (Vaswani et al. 2017). variant set attention_type=\"multihead\", repeat_encoder value least 1, self_attention_headsto value least 1. - Furthermore can use attention mechanism described Lee-Thorp et al. (2021) FNet model allows much fast computations low accuracy costs. use kind attention setattention_type=“fourierandrepeat_encoder` value least 1. repeat_encoder can chose many times encoder layer added. encoder implemented described Chollet, Kalinowski, Allaire (2022, pp. 373) variants attention. can extend abilities network adding positional embeddings. Positional embeddings take care order chunks. Thus, adding layer may increase performance order information important. can add layer setting add_pos_embedding=TRUE. layer created described Chollet, Kalinowski, Allaire (2022, pp. 378) Masking, normalization, creation input layer well output layer done automatically. created new classifier, can begin training. Note: contrast text embedding models decision machine learning framework important since classifier can used framework created trained model.","code":"example_targets<-as.factor(example_data$label) names(example_targets)=example_data$id classifier<-TextEmbeddingClassifierNeuralNet$new( ml_framework=aifeducation_config$get_framework(), name=\"movie_review_classifier\", label=\"Classifier for Estimating a Postive or Negative Rating of Movie Reviews\", text_embeddings=bert_embeddings, targets=example_targets, hidden=NULL, rec=c(256), self_attention_heads=2, intermediate_size=512, attention_type=\"fourier\", add_pos_embedding=TRUE, rec_dropout=0.1, repeat_encoder=1, dense_dropout=0.4, recurrent_dropout=0.4, encoder_dropout=0.1, optimizer=\"adam\")"},{"path":"/articles/classification_tasks.html","id":"training-a-classifier","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.2 Training a Classifier","title":"02b Text Embedding and Classification Tasks","text":"start training classifier, call train method. Similarly, creation classifier, must provide text embedding data_embeddings categories/classes target data data_targets. Please remember data_targets expects named factor names correspond IDs corresponding text embeddings. Text embeddings target data matched omitted training. train classifier, necessary provide path dir_checkpoint. directory stores best set weights training epoch. training, weights automatically used final weights classifier. performance estimation, training splits data several chunks based cross-fold validation. number folds set data_n_test_samples. every case, one fold used training serves test sample. remaining data used create training validation sample. performance values saved trained classifier refer test sample. data never used training provides realistic estimation classifier`s performance. Since aifedcuation tries address special needs educational social science, special training steps integrated method. Baseline: interested training classifier without applying additional statistical techniques, set use_baseline = TRUE. case, classifier trained provided data . Cases missing values target data omitted. Even like apply statistical adjustments, makes sense compute baseline model comparing effect modified training process unmodified training. using bsl_val_size can determine much data used training data much used validation data. Balanced Synthetic Cases: case imbalanced data, recommended set use_bsc=TRUE. training, number synthetic units created via different techniques. Currently can request Basic Synthetic Minority Oversampling Technique, Density-Bases Synthetic Minority Oversampling Technique, Adaptive Synthetic Sampling Approach Imbalanced Learning. aim create new cases fill gap majority class. Multi-class problems reduced two class problem (class investigation vs. ) generating units. can even request several techniques . number synthetic units original minority units exceeds number cases majority class, random sample drawn. technique allows set number neighbors generation, k = bsc_max_k used. Balanced Pseudo-Labeling: technique relevant labeled target data large number unlabeled target data. different parameter starting “bpl_”, can request different implementations pseudo-labeling, example based work Lee (2013) Cascante-Bonilla et al. (2020). turn pseudo-labeling, set use_bpl=TRUE. request pseudo-labeling based Cascante-Bonilla et al. (2020), following parameters set: bpl_max_steps = 5 (splits unlabeled data five chunks) bpl_dynamic_inc = TRUE (ensures number used chunks increases every step) bpl_model_reset = TRUE (re-initializes model every step) bpl_epochs_per_step=30 (number training epochs within step) bpl_balance=FALSE (ensures cases highest certainty added training regardless absolute frequencies classes) bpl_weight_inc=0.00 bpl_weight_start=1.00 (ensures labeled unlabeled data weight training) bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00 (ensures unlabeled data considered training cases highest certainty used training.) request original pseudo-labeling proposed Lee (2013), set following parameters: bpl_max_steps=30 (steps must treated epochs) bpl_dynamic_inc=FALSE (ensures pseudo-labeled cases used) bpl_model_reset=FALSE (model allowed re-initialized) bpl_epochs_per_step=1 (steps treated epochs must one) bpl_balance=FALSE (ensures cases added regardless absolute frequencies classes) bpl_weight_inc=0.02 bpl_weight_start=0.00 (gives pseudo labeled data increasing weight every step) bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00 (ensures pseudo labeled cases used training. bpl_anchor affect calculations) Please note Lee (2013) suggests recalculate pseudo-labels unlabeled data every weight actualization, aifeducation, pseudo-labels recalculated every epoch. bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00 used describe certainty prediction. 0 refers random guessing 1 refers perfect certainty. bpl_anchor used reference value. distance bpl_anchor calculated every case. , sorted increasing distance bpl_anchor. resulting order cases relevant set bpl_dynamic_inc=TRUE bpl_balance=TRUE. Figure 5 illustrates training loop cases three options set TRUE. example applies algorithm proposed Cascante-Bonilla et al. (2020). training classifier labeled data, unlabeled data introduced training. classifier predicts potential labels unlabeled data adds 20% cases highest certainty pseudo-labels training. classifier re-initialized trained . training, classifier predicts potential labels originally unlabeled data adds 40% pseudo-labeled data training data. model re-initialized trained unlabeled data used training. Since training neural net energy consuming aifeducation allows estimate ecological impact help python library codecarbon. Thus, sustain_track set TRUE default. use sustainability tracker must provide alpha-3 code country computer located (e.g., “CAN”=“Canada”, “Deu”=“Germany”). list codes can found wikipedia. reason different countries use different sources techniques generating energy resulting specific impact CO2 emissions. USA Canada can additionally specify region setting sustain_region. Please refer documentation codecarbon information. Finally, trace, view_metrics, keras_trace allow control much information training progress printed console. Please note training classifier can take time. Please note performance estimation, final training classifier makes use data available. , test sample left empty.","code":"example_targets<-as.factor(example_data$label) names(example_targets)=example_data$id classifier$train( data_embeddings = bert_embeddings, data_targets = example_targets, data_n_test_samples=5, use_baseline=TRUE, bsl_val_size=0.33, use_bsc=TRUE, bsc_methods=c(\"dbsmote\"), bsc_max_k=10, bsc_val_size=0.25, use_bpl=TRUE, bpl_max_steps=5, bpl_epochs_per_step=30, bpl_dynamic_inc=TRUE, bpl_balance=FALSE, bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00, bpl_weight_inc=0.00, bpl_weight_start=1.00, bpl_model_reset=TRUE, epochs=30, batch_size=8, sustain_track=TRUE, sustain_iso_code=\"DEU\", sustain_region=NULL, sustain_interval=15, trace=TRUE, view_metrics=FALSE, keras_trace=0, n_cores=2, dir_checkpoint=\"training/classifier\")"},{"path":"/articles/classification_tasks.html","id":"evaluating-classifiers-performance","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.3 Evaluating Classifier’s Performance","title":"02b Text Embedding and Classification Tasks","text":"finishing training, can evaluate performance classifier. every fold, classifier applied test sample results compared true categories/classes. Since test sample never part training, performance measures provide realistic idea classifier`s performance. support researchers judging quality predictions, aifeducation utilizes several measures concepts content analysis. Iota Concept Second Generation (Berding & Pargmann 2022) Krippendorff’s Alpha (Krippendorff 2019) Percentage Agreement Gwet’s AC1/AC2 (Gwet 2014) Kendall’s coefficient concordance W Cohen’s Kappa unweighted Cohen’s Kappa equal weights Cohen’s Kappa squared weights Fleiss’ Kappa multiple raters without exact estimation can access concrete values accessing field reliability stores relevant information. list find reliability values every fold every requested training configuration. addition, reliability every step within balanced pseudo-labeling reported. central estimates reliability values can found via reliability$test_metric_mean. example : now table relevant values. particular interest values alpha Iota Concept since represent measure reliability independent frequency distribution classes/categories. alpha values describe probability case specific class recognized specific class. can see, compared baseline model, applying Balanced Synthetic Cases increased increases minimal value alpha, reducing risk miss cases belong rare class (see row “BSC”). contrary, alpha values major category decrease slightly, thus losing unjustified bonus high number cases training set. provides realistic performance estimation classifier. Furthermore, can see application pseudo-labeling increases alpha values minor class , step 3. Finally, can plot coding stream scheme showing cases different classes labeled. use package iotarelr. can see small number negative reviews treated good review larger number positive reviews treated bad review. Thus, data major class (negative reviews) reliable valid data minor class (positive reviews). Evaluating performance classifier complex task beyond scope vignette. Instead, like refer cited literature content analysis machine learning like dive deeper topic.","code":"classifier$reliability$test_metric_mean test_metric_mean #> iota_index min_iota2 avg_iota2 max_iota2 min_alpha avg_alpha #> Baseline 0.6320000 0.10294118 0.3877251 0.6725090 0.136 0.549 #> BSC 0.4346667 0.06895416 0.2676750 0.4663959 0.072 0.512 #> BPL 0.6293333 0.51019563 0.6401731 0.7701506 0.580 0.756 #> Final 0.6293333 0.51019563 0.6401731 0.7701506 0.580 0.756 #> max_alpha static_iota_index dynamic_iota_index kalpha_nominal #> Baseline 0.962 0.5455732 0.5281005 -0.04487101 #> BSC 0.952 0.3785559 0.3744595 -0.25488654 #> BPL 0.932 0.3846018 0.5242565 0.54678492 #> Final 0.932 0.3846018 0.5242565 0.54678492 #> kalpha_ordinal kendall kappa2 kappa_fleiss kappa_light #> Baseline -0.04487101 0.5531797 0.10142922 0.10142922 0.10142922 #> BSC -0.25488654 0.5199922 0.02869454 0.02869454 0.02869454 #> BPL 0.54678492 0.7827658 0.55104690 0.55104690 0.55104690 #> Final 0.54678492 0.7827658 0.55104690 0.55104690 0.55104690 #> percentage_agreement gwet_ac #> Baseline 0.6866667 0.543828 #> BSC 0.4920000 0.106742 #> BPL 0.8146667 0.686684 #> Final 0.8146667 0.686684 library(iotarelr) iotarelr::plot_iota2_alluvial(test_classifier$reliability$iota_object_end_free)"},{"path":"/articles/classification_tasks.html","id":"sustainability-1","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.4 Sustainability","title":"02b Text Embedding and Classification Tasks","text":"case classifier trained active sustainability tracker can receive information sustainability calling classifier$get_sustainability_data().","code":"sustainability_data #> $sustainability_tracked #> [1] TRUE #> #> $date #> [1] \"Thu Oct 5 11:20:53 2023\" #> #> $sustainability_data #> $sustainability_data$duration_sec #> [1] 7286.503 #> #> $sustainability_data$co2eq_kg #> [1] 0.05621506 #> #> $sustainability_data$cpu_energy_kwh #> [1] 0.08602103 #> #> $sustainability_data$gpu_energy_kwh #> [1] 0.05598303 #> #> $sustainability_data$ram_energy_kwh #> [1] 0.01180879 #> #> $sustainability_data$total_energy_kwh #> [1] 0.1538128 #> #> #> $technical #> $technical$tracker #> [1] \"codecarbon\" #> #> $technical$py_package_version #> [1] \"2.3.1\" #> #> $technical$cpu_count #> [1] 12 #> #> $technical$cpu_model #> [1] \"12th Gen Intel(R) Core(TM) i5-12400F\" #> #> $technical$gpu_count #> [1] 1 #> #> $technical$gpu_model #> [1] \"1 x NVIDIA GeForce RTX 4070\" #> #> $technical$ram_total_size #> [1] 15.84258 #> #> #> $region #> $region$country_name #> [1] \"Germany\" #> #> $region$country_iso_code #> [1] \"DEU\" #> #> $region$region #> [1] NA"},{"path":"/articles/classification_tasks.html","id":"saving-and-loading-a-classifier","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.5 Saving and Loading a Classifier","title":"02b Text Embedding and Classification Tasks","text":"created classifier, saving loading easy. process saving model similar process text embedding models. pass model directory path function save_ai_model. contrast text embedding models can specify additional argument save_format. case pytorch models arguments allows choose save_format = \"safetensors\" save_format = \"pt\". recommend chose save_format = \"safetensors\" since safer method save models. case tensorflow models arguments allows choose save_format = \"keras\", save_format = \"tf\" save_format = \"h5\". recommend chose save_format = \"keras\" since recommended format keras. set save_format = \"default\" .safetensors used pytorch models .keras used tensorflow models. like load model can call function load_ai_model. Note: Classifiers depend framework used creation. Thus, classifier always initalized original framework. argument ml_framework effect.","code":"save_ai_model( model=classifier, model_dir=\"classifiers\", dir_name=\"movie_classifier\", save_format = \"default\", append_ID=FALSE) classifier<-load_ai_model( model_dir=\"classifiers/movie_classifier\")"},{"path":"/articles/classification_tasks.html","id":"predicting-new-data","dir":"Articles","previous_headings":"5 Using AI for Classification","what":"5.6 Predicting New Data","title":"02b Text Embedding and Classification Tasks","text":"like apply classifier new data, two steps necessary. must first transform raw text numerical expression using exactly text embedding model used training classifier. case example classifier use BERT model. transform raw texts numeric representation Just pass raw texts IDs every text method embed loaded model. easy used package readtext read raw text disk, since object resulting readtext always stores texts column “texts” IDs column “doc_id”. Depending machine, embedding raw texts may take time. case use machine graphic device, possible “memory” error occurs. case reduce batch size. error still occurs, restart R session, switch cpu-mode directly loading libraries aifeducation::set_config_cpu_only() request embedding . example , text embeddings stored text_embedding. Since embedding texts may take time, good idea save embeddings future analysis (use save function R). allows load embedding without need apply text embedding model raw texts . resulting object can passed method predict classifier get predictions together estimate certainty class/category. classifier finishes prediction, estimated categories/classes stored predicted_categories. object data.frame containing texts’ IDs rows probabilities different categories/classes columns. last column name expected_category represents category assigned text due highest probability. estimates can used analysis common methods educational social sciences correlation analysis, regression analysis, structural equation modeling, latent class analysis analysis variance.","code":"# If our mode is not loaded bert_modeling<-load_ai_model( model_dir=\"text_embedding_models/bert_embedding\") # Create a numerical representation of the text text_embeddings<-bert_modeling$embed( raw_text = textual_data$texts, doc_id = textual_data$doc_id, batch_size=8, trace=TRUE) # If your classifier is not loaded classifier<-load_ai_model( model_dir=\"classifiers/movie_review_classifier\") # Predict the classes of new texts predicted_categories<-classifier$predict( newdata = text_embeddings, batch_size=8, verbose=0)"},{"path":"/articles/classification_tasks.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"02b Text Embedding and Classification Tasks","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. https://doi.org/10.48550/arXiv.2004.05150 Berding, F., & Pargmann, J. (2022). Iota Reliability Concept Second Generation. Berlin: Logos. https://doi.org/10.30819/5581 Berding, F., Riebenbauer, E., Stütz, S., Jahncke, H., Slopinski, ., & Rebmann, K. (2022). Performance Configuration Artificial Intelligence Educational Settings.: Introducing New Reliability Concept Based Content Analysis. Frontiers Education, 1–21. https://doi.org/10.3389/feduc.2022.818365 Campesato, O. (2021). Natural Language Processing Fundamentals Developers. Mercury Learning & Information. https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=6647713 Cascante-Bonilla, P., Tan, F., Qi, Y. & Ordonez, V. (2020). Curriculum Labeling: Revisiting Pseudo-Labeling Semi-Supervised Learning. https://doi.org/10.48550/arXiv.2001.06001 Chollet, F., Kalinowski, T., & Allaire, J. J. (2022). Deep learning R (Second edition). Manning Publications Co. https://learning.oreilly.com/library/view/-/9781633439849/?ar Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. https://doi.org/10.48550/arXiv.2006.03236 Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171–4186). Association Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Gwet, K. L. (2014). Handbook inter-rater reliability: definitive guide measuring extent agreement among raters (Fourth edition). Gaithersburg: STATAXIS. , P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. https://doi.org/10.48550/arXiv.2006.03654 Krippendorff, K. (2019). Content Analysis: Introduction Methodology (4th ed.). Los Angeles: SAGE. Lane, H., Howard, C., & Hapke, H. M. (2019). Natural language processing action: Understanding, analyzing, generating text Python. Shelter Island: Manning. Larusson, J. ., & White, B. (Eds.). (2014). Learning Analytics: Research Practice. New York: Springer. https://doi.org/10.1007/978-1-4614-3305-7 Lee, D.‑H. (2013). Pseudo-Label: Simple Efficient Semi-Supervised Learning Method Deep Neural Networks. CML 2013 Workshop: Challenges Representation Learning. Lee-Thorp, J., Ainslie, J., Eckstein, . & Ontanon, S. (2021). FNet: Mixing Tokens Fourier Transforms. https://doi.org/10.48550/arXiv.2105.03824 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 Papilloud, C., & Hinneburg, . (2018). Qualitative Textanalyse mit Topic-Modellen: Eine Einführung für Sozialwissenschaftler. Wiesbaden: Springer. https://doi.org/10.1007/978-3-658-21980-2 Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., & Dehak, N. (2019). Hierarchical Transformers Long Document Classification. 2019 IEEE Automatic Speech Recognition Understanding Workshop (ASRU) (pp. 838–844). IEEE. https://doi.org/10.1109/ASRU46091.2019.9003958 Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global Vectors Word Representation. Proceedings 2014 Conference Empirical Methods Natural Language Processing. https://aclanthology.org/D14-1162.pdf Schreier, M. (2012). Qualitative Content Analysis Practice. Los Angeles: SAGE. Tunstall, L., Werra, L. von, Wolf, T., & Géron, . (2022). Natural language processing transformers: Building language applications hugging face (Revised edition). Heidelberg: O’Reilly. Vaswani, ., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, . N., Kaiser, L., & Polosukhin, . (2017). Attention Need. https://doi.org/10.48550/arXiv.1706.03762 Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, ., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., . . . Dean, J. (2016). Google’s Neural Machine Translation System: Bridging Gap Human Machine Translation. https://doi.org/10.48550/arXiv.1609.08144","code":""},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"preface","dir":"Articles","previous_headings":"1 Introduction and Overview","what":"1.1 Preface","title":"02a Using Aifeducation Studio","text":"vignette introduces Aifeducation - Studio graphical user interface creating, training, documenting, analyzing, applying artificial intelligence (AI). made users unfamiliar R coding skills relevant languages (e.g., python). experienced users, interface provides convenient way working AI educational context. article overlaps vignette 02b Classification Tasks, explains use package R syntax. assume aifeducation installed described vignette 01 Get Started. introduction starts brief explanation basic concepts, necessary work package.","code":""},{"path":"/articles/gui_aife_studio.html","id":"basic-concepts","dir":"Articles","previous_headings":"1 Introduction and Overview","what":"1.2 Basic Concepts","title":"02a Using Aifeducation Studio","text":"educational social sciences, assigning observation scientific concepts important task allows researchers understand observation, generate new insights, derive recommendations research practice. educational science, several areas deal kind task. example, diagnosing students’ characteristics important aspect teachers’ profession necessary understand promote learning. Another example use learning analytics, data students used provide learning environments adapted individual needs. another level, educational institutions schools universities can use information data-driven performance decisions (Laurusson & White 2014) well improve . case, real-world observation aligned scientific models use scientific knowledge technology improved learning instruction. Supervised machine learning one concept allows link real-world observations existing scientific models theories (Berding et al. 2022). educational sciences, great advantage allows researchers use existing knowledge insights apply AI. drawback approach training AI requires information real world observations information corresponding alignment scientific models theories. valuable source data educational science written texts, since textual data can found almost everywhere realm learning teaching (Berding et al. 2022). example, teachers often require students solve task provide written form. Students create solution tasks often document short written essay presentation. data can used analyze learning teaching. Teachers’ written tasks students may provide insights quality instruction students’ solutions may provide insights learning outcomes prerequisites. AI can helpful assistant analyzing textual data since analysis textual data challenging time-consuming task humans. Please note introduction content analysis, natural language processing machine learning beyond scope vignette. like learn , please refer cited literature. start, necessary introduce definition understanding basic concepts, since applying AI educational contexts means combine knowledge different scientific disciplines using different, sometimes overlapping concepts. Even within single research area, concepts unified. Figure 1 illustrates package’s understanding. Since aifeducation looks application AI classification tasks perspective empirical method content analysis, overlapping concepts content analysis machine learning. content analysis, phenomenon like performance colors can described scale/dimension made several categories (e.g. Schreier 2012, pp. 59). example, exam’s performance (scale/dimension) “good”, “average” “poor”. terms colors (scale/dimension) categories “blue”, “green”, etc. Machine learning literature uses words describe kind data. machine learning, “scale” “dimension” correspond term “label” “categories” refer term “classes” (Chollet, Kalinowski & Allaire 2022, p. 114). clarifications, classification means text assigned correct category scale , respectively, text labeled correct class. Figure 2 illustrates, two kinds data necessary train AI classify text line supervised machine learning principles. providing AI textual data input data corresponding information class target data, AI can learn texts imply specific class category. exam example, AI can learn texts imply “good”, “average” “poor” judgment. training, AI can applied new texts predict likely class every new text. generated class can used statistical analysis derive recommendations learning teaching. use cases described vignette, AI “understand” natural language: „Natural language processing area research computer science artificial intelligence (AI) concerned processing natural languages English Mandarin. processing generally involves translating natural language data (numbers) computer can use learn world. (…)” (Lane , Howard & Hapke 2019, p. 4) Thus, first step transform raw texts form usable computer, hence raw texts must transformed numbers. modern approaches, usually done word embeddings. Campesato (2021, p. 102) describes “collective name set language modeling feature learning techniques (…) words phrases vocabulary mapped vectors real numbers.” definition word vector similar: „Word vectors represent semantic meaning words vectors context training corpus.” (Lane, Howard & Hapke 2019, p. 191). next step, words text embeddings can used input data labels target data training AI classify text. aifeducation, steps covered three different types models, shown Figure 3. Base Models: base models models contain capacities understand natural language. general, transformers BERT, RoBERTa, etc. huge number pre-trained models can found Huggingface. Text Embedding Models: modes built top base models store directions use base models converting raw texts sequences numbers. Please note base model can used create different text embedding models. Classifiers: Classifiers used top text embedding model. used classify text categories/classes based numeric representation provided corresponding text embedding model. Please note text embedding model can used create different classifiers (e.g. one classifier colors, one classifier estimate quality text, etc.). help overview can start introduction Aifeducation Studio.","code":""},{"path":"/articles/gui_aife_studio.html","id":"starting-aifeducation-studio","dir":"Articles","previous_headings":"","what":"2 Starting Aifeducation Studio","title":"02a Using Aifeducation Studio","text":"recommend start clean R session. can start Aifeducation Studio entering following console: Please note can take moment. beginning see start page (Figure 4). can configure current session. First, important choose machine learning framework like use session (box Machine Learning Framework). choice changed session starts. change framework restart Aifeducation Studio. Depending chosen framework, can activate settings (box Machine Learning Framework). like use tensorflow computer graphic device low memory, recommend activate option low memory. pytorch, configuration necessary. right side start page, can decide energy consumption recorded training AI (box Sustainability Tracking). Tracking energy consumption allows estimate CO2 emissions using AI. Since world faces challenge climate change, recommend enable option. case choose country order allow accurate estimation model’s sustainability impact. ready can press start button (box Start Session), directs home page.","code":"library(aifeducation) start_aifeducation_studio()"},{"path":[]},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"collection-of-raw-texts","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.1 Preparing Data","what":"3.1.1 Collection of Raw Texts","title":"02a Using Aifeducation Studio","text":"fist step working AI gather structure data. scope aifeducation, data can either collection raw texts, sequences numbers representing texts (text embeddings) texts’ labels. Collections raw texts necessary two cases: First, train fine-tune base models. Second, transform texts texts embeddings can used input training classifier predicting texts’ labels via classifier. create collection raw texts, choose Data Preparation page right side shown Figure 5. resulting page (see Figure 6), first choose directory texts stored (box Text Sources). recommend store texts like use single folder. Within folder, can structure data sub-folders. case use sub-folders, please ensure include creating collection raw texts. next step, can decide file formats included (box File Types). Currently, aifeducation supports .pdf, .csv, .xlsx files. enabled, files requested file format included data collection. case like consider .xlsx files, files must one column containing texts one column texts’ IDs shown Figure 7. name corresponding columns must identical files provide column names (first row Figure 7). last step choose folder collection raw texts saved. Please select folder provide name file (box Text Output). Finally, can start creating collection (box Start Process). Please note can take time. process finishes, single file can used tasks. file contains data.table stores texts together IDs. case .xlsx files, texts’ IDs set IDs stored corresponding column ID. case .pdf .csv files, file names used ID (without file extension, see Figure 8). Please note consequence two files text_01.csv text_01.pdf ID, allowed. Please ensure use unique IDs across file formats. IDs important since used match corresponding class/category available.","code":""},{"path":"/articles/gui_aife_studio.html","id":"collections-of-texts-labels","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.1 Preparing Data","what":"3.1.2 Collections of Texts’ Labels","title":"02a Using Aifeducation Studio","text":"Labels necessary like train classifier. easiest way create table contains column texts’ ID one multiple columns contain texts’ categories/classes. Supported file formats .xlsx, .csv, .rda/.rdata. Figure 9 illustrates example .xslx file. case, table must contain column name “id” contains texts’ IDs. columns must also unique names. Please pay attention use “id” “ID” “Id”.","code":""},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"overview","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.2 Base Models","what":"3.2.1 Overview","title":"02a Using Aifeducation Studio","text":"Base models foundation models aifeducation. moment, transformer models BERT (Devlin et al. 2019), RoBERTa (Liu et al. 2019), DeBERTa version 2 (et al. 2020), Funnel-Transformer (Dai et al. 2020), Longformer (Beltagy, Peters & Cohan 2020). general, models trained large corpus general texts first step. next step, models fine-tuned domain-specific texts /fine-tuned specific tasks. Since creation base models requires huge number texts resulting high computational time, recommended use pre-trained models. can found Huggingface. Sometimes, however, straightforward create new model fit specific purpose. Aifeducation Studio supports opportunitiy create train/fine-tune base models.","code":""},{"path":"/articles/gui_aife_studio.html","id":"creation-of-base-models","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.2 Base Models","what":"3.2.2 Creation of Base Models","title":"02a Using Aifeducation Studio","text":"order create new base model choose option Create tab Language Modeling Base Models right side app (Figure 5). Figure 10 shows corresponding page. Figure 10: Language Modeling - Create Transformer (click image enlarge) Every transformer model composed two parts: 1) tokenizer splits raw texts smaller pieces model large number words limited, small number tokens 2) neural network used model capabilities understanding natural language. beginning can choose different supported transformer architectures (box Model Architecture). Depending architecture, different options determining shape neural network. middle, find box named Vocabulary. must provide path file contains collection raw texts. raw texts used calculate vocabulary transformer. file created Aifeducation Studio ensure compatibility. See section 3.1.1 details. important provide number many tokens vocabulary include. Depending transformer method, can set additional options affecting transformer’s vocabulary. Transform Lower: option enabled, words raw text transformed lower cases. instance, resulting token Learners learners . disabled, Learners learners different tokenization. Add Prefix Spaces: enabled, space added first word already one. Thus, enabling option leads similar tokenization word learners cases: 1) “learners need high motivation high achievement.” 2) “high motivation necessary learners achieve high performance.”. Trim Offsets: option enabled, white spaces produced offsets trimmed. last step choose folder new base model saved (box Creation). Finally, can start creation model clicking button “Start Creation”. creation model may take time.","code":""},{"path":"/articles/gui_aife_studio.html","id":"traintune-a-base-model","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.2 Base Models","what":"3.2.3 Train/Tune a Base Model","title":"02a Using Aifeducation Studio","text":"like train new base model (see section 3.2.2) first time want adapt pre-trained model domain-specific language task, click “Train/Tune” right side app. can find option via “Language Modeling” shown Figure 5. Figure 11: Language Modeling - Train/Tune Transformer (click image enlarge) first step, choose base model like train/tune (box Base Model). Please note every base model consists several files. Thus, provide neither single multiple files. Instead provide folder stores entire model. Compatible models base models created Aifeducation Studio. addition can use model Huggingface uses architecture implemented aifeducation BERT, DeBERTa, etc. choosing base model, new boxes appear shown Figure 11. train model, must first provide collection raw texts (box Raw Texts). recommend create collection texts described section 3.1.1. Next can configure training base model (box Train Tune Settings): Chunk Size: training validating base model, raw texts split several smaller texts. value determines maximum length smaller text pieces number tokens. value exceed maximum size set creation base model. Minimal Sequence Length: value determines minimal length text chunk order part training validation data. Full Sequences : option enabled, text chunks number tokens equal “chunk size” included data. Disable option lot small text chunks like use training validation. Probability Token Masking: option determines many tokens every sequence masked. Whole Word Masking: option activated, tokens belonging single word masked. options disabled available token masking used. Validation Size: option determines many sequences used validating performance base model. Sequences used validation available training. Batch Size: option determines many sequences processed time. Please adjust value computation capacities machine. n Epochs: maximum number epochs training. training, model best validation loss saved disk used final model. Learning Rate: initial learning rate. last step, provide directory trained model saved training (box Start Training/Tuning). corresponding folder also contain checkpoints training. important directory directory one stored original model . clicking button “Start Training/Tuning”, training starts. Please note training base model can last days even weeks, depending size kind model, amount data, capacities machine.","code":""},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"create-a-text-embedding-model","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.3 Text Embedding Models","what":"3.3.1 Create a Text Embedding Model","title":"02a Using Aifeducation Studio","text":"text embedding model interface R aifeducation. order create new model, need base model provides ability understand natural language. can open creation page clicking “Create” section “Text Embedding Model” via “Language Modeling” (see Figure 5). Figure 12 shows corresponding page. Figure 12: Text Embedding Model - Create (click image enlarge) First choose base model form foundation new text embedding model. Please select folder contains entire model single files (box Base Model). choosing model, new boxes appear allow customize interface (box Interface Setting). important give model unique name label. difference Name Label Name used computer Label users. Thus, Name contain spaces special characters. Label restrictions. Think Label title book paper. Version can provide version number create newer version model. case create new model, recommend use “0.0.1”. Language, necessary choose language model created , English, French, German, etc. right side box Interface Setting can set interface process raw text: N chunks: Sometimes texts long. value, can decide many chunks longer texts divided. maximum length every chunk determined value provided “Maximum Sequence Length”. Maximal Sequence Length: value determines maximum number tokens model processes every chunk. N Token Overlap: value determines many tokens form prior chunk included current chunk. overlap can useful provide correct context every chunk. Layers Embeddings - Min: Base models transform raw data sequence numbers using different layers’ hidden states. option can decide first layer use. Layers Embeddings - Max: option can decide last layer use. hidden states layers min max averaged form embedding text chunk. Pooling Type: option can decide hidden states cls-token used embedding. set thisn option “average” hidden states tokens averaged within layer except hidden states padding tokens. maximum number tokens model can process provide downstream tasks can calculated \\[Max Tokens = NChunks*MaximalSequenceLength-(NChunks-1)*NOverlap\\] text longer, remaining tokens ignored lost analysis. Please note can create multiple text embedding models different configuration based base model. last step provide name folder save model (box Creation).","code":""},{"path":"/articles/gui_aife_studio.html","id":"using-a-text-embedding-model","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.3 Text Embedding Models","what":"3.3.2 Using a Text Embedding Model","title":"02a Using Aifeducation Studio","text":"Using text embedding model central aspect applying artificial intelligence aifeducation. corresponding page can found clicking “Use” tab “Language Modeling”. start choose model like use. Please select folder contains entire model instead selecting single files. selecting loading model, new box appears shows different aspects model can use . tab Model Description (Figure 13) provides documentation model. Figure 13: Text Embedding Model - Description (click image enlarge) tab Training shows development loss validation loss last training corresponding base model. plot displayed history data available. tab Create Text Embeddings (Figure 14) allows transform raw texts numerical representation texts, called text embeddings. text embeddings can used downstream tasks classifying texts. order transform raw texts embedded texts, first select collection raw texts. recommend create collection according section 3.1.1. Next provide folder embeddings stored name . Batch Size can determine many raw texts processed simultaneously. Please adjust value machine’s capacities. clicking button “Start Embed” transformation texts begins. Figure 14: Text Embedding Model - Embeddings (click image enlarge) tab Encode/Decode/Tokenize (Figure 15) offers insights way text embedding model processes data. box Encode can insert raw text clicking Encode can see text divided tokens corresponding IDs. IDs passed base model used generate numeric representation text. box Decode allows reverse process. can insert sequence numbers (separated comma spaces) clicking Decode, corresponding tokens raw text appear. Figure 15: Text Embedding Model - Encode/Decode/Tokenize (click image enlarge) Finally, tab Fill Mask (Figure 16) allows request underlying base model text embedding model calculate solution fill---blank text. box Text can insert raw text. gap signaled insert corresponding masking token. token can found table row “mask_token”. insert gap/mask_token please ensure correct spelling. “N Solutions per Mask” can determine many tokens model calculate every gap/mask_token. clicking”Calculate Tokens”, find image right side box, showing reasonable token selected gap. tokens ordered certainty; perspective model, reasonable tokens top less reasonable tokens bottom. Figure 16: Text Embedding Model - Fill Mask (click image enlarge)","code":""},{"path":"/articles/gui_aife_studio.html","id":"documenting-a-text-embedding-model","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.3 Text Embedding Models","what":"3.3.3 Documenting a Text Embedding Model","title":"02a Using Aifeducation Studio","text":"Creating “good” AI models requires lot effort. Thus, sharing work users important support progress discipline. Thus, meaningful documentation required. addition, well written documentation makes AI model transparent, allowing others understand AI model generated solution. also important order judge limitations model. support developers documenting work, Aifeducation Studio provides easy way add comprehensive description. find part app clicking “Document” tab Language Modeling. First, choose text embedding model like document (base model!). choosing model, new box appears, allowing insert necessary information. Via tabs Developers Modifiers, can provide names email addresses relevant contributors. Developers refer people created model, Modifiers refers people adapted pre-trained model another domain task. Figure 17: Text Embedding Model - Documentation (click image enlarge) tabs Abstract Description, can provide abstract detailed description work English /native language text embedding model (e.g., French, German, etc.), allowing reach broader audience (Figure 17). four tabs can provide documentation plain text, html, /markdown allowing insert tables highlight parts documentation. like see documentation look like internet, can click button “Preview”. Saving changes possible clicking Save. information document model, please refer vignette 03 Sharing Using Trained AI/Models.","code":""},{"path":[]},{"path":"/articles/gui_aife_studio.html","id":"create-a-classifier","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.4 Classifiers","what":"3.4.1 Create a Classifier","title":"02a Using Aifeducation Studio","text":"Classifiers built top text embedding model. create classifier, click “Create Train” tab Classification (see Figure 5). Figure 18 shows corresponding page. Figure 18: Classifier - Creation Part 1 (click image enlarge) Creating classifier requires two kinds data. First, text embedding collection texts. embeddings created text embedding model described section 3.3.2. Second, table labels every text. kind data created described section 3.1.2. can provide text embeddings opening corresponding file first box (Input Data). selecting embeddings, see summary underlying text embedding model generated embeddings. addition, can see many documents file. Please note classifier bound text embedding model generated embeddings. , classifier can used access corresponding text embedding model. text embedding model necessary transform raw texts format classifier can understand. second box (Target Data), can select file contains corresponding labels. loading file, can select column table like use target data training. addition, can see short summary absolute frequencies single classes/categories. Please note can create multiple classifiers different target data based text embedding model. Thus, need create new text embedding model new classifier. particular, can use text embeddings training different classifiers. third box (Architecture) create architecture neural network (Figure 19). important provide model’s name label section General. Model Name used internal purposes machine Model Label used title classifiers users. Thus, Model Name contain spaces special characters. Model Label, restrictions. Figure 19: Classifier - Creation Part 2 (click image enlarge) expand different sections can click “+” right side, since detailed explanation every option beyond scope introduction. can provide overview. Positional Embedding: activating option, add positional embedding classifier. provides neural network ability take order within sequence account. Encoding Layers: layers similar encoding layers used transformer models, allowing calculate context-sensitive text embeddings. , provide classifier ability take surrounding text chunks (see section 3.3.1) sequences account. Recurrent Layers: section allows add recurrent layers classifier. layers able account order within sequence. order add layers, just pass numbers input field Reccurent Layers separate comma space. Every number represents layer number determines number neurons. field can see Aifeducation Studio input. helpful avoid invalid specifications layers. Dense Layers: section can add dense layers network. process add layers similar process recurrent layers. Optimizer: can choose different optimizers training. next box (Training Settings) contains setting training classifier (Figure 19). Going detail beyond scope introduction. can provide overview. Section: General Setting Balance Class Weights: option enabled, loss adjusted absolute frequencies classes/categories according ‘Inverse Class Frequency’ method. option activated deal imbalanced data. Number Folds: number folds used estimating performance classifier. Proportion Validation Sample: percentage cases within fold used validation sample. sample used determine state model generalizes best. Epochs: Maximal number epochs. training, model best balanced accuracy saved used. Batch Size: number cases processed simultaneously. Please adjust value machine’s capacities. Please note batch size can impact classifier’s performance. Section: Baseline Model Calculate Baseline Model: active, performance baseline model estimated. include Balanced Pseudo Labeling Balanced Synthetic Cases. Section: Balanced Synthetic Cases Number Cores: number cores can used generating synthetic cases. higher number can speed training process. Method: method used generating cases. Max k: maximum number neighbors used generating synthetic cases. algorithm create cases draw random sample . Proportion Validation Sample: percentage synthetic cases added validation sample instead training sample. Add Synthetic Cases: enabled, synthetic cases added. disabled, given number cases added sample ensure balanced frequency classes/categories. Section: Balanced Pseudo-Labeling Add Pseudo Labeling: activated, pseudo-labeling used training. way pseudo-labeling applied can configured following parameters: Max Steps: number steps pseudo-labeling. example, first step, 1/Max Steps pseudo-labeled cases added, second step, 2/Max Steps pseudo-labeled cases added, etc.. cases added can influenced Balance Pseudo-Labels, Certainty Anchor, Max Certainty Value, Min Certainty Value. Balance Pseudo-Labels: option active, number pseudo-labeled cases added every class/category. general, number determined class smallest absolute frequency. Certainty Anchor: value determines reference point choosing pseudo-labeled cases. 1 refers perfect certainty, 0 refers certainty similar random guessing. Selected cases closest value. Max Certainty Value: Pseudo-labeled cases exceeding value included training. Min Certainty Value: Pseudo-labeled cases falling bellow value included training. Reset Model Every Step: enabled, classier set untrained state. can prevent -fitting. Dynamic Weight Increase: enabled, sample weights pseudo-labeled cases increase every step. weights determined Start Weights Weight Increase per Step. Start Weights: Initial value sample weights included pseudo-labeled cases. Weight Increase per Step: Value determining much sample weights increased included pseudo-labeled cases every step. recommend use pseudo-labeling described Cascante-Bonilla et al. (2020). Therefore, following parameters set: bpl_max_steps = 5 (splits unlabeled data five chunks) bpl_dynamic_inc = TRUE (ensures number used chunks increases every step) bpl_model_reset = TRUE (re-initializes model every step) bpl_epochs_per_step=30 (number training epochs within step) bpl_balance=FALSE (ensures cases highest certainty added training regardless absolute frequencies classes) bpl_weight_inc=0.00 bpl_weight_start=1.00 (ensures labeled unlabeled data weight training) bpl_max=1.00, bpl_anchor=1.00, bpl_min=0.00 (ensures unlabeled data considered training cases highest certainty used training.) Figure 20: Classifier - Creation Part 3 (click image enlarge) last box (Figure 20), provide directory like save model. name folder created within directory can set Folder Name. start training, can check many cases can matched text embeddings target data clicking button Test Data Matching (box Model Saving). allows check structure data working. everything okay can start training model clicking Start Training. Please note training classifier can take several hours.","code":""},{"path":"/articles/gui_aife_studio.html","id":"using-a-classifier","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.4 Classifiers","what":"3.4.2 Using a Classifier","title":"02a Using Aifeducation Studio","text":"case trained classifier using classifier trained users, can analyze model’s performance use model classify new texts. , select “Use” tab Classification. Similar functions app, first select classifier providing folder contains entire model. Please note classifier made several files. Thus, Aifeducation Studio asks select folder containing files single files. loading classifier, new box appears. first tab, Model Description, (Figure 21) find documentation model. Figure 21: Classifier - Description (click image enlarge) second tab, Training (Figure 22) receive summary training process model. includes visualization loss, accuracy, balanced accuracy every fold every epoch. Depending applied training techniques (Balanced Pseudo-Labeling), can request additional images. Figure 22: Classifier - Training (click image enlarge) third tab, Reliability, (Figure 23) provides information quality model. find visualizations giving insights classifier able generate reliable results. addition, measures content analysis well machine learning allow analyze specific aspects model’s performance. Figure 23: Classifier - Reliability (click image enlarge) last tab, Prediction (Figure 24) allows apply trained model new data. can use trained model assign classes/categories new texts. purpose, must first provide file contains text embeddings documents like classify. can create embeddings text embedding model used providing training data classifier. necessary steps described section 3.3.2. Figure 24: Classifier - Prediction (click image enlarge) embeddings must created text embedding model created text embeddings training. , error occur. See section 3.4.1 3.3.2 details. next step provide folder like save predictions provide file name. default case store predictions .rda file, allowing load data directly R analysis. However, can additionally save results .csv file, allowing export predictions programs. resulting data table may look like shown Figure 25. Figure 25: Classifier - Prediction Results (click image enlarge)","code":""},{"path":"/articles/gui_aife_studio.html","id":"documenting-a-classifier","dir":"Articles","previous_headings":"3 Using Aifeducation Studio > 3.4 Classifiers","what":"3.4.3 Documenting a Classifier","title":"02a Using Aifeducation Studio","text":"Documenting classifier similar documentation text embedding model (section 3.3.3, see Figure 18). support developers documenting work, Aifeducation Studio provides easy way add comprehensive description. find part app clicking “Document” tab Classification. First, choose classifier like document. choosing model, new box appears, allowing insert necessary information. Via tab Developers, can provide names email addressees relevant contributors. tabs Abstract Description, can provide abstract detailed description work English /native language classifier (e.g., French, German, etc.), allowing reach broader audience. four tabs can provide documentation plain text, html, /markdown allowing insert tables highlight parts documentation. like see documentation look like internet, can click button “Preview”. Saving changes possible clicking Save information document model, please refer vignette 03 Sharing Using Trained AI/Models.","code":""},{"path":"/articles/gui_aife_studio.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"02a Using Aifeducation Studio","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. https://doi.org/10.48550/arXiv.2004.05150 Berding, F., Riebenbauer, E., Stütz, S., Jahncke, H., Slopinski, ., & Rebmann, K. (2022). Performance Configuration Artificial Intelligence Educational Settings.: Introducing New Reliability Concept Based Content Analysis. Frontiers Education, 1–21. https://doi.org/10.3389/feduc.2022.818365 Campesato, O. (2021). Natural Language Processing Fundamentals Developers. Mercury Learning & Information. https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=6647713 Cascante-Bonilla, P., Tan, F., Qi, Y. & Ordonez, V. (2020). Curriculum Labeling: Revisiting Pseudo-Labeling Semi-Supervised Learning. https://doi.org/10.48550/arXiv.2001.06001 Chollet, F., Kalinowski, T., & Allaire, J. J. (2022). Deep learning R (Second edition). Manning Publications Co. https://learning.oreilly.com/library/view/-/9781633439849/?ar Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. https://doi.org/10.48550/arXiv.2006.03236 Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171–4186). Association Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 , P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. https://doi.org/10.48550/arXiv.2006.03654 Lane, H., Howard, C., & Hapke, H. M. (2019). Natural language processing action: Understanding, analyzing, generating text Python. Shelter Island: Manning. Larusson, J. ., & White, B. (Eds.). (2014). Learning Analytics: Research Practice. New York: Springer. https://doi.org/10.1007/978-1-4614-3305-7 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 Schreier, M. (2012). Qualitative Content Analysis Practice. Los Angeles: SAGE.","code":""},{"path":"/articles/sharing_and_publishing.html","id":"introduction","dir":"Articles","previous_headings":"","what":"1 Introduction","title":"03 Sharing and Using Trained AI/Models","text":"educational social sciences, common practice share research instruments questionnaires tests. example, Open Test Archive provides researchers practitioners access large number open access instruments. aifeducation assumes AI-based classifiers shareable, similarly research instruments, empower educational social science researchers support application AI educational purposes. Thus, aifeducation aims make sharing process easy possible. aim, every object generated aifeducation can prepared publication basic steps. vignette, like show make AI ready publication use models persons. Now start guide preparing text embedding models.","code":""},{"path":[]},{"path":"/articles/sharing_and_publishing.html","id":"adding-model-descriptions","dir":"Articles","previous_headings":"2 Text Embedding Models","what":"2.1 Adding Model Descriptions","title":"03 Sharing and Using Trained AI/Models","text":"Every object class TextEmbeddingModel comes several methods allowing provide important information potential users model. First, every model needs clear description developed, modified can used. can add description via method set_model_description. method allows provide description English native language model make distribution model easier. can write description HTML allows add links sources publications, add tables highlight important aspects model. like recommend write least English description allow wider community recognize work. Furthermore, description include: kind data used create model. much data used create model. steps performed method used. kinds tasks materials model can used. abstract_eng abstract_native can provide summary description. important like share work repository. keywords_eng keywords_native can set vector keywords help find work search engines. like recommend least provide information English. can access model’s description using method get_model_description Besides description work necessary provide information people involved creating model. can done method set_publication_info. First decide type information like add. two choices: “developer”, “modifier”, set type. type=\"developer\" stores information people involved process developing model. use transformer model Hugging Face, people description model entered developers. cases can use type providing description developed model. cases might wish modify existing model. might case use transformer model adapt model specific domain task. case rely work people modify work. cases can describe modifications setting type=modifier. every type person can add relevant individuals via authors. Please use R’s function personList() . citationyou can provide free text cite work different persons. url can provide link relevant sites model. can access information using get_publication_info. Finally, must provide license using model. can done set_software_license get_software_license. Please note, cases license model must “GPL-3” since software used create model licensed “GPL-3”. Thus, derivative work must also licensed “GPL-3”. documentation work part software. can set licenses Creative Common (CC) Free Documentation License (FDL). can set license documentation using method ‘set_documentation_license’. Now able share work. Please remember save now fully described object described following section 2.2.","code":"example_model$set_model_description( eng=NULL, native=NULL, abstract_eng=NULL, abstract_native=NULL, keywords_eng=NULL, keywords_native=NULL) example_model$get_model_description() example_model$set_publication_info( type, authors, citation, url=NULL) example_model$get_publication_info() example_model$set_software_license(\"GPL-3\") example_model$set_documentation_license(\"CC BY-SA\")"},{"path":"/articles/sharing_and_publishing.html","id":"saving-and-loading","dir":"Articles","previous_headings":"2 Text Embedding Models","what":"2.2 Saving and Loading","title":"03 Sharing and Using Trained AI/Models","text":"Saving created text embedding model easy using function save_ai_model. function provides unique interface text embedding models. saving work can pass model model directory save model model_dir. Please pass path directory path file function. Internally function creates new folder directory files belonging model stored. can see three text embedding models saved within directory named “text_embedding_models”. Within directory function creates unique folder every model. name folder specified dir_name. set dir_name=NULL append_ID=FALSE name folder created using models’ names. change argument append_ID append_ID=TRUE set dir_name=NULL unique ID model added directory. ID added automatically ensure every model unique name. important like share work persons. want load model, just call function load_ai_model can continue using model. following code assumes specified dir_name manually. case set dir_name=NULL append_ID=TRUE loading models may look shown following code snippet: Please note add name model directory path. example stored three models directory “text_embedding_models”. model saved within folder. folder’s name created automatically help name model. Thus, loading model must specify model want load adding model’s name directory path shown . point may wonder ID model’s name although enter ID model’s creation. ID added automatically ensure every model unique name. important like share work persons. saving ID appended automatically setting append_ID=TRUE. Now ready share work. Just provide files within model folder. BERT model example folder \"text_embedding_models/model_transformer_bert\" \"text_embedding_models/bert_embedding_ID_CmyAQKtts5RdlLaS\" depending saved model.","code":"save_ai_model( model=topic_modeling, model_dir=\"text_embedding_models\", append_ID=FALSE) save_ai_model( model=global_vector_clusters_modeling, model_dir=\"text_embedding_models\", append_ID=FALSE) save_ai_model( model=bert_modeling, model_dir=\"text_embedding_models\", append_ID=FALSE) topic_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_topic_modeling\", ml_framework=aifeducation_config$get_framework()) global_vector_clusters_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_global_vectors\", ml_framework=aifeducation_config$get_framework()) bert_modeling<-load_ai_model( model_dir=\"text_embedding_models/model_transformer_bert\", ml_framework=aifeducation_config$get_framework()) topic_modeling<-load_ai_model( model_dir=\"text_embedding_models/topic_model_embedding_ID_DfO25E1Guuaqw7tM\") global_vector_clusters_modeling<-load_ai_model( model_dir=\"text_embedding_models/global_vector_clusters_embedding_ID_5Tu8HFHegIuoW14l\") bert_modeling<-load_ai_model( model_dir=\"text_embedding_models/bert_embedding_ID_CmyAQKtts5RdlLaS\")"},{"path":[]},{"path":"/articles/sharing_and_publishing.html","id":"adding-model-descriptions-1","dir":"Articles","previous_headings":"3 Classifiers","what":"3.1 Adding Model Descriptions","title":"03 Sharing and Using Trained AI/Models","text":"Adding model description classifier similar TextEmbeddingModels. methods set_model_description get_model_description can provide detailed description (parameter eng native) classifier English native language classifier. abstract_eng abstract_native can provide corresponding abstract descriptions keywords_eng keywords_native take vector corresponding keywords. case classifier, description include: short reference theoretical models guided development. clear detailed description every single category/class. short statement classifier can used. description kind quantity data used training. Information potential bias data. possible, information inter-coder-reliability coding process providing data. possible, provide link corresponding text embedding model. , can provide description HTML include tables (e.g. reporting reliability initial coding process) links sources publications. Please report performance values classifier description. can accessed directly via example_classifier$reliability$test_metric_mean. methods set_publication_info get_publication_info can provide bibliographic information classifier. contrast TextEmbeddingModels different types author groups. Finally, can manage license using classifier via set_software_license get_software_license. Similar TextEmbeddingModels classifier licensed via “GPL-3” since software used creating classifier applies license. documentation can choose license since documentation part software. setting receivinf license can call methods ‘set_documentation_license’ ‘get_documentation_license’ Now ready sharing classifier. Please remember save changes described following section 3.2.","code":"example_classifier$set_model_description( eng=\"This classifier targets the realization of the need for competence from the self-determination theory of motivation by Deci and Ryan in lesson plans and materials. It describes a learner’s need to perceive themselves as capable. In this classifier, the need for competence can take on the values 0 to 2. A value of 0 indicates that the learners have no space in the lesson plan to perceive their own learning progress and that there is no possibility for self-comparison. At level 1, competence growth is made visible implicitly, e.g. by demonstrating the ability to carry out complex exercises or peer control. At level 2, the increase in competence is made explicit by giving each learner insights into their progress towards the competence goal. For example, a comparison between the target vs. actual development towards the learning objectives of the lesson can be made, or the learners receive explicit feedback on their competence growth from the teacher. Self-assessment is also possible. The classifier was trained using 790 lesson plans, 298 materials and up to 1,400 textbook tasks. Two people who received coding training were involved in the coding and the inter-coder reliability for the need for competence increased from a dynamic iota value of 0.615 to 0.646 over two rounds of training. The Krippendorffs alpha value, on the other hand, decreased from 0.516 to 0.484. The classifier is suitable for use in all settings in which lesson plans and materials are to be reviewed with regard to their implementation of the need for competence.\", native=\"Dieser Classifier bewertet Unterrichtsentwürfe und Lernmaterial danach, ob sie das Bedürfnis nach Kompetenzerleben aus der Selbstbestimmungstheorie der Motivation nach Deci und Ryan unterstützen. Das Kompetenzerleben stellt das Bedürfnis dar, sich als wirksam zu erleben. Der Classifer unterteilt es in drei Stufen, wobei 0 bedeutet, dass die Lernenden im Unterrichtsentwurf bzw. Material keinen Raum haben, ihren eigenen Lernfortschritt wahrzunehmen und auch keine Möglichkeit zum Selbstvergleich besteht. Bei einer Ausprägung von 1 wird der Kompetenzzuwachs implizit, also z.B. durch die Durchführung komplexer Übungen oder einer Peer-Kontrolle ermöglicht. Auf Stufe 2 wird der Kompetenzzuwachs explizit aufgezeigt, indem jede:r Lernende einen objektiven Einblick erhält. So kann hier bspw. ein Soll-Ist-Vergleich mit den Lernzielen der Stunde erfolgen oder die Lernenden erhalten dezidiertes Feedback zu ihrem Kompetenzzuwachs durch die Lehrkraft. Auch eine Selbstbewertung ist möglich. Der Classifier wurde anhand von 790 Unterrichtsentwürfen, 298 Materialien und bis zu 1400 Schulbuchaufgaben traniert. Es waren an der Kodierung zwei Personen beteiligt, die eine Kodierschulung erhalten haben und die Inter-Coder-Reliabilität für das Kompetenzerleben würde über zwei Trainingsrunden von einem dynamischen Iota-Wert von 0,615 auf 0,646 gesteigert. Der Krippendorffs Alpha-Wert sank hingegen von 0,516 auf 0,484. Er eignet sich zum Einsatz in allen Settings, in denen Unterrichtsentwürfe und Lernmaterial hinsichtlich ihrer Umsetzung des Kompetenzerlebens überprüft werden sollen.\", abstract_eng=\"This classifier targets the realization of the need for competence from Deci and Ryan’s self-determination theory of motivation in l esson plans and materials. It describes a learner’s need to perceive themselves as capable. The variable need for competence is assessed by a scale of 0-2. The classifier was developed using 790 lesson plans, 298 materials and up to 1,400 textbook tasks. A coding training was conducted and the inter-coder reliabilities of different measures (i.e. Krippendorff’s Alpha and Dynamic Iota Index) of the individual categories were calculated at different points in time.\", abstract_native=\"Dieser Classifier bewertet Unterrichtsentwürfe und Lernmaterial danach, ob sie das Bedürfnis nach Kompetenzerleben aus der Selbstbestimmungstheorie der Motivation nach Deci & Ryan unterstützen. Das Kompetenzerleben stellt das Bedürfnis dar, sich als wirksam zu erleben. Der Classifer unterteilt es in drei Stufen und wurde anhand von 790 Unterrichtsentwürfen, 298 Materialien und bis zu 1400 Schulbuchaufgaben entwickelt. Es wurden stets Kodierschulungen durchgeführt und die Inter-Coder-Reliabilitäten der einzelnen Kategorien zu verschiedenen Zeitpunkten berechnet.\", keywords_eng=c(\"Self-determination theory\", \"motivation\", \"lesson planning\", \"business didactics\"), keywords_native=c(\"Selbstbestimmungstheorie\", \"Motivation\", \"Unterrichtsplanung\", \"Wirtschaftsdidaktik\") example_classifier$set_publication_info( authors, citation, url=NULL) example_classifier$set_software_license(\"GPL-3\") example_classifier$set_documentation_license(\"CC BY-SA\")"},{"path":"/articles/sharing_and_publishing.html","id":"saving-and-loading-1","dir":"Articles","previous_headings":"3 Classifiers","what":"3.2 Saving and Loading","title":"03 Sharing and Using Trained AI/Models","text":"created classifier, saving loading easy due functions save_ai_model load_ai_model. process saving model similar process text embedding models. pass model directory path function save_ai_model. folder name set dir_name. contrast text embedding models can specify additional argument save_format. case pytorch models arguments allows choose save_format = \"safetensors\" save_format = \"pt\". recommend chose save_format = \"safetensors\" since safer method save models. case tensorflow models arguments allows choose save_format = \"keras\", save_format = \"tf\" save_format = \"h5\". recommend chose save_format = \"keras\" since recommended format keras. set save_format = \"default\" .safetensors used pytorch models .keras used tensorflow models. like load model can call function load_ai_model. Note: Classifiers depend framework used creation. Thus, classifier always initalized original framework. argument ml_framework effect. case like share classifier broader audience recommend set dir_name=NULL append_ID = TRUE. create folder name automatically using classifier’s name unique ID. Similar text embedding models ID added name creation classifier ensuring unique name model. options folder name may look like \"movie_review_classifier_ID_oWsaNEB7b09A1pPB\". like share classifier persons provide files within folder \"classifiers/movie_review_classifier_ID_oWsaNEB7b09A1pPB\". Since files stored specific structure change edit files manually. Please note need TextEmbeddingModel used training order predict new data classifier. can request name, label, configuration model example_classifier$get_text_embedding_model()$model. Thus, like share classifier, ensure also share corresponding text embedding model. like apply classifier new data, two steps necessary. First, must transform raw text numerical expression using exactly text embedding model used train classifier. resulting object can passed method predict receive predictions together estimate certainty class/category. information can found vignette 02a Using Aifeducation Studio 02 classification tasks.","code":"save_ai_model( model=classifier, model_dir=\"classifiers\", dir_name=\"movie_review_classifier\", save_format = \"default\", append_ID = FALSE) classifier<-load_ai_model( model_dir=\"classifiers/movie_review_classifier\")"},{"path":"/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Berding Florian. Author, maintainer. Pargmann Julia. Contributor. Riebenbauer Elisabeth. Contributor. Rebmann Karin. Contributor. Slopinski Andreas. Contributor.","code":""},{"path":"/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Florian Berding, Julia Pargmann, Elisabeth Riebenbauer, Karin Rebmann, Andreas Slopinski (2023). AI Education (aifeducation). R package educators researchers educational social sciences. URL=https://fberding.github.io/aifeducation/index.html","code":"@Manual{, title = {AI for Education (aifeducation). A R package for educators and reserachers of the educational and social sciences.}, author = {Florian Berding and Julia Pargmann and Elisabeth Riebenbauer and Karin Rebmann and Andreas Slopinski}, year = {2023}, url = {https://fberding.github.io/aifeducation/index.html}, }"},{"path":"/index.html","id":"aifeducation","dir":"","previous_headings":"","what":"Artificial Intelligence for Education","title":"Artificial Intelligence for Education","text":"R package Artificial Intelligence Education (aifeducation) designed special requirements educators, educational researchers, social researchers. target audience package educators researchers coding skills like develop models, well people like use models created researchers/educators. package supports application Artificial Intelligence (AI) Natural Language Processing tasks text embedding classification special conditions educational social sciences.","code":""},{"path":"/index.html","id":"features-overview","dir":"","previous_headings":"","what":"Features Overview","title":"Artificial Intelligence for Education","text":"Simple usage artificial intelligence providing routines important tasks educators researchers social educational science. Provides graphical user interface (Aifeducation Studio), allowing people work AI without coding skills. Supports ‘PyTorch’ ‘Tensorflow’ machine learning frameworks. Implements advantages python library ‘datasets’ increasing computational speed allowing use large datasets. Uses safetensors saving models ‘PyTorch’. Supports usage trained models frameworks, providing high level flexibility. Supports pre-trained language models Hugging Face. Supports BERT, RoBERTa, DeBERTa, Longformer, Funnel Transformer creating context-sensitive text embedding. Makes sharing pre-trained models easy. Integrates sustainability tracking. Integrates special statistical techniques dealing data structures common social educational sciences. Supports classification long text documents. Currently, package focuses classification tasks can either used diagnose characteristics learners written material estimate properties learning teaching material. future, tasks implemented.","code":""},{"path":"/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Artificial Intelligence for Education","text":"can install latest stable version package CRAN : can install development version aifeducation GitHub : minimal version includes functions limited use transformers. full version additionally includes Aifeducation Studio (graphical user interface) older approaches (GlobalVectors, Topic Modeling). instructions installation can found vignette 01 Get Started. Please note update version aifeducation may require update python libraries. Refer 01 Get Started details.","code":"#Minimal version install.packages(\"aifeducation\") #Full version install.packages(\"aifeducation\",dependencies=TRUE) #Minimal version install.packages(\"devtools\") devtools::install_github(repo=\"FBerding/aifeducation\", ref=\"master\", dependencies = \"Imports\") #Maximal version install.packages(\"devtools\") devtools::install_github(repo=\"FBerding/aifeducation\", ref=\"master\", dependencies = TRUE)"},{"path":"/index.html","id":"graphical-user-interface-aifeducation-studio","dir":"","previous_headings":"","what":"Graphical User Interface Aifeducation Studio","title":"Artificial Intelligence for Education","text":"package ships shiny app serves graphical user interface. Figure 1: Aifeducation Studio Aifeducation Studio allows users easily develop, train, apply, document, analyse AI models without coding skills. See corresponding vignette details: 02a Using Aifeducation Studio.","code":""},{"path":"/index.html","id":"sustainability","dir":"","previous_headings":"","what":"Sustainability","title":"Artificial Intelligence for Education","text":"Training AI models consumes time energy. help researchers estimate ecological impact work, sustainability tracker implemented. based python library ‘codecarbon’ Courty et al. (2023). tracker allows estimate energy consumption CPUs, GPUs RAM training derives value CO2 emission. value based energy mix country computer located.","code":""},{"path":"/index.html","id":"pytorch-and-tensorflow-compatibility","dir":"","previous_headings":"","what":"PyTorch and Tensorflow Compatibility","title":"Artificial Intelligence for Education","text":"package allows supported models either based ‘PyTorch’ ‘tensorflow’, thus providing high level flexibility. Even pre-trained models can used frameworks cases. following table provides details: Table: Framework compatibility Please tensorflow currently supported following versions: 2.13-2.15.","code":""},{"path":[]},{"path":"/index.html","id":"transforming-texts-into-numbers","dir":"","previous_headings":"Classification Tasks","what":"Transforming Texts into Numbers","title":"Artificial Intelligence for Education","text":"Classification tasks require transformation raw texts representation numbers. step, aifeducation supports newer approaches BERT (Devlin et al. 2019), RoBERTa (Liu et al. 2019), DeBERTa version 2 (et al. 2020), Funnel-Transformer (Dai et al. 2020), Longformer (Beltagy, Peters & Cohan 2020) older approaches GlobalVectors (Pennington, Socher & Manning 2014) Latent Dirichlet Allocation/Topic Modeling classification tasks. aifeducation supports use pre-trained transformer models provided Hugging Face creation new transformers, allowing educators researchers develop specialized domain-specific models. package supports analysis long texts. Depending method, long texts transformed vectors , long, split several chunks results sequence vectors.","code":""},{"path":"/index.html","id":"training-ai-under-challenging-conditions","dir":"","previous_headings":"Classification Tasks","what":"Training AI under Challenging Conditions","title":"Artificial Intelligence for Education","text":"second step within classification task, aifeducation integrates important statistical mathematical methods dealing main challenges educational social sciences applying AI. : digital data availability: educational social sciences, data often available handwritten form. example, schools universities, students often solve tasks creating handwritten documents. Thus, educators researchers first transform analogue data digital form, involving human action. makes data generation financially expensive time-consuming, leading small data sets. high privacy policy standards: Furthermore, educational social sciences, data often refers humans /actions. kinds data protected privacy policies many countries, limiting access usage data, also results small data sets. long research tradition: Educational social sciences long research tradition generating insights social phenomena well learning teaching. insights incorporated applications AI (e.g., Luan et al. 2020; Wong et al. 2019). makes supervised machine learning important technology since provides link educational social theories models one hand machine learning hand (Berding et al. 2022). However, kind machine learning requires humans generate valid data set training process, leading small data sets. complex constructs: Compared classification tasks , instance, AI differentiate ‘good’ ‘bad’ movie review, constructs educational social sciences complex. example, research instruments motivational psychology require infer personal motifs written essays (e.g., Gruber & Kreuzpointner 2013). reliable valid interpretation kind information requires well qualified human raters, making data generation expensive. also limits size data set. imbalanced data: Finally, data educational social sciences often occurs imbalanced pattern several empirical studies show (Bloemen 2011; Stütz et al. 2022). Imbalanced means categories characteristics data set high absolute frequencies compared categories characteristics. Imbalance AI training guides algorithms focus prioritize categories characteristics high absolute frequencies, increasing risk miss categories/characteristics low frequencies (Haixiang et al. 2017). can lead AI prefer special groups people/material, imply false recommendations conclusions, miss rare categories characteristics. order deal problem imbalanced data sets, package integrates Synthetic Minority Oversampling Technique learning process. Currently, Basic Synthetic Minority Oversampling Technique (Chawla et al. 2002), Density-Based Synthetic Minority Oversampling Technique (Bunkhumpornpat, Sinapiromsaran & Lursinsap 2012), Adaptive Synthetic Sampling Approach Imbalanced Learning (Hem Garcia & Li 2008) implemented via R package smotefamiliy. order address problem small data sets, training loops AI integrate pseudo-labeling (e.g., Lee 2013). Pseudo-labeling technique can used supervised learning. specifically, educators researchers rate part data set train AI part. remainder data processed humans. Instead, AI uses part data learn . Thus, educators researchers provide additional data AI’s learning process without coding . offers possibility add data training process reduce labor costs.","code":""},{"path":"/index.html","id":"evaluating-performance","dir":"","previous_headings":"Classification Tasks","what":"Evaluating Performance","title":"Artificial Intelligence for Education","text":"Classification tasks machine learning comparable empirical method content analysis social sciences. method looks back long research tradition ongoing discussion evaluate reliability validity generated data. order provide link research tradition provide educators well educational social researchers performance measures familiar , every AI trained package evaluated following measures concepts: Iota Concept Second Generation (Berding & Pargmann 2022) Krippendorff’s Alpha (Krippendorff 2019) Percentage Agreement Gwet’s AC1/AC2 (Gwet 2014) Kendall’s coefficient concordance W Cohen’s Kappa unweighted Cohen’s Kappa equal weights Cohen’s Kappa squared weights Fleiss’ Kappa multiple raters without exact estimation addition traditional measures machine learning literature also available: Precision Recall F1-Score","code":""},{"path":"/index.html","id":"sharing-trained-ai","dir":"","previous_headings":"","what":"Sharing Trained AI","title":"Artificial Intelligence for Education","text":"Since package based torch, tensorflow, transformer libraries, every trained AI can shared educators researchers. package supports easy use pre-trained AI within R, also provides possibility export trained AI environments. Using pre-trained AI classification requires classifier corresponding text embedding model. Use Aifeducation Studio just load R start predictions. Vignette 02a Using Aifeducation Studio describes use user interface. Vignette 02b Classification Tasks describes save load objects R syntax. vignette 03 Sharing Using Trained AI/Models can find detailed guide document share models.","code":""},{"path":"/index.html","id":"tutorial-and-guides","dir":"","previous_headings":"","what":"Tutorial and Guides","title":"Artificial Intelligence for Education","text":"Installation configuration package: 01 Get Started. Introduction graphical user interface Aifeducation Studio: 02a Using Aifeducation Studio. short introduction package examples classification tasks: 02b Classification Tasks. description sharing models: 03 Sharing Using Trained AI/Models","code":""},{"path":"/index.html","id":"references","dir":"","previous_headings":"","what":"References","title":"Artificial Intelligence for Education","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. https://doi.org/10.48550/arXiv.2004.05150 Berding, F., & Pargmann, J. (2022). Iota Reliability Concept Second Generation. Berlin: Logos. https://doi.org/10.30819/5581 Berding, F., Riebenbauer, E., Stütz, S., Jahncke, H., Slopinski, ., & Rebmann, K. (2022). Performance Configuration Artificial Intelligence Educational Settings.: Introducing New Reliability Concept Based Content Analysis. Frontiers Education, 1-21. https://doi.org/10.3389/feduc.2022.818365 Bloemen, . (2011). Lernaufgaben Schulbüchern der Wirtschaftslehre: Analyse, Konstruktion und Evaluation von Lernaufgaben für die Lernfelder industrieller Geschäftsprozesse. Hampp. Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2012). DBSMOTE: Density-Based Synthetic Minority -sampling Technique. Applied Intelligence, 36(3), 664–684. https://doi.org/10.1007/s10489-011-0287-y Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority -sampling Technique. Journal Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953 Courty, B., Schmidt, V., Goyal-Kamal, Coutarel, M., Feld, B., Lecourt, J., & … (2023). mlco2/codecarbon: v2.2.7. https://doi.org/10.5281/zenodo.8181237 Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. https://doi.org/10.48550/arXiv.2006.03236 Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171–4186). Association Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Gruber, N., & Kreuzpointner, L. (2013). Measuring reliability picture story exercises like TAT. PloS One, 8(11), e79450. https://doi.org/10.1371/journal.pone.0079450 Gwet, K. L. (2014). Handbook inter-rater reliability: definitive guide measuring extent agreement among raters (Fourth edition). STATAXIS. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning class-imbalanced data: Review methods applications. Expert Systems Applications, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.035 , H., Bai, Y., Garcia, E. ., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach imbalanced learning. 2008 IEEE International Joint Conference Neural Networks (IEEE World Congress Computational Intelligence) (pp. 1322–1328). IEEE. https://doi.org/10.1109/IJCNN.2008.4633969 , P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. https://doi.org/10.48550/arXiv.2006.03654 Krippendorff, K. (2019). Content Analysis: Introduction Methodology (4th Ed.). SAGE. Lee, D.‑H. (2013). Pseudo-Label: Simple Efficient Semi-Supervised Learning Method Deep Neural Networks. CML 2013 Workshop: Challenges Representation Learning. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 Luan, H., Geczy, P., Lai, H., Gobert, J., Yang, S. J. H., Ogata, H., Baltes, J., Guerra, R., Li, P., & Tsai, C.‑C. (2020). Challenges Future Directions Big Data Artificial Intelligence Education. Frontiers Psychology, 11, 1–11. https://doi.org/10.3389/fpsyg.2020.580820 Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global Vectors Word Representation. Proceedings 2014 Conference Empirical Methods Natural Language Processing. https://aclanthology.org/D14-1162.pdf Stütz, S., Berding, F., Reincke, S., & Scheper, L. (2022). Characteristics learning tasks accounting textbooks: AI assisted analysis. Empirical Research Vocational Education Training, 14(1). https://doi.org/10.1186/s40461-022-00138-2 Wong, J., Baars, M., Koning, B. B. de, van der Zee, T., Davis, D., Khalil, M., Houben, G.‑J., & Paas, F. (2019). Educational Theories Learning Analytics: Data Knowledge. D. Ifenthaler, D.-K. Mah, & J. Y.-K. Yau (Eds.), Utilizing Learning Analytics Support Study Success (pp. 3–25). Springer. https://doi.org/10.1007/978-3-319-64792-0_1","code":""},{"path":"/reference/AifeducationConfiguration.html","id":null,"dir":"Reference","previous_headings":"","what":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"R6 class settting global machine learning framework. R6 class settting global machine learning framework.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"function nothing return. used side effects.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"R6 class setting global machine learning framework 'PyTorch' 'tensorflow'.","code":""},{"path":[]},{"path":[]},{"path":"/reference/AifeducationConfiguration.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"AifeducationConfiguration$get_framework() AifeducationConfiguration$set_global_ml_backend() AifeducationConfiguration$global_framework_set() AifeducationConfiguration$clone()","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"method-get-framework-","dir":"Reference","previous_headings":"","what":"Method get_framework()","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Method requesting used machine learning framework.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"AifeducationConfiguration$get_framework()"},{"path":"/reference/AifeducationConfiguration.html","id":"returns","dir":"Reference","previous_headings":"","what":"Returns","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Returns string containing used machine learning framework TextEmbeddingModels well TextEmbeddingClassifierNeuralNet.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"method-set-global-ml-backend-","dir":"Reference","previous_headings":"","what":"Method set_global_ml_backend()","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Method setting global machine learning framework.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"AifeducationConfiguration$set_global_ml_backend(backend)"},{"path":"/reference/AifeducationConfiguration.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"backend string Framework use training inference. backend=\"tensorflow\" 'tensorflow' backend=\"pytorch\" 'PyTorch'.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"returns-1","dir":"Reference","previous_headings":"","what":"Returns","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"method nothing return. used setting global configuration 'aifeducation'.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"method-global-framework-set-","dir":"Reference","previous_headings":"","what":"Method global_framework_set()","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Method checking global ml framework set.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"AifeducationConfiguration$global_framework_set()"},{"path":"/reference/AifeducationConfiguration.html","id":"returns-2","dir":"Reference","previous_headings":"","what":"Returns","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"Return TRUE global machine learning framework set. Otherwise FALSE.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"objects class cloneable method.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"AifeducationConfiguration$clone(deep = FALSE)"},{"path":"/reference/AifeducationConfiguration.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"deep Whether make deep clone.","code":""},{"path":"/reference/AifeducationConfiguration.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"R6 class for settting the global machine learning framework. — AifeducationConfiguration","text":"","code":"library(aifeducation) #Example for setting the global machine learning framework #aifeducation_config is the object created during loading the package #For using PyTorch aifeducation_config$set_global_ml_backend(\"pytorch\") #> Global Backend set to: pytorch #For using Tensorflow aifeducation_config$set_global_ml_backend(\"pytorch\") #> Global Backend set to: pytorch #Example for requesting the global machine learning framework aifeducation_config$get_framework() #> $global_ml_framework #> [1] \"pytorch\" #> #> $TextEmbeddingFramework #> [1] \"pytorch\" #> #> $ClassifierFramework #> [1] \"pytorch\" #> #Example for checking if the global macheine learning framework is set aifeducation_config$global_framework_set() #> [1] TRUE"},{"path":"/reference/aifeducation_config.html","id":null,"dir":"Reference","previous_headings":"","what":"R6 object of class AifeducationConfiguration — aifeducation_config","title":"R6 object of class AifeducationConfiguration — aifeducation_config","text":"Object managing setting machine learning framework session.","code":""},{"path":"/reference/aifeducation_config.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"R6 object of class AifeducationConfiguration — aifeducation_config","text":"","code":"aifeducation_config"},{"path":"/reference/aifeducation_config.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"R6 object of class AifeducationConfiguration — aifeducation_config","text":"object class aifeducationConfiguration (inherits R6) length 5.","code":""},{"path":[]},{"path":"/reference/array_to_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Array to matrix — array_to_matrix","title":"Array to matrix — array_to_matrix","text":"Function transforming array matrix.","code":""},{"path":"/reference/array_to_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Array to matrix — array_to_matrix","text":"","code":"array_to_matrix(text_embedding)"},{"path":"/reference/array_to_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Array to matrix — array_to_matrix","text":"text_embedding array containing text embedding. array created via object class TextEmbeddingModel.","code":""},{"path":"/reference/array_to_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Array to matrix — array_to_matrix","text":"Returns matrix contains cases rows columns represent features sequences. sequences concatenated.","code":""},{"path":[]},{"path":"/reference/array_to_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Array to matrix — array_to_matrix","text":"","code":"#text embedding is an array of shape (batch,times,features) example_embedding<-c(1:24) example_embedding<-array(example_embedding,dim=c(4,3,2)) example_embedding #> , , 1 #> #> [,1] [,2] [,3] #> [1,] 1 5 9 #> [2,] 2 6 10 #> [3,] 3 7 11 #> [4,] 4 8 12 #> #> , , 2 #> #> [,1] [,2] [,3] #> [1,] 13 17 21 #> [2,] 14 18 22 #> [3,] 15 19 23 #> [4,] 16 20 24 #> #Transform array to a matrix #matrix has shape (batch,times*features) array_to_matrix(example_embedding) #> feat_1 feat_2 feat_3 feat_4 feat_5 feat_6 #> [1,] 1 13 5 17 9 21 #> [2,] 2 14 6 18 10 22 #> [3,] 3 15 7 19 11 23 #> [4,] 4 16 8 20 12 24"},{"path":"/reference/bow_pp_create_basic_text_rep.html","id":null,"dir":"Reference","previous_headings":"","what":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","title":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","text":"function prepares raw texts use TextEmbeddingModel.","code":""},{"path":"/reference/bow_pp_create_basic_text_rep.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","text":"","code":"bow_pp_create_basic_text_rep( data, vocab_draft, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, remove_separators = TRUE, split_hyphens = FALSE, split_tags = FALSE, language_stopwords = \"de\", use_lemmata = FALSE, to_lower = FALSE, min_termfreq = NULL, min_docfreq = NULL, max_docfreq = NULL, window = 5, weights = 1/(1:5), trace = TRUE )"},{"path":"/reference/bow_pp_create_basic_text_rep.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","text":"data vector containing raw texts. vocab_draft Object created bow_pp_create_vocab_draft. remove_punct bool TRUE punctuation removed. remove_symbols bool TRUE symbols removed. remove_numbers bool TRUE numbers removed. remove_url bool TRUE urls removed. remove_separators bool TRUE separators removed. split_hyphens bool TRUE hyphens split several tokens. split_tags bool TRUE tags split. language_stopwords string Abbreviation language stopwords removed. use_lemmata bool TRUE lemmas instead original tokens used. to_lower bool TRUE tokens lemmas used lower cases. min_termfreq int Minimum frequency token part vocabulary. min_docfreq int Minimum appearance token documents part vocabulary. max_docfreq int Maximum appearance token documents part vocabulary. window int size window creating feature-co-occurance matrix. weights vector weights corresponding window. vector length must equal window size. trace bool TRUE information progress printed console.","code":""},{"path":"/reference/bow_pp_create_basic_text_rep.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Prepare texts for text embeddings with a bag of word approach. — bow_pp_create_basic_text_rep","text":"Returns list class basic_text_rep following components. dfm: Document-Feature-Matrix. Rows correspond documents. Columns represent number tokens document. fcm: Feature-Co-Occurance-Matrix. information: list containing information used vocabulary. : n_sentence: Number sentences n_document_segments: Number document segments/raw texts n_token_init: Number initial tokens n_token_final: Number final tokens n_lemmata: Number lemmas configuration: list containing information vocabulary created lower cases vocabulary uses original tokens lemmas. language_model: list containing information applied language model. : model: udpipe language model label: label udpipe language model upos: applied universal part--speech tags language: language vocab: data.frame original vocabulary","code":""},{"path":[]},{"path":"/reference/bow_pp_create_vocab_draft.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"Function creating first draft vocabulary function creates list tokens refer specific universal part--speech tags (UPOS) provides corresponding lemmas.","code":""},{"path":"/reference/bow_pp_create_vocab_draft.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"","code":"bow_pp_create_vocab_draft( path_language_model, data, upos = c(\"NOUN\", \"ADJ\", \"VERB\"), label_language_model = NULL, language = NULL, chunk_size = 100, trace = TRUE )"},{"path":"/reference/bow_pp_create_vocab_draft.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"path_language_model string Path udpipe language model used tagging lemmatization. data vector containing raw texts. upos vector containing universal part--speech tags used build vocabulary. label_language_model string Label udpipe language model used. language string Name language (e.g., English, German) chunk_size int Number raw texts processed . trace bool TRUE information progress printed console.","code":""},{"path":"/reference/bow_pp_create_vocab_draft.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"list following components. vocab: data.frame containing tokens, lemmas, tokens lower case, lemmas lower case. ud_language_model udpipe language model used tagging. label_language_model Label udpipe language model. language Language raw texts. upos Used univerisal part--speech tags. n_sentence int Estimated number sentences raw texts. n_token int Estimated number tokens raw texts. n_document_segments int Estimated number document segments/raw texts.","code":""},{"path":"/reference/bow_pp_create_vocab_draft.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a first draft of a vocabulary\r\nThis function creates a list of tokens which refer to specific\r\nuniversal part-of-speech tags (UPOS) and provides the corresponding lemmas. — bow_pp_create_vocab_draft","text":"list possible tags can found : https://universaldependencies.org/u/pos/index.html. huge number models can found : https://ufal.mff.cuni.cz/udpipe/2/models.","code":""},{"path":[]},{"path":"/reference/calc_standard_classification_measures.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate standard classification measures — calc_standard_classification_measures","title":"Calculate standard classification measures — calc_standard_classification_measures","text":"Function calculating recall, precision, f1.","code":""},{"path":"/reference/calc_standard_classification_measures.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate standard classification measures — calc_standard_classification_measures","text":"","code":"calc_standard_classification_measures(true_values, predicted_values)"},{"path":"/reference/calc_standard_classification_measures.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate standard classification measures — calc_standard_classification_measures","text":"true_values factor containing true labels/categories. predicted_values factor containing predicted labels/categories.","code":""},{"path":"/reference/calc_standard_classification_measures.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate standard classification measures — calc_standard_classification_measures","text":"Returns matrix contains cases categories rows measures (precision, recall, f1) columns.","code":""},{"path":[]},{"path":"/reference/check_aif_py_modules.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if all necessary python modules are available — check_aif_py_modules","title":"Check if all necessary python modules are available — check_aif_py_modules","text":"function checks python modules necessary package aifeducation work available.","code":""},{"path":"/reference/check_aif_py_modules.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if all necessary python modules are available — check_aif_py_modules","text":"","code":"check_aif_py_modules(trace = TRUE, check = \"all\")"},{"path":"/reference/check_aif_py_modules.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if all necessary python modules are available — check_aif_py_modules","text":"trace bool TRUE list modules availability printed console. check string determining machine learning framework check . check=\"pytorch\" 'pytorch', check=\"tensorflow\" 'tensorflow', check=\"\" frameworks.","code":""},{"path":"/reference/check_aif_py_modules.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if all necessary python modules are available — check_aif_py_modules","text":"function prints table relevant packages shows modules available unavailable. relevant modules available, functions returns TRUE. cases returns FALSE","code":""},{"path":[]},{"path":"/reference/check_embedding_models.html","id":null,"dir":"Reference","previous_headings":"","what":"Check of compatible text embedding models — check_embedding_models","title":"Check of compatible text embedding models — check_embedding_models","text":"function checks different objects based text embedding model. necessary ensure classifiers used data generated compatible embedding models.","code":""},{"path":"/reference/check_embedding_models.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check of compatible text embedding models — check_embedding_models","text":"","code":"check_embedding_models(object_list, same_class = FALSE)"},{"path":"/reference/check_embedding_models.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check of compatible text embedding models — check_embedding_models","text":"object_list list object class EmbeddedText TextEmbeddingClassifierNeuralNet. same_class bool TRUE object must class.","code":""},{"path":"/reference/check_embedding_models.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check of compatible text embedding models — check_embedding_models","text":"Returns TRUE objects refer text embedding model. FALSE cases.","code":""},{"path":[]},{"path":"/reference/clean_pytorch_log_transformers.html","id":null,"dir":"Reference","previous_headings":"","what":"Clean pytorch log of transformers — clean_pytorch_log_transformers","title":"Clean pytorch log of transformers — clean_pytorch_log_transformers","text":"Function preparing cleaning log created object class Trainer python library 'transformer's","code":""},{"path":"/reference/clean_pytorch_log_transformers.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Clean pytorch log of transformers — clean_pytorch_log_transformers","text":"","code":"clean_pytorch_log_transformers(log)"},{"path":"/reference/clean_pytorch_log_transformers.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Clean pytorch log of transformers — clean_pytorch_log_transformers","text":"log data.frame containing log.","code":""},{"path":"/reference/clean_pytorch_log_transformers.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Clean pytorch log of transformers — clean_pytorch_log_transformers","text":"Returns data.frame containing epochs, loss, val_loss.","code":""},{"path":[]},{"path":"/reference/combine_embeddings.html","id":null,"dir":"Reference","previous_headings":"","what":"Combine embedded texts — combine_embeddings","title":"Combine embedded texts — combine_embeddings","text":"Function combining embedded texts model","code":""},{"path":"/reference/combine_embeddings.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Combine embedded texts — combine_embeddings","text":"","code":"combine_embeddings(embeddings_list)"},{"path":"/reference/combine_embeddings.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Combine embedded texts — combine_embeddings","text":"embeddings_list list objects class EmbeddedText.","code":""},{"path":"/reference/combine_embeddings.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Combine embedded texts — combine_embeddings","text":"Returns object class EmbeddedText contains unique cases input objects.","code":""},{"path":[]},{"path":"/reference/create_bert_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on BERT — create_bert_model","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"function creates transformer configuration based BERT base architecture vocabulary based WordPiece using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_bert_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"","code":"create_bert_model( ml_framework = aifeducation_config$get_framework(), model_dir, vocab_raw_texts = NULL, vocab_size = 30522, vocab_do_lower_case = FALSE, max_position_embeddings = 512, hidden_size = 768, num_hidden_layer = 12, num_attention_heads = 12, intermediate_size = 3072, hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_bert_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. vocab_do_lower_case bool TRUE words/tokens lower case. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Number neurons layer. parameter determines dimensionality resulting text embedding. num_hidden_layer int Number hidden layers. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout. attention_probs_dropout_prob double Ratio dropout attention probabilities. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_bert_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_bert_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"train model, pass directory model function train_tune_bert_model. models uses WordPiece Tokenizer like BERT can trained whole word masking. Transformer library may show warning can ignored.","code":""},{"path":"/reference/create_bert_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on BERT — create_bert_model","text":"Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171--4186). Association Computational Linguistics. doi:10.18653/v1/N19-1423 Hugging Face documentation https://huggingface.co/docs/transformers/model_doc/bert#transformers.TFBertForMaskedLM","code":""},{"path":[]},{"path":"/reference/create_deberta_v2_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"function creates transformer configuration based DeBERTa-V2 base architecture vocabulary based SentencePiece tokenizer using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_deberta_v2_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"","code":"create_deberta_v2_model( ml_framework = aifeducation_config$get_framework(), model_dir, vocab_raw_texts = NULL, vocab_size = 128100, do_lower_case = FALSE, max_position_embeddings = 512, hidden_size = 1536, num_hidden_layer = 24, num_attention_heads = 24, intermediate_size = 6144, hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_deberta_v2_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. do_lower_case bool TRUE characters transformed lower case. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Number neurons layer. parameter determines dimensionality resulting text embedding. num_hidden_layer int Number hidden layers. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout. attention_probs_dropout_prob double Ratio dropout attention probabilities. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_deberta_v2_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_deberta_v2_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":"train model, pass directory model function train_tune_deberta_v2_model. model WordPiece tokenizer created. standard implementation DeBERTa version 2 HuggingFace uses SentencePiece tokenizer. Thus, please use AutoTokenizer 'transformers' library use model.","code":""},{"path":"/reference/create_deberta_v2_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on DeBERTa-V2 — create_deberta_v2_model","text":", P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. doi:10.48550/arXiv.2006.03654 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/deberta-v2#debertav2","code":""},{"path":[]},{"path":"/reference/create_funnel_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"function creates transformer configuration based Funnel Transformer base architecture vocabulary based WordPiece using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_funnel_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"","code":"create_funnel_model( ml_framework = aifeducation_config$get_framework(), model_dir, vocab_raw_texts = NULL, vocab_size = 30522, vocab_do_lower_case = FALSE, max_position_embeddings = 512, hidden_size = 768, target_hidden_size = 64, block_sizes = c(4, 4, 4), num_attention_heads = 12, intermediate_size = 3072, num_decoder_layers = 2, pooling_type = \"mean\", hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, activation_dropout = 0, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_funnel_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. vocab_do_lower_case bool TRUE words/tokens lower case. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Initial number neurons layer. target_hidden_size int Number neurons final layer. parameter determines dimensionality resulting text embedding. block_sizes vector int determining number sizes block. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. num_decoder_layers int Number decoding layers. pooling_type string \"mean\" pooling mean \"max\" pooling maximum values. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout. attention_probs_dropout_prob double Ratio dropout attention probabilities. activation_dropout float Dropout probability layers feed-forward blocks. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_funnel_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_funnel_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"model uses configuration truncate_seq=TRUE avoid implementation problems tensorflow. train model, pass directory model function train_tune_funnel_model. Model created separete_cls=TRUE,truncate_seq=TRUE, pool_q_only=TRUE. models uses WordPiece Tokenizer like BERT can trained whole word masking. Transformer library may show warning can ignored.","code":""},{"path":"/reference/create_funnel_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on Funnel Transformer — create_funnel_model","text":"Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. doi:10.48550/arXiv.2006.03236 Hugging Face documentation https://huggingface.co/docs/transformers/model_doc/funnel#funnel-transformer","code":""},{"path":[]},{"path":"/reference/create_iota2_mean_object.html","id":null,"dir":"Reference","previous_headings":"","what":"Create an iota2 object — create_iota2_mean_object","title":"Create an iota2 object — create_iota2_mean_object","text":"Function creates object class iotarelr_iota2 can used package iotarelr. function internal use .","code":""},{"path":"/reference/create_iota2_mean_object.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create an iota2 object — create_iota2_mean_object","text":"","code":"create_iota2_mean_object( iota2_list, free_aem = FALSE, call = \"aifeducation::te_classifier_neuralnet\", original_cat_labels )"},{"path":"/reference/create_iota2_mean_object.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create an iota2 object — create_iota2_mean_object","text":"iota2_list list objects class iotarelr_iota2. free_aem bool TRUE iota2 objects estimated without forcing assumption weak superiority. call string characterizing source estimation. , function within object estimated. original_cat_labels vector containing original labels category.","code":""},{"path":"/reference/create_iota2_mean_object.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create an iota2 object — create_iota2_mean_object","text":"Returns object class iotarelr_iota2 mean iota2 object.","code":""},{"path":[]},{"path":"/reference/create_longformer_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on Longformer — create_longformer_model","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"function creates transformer configuration based Longformer base architecture vocabulary based Byte-Pair Encoding (BPE) tokenizer using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_longformer_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"","code":"create_longformer_model( ml_framework = aifeducation_config$get_framework, model_dir, vocab_raw_texts = NULL, vocab_size = 30522, add_prefix_space = FALSE, trim_offsets = TRUE, max_position_embeddings = 512, hidden_size = 768, num_hidden_layer = 12, num_attention_heads = 12, intermediate_size = 3072, hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, attention_window = 512, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_longformer_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. add_prefix_space bool TRUE additional space insert leading words. trim_offsets bool TRUE trims whitespaces produced offsets. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Number neurons layer. parameter determines dimensionality resulting text embedding. num_hidden_layer int Number hidden layers. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout attention_probs_dropout_prob double Ratio dropout attention probabilities. attention_window int Size window around token attention mechanism every layer. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_longformer_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_longformer_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"train model, pass directory model function train_tune_longformer_model.","code":""},{"path":"/reference/create_longformer_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on Longformer — create_longformer_model","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. doi:10.48550/arXiv.2004.05150 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/longformer#transformers.LongformerConfig","code":""},{"path":[]},{"path":"/reference/create_roberta_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for creating a new transformer based on RoBERTa — create_roberta_model","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"function creates transformer configuration based RoBERTa base architecture vocabulary based Byte-Pair Encoding (BPE) tokenizer using python libraries 'transformers' 'tokenizers'.","code":""},{"path":"/reference/create_roberta_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"","code":"create_roberta_model( ml_framework = aifeducation_config$get_framework(), model_dir, vocab_raw_texts = NULL, vocab_size = 30522, add_prefix_space = FALSE, trim_offsets = TRUE, max_position_embeddings = 512, hidden_size = 768, num_hidden_layer = 12, num_attention_heads = 12, intermediate_size = 3072, hidden_act = \"gelu\", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, pytorch_safetensors = TRUE )"},{"path":"/reference/create_roberta_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. model_dir string Path directory model saved. vocab_raw_texts vector containing raw texts creating vocabulary. vocab_size int Size vocabulary. add_prefix_space bool TRUE additional space insert leading words. trim_offsets bool TRUE post processing trims offsets avoid including whitespaces. max_position_embeddings int Number maximal position embeddings. parameter also determines maximum length sequence can processed model. hidden_size int Number neurons layer. parameter determines dimensionality resulting text embedding. num_hidden_layer int Number hidden layers. num_attention_heads int Number attention heads. intermediate_size int Number neurons intermediate layer attention mechanism. hidden_act string name activation function. hidden_dropout_prob double Ratio dropout. attention_probs_dropout_prob double Ratio dropout attention probabilities. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/create_roberta_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"function return object. Instead configuration vocabulary new model saved disk.","code":""},{"path":"/reference/create_roberta_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"train model, pass directory model function train_tune_roberta_model.","code":""},{"path":"/reference/create_roberta_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for creating a new transformer based on RoBERTa — create_roberta_model","text":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. doi:10.48550/arXiv.1907.11692 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/roberta#transformers.RobertaConfig","code":""},{"path":[]},{"path":"/reference/create_synthetic_units.html","id":null,"dir":"Reference","previous_headings":"","what":"Create synthetic units — create_synthetic_units","title":"Create synthetic units — create_synthetic_units","text":"Function creating synthetic cases order balance data training TextEmbeddingClassifierNeuralNet. auxiliary function use get_synthetic_cases allow parallel computations.","code":""},{"path":"/reference/create_synthetic_units.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create synthetic units — create_synthetic_units","text":"","code":"create_synthetic_units(embedding, target, k, max_k, method, cat, cat_freq)"},{"path":"/reference/create_synthetic_units.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create synthetic units — create_synthetic_units","text":"embedding Named data.frame containing text embeddings. cases object taken EmbeddedText$embeddings. target Named factor containing labels/categories corresponding cases. k int number nearest neighbors sampling process. max_k int maximum number nearest neighbors sampling process. method vector containing strings requested methods generating new cases. Currently \"smote\",\"dbsmote\", \"adas\" package smotefamily available. cat string category new cases created. cat_freq Object class \"table\" containing absolute frequencies every category/label.","code":""},{"path":"/reference/create_synthetic_units.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create synthetic units — create_synthetic_units","text":"Returns list contains text embeddings new synthetic cases named data.frame labels named factor.","code":""},{"path":[]},{"path":"/reference/EmbeddedText.html","id":null,"dir":"Reference","previous_headings":"","what":"Embedded text — EmbeddedText","title":"Embedded text — EmbeddedText","text":"Object class R6 stores text embeddings generated object class TextEmbeddingModel via method embed().","code":""},{"path":"/reference/EmbeddedText.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Embedded text — EmbeddedText","text":"Returns object class EmbeddedText. objects used storing managing text embeddings created objects class TextEmbeddingModel. Objects class EmbeddedText serve input classifiers class TextEmbeddingClassifierNeuralNet. main aim class provide structured link embedding models classifiers. Since objects class save information text embedding model created text embedding ensures embedding generated embedding model combined. Furthermore, stored information allows classifiers check embeddings correct text embedding model used training predicting.","code":""},{"path":[]},{"path":"/reference/EmbeddedText.html","id":"public-fields","dir":"Reference","previous_headings":"","what":"Public fields","title":"Embedded text — EmbeddedText","text":"embeddings ('data.frame()') data.frame containing text embeddings chunks. Documents rows. Embedding dimensions columns.","code":""},{"path":[]},{"path":"/reference/EmbeddedText.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"Embedded text — EmbeddedText","text":"EmbeddedText$new() EmbeddedText$get_model_info() EmbeddedText$get_model_label() EmbeddedText$clone()","code":""},{"path":"/reference/EmbeddedText.html","id":"method-new-","dir":"Reference","previous_headings":"","what":"Method new()","title":"Embedded text — EmbeddedText","text":"Creates new object representing text embeddings.","code":""},{"path":"/reference/EmbeddedText.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Embedded text — EmbeddedText","text":"","code":"EmbeddedText$new( model_name = NA, model_label = NA, model_date = NA, model_method = NA, model_version = NA, model_language = NA, param_seq_length = NA, param_chunks = NULL, param_overlap = NULL, param_emb_layer_min = NULL, param_emb_layer_max = NULL, param_emb_pool_type = NULL, param_aggregation = NULL, embeddings )"},{"path":"/reference/EmbeddedText.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Embedded text — EmbeddedText","text":"model_name string Name model generates embedding. model_label string Label model generates embedding. model_date string Date embedding generating model created. model_method string Method underlying embedding model. model_version string Version model generated embedding. model_language string Language model generated embedding. param_seq_length int Maximum number tokens processes generating model chunk. param_chunks int Maximum number chunks supported generating model. param_overlap int Number tokens added beginning sequence next chunk model. param_emb_layer_min int string determining first layer included creation embeddings. param_emb_layer_max int string determining last layer included creation embeddings. param_emb_pool_type string determining method pooling token embeddings within layer. param_aggregation string Aggregation method hidden states. Deprecated. included backward compatibility. embeddings data.frame containing text embeddings.","code":""},{"path":"/reference/EmbeddedText.html","id":"returns","dir":"Reference","previous_headings":"","what":"Returns","title":"Embedded text — EmbeddedText","text":"Returns object class EmbeddedText stores text embeddings produced objects class TextEmbeddingModel. object serves input objects class TextEmbeddingClassifierNeuralNet.","code":""},{"path":"/reference/EmbeddedText.html","id":"method-get-model-info-","dir":"Reference","previous_headings":"","what":"Method get_model_info()","title":"Embedded text — EmbeddedText","text":"Method retrieving information model generated embedding.","code":""},{"path":"/reference/EmbeddedText.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"Embedded text — EmbeddedText","text":"","code":"EmbeddedText$get_model_info()"},{"path":"/reference/EmbeddedText.html","id":"returns-1","dir":"Reference","previous_headings":"","what":"Returns","title":"Embedded text — EmbeddedText","text":"list contains saved information underlying text embedding model.","code":""},{"path":"/reference/EmbeddedText.html","id":"method-get-model-label-","dir":"Reference","previous_headings":"","what":"Method get_model_label()","title":"Embedded text — EmbeddedText","text":"Method retrieving label model generated embedding.","code":""},{"path":"/reference/EmbeddedText.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"Embedded text — EmbeddedText","text":"","code":"EmbeddedText$get_model_label()"},{"path":"/reference/EmbeddedText.html","id":"returns-2","dir":"Reference","previous_headings":"","what":"Returns","title":"Embedded text — EmbeddedText","text":"string Label corresponding text embedding model","code":""},{"path":"/reference/EmbeddedText.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"Embedded text — EmbeddedText","text":"objects class cloneable method.","code":""},{"path":"/reference/EmbeddedText.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"Embedded text — EmbeddedText","text":"","code":"EmbeddedText$clone(deep = FALSE)"},{"path":"/reference/EmbeddedText.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"Embedded text — EmbeddedText","text":"deep Whether make deep clone.","code":""},{"path":"/reference/generate_id.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate ID suffix for objects — generate_id","title":"Generate ID suffix for objects — generate_id","text":"Function generating ID suffix objects class TextEmbeddingModel TextEmbeddingClassifierNeuralNet.","code":""},{"path":"/reference/generate_id.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate ID suffix for objects — generate_id","text":"","code":"generate_id(length = 16)"},{"path":"/reference/generate_id.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate ID suffix for objects — generate_id","text":"length int determining length id suffix.","code":""},{"path":"/reference/generate_id.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate ID suffix for objects — generate_id","text":"Returns string requested length","code":""},{"path":[]},{"path":"/reference/get_coder_metrics.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate reliability measures based on content analysis — get_coder_metrics","title":"Calculate reliability measures based on content analysis — get_coder_metrics","text":"function calculates different reliability measures based empirical research method content analysis.","code":""},{"path":"/reference/get_coder_metrics.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate reliability measures based on content analysis — get_coder_metrics","text":"","code":"get_coder_metrics( true_values = NULL, predicted_values = NULL, return_names_only = FALSE )"},{"path":"/reference/get_coder_metrics.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate reliability measures based on content analysis — get_coder_metrics","text":"true_values factor containing true labels/categories. predicted_values factor containing predicted labels/categories. return_names_only bool TRUE returns names resulting vector. Use FALSE request computation values.","code":""},{"path":"/reference/get_coder_metrics.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate reliability measures based on content analysis — get_coder_metrics","text":"return_names_only=FALSE returns vector following reliability measures: #' iota_index: Iota Index Iota Reliability Concept Version 2. min_iota2: Minimal Iota Iota Reliability Concept Version 2. avg_iota2: Average Iota Iota Reliability Concept Version 2. max_iota2: Maximum Iota Iota Reliability Concept Version 2. min_alpha: Minmal Alpha Reliability Iota Reliability Concept Version 2. avg_alpha: Average Alpha Reliability Iota Reliability Concept Version 2. max_alpha: Maximum Alpha Reliability Iota Reliability Concept Version 2. static_iota_index: Static Iota Index Iota Reliability Concept Version 2. dynamic_iota_index: Dynamic Iota Index Iota Reliability Concept Version 2. kalpha_nominal: Krippendorff's Alpha nominal variables. kalpha_ordinal: Krippendorff's Alpha ordinal variables. kendall: Kendall's coefficient concordance W. kappa2_unweighted: Cohen's Kappa unweighted. kappa2_equal_weighted: Weighted Cohen's Kappa equal weights. kappa2_squared_weighted: Weighted Cohen's Kappa squared weights. kappa_fleiss: Fleiss' Kappa multiple raters without exact estimation. percentage_agreement: Percentage Agreement. balanced_accuracy: Average accuracy within class. gwet_ac: Gwet's AC1/AC2 agreement coefficient. return_names_only=TRUE returns names vector elements.","code":""},{"path":[]},{"path":"/reference/get_folds.html","id":null,"dir":"Reference","previous_headings":"","what":"Create cross-validation samples — get_folds","title":"Create cross-validation samples — get_folds","text":"Function creates cross-validation samples ensures relative frequency every category/label within fold equals relative frequency category/label within initial data.","code":""},{"path":"/reference/get_folds.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create cross-validation samples — get_folds","text":"","code":"get_folds(target, k_folds)"},{"path":"/reference/get_folds.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create cross-validation samples — get_folds","text":"target Named factor containing relevant labels/categories. Missing cases declared NA. k_folds int number folds.","code":""},{"path":"/reference/get_folds.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create cross-validation samples — get_folds","text":"Return list following components: val_sample: vector strings containing names cases validation sample. train_sample: vector strings containing names cases train sample. n_folds: int Number realized folds. unlabeled_cases: vector strings containing names unlabeled cases.","code":""},{"path":"/reference/get_folds.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create cross-validation samples — get_folds","text":"parameter target allows cases missing categories/labels. declared NA. cases ignored creating different folds. names saved within component unlabeled_cases. cases can used Pseudo Labeling. function checks absolute frequencies every category/label. absolute frequency sufficient ensure least four cases every fold, number folds adjusted. cases, warning printed console. least four cases per fold necessary ensure training TextEmbeddingClassifierNeuralNet works well options turned .","code":""},{"path":[]},{"path":"/reference/get_n_chunks.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the number of chunks/sequences for each case — get_n_chunks","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"Function calculating number chunks/sequences every case","code":""},{"path":"/reference/get_n_chunks.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"","code":"get_n_chunks(text_embeddings, features, times)"},{"path":"/reference/get_n_chunks.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"text_embeddings data.frame array containing text embeddings. features int Number features within sequence. times int Number sequences","code":""},{"path":"/reference/get_n_chunks.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"Namedvector integers representing number chunks/sequences every case.","code":""},{"path":[]},{"path":"/reference/get_n_chunks.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the number of chunks/sequences for each case — get_n_chunks","text":"","code":"test_array<-array(data=c(1,1,1,0,0,0,0,0,0, 2,1,1,2,5,6,0,0,0, 1,2,5,6,1,2,0,4,2), dim=c(3,3,3)) test_array #> , , 1 #> #> [,1] [,2] [,3] #> [1,] 1 0 0 #> [2,] 1 0 0 #> [3,] 1 0 0 #> #> , , 2 #> #> [,1] [,2] [,3] #> [1,] 2 2 0 #> [2,] 1 5 0 #> [3,] 1 6 0 #> #> , , 3 #> #> [,1] [,2] [,3] #> [1,] 1 6 0 #> [2,] 2 1 4 #> [3,] 5 2 2 #> #test array has shape (batch,times,features) with #times=3 and features=3 #Slices where all values are zero are padded. get_n_chunks(text_embeddings=test_array,features=3,times=3) #> [1] 2 3 3 #The length of case 1 is 1, case 2 is 3, and case 3 is 2."},{"path":"/reference/get_stratified_train_test_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a stratified random sample — get_stratified_train_test_split","title":"Create a stratified random sample — get_stratified_train_test_split","text":"function creates stratified random sample.difference get_train_test_split function require text embeddings split text embeddings train validation sample.","code":""},{"path":"/reference/get_stratified_train_test_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a stratified random sample — get_stratified_train_test_split","text":"","code":"get_stratified_train_test_split(targets, val_size = 0.25)"},{"path":"/reference/get_stratified_train_test_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a stratified random sample — get_stratified_train_test_split","text":"targets Named vector containing labels/categories case. val_size double Value 0 1 indicating many cases label/category part validation sample.","code":""},{"path":"/reference/get_stratified_train_test_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a stratified random sample — get_stratified_train_test_split","text":"list contains names cases belonging train sample validation sample.","code":""},{"path":[]},{"path":"/reference/get_synthetic_cases.html","id":null,"dir":"Reference","previous_headings":"","what":"Create synthetic cases for balancing training data — get_synthetic_cases","title":"Create synthetic cases for balancing training data — get_synthetic_cases","text":"function creates synthetic cases balancing training object class TextEmbeddingClassifierNeuralNet.","code":""},{"path":"/reference/get_synthetic_cases.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create synthetic cases for balancing training data — get_synthetic_cases","text":"","code":"get_synthetic_cases( embedding, times, features, target, method = c(\"smote\"), max_k = 6 )"},{"path":"/reference/get_synthetic_cases.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create synthetic cases for balancing training data — get_synthetic_cases","text":"embedding Named data.frame containing text embeddings. cases, object taken EmbeddedText$embeddings. times int number sequences/times. features int number features within sequence. target Named factor containing labels corresponding embeddings. method vector containing strings requested methods generating new cases. Currently \"smote\",\"dbsmote\", \"adas\" package smotefamily available. max_k int maximum number nearest neighbors sampling process.","code":""},{"path":"/reference/get_synthetic_cases.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create synthetic cases for balancing training data — get_synthetic_cases","text":"list following components. syntetic_embeddings: Named data.frame containing text embeddings synthetic cases. syntetic_targets Named factor containing labels corresponding synthetic cases. n_syntetic_units table showing number synthetic cases every label/category.","code":""},{"path":[]},{"path":"/reference/get_train_test_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for splitting data into a train and validation sample — get_train_test_split","title":"Function for splitting data into a train and validation sample — get_train_test_split","text":"function creates train validation sample based stratified random sampling. relative frequencies category train validation sample equal relative frequencies initial data (proportional stratified sampling).","code":""},{"path":"/reference/get_train_test_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for splitting data into a train and validation sample — get_train_test_split","text":"","code":"get_train_test_split(embedding = NULL, target, val_size)"},{"path":"/reference/get_train_test_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for splitting data into a train and validation sample — get_train_test_split","text":"embedding Object class EmbeddedText. target Named factor containing labels every case. val_size double Ratio 0 1 indicating relative frequency cases used validation sample.","code":""},{"path":"/reference/get_train_test_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for splitting data into a train and validation sample — get_train_test_split","text":"Returns list following components. target_train: Named factor containing labels training sample. embeddings_train: Object class EmbeddedText containing text embeddings training sample target_test: Named factor containing labels validation sample. embeddings_test: Object class EmbeddedText containing text embeddings validation sample","code":""},{"path":[]},{"path":"/reference/imdb_movie_reviews.html","id":null,"dir":"Reference","previous_headings":"","what":"Standford Movie Review Dataset — imdb_movie_reviews","title":"Standford Movie Review Dataset — imdb_movie_reviews","text":"data.frame consisting subset 100 negative 200 positive movie reviews dataset provided Maas et al. (2011). data.frame consists three columns. first column 'text' stores movie review. second stores labels (0 = negative, 1 = positive). last column stores id. purpose data illustration vignettes.","code":""},{"path":"/reference/imdb_movie_reviews.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Standford Movie Review Dataset — imdb_movie_reviews","text":"","code":"imdb_movie_reviews"},{"path":"/reference/imdb_movie_reviews.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Standford Movie Review Dataset — imdb_movie_reviews","text":"data.frame","code":""},{"path":"/reference/imdb_movie_reviews.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Standford Movie Review Dataset — imdb_movie_reviews","text":"Maas, . L., Daly, R. E., Pham, P. T., Huang, D., Ng, . Y., & Potts, C. (2011). Learning Word Vectors Sentiment Analysis. D. Lin, Y. Matsumoto, & R. Mihalcea (Eds.), Proceedings 49th Annual Meeting Association Computational Linguistics: Human Language Technologies (pp. 142–150). Association Computational Linguistics. https://aclanthology.org/P11-1015","code":""},{"path":"/reference/install_py_modules.html","id":null,"dir":"Reference","previous_headings":"","what":"Installing necessary python modules to an environment — install_py_modules","title":"Installing necessary python modules to an environment — install_py_modules","text":"Function installing necessary python modules","code":""},{"path":"/reference/install_py_modules.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Installing necessary python modules to an environment — install_py_modules","text":"","code":"install_py_modules( envname = \"aifeducation\", install = \"pytorch\", tf_version = \"2.15\", pytorch_cuda_version = \"12.1\", python_version = \"3.9\", remove_first = FALSE, cpu_only = FALSE )"},{"path":"/reference/install_py_modules.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Installing necessary python modules to an environment — install_py_modules","text":"envname string Name environment packages installed. install character determining machine learning frameworks installed. install=\"\" 'pytorch' 'tensorflow'. install=\"pytorch\" 'pytorch', install=\"tensorflow\" 'tensorflow'. tf_version string determining desired version 'tensorflow'. pytorch_cuda_version string determining desired version 'cuda' 'PyTorch'. python_version string Python version use. remove_first bool TRUE removes environment completely recreating environment installing packages. FALSE packages installed existing environment without prior changes. cpu_only bool TRUE installs cpu version machine learning frameworks.","code":""},{"path":"/reference/install_py_modules.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Installing necessary python modules to an environment — install_py_modules","text":"Returns values objects. Function used installing necessary python libraries conda environment.","code":""},{"path":[]},{"path":"/reference/is.null_or_na.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if NULL or NA — is.null_or_na","title":"Check if NULL or NA — is.null_or_na","text":"Function checking object NULL NA","code":""},{"path":"/reference/is.null_or_na.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if NULL or NA — is.null_or_na","text":"","code":"is.null_or_na(object)"},{"path":"/reference/is.null_or_na.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if NULL or NA — is.null_or_na","text":"object object test.","code":""},{"path":"/reference/is.null_or_na.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if NULL or NA — is.null_or_na","text":"Returns FALSE object NULL NA. Returns TRUE cases.","code":""},{"path":[]},{"path":"/reference/load_ai_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Loading models created with 'aifeducation' — load_ai_model","title":"Loading models created with 'aifeducation' — load_ai_model","text":"Function loading models created 'aifeducation'.","code":""},{"path":"/reference/load_ai_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Loading models created with 'aifeducation' — load_ai_model","text":"","code":"load_ai_model(model_dir, ml_framework = aifeducation_config$get_framework())"},{"path":"/reference/load_ai_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Loading models created with 'aifeducation' — load_ai_model","text":"model_dir Path directory model stored. ml_framework string Determines machine learning framework using model. Possible ml_framework=\"pytorch\" 'pytorch', ml_framework=\"tensorflow\" 'tensorflow', ml_framework=\"auto\". using framework used saving model.","code":""},{"path":"/reference/load_ai_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Loading models created with 'aifeducation' — load_ai_model","text":"Returns object class TextEmbeddingClassifierNeuralNet TextEmbeddingModel.","code":""},{"path":[]},{"path":"/reference/matrix_to_array_c.html","id":null,"dir":"Reference","previous_headings":"","what":"Reshape matrix to array — matrix_to_array_c","title":"Reshape matrix to array — matrix_to_array_c","text":"Function written C++ reshaping matrix containing sequential data array use keras.","code":""},{"path":"/reference/matrix_to_array_c.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Reshape matrix to array — matrix_to_array_c","text":"","code":"matrix_to_array_c(matrix, times, features)"},{"path":"/reference/matrix_to_array_c.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Reshape matrix to array — matrix_to_array_c","text":"matrix matrix containing sequential data. times uword Number sequences. features uword Number features within sequence.","code":""},{"path":"/reference/matrix_to_array_c.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Reshape matrix to array — matrix_to_array_c","text":"Returns array. first dimension corresponds cases, second times, third features.","code":""},{"path":[]},{"path":"/reference/matrix_to_array_c.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Reshape matrix to array — matrix_to_array_c","text":"","code":"#matrix has shape (batch,times*features) matrix<-matrix(data=c(1,1,1,2,2,2, 2,2,2,3,3,3, 1,1,1,1,1,1), nrow=3, byrow=TRUE) matrix #> [,1] [,2] [,3] [,4] [,5] [,6] #> [1,] 1 1 1 2 2 2 #> [2,] 2 2 2 3 3 3 #> [3,] 1 1 1 1 1 1 #Transform matrix to a array #array has shape (batch,times*features) matrix_to_array_c(matrix=matrix,times=2,features=3) #> , , 1 #> #> [,1] [,2] #> [1,] 1 2 #> [2,] 2 3 #> [3,] 1 1 #> #> , , 2 #> #> [,1] [,2] #> [1,] 1 2 #> [2,] 2 3 #> [3,] 1 1 #> #> , , 3 #> #> [,1] [,2] #> [1,] 1 2 #> [2,] 2 3 #> [3,] 1 1 #>"},{"path":"/reference/save_ai_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Saving models created with 'aifeducation' — save_ai_model","title":"Saving models created with 'aifeducation' — save_ai_model","text":"Function saving models created 'aifeducation'.","code":""},{"path":"/reference/save_ai_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Saving models created with 'aifeducation' — save_ai_model","text":"","code":"save_ai_model( model, model_dir, dir_name = NULL, save_format = \"default\", append_ID = TRUE )"},{"path":"/reference/save_ai_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Saving models created with 'aifeducation' — save_ai_model","text":"model Object class TextEmbeddingClassifierNeuralNet TextEmbeddingModel saved. model_dir Path directory model stored. dir_name Name folder created model_dir. Ifdir_name=NULL model's name used. additionally append_ID=TRUE models's name ID used generating name directory. save_format relevant TextEmbeddingClassifierNeuralNet. Format saving model. 'tensorflow'/'keras' models \"keras\" 'Keras v3 format', \"tf\" SavedModel \"h5\" HDF5. 'pytorch' models \"safetensors\" 'safetensors' \"pt\" 'pytorch via pickle'. Use \"default\" standard format. keras 'tensorflow'/'keras' models safetensors 'pytorch' models. append_ID bool TRUE ID appended model directory saving purposes. FALSE .","code":""},{"path":"/reference/save_ai_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Saving models created with 'aifeducation' — save_ai_model","text":"Function return value. saves model disk. return value, called side effects.","code":""},{"path":[]},{"path":"/reference/set_config_cpu_only.html","id":null,"dir":"Reference","previous_headings":"","what":"Setting cpu only for 'tensorflow' — set_config_cpu_only","title":"Setting cpu only for 'tensorflow' — set_config_cpu_only","text":"functions configurates 'tensorflow' use cpus.","code":""},{"path":"/reference/set_config_cpu_only.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Setting cpu only for 'tensorflow' — set_config_cpu_only","text":"","code":"set_config_cpu_only()"},{"path":"/reference/set_config_cpu_only.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Setting cpu only for 'tensorflow' — set_config_cpu_only","text":"function return anything. used side effects.","code":""},{"path":"/reference/set_config_cpu_only.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Setting cpu only for 'tensorflow' — set_config_cpu_only","text":"os$environ$setdefault(\"CUDA_VISIBLE_DEVICES\",\"-1\")","code":""},{"path":[]},{"path":"/reference/set_config_gpu_low_memory.html","id":null,"dir":"Reference","previous_headings":"","what":"Setting gpus' memory usage — set_config_gpu_low_memory","title":"Setting gpus' memory usage — set_config_gpu_low_memory","text":"function changes memory usage gpus allow computations machines small memory. function, computations large models may possible speed computation decreases.","code":""},{"path":"/reference/set_config_gpu_low_memory.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Setting gpus' memory usage — set_config_gpu_low_memory","text":"","code":"set_config_gpu_low_memory()"},{"path":"/reference/set_config_gpu_low_memory.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Setting gpus' memory usage — set_config_gpu_low_memory","text":"function return anything. used side effects.","code":""},{"path":"/reference/set_config_gpu_low_memory.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Setting gpus' memory usage — set_config_gpu_low_memory","text":"function sets TF_GPU_ALLOCATOR \"cuda_malloc_async\" sets memory growth TRUE.","code":""},{"path":[]},{"path":"/reference/set_config_os_environ_logger.html","id":null,"dir":"Reference","previous_headings":"","what":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","title":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","text":"function changes level logging information 'tensorflow' via os environment. function must called importing 'tensorflow'.","code":""},{"path":"/reference/set_config_os_environ_logger.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","text":"","code":"set_config_os_environ_logger(level = \"ERROR\")"},{"path":"/reference/set_config_os_environ_logger.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","text":"level string Minimal level printed console. Four levels available: INFO, WARNING, ERROR NONE.","code":""},{"path":"/reference/set_config_os_environ_logger.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Sets the level for logging information in tensor flow. — set_config_os_environ_logger","text":"function return anything. used side effects.","code":""},{"path":[]},{"path":"/reference/set_config_tf_logger.html","id":null,"dir":"Reference","previous_headings":"","what":"Sets the level for logging information in tensor flow. — set_config_tf_logger","title":"Sets the level for logging information in tensor flow. — set_config_tf_logger","text":"function changes level logging information 'tensorflow'.","code":""},{"path":"/reference/set_config_tf_logger.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sets the level for logging information in tensor flow. — set_config_tf_logger","text":"","code":"set_config_tf_logger(level = \"ERROR\")"},{"path":"/reference/set_config_tf_logger.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sets the level for logging information in tensor flow. — set_config_tf_logger","text":"level string Minimal level printed console. Five levels available: FATAL, ERROR, WARN, INFO, DEBUG.","code":""},{"path":"/reference/set_config_tf_logger.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Sets the level for logging information in tensor flow. — set_config_tf_logger","text":"function return anything. used side effects.","code":""},{"path":[]},{"path":"/reference/set_transformers_logger.html","id":null,"dir":"Reference","previous_headings":"","what":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","title":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","text":"function changes level logging information 'transformers' library. influences output printed console creating training transformer models well TextEmbeddingModels.","code":""},{"path":"/reference/set_transformers_logger.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","text":"","code":"set_transformers_logger(level = \"ERROR\")"},{"path":"/reference/set_transformers_logger.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","text":"level string Minimal level printed console. Four levels available: INFO, WARNING, ERROR DEBUG","code":""},{"path":"/reference/set_transformers_logger.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Sets the level for logging information of the 'transformers' library. — set_transformers_logger","text":"function return anything. used side effects.","code":""},{"path":[]},{"path":"/reference/split_labeled_unlabeled.html","id":null,"dir":"Reference","previous_headings":"","what":"Split data into labeled and unlabeled data — split_labeled_unlabeled","title":"Split data into labeled and unlabeled data — split_labeled_unlabeled","text":"functions splits data labeled unlabeled data.","code":""},{"path":"/reference/split_labeled_unlabeled.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Split data into labeled and unlabeled data — split_labeled_unlabeled","text":"","code":"split_labeled_unlabeled(embedding, target)"},{"path":"/reference/split_labeled_unlabeled.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Split data into labeled and unlabeled data — split_labeled_unlabeled","text":"embedding Object class EmbeddedText. target Named factor containing cases labels missing labels.","code":""},{"path":"/reference/split_labeled_unlabeled.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Split data into labeled and unlabeled data — split_labeled_unlabeled","text":"Returns list following components embeddings_labeled: Object class EmbeddedText containing cases labels. embeddings_unlabeled: Object class EmbeddedText containing cases labels. targets_labeled: Named factor containing labels relevant cases.","code":""},{"path":[]},{"path":"/reference/start_aifeducation_studio.html","id":null,"dir":"Reference","previous_headings":"","what":"Aifeducation Studio — start_aifeducation_studio","title":"Aifeducation Studio — start_aifeducation_studio","text":"Functions starts shiny app represents Aifeducation Studio","code":""},{"path":"/reference/start_aifeducation_studio.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Aifeducation Studio — start_aifeducation_studio","text":"","code":"start_aifeducation_studio()"},{"path":"/reference/start_aifeducation_studio.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Aifeducation Studio — start_aifeducation_studio","text":"function nothing return. used start shiny app.","code":""},{"path":"/reference/summarize_tracked_sustainability.html","id":null,"dir":"Reference","previous_headings":"","what":"Summarizing tracked sustainability data — summarize_tracked_sustainability","title":"Summarizing tracked sustainability data — summarize_tracked_sustainability","text":"Function summarizing tracked sustainability data tracker python library 'codecarbon'.","code":""},{"path":"/reference/summarize_tracked_sustainability.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Summarizing tracked sustainability data — summarize_tracked_sustainability","text":"","code":"summarize_tracked_sustainability(sustainability_tracker)"},{"path":"/reference/summarize_tracked_sustainability.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Summarizing tracked sustainability data — summarize_tracked_sustainability","text":"sustainability_tracker Object class codecarbon.emissions_tracker.OfflineEmissionsTracker python library codecarbon.","code":""},{"path":"/reference/summarize_tracked_sustainability.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Summarizing tracked sustainability data — summarize_tracked_sustainability","text":"Returns list contains tracked sustainability data.","code":""},{"path":[]},{"path":"/reference/test_classifier_sustainability.html","id":null,"dir":"Reference","previous_headings":"","what":"Sustainability data for an example classifier — test_classifier_sustainability","title":"Sustainability data for an example classifier — test_classifier_sustainability","text":"list length 5 containing used energy consumption co2 emissions classifier training. purpose data illustration vignettes.","code":""},{"path":"/reference/test_classifier_sustainability.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sustainability data for an example classifier — test_classifier_sustainability","text":"","code":"test_classifier_sustainability"},{"path":"/reference/test_classifier_sustainability.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Sustainability data for an example classifier — test_classifier_sustainability","text":"list","code":""},{"path":"/reference/test_metric_mean.html","id":null,"dir":"Reference","previous_headings":"","what":"Test metric for an example classifier — test_metric_mean","title":"Test metric for an example classifier — test_metric_mean","text":"matrix 4 rows 17 columns containing test metrics example classifier. purpose data illustration vignettes.","code":""},{"path":"/reference/test_metric_mean.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test metric for an example classifier — test_metric_mean","text":"","code":"test_metric_mean"},{"path":"/reference/test_metric_mean.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Test metric for an example classifier — test_metric_mean","text":"matrix","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":null,"dir":"Reference","previous_headings":"","what":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Abstract class neural nets 'keras'/'tensorflow' 'pytorch'.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Objects class used assigning texts classes/categories. creation training classifier object class EmbeddedText factor necessary. object class EmbeddedText contains numerical text representations (text embeddings) raw texts generated object class TextEmbeddingModel. factor contains classes/categories every text. Missing values (unlabeled cases) supported. predictions object class EmbeddedText used created text embedding model training.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"public-fields","dir":"Reference","previous_headings":"","what":"Public fields","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"model ('tensorflow_model()') Field storing tensorflow model loading. model_config ('list()') List storing information configuration model. information used predict new data. model_config$n_rec: Number recurrent layers. model_config$n_hidden: Number dense layers. model_config$target_levels: Levels target variable. change manually. model_config$input_variables: Order name input variables. change manually. model_config$init_config: List storing parameters passed method new(). last_training ('list()') List storing history results last training. information overwritten new training started. last_training$learning_time: Duration training process. config$history: History last training. config$data: Object class table storing initial frequencies passed data. config$data_pb:l Matrix storing number additional cases (test training) added balanced pseudo-labeling. rows refer folds final training. columns refer steps pseudo-labeling. config$data_bsc_test: Matrix storing number cases category used testing phase balanced synthetic units. Please note frequencies include original synthetic cases. case number original synthetic cases exceeds limit majority classes, frequency represents number cases created cluster analysis. config$date: Time last training finished. config$config: List storing kind estimation requested last training. config$config$use_bsc: TRUE balanced synthetic cases requested. FALSE . config$config$use_baseline: TRUE baseline estimation requested. FALSE . config$config$use_bpl: TRUE balanced, pseudo-labeling cases requested. FALSE . reliability ('list()') List storing central reliability measures last training. reliability$test_metric: Array containing reliability measures validation data every fold, method, step (case pseudo-labeling). reliability$test_metric_mean: Array containing reliability measures validation data every method step (case pseudo-labeling). values represent mean values every fold. reliability$raw_iota_objects: List containing iota_object generated package iotarelr every fold start end last training. reliability$raw_iota_objects$iota_objects_start: List objects class iotarelr_iota2 containing estimated iota reliability second generation baseline model every fold. estimation baseline model requested, list set NULL. reliability$raw_iota_objects$iota_objects_end: List objects class iotarelr_iota2 containing estimated iota reliability second generation final model every fold. Depending requested training method values refer baseline model, trained model basis balanced synthetic cases, balanced pseudo labeling combination balanced synthetic cases pseudo labeling. reliability$raw_iota_objects$iota_objects_start_free: List objects class iotarelr_iota2 containing estimated iota reliability second generation baseline model every fold. estimation baseline model requested, list set NULL.Please note model estimated without forcing Assignment Error Matrix line assumption weak superiority. reliability$raw_iota_objects$iota_objects_end_free: List objects class iotarelr_iota2 containing estimated iota reliability second generation final model every fold. Depending requested training method, values refer baseline model, trained model basis balanced synthetic cases, balanced pseudo-labeling combination balanced synthetic cases pseudo-labeling. Please note model estimated without forcing Assignment Error Matrix line assumption weak superiority. reliability$iota_object_start: Object class iotarelr_iota2 mean individual objects every fold. estimation baseline model requested, list set NULL. reliability$iota_object_start_free: Object class iotarelr_iota2 mean individual objects every fold. estimation baseline model requested, list set NULL. Please note model estimated without forcing Assignment Error Matrix line assumption weak superiority. reliability$iota_object_end: Object class iotarelr_iota2 mean individual objects every fold. Depending requested training method, object refers baseline model, trained model basis balanced synthetic cases, balanced pseudo-labeling combination balanced synthetic cases pseudo-labeling. reliability$iota_object_end_free: Object class iotarelr_iota2 mean individual objects every fold. Depending requested training method, object refers baseline model, trained model basis balanced synthetic cases, balanced pseudo-labeling combination balanced synthetic cases pseudo-labeling. Please note model estimated without forcing Assignment Error Matrix line assumption weak superiority. reliability$standard_measures_end: Object class list containing final measures precision, recall, f1 every fold. Depending requested training method, values refer baseline model, trained model basis balanced synthetic cases, balanced pseudo-labeling combination balanced synthetic cases pseudo-labeling. reliability$standard_measures_mean: matrix containing mean measures precision, recall, f1 end every fold.","code":""},{"path":[]},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"TextEmbeddingClassifierNeuralNet$new() TextEmbeddingClassifierNeuralNet$train() TextEmbeddingClassifierNeuralNet$predict() TextEmbeddingClassifierNeuralNet$check_embedding_model() TextEmbeddingClassifierNeuralNet$get_model_info() TextEmbeddingClassifierNeuralNet$get_text_embedding_model() TextEmbeddingClassifierNeuralNet$set_publication_info() TextEmbeddingClassifierNeuralNet$get_publication_info() TextEmbeddingClassifierNeuralNet$set_software_license() TextEmbeddingClassifierNeuralNet$get_software_license() TextEmbeddingClassifierNeuralNet$set_documentation_license() TextEmbeddingClassifierNeuralNet$get_documentation_license() TextEmbeddingClassifierNeuralNet$set_model_description() TextEmbeddingClassifierNeuralNet$get_model_description() TextEmbeddingClassifierNeuralNet$save_model() TextEmbeddingClassifierNeuralNet$load_model() TextEmbeddingClassifierNeuralNet$get_package_versions() TextEmbeddingClassifierNeuralNet$get_sustainability_data() TextEmbeddingClassifierNeuralNet$get_ml_framework() TextEmbeddingClassifierNeuralNet$clone()","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-new-","dir":"Reference","previous_headings":"","what":"Method new()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Creating new instance class.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$new( ml_framework = aifeducation_config$get_framework(), name = NULL, label = NULL, text_embeddings = NULL, targets = NULL, hidden = c(128), rec = c(128), self_attention_heads = 0, intermediate_size = NULL, attention_type = \"fourier\", add_pos_embedding = TRUE, rec_dropout = 0.1, repeat_encoder = 1, dense_dropout = 0.4, recurrent_dropout = 0.4, encoder_dropout = 0.1, optimizer = \"adam\" )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch' name Character Name new classifier. Please refer common name conventions. Free text can used parameter label. label Character Label new classifier. can use free text. text_embeddings object classTextEmbeddingModel. targets factor containing target values classifier. hidden vector containing number neurons dense layer. length vector determines number dense layers. want dense layer, set parameter NULL. rec vector containing number neurons recurrent layer. length vector determines number dense layers. want dense layer, set parameter NULL. self_attention_heads integer determining number attention heads self-attention layer. relevant attention_type=\"multihead\" intermediate_size int determining size projection layer within transformer encoder. attention_type string Choose relevant attention type. Possible values \"fourier\" multihead. add_pos_embedding bool TRUE positional embedding used. rec_dropout double ranging 0 lower 1, determining dropout bidirectional gru layers. repeat_encoder int determining many times encoder added network. dense_dropout double ranging 0 lower 1, determining dropout dense layers. recurrent_dropout double ranging 0 lower 1, determining recurrent dropout recurrent layer. relevant keras models. encoder_dropout double ranging 0 lower 1, determining dropout dense projection within encoder layers. optimizer Object class keras.optimizers.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns object class TextEmbeddingClassifierNeuralNet ready training.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-train-","dir":"Reference","previous_headings":"","what":"Method train()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method training neural net.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$train( data_embeddings, data_targets, data_n_test_samples = 5, balance_class_weights = TRUE, use_baseline = TRUE, bsl_val_size = 0.25, use_bsc = TRUE, bsc_methods = c(\"dbsmote\"), bsc_max_k = 10, bsc_val_size = 0.25, bsc_add_all = FALSE, use_bpl = TRUE, bpl_max_steps = 3, bpl_epochs_per_step = 1, bpl_dynamic_inc = FALSE, bpl_balance = FALSE, bpl_max = 1, bpl_anchor = 1, bpl_min = 0, bpl_weight_inc = 0.02, bpl_weight_start = 0, bpl_model_reset = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, epochs = 40, batch_size = 32, dir_checkpoint, trace = TRUE, keras_trace = 2, pytorch_trace = 2, n_cores = 2 )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"data_embeddings Object class TextEmbeddingModel. data_targets Factor containing labels cases stored data_embeddings. Factor must named use names used data_embeddings. data_n_test_samples int determining number cross-fold samples. balance_class_weights bool TRUE class weights generated based frequencies training data method Inverse Class Frequency'. FALSE class weight 1. use_baseline bool TRUE calculation baseline model requested. option relevant use_bsc=TRUE use_pbl=TRUE. FALSE, baseline model calculated. bsl_val_size double 0 1, indicating proportion cases class used validation sample estimation baseline model. remaining cases part training data. use_bsc bool TRUE estimation integrate balanced synthetic cases. FALSE . bsc_methods vector containing methods generating synthetic cases via 'smotefamily'. Multiple methods can passed. Currently bsc_methods=c(\"adas\"), bsc_methods=c(\"smote\") bsc_methods=c(\"dbsmote\") possible. bsc_max_k int determining maximal number k used creating synthetic units. bsc_val_size double 0 1, indicating proportion cases class used validation sample estimation synthetic cases. bsc_add_all bool FALSE synthetic cases necessary fill gab class major class added data. TRUE generated synthetic cases added data. use_bpl bool TRUE estimation integrate balanced pseudo-labeling. FALSE . bpl_max_steps int determining maximum number steps pseudo-labeling. bpl_epochs_per_step int Number training epochs within every step. bpl_dynamic_inc bool TRUE, specific percentage cases included step. percentage determined \\(step/bpl_max_steps\\). FALSE, cases used. bpl_balance bool TRUE, number cases every category/class pseudo-labeled data used training. , number cases determined minor class/category. bpl_max double 0 1, setting maximal level confidence considering case pseudo-labeling. bpl_anchor double 0 1 indicating reference point sorting new cases every label. See notes details. bpl_min double 0 1, setting minimal level confidence considering case pseudo-labeling. bpl_weight_inc double value much sample weights increased cases pseudo-labels every step. bpl_weight_start dobule Starting value weights unlabeled cases. bpl_model_reset bool TRUE, model re-initialized every step. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. epochs int Number training epochs. batch_size int Size batches. dir_checkpoint string Path directory checkpoint training saved. directory exist, created. trace bool TRUE, information estimation phase printed console. keras_trace int keras_trace=0 print information training process keras console. pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_trace=2 prints one line information every epoch. n_cores int Number cores used creating synthetic units.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"bsc_max_k: values 2 bsc_max_k successively used. number bsc_max_k high, value reduced number allows calculating synthetic units. bpl_anchor: help value, new cases sorted. aim, distance anchor calculated cases arranged ascending order.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-1","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. changes object trained classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-predict-","dir":"Reference","previous_headings":"","what":"Method predict()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method predicting new data trained neural net.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$predict(newdata, batch_size = 32, verbose = 1)"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-2","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"newdata Object class TextEmbeddingModel data.frame predictions made. batch_size int Size batches. verbose int verbose=0 cat information training process keras console. verbose=1 prints progress bar. verbose=2 prints one line information every epoch.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-2","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns data.frame containing predictions probabilities different labels case.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-check-embedding-model-","dir":"Reference","previous_headings":"","what":"Method check_embedding_model()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method checking provided text embeddings created TextEmbeddingModel classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$check_embedding_model(text_embeddings)"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-3","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"text_embeddings Object class EmbeddedText.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-3","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"TRUE underlying TextEmbeddingModel . FALSE models differ.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-model-info-","dir":"Reference","previous_headings":"","what":"Method get_model_info()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting model information","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-4","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_model_info()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-4","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"list relevant model information","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-text-embedding-model-","dir":"Reference","previous_headings":"","what":"Method get_text_embedding_model()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting text embedding model information","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-5","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_text_embedding_model()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-5","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"list relevant model information text embedding model underlying classifier","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-set-publication-info-","dir":"Reference","previous_headings":"","what":"Method set_publication_info()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method setting publication information classifier","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-6","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$set_publication_info( authors, citation, url = NULL )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-4","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"authors List authors. citation Free text citation. url URL corresponding homepage.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-6","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used setting private members publication information.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-publication-info-","dir":"Reference","previous_headings":"","what":"Method get_publication_info()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting bibliographic information classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-7","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_publication_info()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-7","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"list saved bibliographic information.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-set-software-license-","dir":"Reference","previous_headings":"","what":"Method set_software_license()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method setting license classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-8","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$set_software_license(license = \"GPL-3\")"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-5","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-8","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used setting private member software license model.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-software-license-","dir":"Reference","previous_headings":"","what":"Method get_software_license()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method getting license classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-9","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_software_license()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-6","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-9","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"string representing license software.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-set-documentation-license-","dir":"Reference","previous_headings":"","what":"Method set_documentation_license()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method setting license classifier's documentation.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-10","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$set_documentation_license( license = \"CC BY-SA\" )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-7","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-10","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used setting private member documentation license model.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-documentation-license-","dir":"Reference","previous_headings":"","what":"Method get_documentation_license()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method getting license classifier's documentation.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-11","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_documentation_license()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-8","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-11","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns license string.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-set-model-description-","dir":"Reference","previous_headings":"","what":"Method set_model_description()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method setting description classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-12","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$set_model_description( eng = NULL, native = NULL, abstract_eng = NULL, abstract_native = NULL, keywords_eng = NULL, keywords_native = NULL )"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-9","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"eng string text describing training learner, theoretical empirical background, different output labels English. native string text describing training learner, theoretical empirical background, different output labels native language classifier. abstract_eng string text providing summary description English. abstract_native string text providing summary description native language classifier. keywords_eng vector keyword English. keywords_native vector keyword native language classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-12","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used setting private members description model.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-model-description-","dir":"Reference","previous_headings":"","what":"Method get_model_description()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting model description.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-13","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_model_description()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-13","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"list description classifier English native language.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-save-model-","dir":"Reference","previous_headings":"","what":"Method save_model()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method saving model 'Keras v3 format', 'tensorflow' SavedModel format h5 format.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-14","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$save_model(dir_path, save_format = \"default\")"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-10","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"dir_path string() Path directory model saved. save_format Format saving model. 'tensorflow'/'keras' models \"keras\" 'Keras v3 format', \"tf\" SavedModel \"h5\" HDF5. 'pytorch' models \"safetensors\" 'safetensors' \"pt\" 'pytorch' via pickle. Use \"default\" standard format. keras 'tensorflow'/'keras' models safetensors 'pytorch' models.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-14","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. saves model disk.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-load-model-","dir":"Reference","previous_headings":"","what":"Method load_model()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method importing model 'Keras v3 format', 'tensorflow' SavedModel format h5 format.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-15","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$load_model(dir_path, ml_framework = \"auto\")"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-11","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"dir_path string() Path directory model saved. ml_framework string Determines machine learning framework using model. Possible ml_framework=\"pytorch\" 'pytorch', ml_framework=\"tensorflow\" 'tensorflow', ml_framework=\"auto\".","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-15","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Function return value. used load weights model.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-package-versions-","dir":"Reference","previous_headings":"","what":"Method get_package_versions()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting summary R python packages' versions used creating classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-16","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_package_versions()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-16","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns list containing versions relevant R python packages.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-sustainability-data-","dir":"Reference","previous_headings":"","what":"Method get_sustainability_data()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting summary tracked energy consumption training estimate resulting CO2 equivalents kg.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-17","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_sustainability_data()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-17","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns list containing tracked energy consumption, CO2 equivalents kg, information tracker used, technical information training infrastructure.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-get-ml-framework-","dir":"Reference","previous_headings":"","what":"Method get_ml_framework()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Method requesting machine learning framework used classifier.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-18","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$get_ml_framework()"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"returns-18","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"Returns string describing machine learning framework used classifier","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"objects class cloneable method.","code":""},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"usage-19","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"","code":"TextEmbeddingClassifierNeuralNet$clone(deep = FALSE)"},{"path":"/reference/TextEmbeddingClassifierNeuralNet.html","id":"arguments-12","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding classifier with a neural net — TextEmbeddingClassifierNeuralNet","text":"deep Whether make deep clone.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":null,"dir":"Reference","previous_headings":"","what":"Text embedding model — TextEmbeddingModel","title":"Text embedding model — TextEmbeddingModel","text":"R6 class stores text embedding model can used tokenize, encode, decode, embed raw texts. object provides unique interface different text processing methods.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Text embedding model — TextEmbeddingModel","text":"Objects class TextEmbeddingModel transform raw texts numerical representations can used downstream tasks. aim objects class allow tokenize raw texts, encode tokens sequences integers, decode sequences integers back tokens.","code":""},{"path":[]},{"path":"/reference/TextEmbeddingModel.html","id":"public-fields","dir":"Reference","previous_headings":"","what":"Public fields","title":"Text embedding model — TextEmbeddingModel","text":"last_training ('list()') List storing history results last training. information overwritten new training started.","code":""},{"path":[]},{"path":"/reference/TextEmbeddingModel.html","id":"public-methods","dir":"Reference","previous_headings":"","what":"Public methods","title":"Text embedding model — TextEmbeddingModel","text":"TextEmbeddingModel$new() TextEmbeddingModel$load_model() TextEmbeddingModel$save_model() TextEmbeddingModel$encode() TextEmbeddingModel$decode() TextEmbeddingModel$get_special_tokens() TextEmbeddingModel$embed() TextEmbeddingModel$fill_mask() TextEmbeddingModel$set_publication_info() TextEmbeddingModel$get_publication_info() TextEmbeddingModel$set_software_license() TextEmbeddingModel$get_software_license() TextEmbeddingModel$set_documentation_license() TextEmbeddingModel$get_documentation_license() TextEmbeddingModel$set_model_description() TextEmbeddingModel$get_model_description() TextEmbeddingModel$get_model_info() TextEmbeddingModel$get_package_versions() TextEmbeddingModel$get_basic_components() TextEmbeddingModel$get_bow_components() TextEmbeddingModel$get_transformer_components() TextEmbeddingModel$get_sustainability_data() TextEmbeddingModel$get_ml_framework() TextEmbeddingModel$clone()","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-new-","dir":"Reference","previous_headings":"","what":"Method new()","title":"Text embedding model — TextEmbeddingModel","text":"Method creating new text embedding model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$new( model_name = NULL, model_label = NULL, model_version = NULL, model_language = NULL, method = NULL, ml_framework = aifeducation_config$get_framework()$TextEmbeddingFramework, max_length = 0, chunks = 1, overlap = 0, emb_layer_min = \"middle\", emb_layer_max = \"2_3_layer\", emb_pool_type = \"average\", model_dir, bow_basic_text_rep, bow_n_dim = 10, bow_n_cluster = 100, bow_max_iter = 500, bow_max_iter_cluster = 500, bow_cr_criterion = 1e-08, bow_learning_rate = 1e-08, trace = FALSE )"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"model_name string containing name new model. model_label string containing label/title new model. model_version string version model. model_language string containing language model represents (e.g., English). method string determining kind embedding model. Currently following models supported: method=\"bert\" Bidirectional Encoder Representations Transformers (BERT), method=\"roberta\" Robustly Optimized BERT Pretraining Approach (RoBERTa), method=\"longformer\" Long-Document Transformer, method=\"funnel\" Funnel-Transformer, method=\"deberta_v2\" Decoding-enhanced BERT Disentangled Attention (DeBERTa V2), method=\"glove\" GlobalVector Clusters, method=\"lda\" topic modeling. See details information. ml_framework string Framework use model. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. relevant transformer models. max_length int determining maximum length token sequences used transformer models. relevant methods. chunks int Maximum number chunks. relevant transformer models. overlap int determining number tokens added beginning next chunk. relevant BERT models. emb_layer_min int string determining first layer included creation embeddings. integer correspondents layer number. first layer number 1. Instead integer following strings possible: \"start\" first layer, \"middle\" middle layer, \"2_3_layer\" layer two-third layer, \"last\" last layer. emb_layer_max int string determining last layer included creation embeddings. integer correspondents layer number. first layer number 1. Instead integer following strings possible: \"start\" first layer, \"middle\" middle layer, \"2_3_layer\" layer two-third layer, \"last\" last layer. emb_pool_type string determining method pooling token embeddings within layer. \"cls\" embedding CLS token used. \"average\" token embedding tokens averaged (excluding padding tokens). model_dir string path directory BERT model stored. bow_basic_text_rep object class basic_text_rep created via function bow_pp_create_basic_text_rep. relevant method=\"glove_cluster\" method=\"lda\". bow_n_dim int Number dimensions GlobalVector number topics LDA. bow_n_cluster int Number clusters created basis GlobalVectors. Parameter relevant method=\"lda\" method=\"bert\" bow_max_iter int Maximum number iterations fitting GlobalVectors Topic Models. bow_max_iter_cluster int Maximum number iterations fitting cluster method=\"glove\". bow_cr_criterion double convergence criterion GlobalVectors. bow_learning_rate double initial learning rate GlobalVectors. trace bool TRUE prints information progress. FALSE .","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Text embedding model — TextEmbeddingModel","text":"method: case method=\"bert\", method=\"roberta\", method=\"longformer\", pretrained transformer model must supplied via model_dir. method=\"glove\" method=\"lda\" new model created based data provided via bow_basic_text_rep. original algorithm GlobalVectors provides word embeddings, text embeddings. achieve text embeddings words clustered based word embeddings kmeans.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns object class TextEmbeddingModel.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-load-model-","dir":"Reference","previous_headings":"","what":"Method load_model()","title":"Text embedding model — TextEmbeddingModel","text":"Method loading transformers model R.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-1","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$load_model(model_dir, ml_framework = \"auto\")"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-1","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"model_dir string containing path relevant model directory. ml_framework string Determines machine learning framework using model. Possible ml_framework=\"pytorch\" 'pytorch', ml_framework=\"tensorflow\" 'tensorflow', ml_framework=\"auto\".","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-1","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used loading saved transformer model R interface.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-save-model-","dir":"Reference","previous_headings":"","what":"Method save_model()","title":"Text embedding model — TextEmbeddingModel","text":"Method saving transformer model disk.Relevant transformer models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-2","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$save_model(model_dir, save_format = \"default\")"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-2","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"model_dir string containing path relevant model directory. save_format Format saving model. 'tensorflow'/'keras' models \"h5\" HDF5. 'pytorch' models \"safetensors\" 'safetensors' \"pt\" 'pytorch' via pickle. Use \"default\" standard format. h5 'tensorflow'/'keras' models safetensors 'pytorch' models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-2","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used saving transformer model disk.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-encode-","dir":"Reference","previous_headings":"","what":"Method encode()","title":"Text embedding model — TextEmbeddingModel","text":"Method encoding words raw texts integers.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-3","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$encode( raw_text, token_encodings_only = FALSE, to_int = TRUE, trace = FALSE )"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-3","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"raw_text vector containing raw texts. token_encodings_only bool TRUE, token encodings returned. FALSE, complete encoding returned important BERT models. to_int bool TRUE integer ids tokens returned. FALSE tokens returned. Argument applies transformer models token_encodings_only==TRUE. trace bool TRUE, information progress printed. FALSE requested.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-3","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list containing integer sequences raw texts special tokens.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-decode-","dir":"Reference","previous_headings":"","what":"Method decode()","title":"Text embedding model — TextEmbeddingModel","text":"Method decoding sequence integers tokens","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-4","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$decode(int_seqence, to_token = FALSE)"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-4","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"int_seqence list containing integer sequences transformed tokens plain text. to_token bool FALSE plain text returned. TRUE sequence tokens returned. Argument relevant model based transformer.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-4","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list token sequences","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-special-tokens-","dir":"Reference","previous_headings":"","what":"Method get_special_tokens()","title":"Text embedding model — TextEmbeddingModel","text":"Method receiving special tokens model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-5","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_special_tokens()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-5","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns matrix containing special tokens rows type, token, id columns.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-embed-","dir":"Reference","previous_headings":"","what":"Method embed()","title":"Text embedding model — TextEmbeddingModel","text":"Method creating text embeddings raw texts case using GPU running memory reduce batch size restart R switch use cpu via set_config_cpu_only.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-6","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$embed( raw_text = NULL, doc_id = NULL, batch_size = 8, trace = FALSE )"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-5","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"raw_text vector containing raw texts. doc_id vector containing corresponding IDs every text. batch_size int determining maximal size every batch. trace bool TRUE, information progression printed console.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-6","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Method returns R6 object class EmbeddedText. object contains embeddings data.frame information model creating embeddings.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-fill-mask-","dir":"Reference","previous_headings":"","what":"Method fill_mask()","title":"Text embedding model — TextEmbeddingModel","text":"Method calculating tokens behind mask tokens.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-7","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$fill_mask(text, n_solutions = 5)"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-6","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"text string Text containing mask tokens. n_solutions int Number estimated tokens every mask.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-7","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list containing data.frame every mask. data.frame contains solutions rows reports score, token id, token string columns.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-set-publication-info-","dir":"Reference","previous_headings":"","what":"Method set_publication_info()","title":"Text embedding model — TextEmbeddingModel","text":"Method setting bibliographic information model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-8","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$set_publication_info(type, authors, citation, url = NULL)"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-7","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"type string Type information changed/added. type=\"developer\", type=\"modifier\" possible. authors List people. citation string Citation free text. url string Corresponding URL applicable.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-8","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used set private members publication information model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-publication-info-","dir":"Reference","previous_headings":"","what":"Method get_publication_info()","title":"Text embedding model — TextEmbeddingModel","text":"Method getting bibliographic information model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-9","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_publication_info()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-9","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list bibliographic information.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-set-software-license-","dir":"Reference","previous_headings":"","what":"Method set_software_license()","title":"Text embedding model — TextEmbeddingModel","text":"Method setting license model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-10","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$set_software_license(license = \"GPL-3\")"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-8","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-10","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used setting private member software license model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-software-license-","dir":"Reference","previous_headings":"","what":"Method get_software_license()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting license model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-11","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_software_license()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-11","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"string License model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-set-documentation-license-","dir":"Reference","previous_headings":"","what":"Method set_documentation_license()","title":"Text embedding model — TextEmbeddingModel","text":"Method setting license models' documentation.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-12","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$set_documentation_license(license = \"CC BY-SA\")"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-9","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-12","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used set private member documentation license model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-documentation-license-","dir":"Reference","previous_headings":"","what":"Method get_documentation_license()","title":"Text embedding model — TextEmbeddingModel","text":"Method getting license models' documentation.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-13","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_documentation_license()"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-10","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"license string containing abbreviation license license text.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-set-model-description-","dir":"Reference","previous_headings":"","what":"Method set_model_description()","title":"Text embedding model — TextEmbeddingModel","text":"Method setting description model","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-14","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$set_model_description( eng = NULL, native = NULL, abstract_eng = NULL, abstract_native = NULL, keywords_eng = NULL, keywords_native = NULL )"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-11","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"eng string text describing training classifier, theoretical empirical background, different output labels English. native string text describing training classifier, theoretical empirical background, different output labels native language model. abstract_eng string text providing summary description English. abstract_native string text providing summary description native language classifier. keywords_eng vector keywords English. keywords_native vector keywords native language classifier.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"returns-13","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Function return value. used set private members description model.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-model-description-","dir":"Reference","previous_headings":"","what":"Method get_model_description()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting model description.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-15","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_model_description()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-14","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list description model English native language.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-model-info-","dir":"Reference","previous_headings":"","what":"Method get_model_info()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting model information","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-16","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_model_info()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-15","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"list relevant model information","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-package-versions-","dir":"Reference","previous_headings":"","what":"Method get_package_versions()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting summary R python packages' versions used creating classifier.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-17","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_package_versions()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-16","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list containing versions relevant R python packages.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-basic-components-","dir":"Reference","previous_headings":"","what":"Method get_basic_components()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting part interface's configuration necessary models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-18","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_basic_components()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-17","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-bow-components-","dir":"Reference","previous_headings":"","what":"Method get_bow_components()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting part interface's configuration necessary bag--words models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-19","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_bow_components()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-18","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-transformer-components-","dir":"Reference","previous_headings":"","what":"Method get_transformer_components()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting part interface's configuration necessary transformer models.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-20","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_transformer_components()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-19","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns list.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-sustainability-data-","dir":"Reference","previous_headings":"","what":"Method get_sustainability_data()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting log tracked energy consumption training estimate resulting CO2 equivalents kg.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-21","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_sustainability_data()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-20","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns matrix containing tracked energy consumption, CO2 equivalents kg, information tracker used, technical information training infrastructure every training run.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-get-ml-framework-","dir":"Reference","previous_headings":"","what":"Method get_ml_framework()","title":"Text embedding model — TextEmbeddingModel","text":"Method requesting machine learning framework used classifier.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-22","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$get_ml_framework()"},{"path":"/reference/TextEmbeddingModel.html","id":"returns-21","dir":"Reference","previous_headings":"","what":"Returns","title":"Text embedding model — TextEmbeddingModel","text":"Returns string describing machine learning framework used classifier","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"method-clone-","dir":"Reference","previous_headings":"","what":"Method clone()","title":"Text embedding model — TextEmbeddingModel","text":"objects class cloneable method.","code":""},{"path":"/reference/TextEmbeddingModel.html","id":"usage-23","dir":"Reference","previous_headings":"","what":"Usage","title":"Text embedding model — TextEmbeddingModel","text":"","code":"TextEmbeddingModel$clone(deep = FALSE)"},{"path":"/reference/TextEmbeddingModel.html","id":"arguments-12","dir":"Reference","previous_headings":"","what":"Arguments","title":"Text embedding model — TextEmbeddingModel","text":"deep Whether make deep clone.","code":""},{"path":"/reference/to_categorical_c.html","id":null,"dir":"Reference","previous_headings":"","what":"Transforming classes to one-hot encoding — to_categorical_c","title":"Transforming classes to one-hot encoding — to_categorical_c","text":"Function written C++ transforming vector classes (int) binary class matrix.","code":""},{"path":"/reference/to_categorical_c.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Transforming classes to one-hot encoding — to_categorical_c","text":"","code":"to_categorical_c(class_vector, n_classes)"},{"path":"/reference/to_categorical_c.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Transforming classes to one-hot encoding — to_categorical_c","text":"class_vector vector containing integers every class. integers must range 0 n_classes-1. n_classes int Total number classes.","code":""},{"path":"/reference/to_categorical_c.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Transforming classes to one-hot encoding — to_categorical_c","text":"Returns matrix containing binary representation every class.","code":""},{"path":[]},{"path":"/reference/train_tune_bert_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a BERT model — train_tune_bert_model","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"function can used train fine-tune transformer based BERT architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_bert_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"","code":"train_tune_bert_model( ml_framework = aifeducation_config$get_framework(), output_dir, model_dir_path, raw_texts, p_mask = 0.15, whole_word = TRUE, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, full_sequences_only = FALSE, min_seq_len = 50, learning_rate = 0.003, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_bert_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. whole_word bool TRUE whole word masking applied. FALSE token masking used. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. full_sequences_only bool TRUE using chunks sequence length equal chunk_size. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. learning_rate double Learning rate adam optimizer. n_workers int Number workers. relevant ml_framework=\"tensorflow\". multi_process bool TRUE multiple processes activated. relevant ml_framework=\"tensorflow\". sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. relevant ml_framework=\"tensorflow\". pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_bert_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_bert_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"models uses WordPiece Tokenizer like BERT can trained whole word masking. Transformer library may show warning can ignored. Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_bert_model. Training model makes use dynamic masking contrast original paper static masking applied.","code":""},{"path":"/reference/train_tune_bert_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a BERT model — train_tune_bert_model","text":"Devlin, J., Chang, M.‑W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Deep Bidirectional Transformers Language Understanding. J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings 2019 Conference North (pp. 4171--4186). Association Computational Linguistics. doi:10.18653/v1/N19-1423 Hugging Face documentation https://huggingface.co/docs/transformers/model_doc/bert#transformers.TFBertForMaskedLM","code":""},{"path":[]},{"path":"/reference/train_tune_deberta_v2_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"function can used train fine-tune transformer based DeBERTa-V2 architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_deberta_v2_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"","code":"train_tune_deberta_v2_model( ml_framework = aifeducation_config$get_framework(), output_dir, model_dir_path, raw_texts, p_mask = 0.15, whole_word = TRUE, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, full_sequences_only = FALSE, min_seq_len = 50, learning_rate = 0.03, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_deberta_v2_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. whole_word bool TRUE whole word masking applied. FALSE token masking used. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. full_sequences_only bool TRUE using chunks sequence length equal chunk_size. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. learning_rate bool Learning rate adam optimizer. n_workers int Number workers. relevant ml_framework=\"tensorflow\". multi_process bool TRUE multiple processes activated. relevant ml_framework=\"tensorflow\". sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. relevant ml_framework=\"tensorflow\". pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_deberta_v2_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_deberta_v2_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":"Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_deberta_v2_model. Training model makes use dynamic masking.","code":""},{"path":"/reference/train_tune_deberta_v2_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a DeBERTa-V2 model — train_tune_deberta_v2_model","text":", P., Liu, X., Gao, J. & Chen, W. (2020). DeBERTa: Decoding-enhanced BERT Disentangled Attention. doi:10.48550/arXiv.2006.03654 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/deberta-v2#debertav2","code":""},{"path":[]},{"path":"/reference/train_tune_funnel_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"function can used train fine-tune transformer based Funnel Transformer architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_funnel_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"","code":"train_tune_funnel_model( ml_framework = aifeducation_config$get_framework(), output_dir, model_dir_path, raw_texts, p_mask = 0.15, whole_word = TRUE, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, min_seq_len = 50, full_sequences_only = FALSE, learning_rate = 0.003, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_funnel_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. whole_word bool TRUE whole word masking applied. FALSE token masking used. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. full_sequences_only bool TRUE token sequences length equal chunk_size used training. learning_rate double Learning rate adam optimizer. n_workers int Number workers. multi_process bool TRUE multiple processes activated. sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_funnel_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_funnel_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"aug_vocab_by > 0 raw text used training WordPiece tokenizer. end process, additional entries added vocabulary part original vocabulary. experimental state. Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_funnel_model. Training model makes use dynamic masking.","code":""},{"path":"/reference/train_tune_funnel_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a Funnel Transformer model — train_tune_funnel_model","text":"Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering Sequential Redundancy Efficient Language Processing. doi:10.48550/arXiv.2006.03236 Hugging Face documentation https://huggingface.co/docs/transformers/model_doc/funnel#funnel-transformer","code":""},{"path":[]},{"path":"/reference/train_tune_longformer_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"function can used train fine-tune transformer based Longformer architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_longformer_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"","code":"train_tune_longformer_model( ml_framework = aifeducation_config$get_framework, output_dir, model_dir_path, raw_texts, p_mask = 0.15, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, full_sequences_only = FALSE, min_seq_len = 50, learning_rate = 0.03, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_longformer_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. full_sequences_only bool TRUE using chunks sequence length equal chunk_size. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. learning_rate bool Learning rate adam optimizer. n_workers int Number workers. relevant ml_framework=\"tensorflow\". multi_process bool TRUE multiple processes activated. relevant ml_framework=\"tensorflow\". sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. relevant ml_framework=\"tensorflow\". pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_longformer_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_longformer_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_roberta_model. Training model makes use dynamic masking.","code":""},{"path":"/reference/train_tune_longformer_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a Longformer model — train_tune_longformer_model","text":"Beltagy, ., Peters, M. E., & Cohan, . (2020). Longformer: Long-Document Transformer. doi:10.48550/arXiv.2004.05150 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/longformer#transformers.LongformerConfig","code":""},{"path":[]},{"path":"/reference/train_tune_roberta_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"function can used train fine-tune transformer based RoBERTa architecture help python libraries 'transformers', 'datasets', 'tokenizers'.","code":""},{"path":"/reference/train_tune_roberta_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"","code":"train_tune_roberta_model( ml_framework = aifeducation_config$get_framework(), output_dir, model_dir_path, raw_texts, p_mask = 0.15, val_size = 0.1, n_epoch = 1, batch_size = 12, chunk_size = 250, full_sequences_only = FALSE, min_seq_len = 50, learning_rate = 0.03, n_workers = 1, multi_process = FALSE, sustain_track = TRUE, sustain_iso_code = NULL, sustain_region = NULL, sustain_interval = 15, trace = TRUE, keras_trace = 1, pytorch_trace = 1, pytorch_safetensors = TRUE )"},{"path":"/reference/train_tune_roberta_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"ml_framework string Framework use training inference. ml_framework=\"tensorflow\" 'tensorflow' ml_framework=\"pytorch\" 'pytorch'. output_dir string Path directory final model saved. directory exist, created. model_dir_path string Path directory original model stored. raw_texts vector containing raw texts training. p_mask double Ratio determining number words/tokens masking. val_size double Ratio determining amount token chunks used validation. n_epoch int Number epochs training. batch_size int Size batches. chunk_size int Size every chunk training. full_sequences_only bool TRUE using chunks sequence length equal chunk_size. min_seq_len int relevant full_sequences_only=FALSE. Value determines minimal sequence length inclusion training process. learning_rate bool Learning rate adam optimizer. n_workers int Number workers. relevant ml_framework=\"tensorflow\". multi_process bool TRUE multiple processes activated. relevant ml_framework=\"tensorflow\". sustain_track bool TRUE energy consumption tracked training via python library codecarbon. sustain_iso_code string ISO code (Alpha-3-Code) country. variable must set sustainability tracked. list can found Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. sustain_region Region within country. available USA Canada See documentation codecarbon information. https://mlco2.github.io/codecarbon/parameters.html sustain_interval integer Interval seconds measuring power usage. trace bool TRUE information progress printed console. keras_trace int keras_trace=0 print information training process keras console. keras_trace=1 prints progress bar. keras_trace=2 prints one line information every epoch. relevant ml_framework=\"tensorflow\". pytorch_trace int pytorch_trace=0 print information training process pytorch console. pytorch_trace=1 prints progress bar. pytorch_safetensors bool TRUE 'pytorch' model saved safetensors format. FALSE 'safetensors' available saved standard pytorch format (.bin). relevant pytorch models.","code":""},{"path":"/reference/train_tune_roberta_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"function return object. Instead trained fine-tuned model saved disk.","code":""},{"path":"/reference/train_tune_roberta_model.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"Pre-Trained models can fine-tuned function available https://huggingface.co/. New models can created via function create_roberta_model. Training model makes use dynamic masking.","code":""},{"path":"/reference/train_tune_roberta_model.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Function for training and fine-tuning a RoBERTa model — train_tune_roberta_model","text":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: Robustly Optimized BERT Pretraining Approach. doi:10.48550/arXiv.1907.11692 Hugging Face Documentation https://huggingface.co/docs/transformers/model_doc/roberta#transformers.RobertaConfig","code":""},{"path":[]},{"path":"/reference/update_aifeducation_progress_bar.html","id":null,"dir":"Reference","previous_headings":"","what":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","title":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","text":"function updates master progress bar aifeducation shiny app. progress bar reports current state overall process.","code":""},{"path":"/reference/update_aifeducation_progress_bar.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","text":"","code":"update_aifeducation_progress_bar(value, total, title = NULL)"},{"path":"/reference/update_aifeducation_progress_bar.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","text":"value int Value describing current step process. total int Total number steps process. title string Title displaying top progress bar.","code":""},{"path":"/reference/update_aifeducation_progress_bar.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Update master progress bar in aifeducation shiny app. — update_aifeducation_progress_bar","text":"Function nothing returns. updates progress bar id \"pgr_bar_aifeducation\".","code":""},{"path":[]},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":null,"dir":"Reference","previous_headings":"","what":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"function updates epoch progress bar aifeducation shiny app. progress bar reports current state overall process.","code":""},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"","code":"update_aifeducation_progress_bar_epochs(value, total, title = NULL)"},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"value int Value describing current step process. total int Total number steps process. title string Title displaying top progress bar.","code":""},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"Function nothing returns. updates progress bar id \"pgr_bar_aifeducation_epochs\".","code":""},{"path":"/reference/update_aifeducation_progress_bar_epochs.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Update epoch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_epochs","text":"function called often training model. Thus, function check requirements updating progress bar reduce computational time. check fulfilling necessary conditions must implemented separately.","code":""},{"path":[]},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":null,"dir":"Reference","previous_headings":"","what":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"function updates step/batch progress bar aifeducation shiny app. progress bar reports current state overall process.","code":""},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"","code":"update_aifeducation_progress_bar_steps(value, total, title = NULL)"},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"value int Value describing current step process. total int Total number steps process. title string Title displaying top progress bar.","code":""},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"Function nothing returns. updates progress bar id \"pgr_bar_aifeducation_steps\".","code":""},{"path":"/reference/update_aifeducation_progress_bar_steps.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Update step/batch progress bar in aifeducation shiny app. — update_aifeducation_progress_bar_steps","text":"function called often training model. Thus, function check requirements updating progress bar reduce computational time. check fulfilling necessary conditions must implemented separately.","code":""},{"path":[]},{"path":"/news/index.html","id":"aifeducation-033","dir":"Changelog","previous_headings":"","what":"aifeducation 0.3.3","title":"aifeducation 0.3.3","text":"CRAN release: 2024-04-22 Graphical User Interface Aifeducation Studio Fixed bug concerning ids .pdf .csv files. Now ids correctly saved within text collection file. Fixed bug checking selection least one file type creation text collection. TextEmbeddingClassifiers Fixed process checking TextEmbeddingModels compatible. Python Installation Fixed bug caused installation incompatible versions keras Tensorflow. Changes Removed quanteda.textmodels necessary library testing package. Added dataset testing package based Maas et al. (2011).","code":""},{"path":"/news/index.html","id":"aifeducation-032","dir":"Changelog","previous_headings":"","what":"aifeducation 0.3.2","title":"aifeducation 0.3.2","text":"CRAN release: 2024-03-15 TextEmbeddingClassifiers Fixed bug GlobalAveragePooling1D_PT. Now layer makes correct pooling. change effect PyTorch models trained version 0.3.1. TextEmbeddingModel Replaced parameter ‘aggregation’ three new parameters allowing explicitly choose start end layer included creation embeddings. Furthermore, two options pooling method within layer added (“cls” “average”). Added support reporting training validation loss training corresponding base model. Transformer Models Fixed bug creation transformer models except funnel. Now choosing number layers working. file ‘history.log’ now saved within model’s folder reporting loss validation loss training epoch. EmbeddedText Changed process validating EmbeddedTexts compatible. Now model’s unique name used validation. Added new fields updated methods account new options creating embeddings (layer selection pooling type). Graphical User Interface Aifeducation Studio Adapted interface according changes made version. Improved read raw texts. Reading now reduces multiple spaces characters one single space character. Hyphenation removed. Python Installation Updated installation account new version keras.","code":""},{"path":"/news/index.html","id":"aifeducation-031","dir":"Changelog","previous_headings":"","what":"aifeducation 0.3.1","title":"aifeducation 0.3.1","text":"CRAN release: 2024-02-18 Graphical User Interface Aifeducation Studio Added shiny app package serves graphical user interface. Transformer Models Fixed bug transformers except BERT concerning unk_token. Switched SentencePiece tokenizer WordPiece tokenizer DeBERTa_V2. Add possibility train DeBERTa_V2 FunnelTransformer models Whole Word Masking. TextEmbeddingModel Added method ‘fill-mask’. Added new argument method ‘encode’, allowing chose encoding token ids token strings. Added new argument method ‘decode’, allowing chose decoding single tokens plain text. Fixed bug embedding texts using pytorch. fix decrease computational time enables gpu support (available machine). Fixed two missing columns saving results sustainability tracking machines without gpu. Implemented advantages datasets python library ‘datasets’ increasing computational speed allowing use large datasets. TextEmbeddingClassifiers Adding support pytorch without need kerasV3 keras-core. Classifiers pytorch now implemented native pytorch. Changed architecture new classifiers extended abilities neural nets adding possibility add positional embedding. Changed architecture new classifiers extended abilities neural nets adding alternative method self-attention mechanism via fourier transformation (similar FNet). Added balanced_accuracy new metric determining state model predicts classes best. Fixed error training history saved correctly. Added record metric test dataset training history pytorch. Added option balance class weights calculating training loss according Inverse Frequency method. Balance class weights activated default. Added method checking compatibility underlying TextEmbeddingModels classifier object class EmbeddedText. Added precision, recall, f1-score new metrics. Python Installation Added argument ‘install_py_modules’, allowing choose machine learning framework installed. Updated ‘check_aif_py_modules’. Changes Setting machine learning framework start session longer necessary. function setting global ml_framework remains active convenience. ml_framework can now switched time session. Updated documentation.","code":""},{"path":"/news/index.html","id":"aifeducation-030","dir":"Changelog","previous_headings":"","what":"aifeducation 0.3.0","title":"aifeducation 0.3.0","text":"CRAN release: 2023-10-10 Added DeBERTa Funnel-Transformer support. Fixed issues installing required python packages. Fixed issues training transformer models. Fixed issue calculating final iota values classifiers pseudo labeling active. Added support PyTorch Tensorflow transformer models. Added support PyTorch classifier objects via keras 3 future. Removed augmentation vocabulary training BERT models. Updated documentation. Changed reported values kappa.","code":""},{"path":"/news/index.html","id":"aifeducation-020","dir":"Changelog","previous_headings":"","what":"aifeducation 0.2.0","title":"aifeducation 0.2.0","text":"CRAN release: 2023-08-15 First release CRAN","code":""}]