Update Documentation

Bug Fix in get special tokens
FBerding · Jan 21, 2024 · 1219b4b · 1219b4b
1 parent e9ab854
commit 1219b4b
Show file tree

Hide file tree

Showing 16 changed files with 1,140 additions and 799 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -12,15 +12,15 @@ Authors@R: c(
   )
 Description: In social and educational settings, the use of Artificial
     Intelligence (AI) is a challenging task. Relevant data is often only
-    available in handwritten forms or the use of data is restricted by
-    privacy policies, often leading to small data sets. Furthermore, data
-    in the educational and social sciences is often unbalanced in terms of
+    available in handwritten forms, or the use of data is restricted by
+    privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences,
+    data is often unbalanced in terms of
     frequencies. To support educators as well as educational and social
     researchers in using the potentials of AI for their work, this package
     provides a unified interface for neural nets in 'keras',
     'tensorflow', and 'pytorch' to deal with natural language problems. In addition,
-    the package ships with a shiny app providing a graphical unser interface to
-    user. This allows the usage of AI even without the need of codings skills.
+    the package ships with a shiny app, providing a graphical user interface.
+    This allows the usage of AI for people without codings skills.
     The tools integrate existing mathematical and statistical methods for dealing
     with small data sets via pseudo-labeling (e.g. Lee (2013)
     <https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks>,
@@ -32,9 +32,9 @@ Description: In social and educational settings, the use of Artificial
     familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>,
     Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019)
     <doi:10.4135/9781071878781>). Estimation of energy consumption and CO2
-    emissions during training models is done with the 'python' library
+    emissions during model training is done with the 'python' library
     'codecarbon'.  Finally, all objects created with this package allow to
-    share trained AI with other people.
+    share trained AI models with other people.
 License: GPL-3
 URL:https://fberding.github.io/aifeducation/
 BugReports: https://github.com/cran/aifeducation/issues

diff --git a/NAMESPACE b/NAMESPACE
@@ -50,6 +50,7 @@ import(shinydashboard)
 import(stats)
 importFrom(abind,abind)
 importFrom(fs,path_home)
+importFrom(methods,is)
 importFrom(methods,isClass)
 importFrom(readxl,read_xlsx)
 importFrom(reticulate,conda_create)

diff --git a/NEWS.md b/NEWS.md
@@ -1,66 +1,81 @@
+---
+editor_options: 
+  markdown: 
+    wrap: 72
+---
+
 # aifeducation 0.3.1
+
 **Graphical User Interface Aifeducation Studio**
 
--   Added a shiny app to the package that servers as a graphical user interface.
+-   Added a shiny app to the package that serves as a graphical user
+    interface.
 
 **Transformer Models**
 
--   Fixed a bug of all transformers expect BERT concerning the unk_token.
+-   Fixed a bug in all transformers except BERT concerning the
+    unk_token.
 
 **TextEmbeddingModel**
 
 -   Added a method for 'fill-mask'.
--   Added a new argument to the method 'encode' allowing to chose
-    between encoding into tokens' ids or into tokens' strings.
--   Added a new argument to the method 'decode' allowing to chose
+-   Added a new argument to the method 'encode', allowing to chose
+    between encoding into token ids or into token strings.
+-   Added a new argument to the method 'decode', allowing to chose
     between decoding into single tokens or into plain text.
 -   Fixed a bug for embedding texts when using pytorch. The fix should
     decrease computational time.
-    
+
 **TextEmbeddingClassifiers**
-
--   Adding support for pytorch without the need for kerasV3
-    or keras-core. Classifiers for pytorch are now implemented in native pytorch.
--   Changed the architecture for new classifiers and extended
-    the abilities of neural nets by adding the possibility to add positional embedding.
--   Changed the architecture for new classifiers and extended
-    the abilities of neural nets by adding an alternative method for the self-attention
-    mechanism via fourier transformation (similar to FNet).
--   Added balanced_accuracy as the new metric for determining
-    which state of a model predicts classes best.
+
+-   Adding support for pytorch without the need for kerasV3 or
+    keras-core. Classifiers for pytorch are now implemented in native
+    pytorch.
+-   Changed the architecture for new classifiers and extended the
+    abilities of neural nets by adding the possibility to add positional
+    embedding.
+-   Changed the architecture for new classifiers and extended the
+    abilities of neural nets by adding an alternative method for the
+    self-attention mechanism via fourier transformation (similar to
+    FNet).
+-   Added balanced_accuracy as the new metric for determining which
+    state of a model predicts classes best.
 -   Fixed error that training history is not saved correctly.
--   Added a record metric for the test dataset to training
-    history with pytorch.
--   Added the option to balance class weights for
-    calculating training loss according to the Inverse Frequency method. 
-    Balance class weights is activated by default.
--   Added a method for checking the compatibility of the underlying TextEmbeddingModels
-    of a classifier and an object of class EmbeddedText.
+-   Added a record metric for the test dataset to training history with
+    pytorch.
+-   Added the option to balance class weights for calculating training
+    loss according to the Inverse Frequency method. Balance class
+    weights is activated by default.
+-   Added a method for checking the compatibility of the underlying
+    TextEmbeddingModels of a classifier and an object of class
+    EmbeddedText.
 -   Added precision, recall, and f1-score as new metrics.
-
-**Python Installation**
--   Adding an argument to 'install_py_modules' allowing to choose which machine
-    learning framework should be installed.
--   Updated 'check_aif_py_modules'.
-
+
+**Python Installation** - Added an argument to 'install_py_modules',
+allowing to choose which machine learning framework should be
+installed. - Updated 'check_aif_py_modules'.
+
 **Further Changes**
-
--   Setting the machine learning framework is not longer necessary at the start 
-    of a session. The function for setting the global ml_framework remains active
-    for convenience. The ml_framework can now switched at any time during a session.
--   Updated documentation
+
+-   Setting the machine learning framework at the start of a session is
+    no longer necessary. The function for setting the global
+    ml_framework remains active for convenience. The ml_framework can
+    now be switched at any time during a session.
+-   Updated documentation.
 
 # aifeducation 0.3.0
 
--   Added DeBERTa and Funnel-Transformer support
--   Fixed issues for installing the required python packages
--   Fixed issues in training transformer models
--   Fixed an issue for calculating the final iota values in classifiers if pseudo labeling is active
--   Added support for PyTorch and Tensorflow for all transformer models
--   Added support for PyTorch for classifier objects via keras 3 in the future
--   Removed augmentation of vocabulary from training BERT models
--   Updated documentation
--   changed the reported values for kappa
+-   Added DeBERTa and Funnel-Transformer support.
+-   Fixed issues for installing the required python packages.
+-   Fixed issues in training transformer models.
+-   Fixed an issue for calculating the final iota values in classifiers
+    if pseudo labeling is active.
+-   Added support for PyTorch and Tensorflow for all transformer models.
+-   Added support for PyTorch for classifier objects via keras 3 in the
+    future.
+-   Removed augmentation of vocabulary from training BERT models.
+-   Updated documentation.
+-   Changed the reported values for kappa.
 
 # aifeducation 0.2.0
 

diff --git a/R/aif_gui.R b/R/aif_gui.R
@@ -25,6 +25,7 @@
 #'@importFrom utils read.csv2
 #'@importFrom utils write.csv2
 #'@importFrom rlang .data
+#'@importFrom methods is
 #'
 #'
 #'@export
@@ -60,6 +61,11 @@ start_aifeducation_studio<-function(){
     stop("No available machine learning frameworks found.")
   }
 
+  #Exporting Functions for Python
+  #These functions must be available in the global environment
+  py_update_aifeducation_progress_bar_epochs<<-reticulate::py_func(update_aifeducation_progress_bar_epochs)
+  py_update_aifeducation_progress_bar_steps<<-reticulate::py_func(update_aifeducation_progress_bar_steps)
+
   #Start GUI--------------------------------------------------------------------
   options(shiny.reactlog=TRUE)
   #options(shiny.fullstacktrace = TRUE)
@@ -676,8 +682,8 @@ start_aifeducation_studio<-function(){
 
     #Logger for progressmodal
     log=reactiveVal(value = rep(x="",times=15))
-    py_update_aifeducation_progress_bar_epochs<<-reticulate::py_func(update_aifeducation_progress_bar_epochs)
-    py_update_aifeducation_progress_bar_steps<<-reticulate::py_func(update_aifeducation_progress_bar_steps)
+    #py_update_aifeducation_progress_bar_epochs<<-reticulate::py_func(update_aifeducation_progress_bar_epochs)
+    #py_update_aifeducation_progress_bar_steps<<-reticulate::py_func(update_aifeducation_progress_bar_steps)
 
 
 
@@ -1126,7 +1132,7 @@ start_aifeducation_studio<-function(){
                               title = as.character(all_paths[i]))
             tmp_document=readtext::readtext(file=all_paths[i])
             #File name without extension
-            text_corpus[counter,"id"]=stringi::stri_split_regex(tmp_document$doc_id,pattern=".")[[1]][1]
+            text_corpus[counter,"id"]=stringi::stri_split_fixed(tmp_document$doc_id,pattern=".")[[1]][1]
             text_corpus[counter,"text"]=tmp_document$text
             counter=counter+1
 
@@ -1564,7 +1570,7 @@ start_aifeducation_studio<-function(){
             create_longformer_model(
               ml_framework=input$config_ml_framework,
               model_dir=input$lm_save_created_model_dir_path,
-              vocab_raw_texts=vocab_raw_texts,
+              vocab_raw_texts=raw_texts$text,
               vocab_size=input$lm_vocab_size,
               add_prefix_space=input$lm_add_prefix_space,
               trim_offsets=input$lm_trim_offsets,
@@ -1589,7 +1595,7 @@ start_aifeducation_studio<-function(){
             create_funnel_model(
               ml_framework=input$config_ml_framework,
               model_dir=input$lm_save_created_model_dir_path,
-              vocab_raw_texts=vocab_raw_texts,
+              vocab_raw_texts=raw_texts$text,
               vocab_size=input$lm_vocab_size,
               vocab_do_lower_case=input$lm_vocab_do_lower_case,
               max_position_embeddings=input$lm_max_position_embeddings,
@@ -2318,7 +2324,7 @@ start_aifeducation_studio<-function(){
         model=try(load_ai_model(model_dir = model_path,
                                 ml_framework=input$config_ml_framework),
                   silent = TRUE)
-        if(is(model,class2 = "try-error")==FALSE){
+        if(methods::is(model,class2 = "try-error")==FALSE){
           if("TextEmbeddingModel"%in%class(model)){
             if(utils::compareVersion(as.character(model$get_package_versions()$aifeducation),"0.3.1")>=0){
               closeSweetAlert()
@@ -2549,7 +2555,7 @@ start_aifeducation_studio<-function(){
           n_solutions=input$lm_n_fillments_for_fill_mask),
         silent = TRUE)
 
-      if(is(solutions,class2 = "try-error")==FALSE){
+      if(methods::is(solutions,class2 = "try-error")==FALSE){
         updateNumericInput(inputId = "lm_select_mask_for_fill_mask",
                            max=length(solutions))
 
@@ -2570,7 +2576,7 @@ start_aifeducation_studio<-function(){
       plot_data=plot_data[order(plot_data$score,decreasing=FALSE),]
       plot_data$token_str=factor(plot_data$token_str,levels=(plot_data$token_str))
       plot=ggplot2::ggplot(data = plot_data)+
-        ggplot2::geom_col(ggplot2::aes(x=token_str,y=score))+
+        ggplot2::geom_col(ggplot2::aes(x=.data$token_str,y=.data$score))+
         ggplot2::coord_flip()+
         ggplot2::xlab("tokens")+
         ggplot2::ylab("score")+
@@ -2817,7 +2823,7 @@ start_aifeducation_studio<-function(){
                    type="info")
         model=try(load_ai_model(model_dir = lm_interface_for_documentation_path(),
                                 ml_framework=input$config_ml_framework),silent = TRUE)
-        if(is(model,class2 = "try-error")==FALSE){
+        if(methods::is(model,class2 = "try-error")==FALSE){
           if("TextEmbeddingModel"%in%class(model)){
             if(utils::compareVersion(as.character(model$get_package_versions()$aifeducation),"0.3.1")>=0){
               closeSweetAlert()
@@ -3185,7 +3191,7 @@ start_aifeducation_studio<-function(){
       file_path=tec_target_data_for_train_path()
       if(!is.null(file_path)){
         if(file.exists(file_path)==TRUE){
-          extension=stringi::stri_split_regex(file_path,pattern=".")[[1]]
+          extension=stringi::stri_split_fixed(file_path,pattern=".")[[1]]
           extension=stringi::stri_trans_tolower(extension[[length(extension)]])
           show_alert(title="Loading",
                      text = "Please wait",
@@ -3208,6 +3214,8 @@ start_aifeducation_studio<-function(){
             target_data=try(
               as.data.frame(target_data),
               silent = TRUE)
+          } else {
+            target_data=NA
           }
 
           #Final Check
@@ -3529,7 +3537,7 @@ start_aifeducation_studio<-function(){
         classifier<-try(load_ai_model(model_dir = model_path,
                                       ml_framework=input$config_ml_framework),
                         silent = TRUE)
-        if(is(classifier,class2 = "try-error")==FALSE){
+        if(methods::is(classifier,class2 = "try-error")==FALSE){
           if("TextEmbeddingClassifierNeuralNet"%in%class(classifier)){
             if(utils::compareVersion(as.character(classifier$get_package_versions()$r_package_versions$aifeducation),"0.3.1")>=0){
               closeSweetAlert()
@@ -4110,20 +4118,20 @@ start_aifeducation_studio<-function(){
         }
 
         plot<-ggplot2::ggplot(data=plot_data)+
-          ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$train_mean,color="train"))+
-          ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$validation_mean,color="validation"))
+          ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$train_mean,color="train"))+
+          ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$validation_mean,color="validation"))
 
         if(input$tec_performance_training_min_max==TRUE){
           plot<-plot+
-            ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$train_min,color="train"))+
-            ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$train_max,color="train"))+
+            ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$train_min,color="train"))+
+            ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$train_max,color="train"))+
             ggplot2::geom_ribbon(ggplot2::aes(x=.data$epoch,
                                               ymin=.data$train_min,
                                               ymax=.data$train_max),
                                  alpha=0.25,
                                  fill="red")+
-            ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$validation_min,color="validation"))+
-            ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$validation_max,color="validation"))+
+            ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$validation_min,color="validation"))+
+            ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$validation_max,color="validation"))+
             ggplot2::geom_ribbon(ggplot2::aes(x=.data$epoch,
                                               ymin=.data$validation_min,
                                               ymax=.data$validation_max),
@@ -4345,7 +4353,7 @@ start_aifeducation_studio<-function(){
         classifier=try(load_ai_model(model_dir = tec_interface_for_documentation_path(),
                                      ml_framework=input$config_ml_framework),
                        silent = TRUE)
-        if(is(classifier,class2 = "try-error")==FALSE){
+        if(methods::is(classifier,class2 = "try-error")==FALSE){
           if("TextEmbeddingClassifierNeuralNet"%in%class(classifier)){
             if(utils::compareVersion(as.character(classifier$get_package_versions()$r_package_versions$aifeducation),"0.3.1")>=0){
               closeSweetAlert()

diff --git a/R/install_and_config.R b/R/install_and_config.R
@@ -38,7 +38,7 @@ install_py_modules<-function(envname="aifeducation",
                          "accelerate")
 
   #Check Arguments
-  if(!(check%in%c("all","pytorch","tensorflow"))){
+  if(!(install%in%c("all","pytorch","tensorflow"))){
     stop("install must be all, pytorch or tensorflow.")
   }
 

diff --git a/R/onLoad.R b/R/onLoad.R
@@ -10,12 +10,11 @@ os<-NULL
 keras<-NULL
 accelerate<-NULL
 safetensors<-NULL
-py_update_aifeducation_progress_bar_epochs<-NULL
-py_update_aifeducation_progress_bar_steps<-NULL
 
 aifeducation_config<-NULL
 
-
+py_update_aifeducation_progress_bar_epochs<<-NULL
+py_update_aifeducation_progress_bar_steps<<-NULL
 
 .onLoad<-function(libname, pkgname){
   # use superassignment to update the global reference

diff --git a/R/te_classifier_neuralnet_model.R b/R/te_classifier_neuralnet_model.R
@@ -2082,7 +2082,6 @@ TextEmbeddingClassifierNeuralNet<-R6::R6Class(
          file_path=paste0(dir_path,"/","model_data",extension)
          if(dir.exists(dir_path)==FALSE){
            dir.create(dir_path)
-           #cat("Creating Directory\n")
          }
          self$model$save(file_path)
        } else if(private$ml_framework=="pytorch"){