Skip to content

Commit

Permalink
Update Documentation
Browse files Browse the repository at this point in the history
Bug Fix in get special tokens
  • Loading branch information
FBerding committed Jan 21, 2024
1 parent e9ab854 commit 1219b4b
Show file tree
Hide file tree
Showing 16 changed files with 1,140 additions and 799 deletions.
14 changes: 7 additions & 7 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ Authors@R: c(
)
Description: In social and educational settings, the use of Artificial
Intelligence (AI) is a challenging task. Relevant data is often only
available in handwritten forms or the use of data is restricted by
privacy policies, often leading to small data sets. Furthermore, data
in the educational and social sciences is often unbalanced in terms of
available in handwritten forms, or the use of data is restricted by
privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences,
data is often unbalanced in terms of
frequencies. To support educators as well as educational and social
researchers in using the potentials of AI for their work, this package
provides a unified interface for neural nets in 'keras',
'tensorflow', and 'pytorch' to deal with natural language problems. In addition,
the package ships with a shiny app providing a graphical unser interface to
user. This allows the usage of AI even without the need of codings skills.
the package ships with a shiny app, providing a graphical user interface.
This allows the usage of AI for people without codings skills.
The tools integrate existing mathematical and statistical methods for dealing
with small data sets via pseudo-labeling (e.g. Lee (2013)
<https://www.researchgate.net/publication/280581078_Pseudo-Label_The_Simple_and_Efficient_Semi-Supervised_Learning_Method_for_Deep_Neural_Networks>,
Expand All @@ -32,9 +32,9 @@ Description: In social and educational settings, the use of Artificial
familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>,
Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019)
<doi:10.4135/9781071878781>). Estimation of energy consumption and CO2
emissions during training models is done with the 'python' library
emissions during model training is done with the 'python' library
'codecarbon'. Finally, all objects created with this package allow to
share trained AI with other people.
share trained AI models with other people.
License: GPL-3
URL:https://fberding.github.io/aifeducation/
BugReports: https://github.com/cran/aifeducation/issues
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ import(shinydashboard)
import(stats)
importFrom(abind,abind)
importFrom(fs,path_home)
importFrom(methods,is)
importFrom(methods,isClass)
importFrom(readxl,read_xlsx)
importFrom(reticulate,conda_create)
Expand Down
101 changes: 58 additions & 43 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,81 @@
---
editor_options:
markdown:
wrap: 72
---

# aifeducation 0.3.1

**Graphical User Interface Aifeducation Studio**

- Added a shiny app to the package that servers as a graphical user interface.
- Added a shiny app to the package that serves as a graphical user
interface.

**Transformer Models**

- Fixed a bug of all transformers expect BERT concerning the unk_token.
- Fixed a bug in all transformers except BERT concerning the
unk_token.

**TextEmbeddingModel**

- Added a method for 'fill-mask'.
- Added a new argument to the method 'encode' allowing to chose
between encoding into tokens' ids or into tokens' strings.
- Added a new argument to the method 'decode' allowing to chose
- Added a new argument to the method 'encode', allowing to chose
between encoding into token ids or into token strings.
- Added a new argument to the method 'decode', allowing to chose
between decoding into single tokens or into plain text.
- Fixed a bug for embedding texts when using pytorch. The fix should
decrease computational time.

**TextEmbeddingClassifiers**

- Adding support for pytorch without the need for kerasV3
or keras-core. Classifiers for pytorch are now implemented in native pytorch.
- Changed the architecture for new classifiers and extended
the abilities of neural nets by adding the possibility to add positional embedding.
- Changed the architecture for new classifiers and extended
the abilities of neural nets by adding an alternative method for the self-attention
mechanism via fourier transformation (similar to FNet).
- Added balanced_accuracy as the new metric for determining
which state of a model predicts classes best.

- Adding support for pytorch without the need for kerasV3 or
keras-core. Classifiers for pytorch are now implemented in native
pytorch.
- Changed the architecture for new classifiers and extended the
abilities of neural nets by adding the possibility to add positional
embedding.
- Changed the architecture for new classifiers and extended the
abilities of neural nets by adding an alternative method for the
self-attention mechanism via fourier transformation (similar to
FNet).
- Added balanced_accuracy as the new metric for determining which
state of a model predicts classes best.
- Fixed error that training history is not saved correctly.
- Added a record metric for the test dataset to training
history with pytorch.
- Added the option to balance class weights for
calculating training loss according to the Inverse Frequency method.
Balance class weights is activated by default.
- Added a method for checking the compatibility of the underlying TextEmbeddingModels
of a classifier and an object of class EmbeddedText.
- Added a record metric for the test dataset to training history with
pytorch.
- Added the option to balance class weights for calculating training
loss according to the Inverse Frequency method. Balance class
weights is activated by default.
- Added a method for checking the compatibility of the underlying
TextEmbeddingModels of a classifier and an object of class
EmbeddedText.
- Added precision, recall, and f1-score as new metrics.

**Python Installation**
- Adding an argument to 'install_py_modules' allowing to choose which machine
learning framework should be installed.
- Updated 'check_aif_py_modules'.


**Python Installation** - Added an argument to 'install_py_modules',
allowing to choose which machine learning framework should be
installed. - Updated 'check_aif_py_modules'.

**Further Changes**

- Setting the machine learning framework is not longer necessary at the start
of a session. The function for setting the global ml_framework remains active
for convenience. The ml_framework can now switched at any time during a session.
- Updated documentation

- Setting the machine learning framework at the start of a session is
no longer necessary. The function for setting the global
ml_framework remains active for convenience. The ml_framework can
now be switched at any time during a session.
- Updated documentation.

# aifeducation 0.3.0

- Added DeBERTa and Funnel-Transformer support
- Fixed issues for installing the required python packages
- Fixed issues in training transformer models
- Fixed an issue for calculating the final iota values in classifiers if pseudo labeling is active
- Added support for PyTorch and Tensorflow for all transformer models
- Added support for PyTorch for classifier objects via keras 3 in the future
- Removed augmentation of vocabulary from training BERT models
- Updated documentation
- changed the reported values for kappa
- Added DeBERTa and Funnel-Transformer support.
- Fixed issues for installing the required python packages.
- Fixed issues in training transformer models.
- Fixed an issue for calculating the final iota values in classifiers
if pseudo labeling is active.
- Added support for PyTorch and Tensorflow for all transformer models.
- Added support for PyTorch for classifier objects via keras 3 in the
future.
- Removed augmentation of vocabulary from training BERT models.
- Updated documentation.
- Changed the reported values for kappa.

# aifeducation 0.2.0

Expand Down
44 changes: 26 additions & 18 deletions R/aif_gui.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#'@importFrom utils read.csv2
#'@importFrom utils write.csv2
#'@importFrom rlang .data
#'@importFrom methods is
#'
#'
#'@export
Expand Down Expand Up @@ -60,6 +61,11 @@ start_aifeducation_studio<-function(){
stop("No available machine learning frameworks found.")
}

#Exporting Functions for Python
#These functions must be available in the global environment
py_update_aifeducation_progress_bar_epochs<<-reticulate::py_func(update_aifeducation_progress_bar_epochs)
py_update_aifeducation_progress_bar_steps<<-reticulate::py_func(update_aifeducation_progress_bar_steps)

#Start GUI--------------------------------------------------------------------
options(shiny.reactlog=TRUE)
#options(shiny.fullstacktrace = TRUE)
Expand Down Expand Up @@ -676,8 +682,8 @@ start_aifeducation_studio<-function(){

#Logger for progressmodal
log=reactiveVal(value = rep(x="",times=15))
py_update_aifeducation_progress_bar_epochs<<-reticulate::py_func(update_aifeducation_progress_bar_epochs)
py_update_aifeducation_progress_bar_steps<<-reticulate::py_func(update_aifeducation_progress_bar_steps)
#py_update_aifeducation_progress_bar_epochs<<-reticulate::py_func(update_aifeducation_progress_bar_epochs)
#py_update_aifeducation_progress_bar_steps<<-reticulate::py_func(update_aifeducation_progress_bar_steps)



Expand Down Expand Up @@ -1126,7 +1132,7 @@ start_aifeducation_studio<-function(){
title = as.character(all_paths[i]))
tmp_document=readtext::readtext(file=all_paths[i])
#File name without extension
text_corpus[counter,"id"]=stringi::stri_split_regex(tmp_document$doc_id,pattern=".")[[1]][1]
text_corpus[counter,"id"]=stringi::stri_split_fixed(tmp_document$doc_id,pattern=".")[[1]][1]
text_corpus[counter,"text"]=tmp_document$text
counter=counter+1

Expand Down Expand Up @@ -1564,7 +1570,7 @@ start_aifeducation_studio<-function(){
create_longformer_model(
ml_framework=input$config_ml_framework,
model_dir=input$lm_save_created_model_dir_path,
vocab_raw_texts=vocab_raw_texts,
vocab_raw_texts=raw_texts$text,
vocab_size=input$lm_vocab_size,
add_prefix_space=input$lm_add_prefix_space,
trim_offsets=input$lm_trim_offsets,
Expand All @@ -1589,7 +1595,7 @@ start_aifeducation_studio<-function(){
create_funnel_model(
ml_framework=input$config_ml_framework,
model_dir=input$lm_save_created_model_dir_path,
vocab_raw_texts=vocab_raw_texts,
vocab_raw_texts=raw_texts$text,
vocab_size=input$lm_vocab_size,
vocab_do_lower_case=input$lm_vocab_do_lower_case,
max_position_embeddings=input$lm_max_position_embeddings,
Expand Down Expand Up @@ -2318,7 +2324,7 @@ start_aifeducation_studio<-function(){
model=try(load_ai_model(model_dir = model_path,
ml_framework=input$config_ml_framework),
silent = TRUE)
if(is(model,class2 = "try-error")==FALSE){
if(methods::is(model,class2 = "try-error")==FALSE){
if("TextEmbeddingModel"%in%class(model)){
if(utils::compareVersion(as.character(model$get_package_versions()$aifeducation),"0.3.1")>=0){
closeSweetAlert()
Expand Down Expand Up @@ -2549,7 +2555,7 @@ start_aifeducation_studio<-function(){
n_solutions=input$lm_n_fillments_for_fill_mask),
silent = TRUE)

if(is(solutions,class2 = "try-error")==FALSE){
if(methods::is(solutions,class2 = "try-error")==FALSE){
updateNumericInput(inputId = "lm_select_mask_for_fill_mask",
max=length(solutions))

Expand All @@ -2570,7 +2576,7 @@ start_aifeducation_studio<-function(){
plot_data=plot_data[order(plot_data$score,decreasing=FALSE),]
plot_data$token_str=factor(plot_data$token_str,levels=(plot_data$token_str))
plot=ggplot2::ggplot(data = plot_data)+
ggplot2::geom_col(ggplot2::aes(x=token_str,y=score))+
ggplot2::geom_col(ggplot2::aes(x=.data$token_str,y=.data$score))+
ggplot2::coord_flip()+
ggplot2::xlab("tokens")+
ggplot2::ylab("score")+
Expand Down Expand Up @@ -2817,7 +2823,7 @@ start_aifeducation_studio<-function(){
type="info")
model=try(load_ai_model(model_dir = lm_interface_for_documentation_path(),
ml_framework=input$config_ml_framework),silent = TRUE)
if(is(model,class2 = "try-error")==FALSE){
if(methods::is(model,class2 = "try-error")==FALSE){
if("TextEmbeddingModel"%in%class(model)){
if(utils::compareVersion(as.character(model$get_package_versions()$aifeducation),"0.3.1")>=0){
closeSweetAlert()
Expand Down Expand Up @@ -3185,7 +3191,7 @@ start_aifeducation_studio<-function(){
file_path=tec_target_data_for_train_path()
if(!is.null(file_path)){
if(file.exists(file_path)==TRUE){
extension=stringi::stri_split_regex(file_path,pattern=".")[[1]]
extension=stringi::stri_split_fixed(file_path,pattern=".")[[1]]
extension=stringi::stri_trans_tolower(extension[[length(extension)]])
show_alert(title="Loading",
text = "Please wait",
Expand All @@ -3208,6 +3214,8 @@ start_aifeducation_studio<-function(){
target_data=try(
as.data.frame(target_data),
silent = TRUE)
} else {
target_data=NA
}

#Final Check
Expand Down Expand Up @@ -3529,7 +3537,7 @@ start_aifeducation_studio<-function(){
classifier<-try(load_ai_model(model_dir = model_path,
ml_framework=input$config_ml_framework),
silent = TRUE)
if(is(classifier,class2 = "try-error")==FALSE){
if(methods::is(classifier,class2 = "try-error")==FALSE){
if("TextEmbeddingClassifierNeuralNet"%in%class(classifier)){
if(utils::compareVersion(as.character(classifier$get_package_versions()$r_package_versions$aifeducation),"0.3.1")>=0){
closeSweetAlert()
Expand Down Expand Up @@ -4110,20 +4118,20 @@ start_aifeducation_studio<-function(){
}

plot<-ggplot2::ggplot(data=plot_data)+
ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$train_mean,color="train"))+
ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$validation_mean,color="validation"))
ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$train_mean,color="train"))+
ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$validation_mean,color="validation"))

if(input$tec_performance_training_min_max==TRUE){
plot<-plot+
ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$train_min,color="train"))+
ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$train_max,color="train"))+
ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$train_min,color="train"))+
ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$train_max,color="train"))+
ggplot2::geom_ribbon(ggplot2::aes(x=.data$epoch,
ymin=.data$train_min,
ymax=.data$train_max),
alpha=0.25,
fill="red")+
ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$validation_min,color="validation"))+
ggplot2::geom_line(ggplot2::aes(x=epoch,y=.data$validation_max,color="validation"))+
ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$validation_min,color="validation"))+
ggplot2::geom_line(ggplot2::aes(x=.data$epoch,y=.data$validation_max,color="validation"))+
ggplot2::geom_ribbon(ggplot2::aes(x=.data$epoch,
ymin=.data$validation_min,
ymax=.data$validation_max),
Expand Down Expand Up @@ -4345,7 +4353,7 @@ start_aifeducation_studio<-function(){
classifier=try(load_ai_model(model_dir = tec_interface_for_documentation_path(),
ml_framework=input$config_ml_framework),
silent = TRUE)
if(is(classifier,class2 = "try-error")==FALSE){
if(methods::is(classifier,class2 = "try-error")==FALSE){
if("TextEmbeddingClassifierNeuralNet"%in%class(classifier)){
if(utils::compareVersion(as.character(classifier$get_package_versions()$r_package_versions$aifeducation),"0.3.1")>=0){
closeSweetAlert()
Expand Down
2 changes: 1 addition & 1 deletion R/install_and_config.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ install_py_modules<-function(envname="aifeducation",
"accelerate")

#Check Arguments
if(!(check%in%c("all","pytorch","tensorflow"))){
if(!(install%in%c("all","pytorch","tensorflow"))){
stop("install must be all, pytorch or tensorflow.")
}

Expand Down
5 changes: 2 additions & 3 deletions R/onLoad.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,11 @@ os<-NULL
keras<-NULL
accelerate<-NULL
safetensors<-NULL
py_update_aifeducation_progress_bar_epochs<-NULL
py_update_aifeducation_progress_bar_steps<-NULL

aifeducation_config<-NULL


py_update_aifeducation_progress_bar_epochs<<-NULL
py_update_aifeducation_progress_bar_steps<<-NULL

.onLoad<-function(libname, pkgname){
# use superassignment to update the global reference
Expand Down
1 change: 0 additions & 1 deletion R/te_classifier_neuralnet_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -2082,7 +2082,6 @@ TextEmbeddingClassifierNeuralNet<-R6::R6Class(
file_path=paste0(dir_path,"/","model_data",extension)
if(dir.exists(dir_path)==FALSE){
dir.create(dir_path)
#cat("Creating Directory\n")
}
self$model$save(file_path)
} else if(private$ml_framework=="pytorch"){
Expand Down
Loading

0 comments on commit 1219b4b

Please sign in to comment.