Changes

Most recent releases are shown at the top. Each release shows:

New: New classes, methods, functions, etc
Changed: Additional parameters, changes to inputs or outputs, etc
Fixed: Bug fixes that don't change documented behaviour

0.32.2 (2022-12-12)

new:

N/A

changed

N/A

fixed:

Changed call to NMF to use alpha_W instead of alpha, as alpha parameter was removed in scikit-learn==1.2.

0.32.1 (2022-12-11)

new:

N/A

changed

N/A

fixed:

In TensorFlow 2.11, the tf.optimizers.Optimizer base class points the new keras optimizer that seems to have problems. Users should use legacy optimizers in tf.keras.optimizers.legacy with ktrain (which evidently will never be deleted). This means that, in TF 2.11, supplying a string representation of an optimizer like "adam" to model.compile uses the new optimizer instead of the legacy optimizers. In these cases, ktrain will issue a warning and automatically recompile the model with the default tf.keras.optimizers.legacy.Adam optimizer.

0.32.0 (2022-12-08)

new:

Support for TensorFlow 2.11. For now, as recommended in the TF release notes, ktrain has been changed to use the legacy optimizers in tf.keras.optimizers.legacy. This means that, when compiling Keras models, you should supply tf.keras.optimizers.legacy.Adam() instead of the string "adam".
Support for Python 3.10. Changed references from CountVectorizer.get_field_names to CountVectorizer.get_field_names_out. Updated supported versions in setup.py.

changed

N/A

fixed:

fixed error in docs

0.31.10 (2022-10-01)

new:

N/A

changed

N/A

fixed:

Adjusted tika imports due to issue with /tmp/tika.log in multi-user scenario

0.31.9 (2022-09-24)

new:

N/A

changed

N/A

fixed:

Adjustment for kwe
Fixed problem with importing ktrain without TensorFlow installed

0.31.8 (2022-09-08)

new:

N/A

changed

N/A

fixed:

Fixed paragraph tokenization in AnswerExtractor

0.31.7 (2022-08-04)

new:

N/A

changed

re-arranged dep warnings for TF
ktrain now pinned to transformers==4.17.0. Python 3.6 users can downgrade to transformers==4.10.3 and still use ktrain.

fixed:

N/A

0.31.6 (2022-08-02)

new:

N/A

changed

updated dependencies to work with newer versions (but temporarily continue pinning to transformers==4.10.1)

fixed:

fixes for newer networkx

0.31.5 (2022-08-01)

new:

N/A

changed

N/A

fixed:

fix release

0.31.4 (2022-08-01)

new:

N/A

changed

TextPredictor.explain and ImagePredictor.explain now use a different fork of eli5: pip install https://github.com/amaiya/eli5-tf/archive/refs/heads/master.zip

fixed:

Fixed loss_fn_from_model function to work with DISABLE_V2_BEHAVIOR properly
TextPredictor.explain and ImagePredictor.explain now work with tensorflow>=2.9 and scipy>=1.9 (due to new eli5-tf fork -- see above)

0.31.3 (2022-07-15)

new:

N/A

changed

added alnum check and period check to KeywordExtractor

fixed:

fixed bug in text.qa.core caused by previous refactoring of paragraph_tokenize and tokenize

0.31.2 (2022-05-20)

new:

N/A

changed

added truncate_to argument (default:5000) and minchars argument (default:3) argument to KeywordExtractor.extract_keywords method.
added score_by argument to KeywordExtractor.extract_keywords. Default is freqpos, which means keywords are now ranked by a combination of frequency and position in document.

fixed:

N/A

0.31.1 (2022-05-17)

new:

N/A

changed

Allow for returning prediction probabilities when merging tokens in sequence-tagging (PR #445)
added basic ML pipeline test to workflow using latest TensorFlow

fixed:

N/A

0.31.0 (2022-05-07)

new:

The text.ner.models.sequence_tagger now supports word embeddings from non-BERT transformer models (e.g., roberta-base, openai-gpt). Thank to @Niekvdplas.
Custom tokenization can now be used in sequence-tagging even when using transformer word embeddings. See custom_tokenizer argument to NERPredictor.predict.

changed

[breaking change] In the text.ner.models.sequence_tagger function, the bilstm-bert model is now called bilstm-transformer and the bert_model parameter has been renamed to transformer_model.
[breaking change] The syntok package is now used as the default tokenizer for NERPredictor (sequence-tagging prediction). To use the tokenization scheme from older versions of ktrain, you can import the re and string packages and supply this function to the custom_tokenizer argument: lambda s: re.compile(f"([{string.punctuation}“”¨«»®´·º½¾¿¡§£₤‘’])").sub(r" \1 ", s).split().
Code base was reformatted using black
ktrain now supports TIKA for text extraction in the text.textractor.TextExtractor package with the use_tika=True argument as default. To use the old-style text extraction based on the textract package, you can supply use_tika=False to TextExtractor.
removed warning about sentence pair classification to avoid confusion

fixed:

N/A

0.30.0 (2022-03-28)

new:

ktrain now supports simple, fast, and robust keyphrase extraction with the ktran.text.kw.KeywordExtractor module
ktrain now only issues a warning if TensorFlow is not installed, insteading of halting and preventing further use. This means that pre-trained PyTorch models (e.g., text.zsl.ZeroShotClassifier) and sklearn models (e.g., text.eda.TopicModel) in ktrain can now be used without having TensorFlow installed.
text.qa.SimpleQA and text.qa.AnswerExtractor now both support PyTorch with optional quantization (use framework='pt' for PyTorch version)
text.qa.SimpleQA and text.qa.AnswerExtractor now both support a quantize argument that can speed up
text.zsl.ZeroShotClassifier, text.translation.Translator, and text.translation.EnglishTranslator all support a quantize argument.
pretrained image-captioning and object-detection via transformers is now supported

changed

reorganized imports
localized seqeval
The half parameter to text.translation.Translator, and text.translation.EnglishTranslator was changed to quantize and now supports both CPU and GPU.

fixed:

N/A

0.29.3 (2022-03-09)

new:

NERPredictor.predict now includes a return_offsets parameter. If True, the results will include character offsets of predicted entities.

changed

In eda.TopicModel, changed lda_max_iter to max_iter and nmf_alpha to alpha
Added show_counts parameter to TopicModel.get_topics method
Changed qa.core._process_question to qa.core.process_question
In qa.core, added remove_english_stopwords and and_np parameters to process_question
The valley learning rate suggestion is now returned in learner.lr_estimate and learner.lr_plot (when suggest=True supplied to learner.lr_plot)

fixed:

save TransformerEmbedding model, tokenizer, and configuration when saving NERPredictor and reset te_model to facilitate loading NERPredictors with BERT embeddings offline (#423)
switched from keras2onnx to tf2onnx, which supports newer versions of TensorFlow

0.29.2 (2022-02-09)

new:

N/A

changed

N/A

fixed:

added get_tokenizer call to TransformersPreprocessor._load_pretrained to address issue #416

0.29.1 (2022-02-08)

new:

N/A

changed

pin to sklearn==0.24.2 due to breaking changes. eli5 fork for tf.keras updated for 0.24.2. To use scikit-learn==0.24.2, users must uninstall and re-install the eli5 fork with: pip install https://github.com/amaiya/eli5/archive/refs/heads/tfkeras_0_10_1.zip

fixed:

N/A

0.29.0 (2022-01-28)

new:

New vision models: added MobileNetV3-Small and EfficientNet. Thanks to @ilos-vigil.

changed

core.Learner.plot now supports plotting of any value that exists in the training History object (e.g., mae if previously specified as metric). Thanks to @ilos-vigil.
added raw_confidence parameter to QA.ask method to return raw confidence scores. Thanks to @ilos-vigil.

fixed:

pin to transformers==4.10.3 due to Issue #398
pin to syntok==1.3.3 due to bug with syntok==1.4.1 causing paragraph tokenization in qa module to break
properly suppress TF/CUDA warnings by default
ensure document fed to keras_bert tokenizer to avoid this issue

0.28.3 (2021-11-05)

new:

speech transcription support

changed

N/A

fixed:

N/A

0.28.2 (2021-10-17)

new:

N/A

changed

minor fix to installation due to pypi

fixed:

N/A

0.28.1 (2021-10-17)

New:

N/A

Changed

added extra_requirements to setup.py
changed imports for summarization, translation, qa, and zsl in notebooks and tests

Fixed:

N/A

0.28.0 (2021-10-13)

New:

text.AnswerExtractor is a universal information extractor powered by a Question-Answering module and capable of extracting user-specfied information from texts.
text.TextExtractor is a text extraction pipeline (e.g., convert PDFs to plain text)

Changed

changed transformers pin to transformers>=4.0.0,<=4.10.3

Fixed:

N/A

0.27.3 (2021-09-03)

New:

N/A

Changed

-N/A

Fixed:

SimpleQA now can load PyTorch question-answering checkpoints
change API call to support newest causalnlp

0.27.2 (2021-07-28)

New:

N/A

Changed

N/A

Fixed:

check for logits attribute when predicting using transformers
change raised Exception to warning for longer sequence lengths for transformers

0.27.1 (2021-07-20)

New:

N/A

Changed

Added method parameter to tabular.causal_inference_model.

Fixed:

N/A

0.27.0 (2021-07-20)

New:

Added tabular.causal_inference_model function for causal inference support.

Changed

N/A

Fixed:

N/A

0.26.5 (2021-07-15)

New:

N/A

Changed

added query parameter to SimpleQA.ask so that an alternative query can be used to retrieve contexts from corpus
added chardet as dependency for stellargraph

Fixed:

fixed issue with TopicModel.build when threshold=None

0.26.4 (2021-06-23)

New:

API documenation index

Changed

Added warning when a TensorFlow version of selected transformers model is not available and the PyTorch version is being downloaded and converted instead using from_pt=True.

Fixed:

Fixed utils.metrics_from_model to support alternative metrics
Check for AUC ktrain.utils "inspect" function

0.26.3 (2021-05-19)

New:

N/A

Changed

shallownlp.ner.NER.predict processes lists of sentences in batches resulting in faster predictions
batch_size argument added to shallownlp.ner.NER.predict
added verbose parameter to ktrain.text.textutils.extract_copy to optionally see why each skipped document was skipped

Fixed:

Changed TextPredictor.save to save Hugging Face tokenizer files locally to ensure they can be easily reloaded when text.Transformer is supplied with local path.
For transformers models, the predictor.preproc.model_name variable is automatically updated to be new Predictor folder to avoid having users manually update model_name. Applies when a local path is supplied to text.Transformer and resultant Predictor is moved to new machine.

0.26.2 (2021-03-26)

New:

N/A

Changed

NERPredictor.predict now optionally accepts lists of sentences to make sequence-labeling predictions in batches (as all other Predictor instances already do).

Fixed:

N/A

0.26.1 (2021-03-11)

New:

N/A

Changed

expose errors from transformers in _load_pretrained
Changed TextPreprocessor.check_trained to be a warning instead of Exception

Fixed:

N/A

0.26.0 (2021-03-10)

New:

Support for transformers 4.0 and above.

Changed

added set_tokenizer to TransformerPreprocessor`
show error message when original weights cannot be saved (for reset_weights method)

Fixed:

cast filename to string before concatenating with suffix in images_from_csv and images_from_df (addresses issue #330)
resolved import error for sklearn>=0.24.0, but eli5 still requires sklearn<0.24.0.

0.25.4 (2021-01-26)

New:

N/A

Changed

N/A

Fixed:

fixed problem with LabelEncoder not properly being stored when texts_from_df is invoked
refrain from invoking max on empty sequence (#307)
corrected issue with return_proba=True in NER predictions (#316)

0.25.3 (2020-12-23)

New:

N/A

Changed

A steps_per_epoch argument has been added to all *fit* methods that operate on generators
Added get_tokenizer methods to all instances of TextPreprocessor

Fixed:

propogate custom metrics to model when distilbert is chosen in text_classifier and text_regression_model functions
pin scikit-learn to 0.24.0 sue to breaking change

0.25.2 (2020-12-05)

New:

N/A

Changed

N/A

Fixed:

Added custom_objects argument to load_predictor to load models with custom loss functions, etc.
Fixed bug #286 related to length computation when use_dynamic_shape=True

0.25.1 (2020-12-02)

New:

N/A

Changed

Added use_dynamic_shape parameter to text.preprocessor.hf_convert_examples which is set to True when running predictions. This reduces the input length when making predictions, if possible..
Added warnings to some imports in imports.py to allow for slightly lighter-weight deployments
Temporarily pinning to transformers>=3.1,<4.0 due to breaking changes in v4.0.

Fixed:

Suppress progress bar in predictor.predict for keras_bert models
Fixed typo causing problems when loading predictor for Inception models
Fixes to address documented/undocumented breaking changes in transformers>=4.0. But, temporarily pinning to transformers>=3.1,<4.0 for backwards compatibility.

0.25.0 (2020-11-08)

New:

The SimpleQA.index_from_folder method now supports text extraction from many file types including PDFs, MS Word documents, and MS PowerPoint files.

Changed

The default in SimpleQA.index_from_list and SimpleQA.index_from_folder has been changed to breakup_docs=True.

Fixed:

N/A

0.24.2 (2020-11-07)

New:

N/A

Changed

ktrain.text.textutils.extract_copy now uses textract to extract text from many file types (e.g., PDF, DOC, PPT) instead of just PDFs,

Fixed:

N/A

0.24.1 (2020-11-06)

New:

N/A

Changed

N/A

Fixed:

Change exception in model ID check in Translator to warning to better allow offline language translations

0.24.0 (2020-11-05)

New:

Predictor instances now provide built-in support for exporting to TensorFlow Lite and ONNX.

Changed

N/A

Fixed:

N/A

0.23.2 (2020-10-27)

New:

N/A

Changed

Use fast tokenizers for the following Hugging Face transformers models: BERT, DistilBERT, and RoBERTa models. This change affects models created with either text.Transformer(... or text.text_clasifier('distilbert',..'). BERT models created with text_classifier('bert',.., which uses keras_bert instead of transformers, are not affected by this change.

Fixed:

N/A

0.23.1 (2020-10-26)

New:

N/A

Changed

N/A

Fixed:

Resolved issue in qa.ask method occuring with embedding computations when full answer sentences exceed 512 tokens.

0.23.0 (2020-10-16)

New:

Support for upcoming release of TensorFlow 2.4 such as removal of references to obsolete multi_gpu_model

Changed

[breaking change] TopicModel.get_docs now returns a list of dicts instead of a list of tuples. Each dict has keys: text, doc_id, topic_proba, topic_id.
added TopicModel.get_document_topic_distribution
added TopicModel.get_sorted_docs method to return all documents sorted by relevance to a given topic_id

Fixed:

Changed version check warning in lr_find to a raised Exception to avoid confusion when warnings from ktrain are suppressed
Pass verbose parameter to hf_convert_examples

0.22.4 (2020-10-12)

New:

N/A

Changed

changed qa.core.display_answers to make URLs open in new tab

Fixed:

pin to seqeval==0.0.19 due to numpy version incompatibility with latest TensorFlow and to suppress errors during installation

0.22.3 (2020-10-09)

New:

N/A

Changed

N/A

Fixed:

fixed issue with missing noun phrase at end of sentence in extract_noun_phrases
fixed TensorFlow versioning issues with utils.metrics_from_model

0.22.2 (2020-10-09)

New:

added extract_noun_phrases to textutils

Changed

SimpleQA.ask now includes an include_np parameter. When True, noun phrases will be used to retrieve documents containing candidate answers.

Fixed:

N/A

0.22.1 (2020-10-08)

New:

N/A

Changed

added optional references argument to SimpleQA.index_from_list
added min_words argument to SimpleQA.index_from_list and SimpleQA.index_from_folder to prune small documents or paragraphs that are unlikely to include good answers
qa.display_answers now supports hyperlinks for document references

Fixed:

N/A

0.22.0 (2020-10-06)

New:

added breakup_docs argument to index_from_list and index_from_folder that potentially speeds up ask method substantially
added batch_size argument to ask and set default at 8 for faster answer-retrieval

Changed

refactored QA and SimpleQA for better extensibility

Fixed:

Ensure save_path is correctyl processed in Learner.evaluate

0.21.4 (2020-09-24)

New:

N/A

Changed

Changed installation instructions in README.md to reflect that using ktrain with TensorFlow 2.1 will require downgrading transformers to 3.1.0.
updated requirements with keras_bert>=0.86.0 due to TensorFlow 2.3 error with older versions of keras_bert
In lr_find and lr_plot, check for TF 2.2 or 2.3 and make necessary adjustments due to TF bug 41174.

Fixed:

fixed typos in __all__ in text and graph` modules (PR #250)
fixed Chinese language translation based on on name-changes of models with zh as source language

0.21.3 (2020-09-08)

New:

N/A

Changed

added TopicModel.get_word_weights method to retrieve the word weights for a given topic
added return_fig option to Learner.lr_plot and Learner.plot, which allows the matplotlib Figure to be returned to user

Fixed:

N/A

0.21.2 (2020-09-03)

New:

N/A

Changed

SUPPRESS_KTRAIN_WARNINGS environment variable changed to SUPPRESS_DEP_WARNINGS

Fixed:

N/A

0.21.1 (2020-09-03)

New:

N/A

Changed

added num_beams and early_stopping arguments to translate methods in translation module that can be set to improve translation speed
added half parameter to Translator construcor

Fixed:

N/A

0.21.0 (2020-09-03)

New:

Added translate_sentences method to Translator class that translates list of sentences, where list is fed to model as single batch

Changed

Removed TensorFlow dependency from setup.py to allow users to use ktrain with any version of TensorFlow 2 they choose.
Added truncation=True to tokenization in summarization module
Require transformers>=3.1.0 due to breaking changes
SUPPRESS_TF_WARNINGS environment variable changed to SUPPRESS_KTRAIN_WARNINGS

Fixed:

Use prepare_seq2seq_batch insteadd of prepare_translation_batch in translation module due to breaking change in transformers==3.1.0

0.20.2 (2020-08-27)

New:

N/A

Changed

N/A

Fixed:

Always use *Auto* classes to load transformers models to prevent loading errors

0.20.1 (2020-08-25)

New:

N/A

Changed

N/A

Fixed:

Added missing torch.no_grad() scope in text.translation and text.summarization modules

0.20.0 (2020-08-24)

New:

added nli_template parameter to ZeroShotClassifier.predict to allow versatility in the kinds of labels that can be predicted
efficiency improvements to ZeroShotClassifier.predict that allow faster predictions on large sequences of documents and a large numer of labels to predict
added 'multilabelparameter toZeroShotClassifier.predict`
added labels parameter to ZeroShotClassifer.predict, an alias to topic_strings parameter

Changed

N/A

Fixed:

Allow variations on accuracy metric such as binary_accuracy when inpecting model in is_classifier

0.19.9 (2020-08-17)

New:

N/A

Changed

N/A

Fixed:

In texts_from_array, check class_names only after preprocessing before printing classification vs. regression status.

0.19.8 (2020-08-17)

New:

N/A

Changed

N/A

Fixed:

In TextPreprocessor instances, correctly reset class_names when targets are in string format.

0.19.7 (2020-08-16)

New:

N/A

Changed

added class_weight parameter to lr_find for imbalanced datasets
removed pins for cchardet and scikitlearn from setup.py
added version check for eli5 fork
removed scipy pin from setup.py
Allow TensorFlow 2.3 for Python 3.8
Request manual installation of shap in TabularPredictor.explain instead of inclusion in setup.py

Fixed:

N/A

0.19.6 (2020-08-12)

New:

N/A

Changed

-N/A

Fixed:

include metrics check in is_classifier function to support with non-standard loss functions

0.19.5 (2020-08-11)

New:

N/A

Changed

-N/A

Fixed:

Ensure transition to YTransform is backwards compatibility for StandardTextPreprocessor and BertPreprocessor

0.19.4 (2020-08-10)

New:

N/A

Changed

TextPreprocessor instances now use YTransform class to transform targets
texts_from_df, texts_from_csv, and texts_from_array employ the use of either YTransformDataFrame or YTransform
images_from_df, images_from_fname, images_from_csv, and imagas_from_array use YTransformDataFrame or YTransform
Extra imports removed from PyTorch-based zsl.core.ZeroShotClassifier and summarization.core.TransformerSummarizer. If necessary, both can now be used without having TensorFlow installed by installing ktrain using --no-deps and importing these modules using a method like this.

Fixed:

N/A

0.19.3 (2020-08-05)

New:

N/A/

Changed

NERPredictor.predict was changed to accept an optional custom_tokenizer argument

Fixed:

N/A

0.19.2 (2020-08-03)

New:

N/A

Changed

N/A

Fixed:

added missing num_classes argument to to_categorical

0.19.1 (2020-07-29)

New:

N/A

Changed

Adjusted no_grad scope in ZeroShotClassifier.predict

Fixed:

N/A

0.19.0 (2020-07-29)

New:

support for tabular data including explainable AI for tabular predictions
learner.validate and learner.evaluate now support regression models
added restore_weights_only flag to lr_find. When True, only the model weights will be restored after simulating training, not the optimizer weights. In at least a few observed cases, this "warm up" seems to improve performance when actual training begins. Further investigation is needed, so it is False by default.

Changed

N/A

Fixed:

added save_path argument to Learner.validate and Learner.evaluate. If print_report=False, classification report will be saved as CSV to save_path.
Use torch.no_grad with ZeroShotClassifier.predict to prevent OOM
Added max_length parameter to ZeroShotClassifier.predict to prevent errors on long documnets
Added type check to TransformersPreprocessor.preprocess_train

0.18.5 (2020-07-20)

New:

N/A

Changed

N/A

Fixed:

Changed qa module to use use 'Auto' when loading QuestionAnswering models and tokenizer
try from_pt=True for qa module if initial model-loading fails
use get_hf_model_name in qa module

0.18.4 (2020-07-17)

New:

N/A

Changed

N/A

Fixed:

return gracefully if no documents match question in qa module
tokenize question in qa module to ensure all candidate documents are returned
Added error in text.preprocessor when training set has incomplete integer labels

0.18.3 (2020-07-12)

New:

added batch_size argument to ZeroShotClassifier.predict that can be increased to speed up predictions. This is especially useful if len(topic_strings) is large.

Changed

N/A

Fixed:

fixed typo in load_predictor error message

0.18.2 (2020-07-08)

New:

N/A

Changed

updated doc comments in core module
removed unused nosave parameter from reset_weights
added warning about obsolete show_wd parameter in print_layers method
pin to scipy==1.4.1 due to TensorFlow requirement

Fixed:

N/A

0.18.1 (2020-07-07)

New:

N/A

Changed

Use tensorflow==2.1.0 if Python 3.6/3.7 and use tensorflow==2.2.0 only if on Python 3.8 due to TensorFlow v2.2.0 issues

Fixed:

N/A

0.18.0 (2020-07-07)

New:

N/A

Changed

Fixes to address changes or issues in TensorFlow 2.2.0:
- created metrics_from_model function due to changes in the way metrics are extracted from compiled model
- use loss_fn_from_model function due to changes in they way loss functions are extracted from compiled model
- addd **kwargs to `AdamWeightDecay based on this issue
- changed TransformerTextClassLearner.predict and TextPredictor.predict to deal with tuples being returned by predict in TensorFlow 2.2.0
- changed multilabel test to use loss insead of accuracy due to TF 2.2.0 issue
- changed Learner.lr_find to use save_model and load_model to restore weights due to this TF issue and added TransformersPreprocessor.load_model_and_configure_from_data to support this

Fixed:

N/A

0.17.5 (2020-07-02)

New:

N/A

Changed

N/A

Fixed:

Explicitly supply 'truncate='longest_first' to prevent sentence pair classification from breaking in transformers==3.0.0
Fixed typo in encode_plus invocation

0.17.4 (2020-07-02)

New:

N/A

Changed

N/A

Fixed:

Explicitly supply 'truncate='longest_first' to prevent sentence pair classification from breaking in transformers==3.0.0

0.17.3 (2020-06-26)

New:

N/A

Changed

N/A

Fixed:

Changed setup.py to open README file using encoding="utf-8" to prevent installation problems on Windows machines with cp1252 encoding

0.17.2 (2020-06-25)

New:

Added support for Russian in text.EnglishTranslator

Changed

N/A

Fixed:

N/A

0.17.1 (2020-06-24)

New:

N/A

Changed

N/A

Fixed:

Properly set device in text.Translator and use cuda when available

0.17.0 (2020-06-24)

New:

support for language translation using pretraiend MarianMT models
added core.evaluate as alias to core.validate
Learner.estimate_lr method will return numerical estimates of learning rate using two different methods. Should only be called after running Learner.lr_find.

Changed

text.zsl.ZeroShotClassifier changed to use AutoModel* and AutoTokenizer in order to load any mlni model
remove external modules from ktrain.__init__.py so that they do not appear when pressing TAB in notebook
added Transformer.save_tokenizer and Transformer.get_tokenizer methods to facilitate training on machines with no internet

Fixed:

explicitly call plt.show() in LRFinder.plot_loss to resolved issues with plot not displaying in certain cases (PR #170)
suppress warning about text regression when making text regression predictions
allow xnli models for zsl module

0.16.3 (2020-06-10)

New:

added metrics parameter to text.text_classifier and text.text_regression_model functions
added metrics parameter to Transformer.get_classifier and Transformer.get_regrssion_model methods

Changed

metric parameter in vision.image_classifier and vision.image_regression_model functions changed to metrics

Fixed:

N/A

0.16.2 (2020-06-07)

New:

N/A

Changed

default model for summarization changed to facebook/bart-large-cnn due to breaking change in v2.11
added device argument to TransformerSummarizer constructor to control PyTorch device

Fixed:

require transformers>=2.11.0 due to breaking changes in v2.11 related to BART models

0.16.1 (2020-06-05)

New:

N/A

Changed

N/A/

Fixed:

prevent transformer tokenizers from being pickled during predictor.save, as it causes problems for some community-uploaded models like bert-base-japanese-whole-word-masking.

0.16.0 (2020-06-03)

New:

support for Zero-Shot Topic Classification via the text.ZeroShotClassifier.

Changed

N/A/

Fixed:

N/A

0.15.4 (2020-06-03)

New:

N/A

Changed

N/A/

Fixed:

Added the procs, limitmb, and multisegment argumetns to index_from_list and index_from_folder method in text.SimpleQA to speedup indexing when necessary. Supplying multisegment=True speeds things up significantly, for example. Defaults, however, are the same as before. Users must explicitly change values if desiring a speedup.
Load xlm-roberta* as jplu/tf-xlm-roberta* to bypass error from transformers

0.15.3 (2020-05-28)

New:

N/A

Changed

[breaking change] The multilabel argument in text.Transformer constructor was moved to Transformer.get_classifier and now correctly allows users to forcibly configure model for multilabel task regardless as to what data suggests. However, it is recommended to leave this value as None.
The methods predictor.save, ktrain.load_predictor, learner.save_model, learner.load_model all now accept a path to folder where all files (e.g., model file, .preproc file) will be saved. If path does not exist, it will be created. This should not be a breaking change as the load* methods will still look for files in the old location if model or predictor was saved using an older version of ktrain.

Fixed:

N/A

0.15.2 (2020-05-15)

New:

N/A

Changed

Added n_samples argument to TextPredictor.explain to address slowness of explain on Google Colab
Lock to version 0.21.3 of scikit-learn to ensure old-style explanations are generated from TextPredictor.explain

Fixed:

added missing import pickle to ensure saved topic models can be loaded

0.15.1 (2020-05-14)

New:

N/A

Changed

Changed Transformer.preprocess* methods to accept sentence pairs for sentence pair classification

Fixed:

N/A

0.15.0 (2020-05-13)

New:

Out-of-the-box support for image regression
vision.images_from_df function to load image data from pandas DataFrames

Changed

references to fit_generator and predict_generator converted to fit and predict

Fixed:

Resolved issue with multilabel detection returning False for valid multilabel problems when data is in form of generator

0.14.7 (2020-05-10)

New:

Added TFDataset class for use as wrapper around arbitrary tf.data.Dataset objects for use in ktrain

Changed

Added NERPreprocessor.preprocess_train_from_conll2003
Removed extraneous imports from text.__init__.py and vision.__init__.py
classes argument in images_from_array changed to class_names

Fixed:

ensure NER data is properly prepared text.ner.learner.validate
fixed typo with df reference in images_from_fname

0.14.6 (2020-05-06)

New:

If no validation data is supplied to images_from_array, training data is split to generate validation data

Changed

issue warning if Learner cannot save original weights
images_from_array accepts labels in the form of integer class IDs

Fixed:

fix pandas SettingwithCopyWarning from images_from_csv
fixed issue with return_proba=True including class labels for multilabel image classification
resolved issue with class labels not being set correctly in images_from_array
lock to cchardet==2.1.5 due to this issue
fixed y_from_data from NumpyArrayIterators in image classification

0.14.5 (2020-05-03)

New:

N/A

Changed

N/A

Fixed:

fixed issue with MobileNet model due to typo and added MobileNet example notebook

0.14.4 (2020-04-30)

New:

N/A

Changed

added merge_tokens and return_proba options to NERPredictor.predict

Fixed:

N/A

0.14.3 (2020-04-27)

New:

N/A

Changed

added textutils to text namespace and added note about sent_tokenize to sequence-tagging tutorial

Fixed:

cast dependent variable to tf.float32 instead of tf.int64 for text regression problems using transformers library

0.14.2 (2020-04-21)

New:

N/A

Changed

added suggest option to core.Learner.lr_plot

Fixed:

set interactive mode for matplotlib so plots show automatically from Python console and PyCharm
run prepare for NER sequence predictor to avoid matrix mismatch

0.14.1 (2020-04-17)

New:

N/A

Changed

N/A

Fixed:

ensure text.eda.TopicModel.visualize_documents works with bokeh v2.0.x

0.14.0 (2020-04-15)

New:

support for building Question-Answering systems
textutils now contains paragraph_tokenize function

Changed

N/A

Fixed:

resolved import issue with `textutils.sent_tokenize'

0.13.2 (2020-04-09)

New:

N/A

Changed

TransformerSummarizer accepts BART model_name as parameter

Fixed:

N/A

0.13.1 including 0.13.0 (2020-04-09)

New:

support for link prediction with graph neural networks
text summarization with pretrained BART (included in 0.13.1 but not in 0.13.0)
bigru method now selects pretrained word vectors based on detected language

Changed

instead of throwing error, default to English if detect_lang could not detect language from batch of texts
layers argument moved to TransformerEmbedding constructor
enforce specific version of TensorFlow due to undocumented breaking changes in newer TF versions
AdamWeightDecay optimizer is now used to support global weight decay. Used when user excplictly sets a weight decay

Fixed:

force re-instantiation of TransformerEmbedding object with sequence_tagger function is re-invoked

0.12.3 (2020-04-02)

New:

Added max_momentum and min_momentum parameters to autofit and fit_onecycle to control cyclical momentum

Changed

Prevent loading errors of previously saved NERPreprocessor objects

Fixed:

N/A

0.12.2 (2020-04-01)

New:

N/A

Changed

Require at least TensorFlow 2.1.0 is installed in setup.py due to TF 2.0.0 bug with lr_find
Added lower bounds to scikit-learn and networkx versions

Fixed:

N/A

0.12.1 (2020-04-01)

New:

N/A

Changed

N/A

Fixed:

check and ensure AllenNLP is installed when Elmo embeddings are selected for NER

0.12.0 (2020-03-31)

New:

BERT and Elmo embeddings for NER and other downstream tasks

Changed

wv_path_or_url parameter moved from entities_from* to sequence_taggers
Added use_char parameter and ensure it is not used unless DISABLE_V2_BEHAVIOR is enabled:
batch_size argument added to get_predictor and load_predictor
eval_batch_size argument added to get_learner
added val_pct argument to entities_from_array

Fixed:

properly set threshold in text.eda (PR #99)
fixed error when no validation data is supplied to entities_from_array

0.11.3 (2020-03-18)

New:

N/A

Changed:

N/A

Fixed:

prevent errors with reading word vector files on Windows by specifying encoding='utf-8'

0.11.2 (2020-03-18)

New:

N/A

Changed:

N/A

Fixed:

ktrain.text.eda.visualize_documents now properly processes filepath argument

0.11.1 (2020-03-18)

New:

entities_from_txt, entities_from_gmb, and entities_from_conll2003 functions now discover the encoding of the file automatically when encoding=None (which is the default now)

Changed:

N/A

Fixed:

N/A

0.11.0 (2020-03-18)

New:

sequence-taging (e.g., NER) now supports ELMo embeddings with use_elmo=True argument to data-loading functions like entities_from_array and entities_from_txtA
pretrained word embeddings (i.e., fasttext word2vec embeddings) can be specified by providing the URL to a .vec.gz file from here. The URL (or path) is supplied as wv_path_or_url argument to data-loading functions like entities_from_array and entities_from_txt
show_random_images: show random images from folder in Jupyter notebook
NERPreprocessor now includes a preprocess_test method for easier evaluation of test sets in datasets that contain a training, validation, and test set

Changed:

ensure DISABLE_V2_BEHAVIOR=True when ImagePredictor.explain is invoked
added SUPPRESS_TF_WARNINGS environment variable. Default is '1'. If set to '0', TF warnings will be displayed.
merge_entities method of ktrain.text.shallownlp.ner.NER changed to merge_tokens
moved load_predictor to constructor in krain.text.shallownlp.ner.NER
ktrain.text.shallownlp.ner.NER now supports predictor_path argument

Fixed:

convert class_names to strings in core.validate to prevent error from scikit-learn
fixed error arising when no data augmentation scheme is provided to the images_from* functions
fixed bug in images_from_fname to ensure supplied pattern is used
added val_folder argument to images_from_fname
raise Exception when preproc is not found in load_predictor
check for existence of preproc in text_classifier and text_regression_model
fixed text.eda so that detect_lang is called correctly after being moved to textutils

0.10.1 (2020-03-04)

New:

N/A

Changed:

shallownlp.Classifier.texts_from_folder changed to shallownlp.Classifier.load_texts_from_folder
shallownlp.Classifier.texts_from_csv changed to shallownlp.Classifier.load_texts_from_csv
In text.preprocessor, added warning that class_names is being ignored when class_names were supplied and y_train and y_test contain string labels

Fixed:

N/A

0.10.0 (2020-03-03)

New:

Transformer API in ktrain now supports using community-uploaded transformer models
added shallownlp module with out-of-the-box NER for English, Russian, and Chinese
text.eda module now supports NMF in addition to LDA

Changed:

texts_from_csv and texts_from_df now accept a single column of labels in string format and will 1-hot-encode labels automatically for classification or multi-class classification problems.
reorganized language-handling to text.textutils
more suppression of warnings due to spurious warnings from TF2 causing confusion in output
classes argument to Transformer constructor has been changed to class_names for consistency with texts_from_array

Fixed:

N/A

0.9.4 (2020-02-13)

New:

N/A

Changed:

changed pandas dependency to >=1.0.1 due to bug in pandas 1.0

Fixed:

N/A

0.9.3 (2020-02-11)

New:

N/A

Changed:

Transformed data containers for transformers, NER, and graph -node classification to be instances of ktrain.data.Dataset.

Fixed:

fixed images_from_array so that y labels are correctly 1-hot-encoded when necessary
correct tokenization for bert-base-japanese Transformer models from PR 57

0.9.2 (2020-02-04)

New:

N/A

Changed:

Removed Exception when distilbert is selected in text_classifier for non-English language after Hugging Face fixed the reported bug.

Fixed:

XLNet models like xlnet-base-cased now works after casting input arrays to int32
modified TextPredictor.explain to propogate correct error message from eli5 for multilabel text classification.

0.9.1 (2020-02-01)

New:

N/A

Changed:

N/A

Fixed:

fixed utils.nclasses_from_data for ktrain.Dataset instances
prevent detect_lang failing when Pandas Series is supplied

0.9.0 (2020-01-31)

New:

support for out-of-the-box text regression in both the Transformer API and conventional API (i.e., text.text_regression_model).

Changed:

text.TextPreprocessor prints sequence length statistics

Fixed:

auto-detect language when using Transformer class to prevent printing en as default

0.8.3 (2020-01-22)

New:

N/A

Changed:

MultiArrayDataset accepts list of Numpy arrays

Fixed:

fixed incorrect activation in TextPredictor for multi-label Transformer models
fixed top_losses for regression tasks

0.8.2 (2020-01-19)

New:

initial base ktrain.Dataset class for use as a Sequence wrapper to better support custom datasets/models

Changed:

N/A

Fixed:

N/A

0.8.1 (2020-01-15)

New:

N/A

Changed:

N/A

Fixed:

fix to support multilabel text classification in Transformers
_prepare_dataset no longer breaks when validation dataset has not been supplied

0.8.0 (2020-01-14)

New:

availability of a new, simplied interface to Hugging Face transformer models
added 'distilbert' as an available model in text.text_classifier function

Changed:

preproc argument is required for text.text_classifier

Fixed:

core._load_model calls _make_predict_function before returning model
added warning when non-adam optimizer is used with cycle_momentum=True

0.7.3 (2019-12-31)

New:

N/A

Changed:

N/A

Fixed:

Fixed error when using ktrain with v0.2.x of fastprogress. ktrain can now be used with both v0.1.x and v0.2.x of fastprogress

0.7.2 (2019-12-11)

New:

All data-loading functions (e.g., texts_from_csv) accept a random_state argument that will enable consistent reproduction of the train-test split.

Changed:

perform local checks for stellargraph where needed.
removed stellargraph as dependency due to issues with it overwriting tensorflow-gpu
change setup.py to skip navigation links for pypi page

Fixed:

N/A

0.7.1 (2019-12-11)

New:

All data-loading functions (e.g., texts_from_csv) accept a random_state argument that will enable consistent reproduction of the train-test split.

Changed:

perform local checks for stellargraph where needed.
removed stellargraph as dependency due to issues with it overwriting tensorflow-gpu

Fixed:

N/A

0.7.0 (2019-12-10)

New:

ktrain now uses tf.keras (tensorflow>=1.14,<=2.0) instead of stand-alone Keras.

Changed:

N/A

Fixed:

N/A

0.6.2 (2019-12-02)

New:

N/A

Changed:

Fixed:

added encoding argument when reading in word vectors to bypass error on Windows systems (PR #31)
Change preprocessing defaults and apply special preprocessing in text.eda.get_topic_model when non-English is detected.

0.6.1 (2019-11-16)

New:

N/A

Changed:

N/A

Fixed:

TextPredictor.explain now correcty supports non-English languages.
Parameter activation is no longer ignored in _build_bert function

0.6.0 (2019-11-12)

New:

support for learning from unlabeled or partially-labeled text data
- unsupervised topic modeling with LDA
- one-class text classification to score documents based on similarity to a set of positive examples
- document recommendation engine

Changed:

N/A

Fixed:

Removed dangling reference to external 'stellargraph' dependency from _load_model, so that we rely solely on local version of stellargraph

0.5.2 (2019-10-20)

New:

N/A

Changed:

N/A

Fixed:

Removed dangling reference to external 'stellargraph' dependency so that we rely solely on local version of stellargraph

0.5.1 (2019-10-17)

New:

N/A

Changed:

N/A

Fixed:

store a local version of stellargraph to prevent it from installing tensorflow-cpu and overriding existing tensorflow-gpu installation

0.5.0 (2019-10-16)

New:

Support for node classification in graphs with ktrain.graph module

Changed:

N/A

Fixed:

N/A

0.4.3 (2019-10-14)

New:

N/A

Changed:

N/A

Fixed:

Call reset before predict_generator for consistent ordering of view_top_losses results
Fixed incorrect reference to train_df instead of val_df in texts_from_df

0.4.2 (2019-10-01)

New:

All fit methods in ktrain now accept class_weight parameter to handle imbalanced datasets.

Changed:

N/A

Fixed:

Resolved problem with text_classifier incorrectly using uncased_L-12_H-768_A-12 to build BERT model instead of multi_cased_L-12_H-768_A-12 when non-English language was detected.
Fixed error messages releated to preproc requirement in text_classifier
Fixed test script for multingual text classification
Fixed rendering of Chinese in view_top_losses

0.4.1 (2019-10-01)

New:

N/A

Changed:

N/A

Fixed:

Fix problem with text_classifier incorrectly using uncased_L-12_H-768_A-12 to build BERT model instead of multi_cased_L-12_H-768_A-12 when non-English language was detected.

0.4.0 (2019-09-30)

New:

Added multilingual support for text classification.
Added experimental support for tf.keras. By default, ktrain will use standalone Keras. If os.environ['TF_KERAS'] is set, ktrian will attempt to use tf.keras. Some capabilities (e.g., predictor.explain for images) are not yet supported for tf.keras

Changed:

When BERT is selected, check to make sure dataset is correctly preprocessed for BERT

Fixed:

Fixed utils.bert_data_type and ensures it does more checks to validate BERT-style data

0.3.1 (2019-09-19)

New:

N/A

Changed:

globally import tensorflow
suppress tensorflow deprecation warnings from TF 1.14.0

Fixed:

Resolved issue with text_classifier failing when BERT is selected and Preprocessor is supplied.

0.3.0 (2019-09-17)

New:

Support for sequence tagging with Bidirectional LSTM-CRF. Word embeddings can currently be either random or word2vec(cbow). If latter chosen, word vectors will be downloaded automaticlaly from Facebook fasttext site.
Added ktra.text.texts_from_df function

Changed:

Added FutureWarning in text.text_classifier, that preproc will be required argument in future.
In text.text_classifier, when preproc=None, use the maximum feature ID to populate max_features.

Fixed:

Fixed construction of custom_objects dictionary for BERT to ensure load_model works for custom BERT models
Resolved issue with pretrained bigru models failing when max_features >= than total word count.

0.2.5 (2019-08-27)

New:

explain methods have been added to TextPredictor and ImagePredictor objects.
TextPredictor.predict_proba and ImagePredictor.predict_proba_* convenience methods have been added.
Added utils.is_classifier utility function

Changed:

TextPredictor.predict method can now accept a single document as input instead of always requiring a list.
Output of core.view_top_losses now includes the ground truth label of examples

Fixed:

Fixed test of data loading

0.2.4 (2019-08-20)

New:

added additional tests of ktrain

Changed:

Added classes argument to vision.images_from_folder. Only classes/subfolders matching a name in the classes list will be considered.

Fixed:

Resolved issue with using learner.view_top_losses with BERT models.

0.2.3 (2019-08-18)

New:

N/A

Changed:

Added classes argument to vision.images_from_folder. Only classes/subfolders matching a name in the classes list will be considered.

Fixed:

Fixed issue with learner.validate and learner.predict failing when validation data is in the form of an Iterator (e.g., DirectoryIterator).

0.2.2 (2019-08-16)

New:

N/A

Changed:

Added check in ktrain.lroptimize.lrfinder to stop training if learning rate exceeds a fixed maximum, which may happen when bad/dysfunctional model is supplied to learning rate finder.

Fixed:

In ktrain.text.data.texts_from_folder function, only subfolders specified in classes argument are read in as training and validation data.

0.2.1 (2019-08-15)

New:

N/A

Changed:

N/A

Fixed:

Fixed error related to validation_steps=None in call to fit_generator in ktrain.core on Google Colab.

0.2.0 (2019-08-12)

New:

Support for pretrained BERT Text Classification

Changed:

For Learner.lr_find, added optional max_epochs argument.
Changed Learner.confusion_matrix to Learner.validate and added optional val_data argument. The use_valid argument has been removed.
Removed pretrained_fpath argument to text.text_classifier. Pretrained word vectors are now downloaded automatically when 'bigru' is selected as model.

Fixed:

Further cleanup of utils.is_iter function to use type check.

0.1.10 (2019-08-02)

New:

N/A

Changed:

For Learner.lr_find, removed epochs and max_lr arguments and added lr_mult argument Default lr_mult is 1.01, but can be changed to control size of sample being used to estimate learning rate.
Changed structure of examples folder

Fixed:

Resolved issue with utils.y_from_data not working correctly with DataFrameIterator objects.

0.1.9 (2019-08-01)

New:

N/A

Changed:

Use class check in utils.is_iter as temporary fix
revert to epochs=5 for Learner.lr_find

Fixed:

N/A

0.1.8 (2019-06-04)

New:

N/A

Changed:

N/A

Fixed:

Learner.set_weight_decay now works correctly

0.1.7 (2019-05-24)

New:

BIGRU text classifier: Bidirectional GRU using pretrained word embeddings

Changed:

Epochs are calculated automatically in LRFinder

Fixed:

Number of epochs that Learner.lr_find runs can be explicitly set again

0.1.6 (2019-05-03)

New:

Changed:

relocated calls to tensorflow
installation instructions and reformatted examples

Fixed:

0.1.5 (2019-05-01)

New:

cycle_momentum argument for bothautofit and fit_onecycle method that will cycle momentum between 0.95 and 0.85 as described in this paper
Learner.plot method that will plot training-validation loss, LR schedule, or momentum schedule
added set_weight_decay and get_weight_decay methods to get/set "global" weight decay in Keras

Changed:

vision.data.preview_data_aug now displays images in rows by default
added multigpu flag to core.get_learner with comment that it is only supported byvision.model.image_classifier
added he_normal initialization to FastText model

Fixed:

Bug in vision.data.images_from_fname that prevented relative paths for directory argument
Bug in utils.y_from_data that returned incorrect information for array-based training/validation data
Bug in core.autofit with callback failure when validation data is not set
Bug in core.autofit and core.fit_onecycle with learning rate setting at end of cycle

0.1.4 (2019-04-10)

Last release without CHANGELOG updates

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changes

0.32.2 (2022-12-12)

new:

changed

fixed:

0.32.1 (2022-12-11)

new:

changed

fixed:

0.32.0 (2022-12-08)

new:

changed

fixed:

0.31.10 (2022-10-01)

new:

changed

fixed:

0.31.9 (2022-09-24)

new:

changed

fixed:

0.31.8 (2022-09-08)

new:

changed

fixed:

0.31.7 (2022-08-04)

new:

changed

fixed:

0.31.6 (2022-08-02)

new:

changed

fixed:

0.31.5 (2022-08-01)

new:

changed

fixed:

0.31.4 (2022-08-01)

new:

changed

fixed:

0.31.3 (2022-07-15)

new:

changed

fixed:

0.31.2 (2022-05-20)

new:

changed

fixed:

0.31.1 (2022-05-17)

new:

changed

fixed:

0.31.0 (2022-05-07)

new:

changed

fixed:

0.30.0 (2022-03-28)

new:

changed

fixed:

0.29.3 (2022-03-09)

new:

changed

fixed:

0.29.2 (2022-02-09)

new:

changed

fixed:

0.29.1 (2022-02-08)

new:

changed

fixed:

0.29.0 (2022-01-28)