Merge branch 'develop'

amaiya · Dec 2, 2020 · 5c9c6b3 · 5c9c6b3
2 parents b65dbdd + 1ee379a
commit 5c9c6b3
Show file tree

Hide file tree

Showing 18 changed files with 270 additions and 43 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,23 @@ Most recent releases are shown at the top. Each release shows:
 - **Changed**: Additional parameters, changes to inputs or outputs, etc
 - **Fixed**: Bug fixes that don't change documented behaviour
 
+## 0.25.1 (2020-12-02)
+
+### New:
+- N/A
+
+### Changed
+- Added `use_dynamic_shape` parameter to `text.preprocessor.hf_convert_examples` which is set to `True` when running predictions.  This reduces the input length when making predictions, if possible..
+- Added warnings to some imports in `imports.py` to allow for slightly lighter-weight deployments
+- Temporarily pinning to `transformers>=3.1,<4.0` due to breaking changes in v4.0.
+
+### Fixed:
+- Suppress progress bar in `predictor.predict` for `keras_bert` models
+- Fixed typo causing problems when loading predictor for Inception models
+- Fixes to address documented/undocumented breaking changes in `transformers>=4.0`. But, temporarily pinning to `transformers>=3.1,<4.0` for backwards compatibility.
+
+
+
 ## 0.25.0 (2020-11-08)
 
 ### New:

diff --git a/FAQ.md b/FAQ.md
@@ -59,6 +59,7 @@
 
 - [Running `predictor.explain` for text classification is slow.  How can I speed it up?](#running-predictorexplain-for-text-classification-is-slow--how-can-i-speed-it-up)
 
+- [How do I make quantized predictions with `transformers` models?](#how-do-i-make-quantized-predictions-with-transformers-models)
 
 
 ---
@@ -129,7 +130,7 @@ model = ktrain.load_predictor('/tmp/my_predictor').model
 learner = ktrain.get_learner(model, train_data=trn, val_data=val)
 learner.fit_onecycle(2e-5, 1)
 ```
-Note that `preproc` here is a *Preprocessor* instance.  If using a data-loading function like `texts_from_csv` or `images_from_folder`, it will be the third return value from the function. Or, if using the [Transformer API](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A3-hugging_face_transformers.ipynb) for text classification, it will be the output of invoking `text.Transformer` (i.e., `preproc = text.Transformer('bert-base-uncased', ...)`).
+Note that `preproc` here is a *Preprocessor* instance.  If using a data-loading function like `texts_from_csv` or `images_from_folder`, it will be the third return value from the function. Or, if using the [Transformer API](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A3-hugging_face_transformers.ipynb) for text classification, it will be the output of invoking `text.Transformer` (i.e., `preproc = text.Transformer('bert-base-uncased', ...)`).  Also, `trn` and `val` are typically the result of invoking `preproc.preprocess_train` and `preproc.preprocess_test`, respectively.
 
 
 #### Method 2: Using `transformers` library (if training Hugging Face Transformers model)
@@ -279,7 +280,7 @@ predictor.predict_filename('C:/temp/cats_and_dogs_filtered/validation/cats/cat.2
 ### How do I use ktrain without an internet connection?
 
 When using pretrained models or pretrained word embeddings in *ktrain*, files are automatically downloaded.  For instance,
-pretrained models and vocabulary files from the `transformers` library are downloaded to `<home_directory>/.cache/torch/transformers`
+pretrained models and vocabulary files from the `transformers` library are downloaded to `<home_directory>/.cache/huggingface/transformers` (or `<home_directory>/.cache/torch/transformers` in older versions)
 by default.  Other data like pretrained word vectors are downloaded to the `<home_directory>/ktrain_data` folder.
 
 In some settings, it is necessary to either train models or make predictions in environments with no internet 
@@ -775,6 +776,90 @@ A number of models in **ktrain** can be used out-of-the-box on a CPU-based lapto
 [[Back to Top](#frequently-asked-questions-about-ktrain)]
 
 
+### How do I make quantized predictions with `transformers` models?
+
+Quantization can improve the efficiency of neural network computations by reducing the size of the weights.  For instance, when making predictions, representing weights with 8-bit integers instead of 32-bit floats can speed up inferences.
+
+TensorFlow has built-in support for quantization.  Unfortunately, as of this writing, it [only works for sequential and functional](https://github.com/tensorflow/tensorflow/issues/40699) `tf.keras` models, which means it cannot be used with Hugging Face `transformers` models.
+
+As a workaround, you can convert your saved TensorFlow model to PyTorch, quantize, and make predictions directly in PyTorch. 
+
+This code example assumes you've trained a DistilBERT model with **ktrain** ,saved a `Predictor` in a folder called `'/tmp/mypredictor'`, and need to make quantized predictions on CPU:
+```python
+# Quantization Using PyTorch
+
+# load the predictor, model, and tokenizer
+from transformers import *
+import ktrain
+predictor = ktrain.load_predictor('/tmp/mypredictor')
+model_pt = AutoModelForSequenceClassification.from_pretrained('/tmp/mypredictor', from_tf=True)
+tokenizer = predictor.preproc.get_tokenizer() # or use AutoTokenizer.from_pretrained(predictor.preproc.model_name)
+maxlen = predictor.preproc.maxlen
+device = 'cpu'
+class_names = predictor.preproc.get_classes()
+
+# quantize model (INT8 quantization)
+import torch
+model_pt_quantized = torch.quantization.quantize_dynamic(
+    model_pt.to(device), {torch.nn.Linear}, dtype=torch.qint8)
+
+# make quantized predictions (x_test is a list of strings representing documents)
+preds = []
+for doc in x_test:
+    model_inputs = tokenizer(doc, return_tensors="pt", max_length=maxlen, truncation=True)
+    model_inputs_on_device = { arg_name: tensor.to(device) 
+                              for arg_name, tensor in model_inputs.items()}
+    pred = model_pt_quantized(**model_inputs_on_device)
+    preds.append(class_names[ np.argmax( np.squeeze( pred[0].cpu().detach().numpy() ) ) ]) 
+
+```
+
+Note that the above example employs smaller inputs by eliminating padding in addition to using a quantized model.  As discussed in [this blog post](https://blog.roblox.com/2020/05/scaled-bert-serve-1-billion-daily-requests-cpus/), both of these steps can speed up predictions in CPU deployment scenarios.
+
+Alternatively, you might also consider quantizing your `transformers` model with the [convert_graph_to_onnx.py](https://github.com/huggingface/transformers/blob/master/src/transformers/convert_graph_to_onnx.py) script included with the `transformers` library, which can also be used as a module, as shown below.
+
+```python
+# Converting to ONNX (from PyTorch-converted model)
+
+# imports
+import numpy as np
+from transformers.convert_graph_to_onnx import convert, optimize, quantize
+from transformers import *
+from pathlib import Path
+
+# paths
+predictor_path = '/tmp/mypredictor'
+pt_path = predictor_path+'_pt'
+pt_onnx_path = pt_path +'_onnx/model.onnx'
+
+# convert to ONNX
+p = ktrain.load_predictor(predictor_path)
+AutoModelForSequenceClassification.from_pretrained(predictor_path, 
+                                                   from_tf=True).save_pretrained(pt_path)
+convert(framework='pt', model=pt_path,output=Path(pt_onnx_path), opset=11, 
+        tokenizer=p.preproc.model_name, pipeline_name='sentiment-analysis')
+pt_onnx_quantized_path = quantize(optimize(Path(pt_onnx_path)))
+
+# create ONNX session and make predictions
+sess = p.create_onnx_session(pt_onnx_quant_name.as_posix())
+tokenizer = p.preproc.get_tokenizer()
+tokens = tokenizer.encode_plus('My computer monitor is blurry.', max_length=p.preproc.maxlen, truncation=True)
+tokens = {name: np.atleast_2d(value) for name, value in tokens.items()}
+print(p.get_classes()[np.argmax(sess.run(None, tokens)[0])])
+
+# output:
+# comp.graphics
+```
+
+The example above assumes the model saved at `predictor_path` was trained on a subset of the 20 Newsgroup corpus as was done in [this tutorial](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A3-hugging_face_transformers.ipynb).
+
+You can also use **ktrain** to [create ONNX models directly from TensorFlow](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/ktrain-ONNX-TFLite-examples.ipynb) with optional quantization.  Note, that conversions to ONNX from TensorFlow models appear to [require a hard-coded input size](https://github.com/huggingface/transformers/issues/8227) (i.e., padding is used), whereas conversions to ONNX from PyTorch models do not appear to have this requirement.
+
+
+[[Back to Top](#frequently-asked-questions-about-ktrain)]
+
+
+
 ### What kinds of applications have been built with *ktrain*?
 
 Examples include:

diff --git a/examples/text/ktrain-ONNX-TFLite-examples.ipynb b/examples/text/ktrain-ONNX-TFLite-examples.ipynb
@@ -146,7 +146,7 @@
     "doc = 'My computer monitor is blurry.'\n",
     "maxlen = predictor.preproc.maxlen\n",
     "tokenizer = predictor.preproc.get_tokenizer()\n",
-    "inputs = tokenizer(doc, max_length=maxlen, padding='max_length', return_tensors=\"tf\")\n",
+    "inputs = tokenizer(doc, max_length=maxlen, padding='max_length', truncation=True, return_tensors=\"tf\")\n",
     "interpreter.set_tensor(input_details[0]['index'], inputs['attention_mask'])\n",
     "interpreter.set_tensor(input_details[1]['index'], inputs['input_ids'])\n",
     "interpreter.invoke()\n",
@@ -227,7 +227,7 @@
     "doc = 'I received a chest x-ray at the hospital.'\n",
     "maxlen = predictor.preproc.maxlen\n",
     "tokenizer = predictor.preproc.get_tokenizer()\n",
-    "input_dict = tokenizer(doc, max_length=maxlen, padding='max_length')\n",
+    "input_dict = tokenizer(doc, max_length=maxlen, padding='max_length', truncation=True)\n",
     "feed = {}\n",
     "feed['input_ids'] = np.array(input_dict['input_ids']).astype('int32')[None,:]\n",
     "feed['attention_mask'] = np.array(input_dict['attention_mask']).astype('int32')[None,:]\n",

diff --git a/ktrain/core.py b/ktrain/core.py
@@ -1474,7 +1474,7 @@ def load_predictor(fpath, batch_size=U.DEFAULT_BS):
         elif preproc_name == 'mobilenet':
             preproc.datagen.preprocessing_function = pre_mobilenet
         elif preproc_name == 'inception':
-            preproc.datagen.preprocessing_function = pre_incpeption
+            preproc.datagen.preprocessing_function = pre_inception
         else:
             raise Exception('Uknown preprocessing_function name: %s' % (preproc_name))
 

diff --git a/ktrain/graph/data.py b/ktrain/graph/data.py
@@ -1,6 +1,7 @@
 from ..imports import *
 from .. import utils as U
 from .preprocessor import NodePreprocessor, LinkPreprocessor
+import networkx as nx
 
 
 def graph_nodes_from_csv(nodes_filepath, 

diff --git a/ktrain/graph/preprocessor.py b/ktrain/graph/preprocessor.py
@@ -1,6 +1,7 @@
 from ..imports import *
 from .. import utils as U
 from ..preprocessor import Preprocessor
+import networkx as nx
 
 
 class NodePreprocessor(Preprocessor):

diff --git a/ktrain/imports.py b/ktrain/imports.py
@@ -190,33 +190,45 @@
 except:
     # fastprogress < v0.2.0
     from fastprogress import master_bar, progress_bar 
-import keras_bert
-from keras_bert import Tokenizer as BERT_Tokenizer
+
+
 import requests
 # verify=False added to avoid headaches from some corporate networks
 from requests.packages.urllib3.exceptions import InsecureRequestWarning
 requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
 
-# multilingual
+# text processing
+import syntok.segmenter as segmenter
+
+# multilingual text processing
 import langdetect
 import jieba
 import cchardet as chardet
 
-# graphs
-import networkx as nx
-#from sklearn import preprocessing, feature_extraction, model_selection
+# 'bert' text classification model
+try:
+    import keras_bert
+    from keras_bert import Tokenizer as BERT_Tokenizer
+except ImportError:
+    warnings.warn("keras_bert is not installed - needed only for 'bert' text classification model")
 
-# ner
-from seqeval.metrics import classification_report as ner_classification_report
-from seqeval.metrics import f1_score as ner_f1_score
-from seqeval.metrics import accuracy_score as ner_accuracy_score
-from seqeval.metrics.sequence_labeling import get_entities
-import syntok.segmenter as segmenter
+
+# text.ner module
+try:
+    from seqeval.metrics import classification_report as ner_classification_report
+    from seqeval.metrics import f1_score as ner_f1_score
+    from seqeval.metrics import accuracy_score as ner_accuracy_score
+    from seqeval.metrics.sequence_labeling import get_entities
+except ImportError:
+    warnings.warn("seqeval is not installed - needed only by 'text.ner' module")
 
 
-# transformers
+# transformers for models in 'text' module
 logging.getLogger("transformers").setLevel(logging.ERROR)
-import transformers
+try:
+    import transformers
+except ImportError:
+    warnings.warn("transformers not installed - needed by various models in 'text' module")
 
 
 try:

diff --git a/ktrain/tests/test_qa.py b/ktrain/tests/test_qa.py
@@ -27,7 +27,7 @@ def test_qa(self):
         tmp_folder = tempfile.mkdtemp()
         shutil.rmtree(tmp_folder)
         text.SimpleQA.initialize_index(tmp_folder)
-        text.SimpleQA.index_from_list(docs, tmp_folder, commit_every=len(docs))
+        text.SimpleQA.index_from_list(docs, tmp_folder, commit_every=len(docs),  multisegment=True)
         qa = text.SimpleQA(tmp_folder)
 
         answers = qa.ask('When did Cassini launch?')

diff --git a/ktrain/text/data.py b/ktrain/text/data.py
@@ -309,6 +309,10 @@ def texts_from_array(x_train, y_train, x_test=None, y_test=None,
     Args:
         x_train(list): list of training texts 
         y_train(list): labels in one of the following forms:
+                       1. list of integers representing classes (class_names is required)
+                       2. list of strings representing classes (class_names is not needed and ignored.)
+                       3. a one or multi hot encoded array representing classes (class_names is required)
+                       4. numerical values for text regresssion (class_names should be left empty)
         x_test(list): list of training texts 
         y_test(list): labels in one of the following forms:
                        1. list of integers representing classes (class_names is required)

diff --git a/ktrain/text/learner.py b/ktrain/text/learner.py
@@ -150,6 +150,8 @@ def predict(self, val_data=None):
         if hasattr(val, 'reset'): val.reset()
         classification, multilabel = U.is_classifier(self.model)
         preds = self.model.predict(self._prepare(val, train=False))
+        if type(preds).__name__ == 'TFSequenceClassifierOutput': # dep_fix: undocumented breaking change in transformers==4.0.0
+            preds = preds.logits
 
         # dep_fix: transformers in TF 2.2.0 returns a tuple insead of NumPy array for some reason
         if isinstance(preds, tuple) and len(preds) == 1: preds = preds[0] 

diff --git a/ktrain/text/predictor.py b/ktrain/text/predictor.py
@@ -51,6 +51,9 @@ def predict(self, texts, return_proba=False):
             tseq.batch_size = self.batch_size
             texts = tseq.to_tfdataset(train=False)
             preds = self.model.predict(texts)
+            if type(preds).__name__ == 'TFSequenceClassifierOutput': # dep_fix: undocumented breaking change in transformers==4.0.0
+                preds = preds.logits
+
             # dep_fix: transformers in TF 2.2.0 returns a tuple insead of NumPy array for some reason
             if isinstance(preds, tuple) and len(preds) == 1: preds = preds[0] 
         else:
@@ -168,4 +171,75 @@ def _save_model(self, fpath):
         return
 
 
-
+    def export_model_to_onnx(self, fpath, quantize=False, target_opset=None, verbose=1):
+        """
+        Export model to onnx
+        Args:
+          fpath(str): String representing full path to model file where ONNX model will be saved.
+                      Example: '/tmp/my_model.onnx'
+          quantize(str): If True, will create a total of three model files will be created using transformers.convert_graph_to_onnx: 
+                         1) ONNX model  (created directly using keras2onnx
+                         2) an optimized ONNX model (created by transformers library)
+                         3) a quantized version of optimized ONNX model (created by transformers library)
+                         All files will be created in the parent folder of fpath:
+                         Example: 
+                           If fpath='/tmp/model.onnx', then both /tmp/model-optimized.onnx and
+                           /tmp/model-optimized-quantized.onnx will also be created.
+          verbose(bool): verbosity
+        Returns:
+          str: string representing fpath.  If quantize=True, returned fpath will be different than supplied fpath
+        """
+        try:
+            import onnxruntime, onnxruntime_tools, onnx, keras2onnx
+        except ImportError:
+            raise Exception('This method requires ONNX libraries to be installed: '+\
+                            'pip install -q --upgrade onnxruntime==1.5.1 onnxruntime-tools onnx keras2onnx')
+        from pathlib import Path
+        if type(self.preproc).__name__ == 'BERTPreprocessor':
+            raise Exception('currently_unsupported:  BERT models created with text_classifier("bert",...) are not supported (i.e., keras_bert models). ' +\
+                            'Only BERT models created with Transformer(...) are supported.')
+
+        if verbose: print('converting to ONNX format ... this may take a few moments...')
+        if U.is_huggingface(model=self.model):
+            tokenizer = self.preproc.get_tokenizer()
+            maxlen = self.preproc.maxlen
+            input_dict = tokenizer('Name', return_tensors='tf',
+                                   padding='max_length', max_length=maxlen)
+
+            if version.parse(tf.__version__) < version.parse('2.2'):
+                raise Exception('export_model_to_tflite requires tensorflow>=2.2')
+                #self.model._set_inputs(input_spec, training=False) # for tf < 2.2
+            self.model._saved_model_inputs_spec = None # for tf > 2.2
+            self.model._set_save_spec(input_dict) # for tf > 2.2
+            self.model._get_save_spec()
+
+        onnx_model = keras2onnx.convert_keras(self.model, self.model.name, target_opset=target_opset)
+        keras2onnx.save_model(onnx_model, fpath)
+        return_fpath = fpath
+
+        if quantize:
+            from transformers.convert_graph_to_onnx import optimize, quantize
+            #opt_path = optimize(Path(fpath))
+
+            if U.is_huggingface(model=self.model) and\
+               type(self.model).__name__ in ['TFDistilBertForSequenceClassification', 'TFBertForSequenceClassification']:
+                try:
+                    from onnxruntime_tools import optimizer
+                    from onnxruntime_tools.transformers.onnx_model_bert import BertOptimizationOptions
+                    # disable embedding layer norm optimization for better model size reduction
+                    opt_options = BertOptimizationOptions('bert')
+                    opt_options.enable_embed_layer_norm = False
+                    opt_model = optimizer.optimize_model(
+                        fpath,
+                       'bert',  # bert_keras causes error with transformers
+                        num_heads=12,
+                        hidden_size=768,
+                        optimization_options=opt_options)
+                    opt_model.save_model_to_file(fpath)
+                except:
+                    warnings.warn('Could not run BERT-specific optimizations')
+                    pass
+            quantize_path = quantize(Path(fpath))
+            return_fpath = quantize_path.as_posix()
+        if verbose: print('done.')
+        return return_fpath