Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
amaiya committed Dec 2, 2020
2 parents b65dbdd + 1ee379a commit 5c9c6b3
Show file tree
Hide file tree
Showing 18 changed files with 270 additions and 43 deletions.
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,23 @@ Most recent releases are shown at the top. Each release shows:
- **Changed**: Additional parameters, changes to inputs or outputs, etc
- **Fixed**: Bug fixes that don't change documented behaviour

## 0.25.1 (2020-12-02)

### New:
- N/A

### Changed
- Added `use_dynamic_shape` parameter to `text.preprocessor.hf_convert_examples` which is set to `True` when running predictions. This reduces the input length when making predictions, if possible..
- Added warnings to some imports in `imports.py` to allow for slightly lighter-weight deployments
- Temporarily pinning to `transformers>=3.1,<4.0` due to breaking changes in v4.0.

### Fixed:
- Suppress progress bar in `predictor.predict` for `keras_bert` models
- Fixed typo causing problems when loading predictor for Inception models
- Fixes to address documented/undocumented breaking changes in `transformers>=4.0`. But, temporarily pinning to `transformers>=3.1,<4.0` for backwards compatibility.



## 0.25.0 (2020-11-08)

### New:
Expand Down
89 changes: 87 additions & 2 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@

- [Running `predictor.explain` for text classification is slow. How can I speed it up?](#running-predictorexplain-for-text-classification-is-slow--how-can-i-speed-it-up)

- [How do I make quantized predictions with `transformers` models?](#how-do-i-make-quantized-predictions-with-transformers-models)


---
Expand Down Expand Up @@ -129,7 +130,7 @@ model = ktrain.load_predictor('/tmp/my_predictor').model
learner = ktrain.get_learner(model, train_data=trn, val_data=val)
learner.fit_onecycle(2e-5, 1)
```
Note that `preproc` here is a *Preprocessor* instance. If using a data-loading function like `texts_from_csv` or `images_from_folder`, it will be the third return value from the function. Or, if using the [Transformer API](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A3-hugging_face_transformers.ipynb) for text classification, it will be the output of invoking `text.Transformer` (i.e., `preproc = text.Transformer('bert-base-uncased', ...)`).
Note that `preproc` here is a *Preprocessor* instance. If using a data-loading function like `texts_from_csv` or `images_from_folder`, it will be the third return value from the function. Or, if using the [Transformer API](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A3-hugging_face_transformers.ipynb) for text classification, it will be the output of invoking `text.Transformer` (i.e., `preproc = text.Transformer('bert-base-uncased', ...)`). Also, `trn` and `val` are typically the result of invoking `preproc.preprocess_train` and `preproc.preprocess_test`, respectively.


#### Method 2: Using `transformers` library (if training Hugging Face Transformers model)
Expand Down Expand Up @@ -279,7 +280,7 @@ predictor.predict_filename('C:/temp/cats_and_dogs_filtered/validation/cats/cat.2
### How do I use ktrain without an internet connection?

When using pretrained models or pretrained word embeddings in *ktrain*, files are automatically downloaded. For instance,
pretrained models and vocabulary files from the `transformers` library are downloaded to `<home_directory>/.cache/torch/transformers`
pretrained models and vocabulary files from the `transformers` library are downloaded to `<home_directory>/.cache/huggingface/transformers` (or `<home_directory>/.cache/torch/transformers` in older versions)
by default. Other data like pretrained word vectors are downloaded to the `<home_directory>/ktrain_data` folder.

In some settings, it is necessary to either train models or make predictions in environments with no internet
Expand Down Expand Up @@ -775,6 +776,90 @@ A number of models in **ktrain** can be used out-of-the-box on a CPU-based lapto
[[Back to Top](#frequently-asked-questions-about-ktrain)]


### How do I make quantized predictions with `transformers` models?

Quantization can improve the efficiency of neural network computations by reducing the size of the weights. For instance, when making predictions, representing weights with 8-bit integers instead of 32-bit floats can speed up inferences.

TensorFlow has built-in support for quantization. Unfortunately, as of this writing, it [only works for sequential and functional](https://github.com/tensorflow/tensorflow/issues/40699) `tf.keras` models, which means it cannot be used with Hugging Face `transformers` models.

As a workaround, you can convert your saved TensorFlow model to PyTorch, quantize, and make predictions directly in PyTorch.

This code example assumes you've trained a DistilBERT model with **ktrain** ,saved a `Predictor` in a folder called `'/tmp/mypredictor'`, and need to make quantized predictions on CPU:
```python
# Quantization Using PyTorch

# load the predictor, model, and tokenizer
from transformers import *
import ktrain
predictor = ktrain.load_predictor('/tmp/mypredictor')
model_pt = AutoModelForSequenceClassification.from_pretrained('/tmp/mypredictor', from_tf=True)
tokenizer = predictor.preproc.get_tokenizer() # or use AutoTokenizer.from_pretrained(predictor.preproc.model_name)
maxlen = predictor.preproc.maxlen
device = 'cpu'
class_names = predictor.preproc.get_classes()

# quantize model (INT8 quantization)
import torch
model_pt_quantized = torch.quantization.quantize_dynamic(
model_pt.to(device), {torch.nn.Linear}, dtype=torch.qint8)

# make quantized predictions (x_test is a list of strings representing documents)
preds = []
for doc in x_test:
model_inputs = tokenizer(doc, return_tensors="pt", max_length=maxlen, truncation=True)
model_inputs_on_device = { arg_name: tensor.to(device)
for arg_name, tensor in model_inputs.items()}
pred = model_pt_quantized(**model_inputs_on_device)
preds.append(class_names[ np.argmax( np.squeeze( pred[0].cpu().detach().numpy() ) ) ])

```

Note that the above example employs smaller inputs by eliminating padding in addition to using a quantized model. As discussed in [this blog post](https://blog.roblox.com/2020/05/scaled-bert-serve-1-billion-daily-requests-cpus/), both of these steps can speed up predictions in CPU deployment scenarios.

Alternatively, you might also consider quantizing your `transformers` model with the [convert_graph_to_onnx.py](https://github.com/huggingface/transformers/blob/master/src/transformers/convert_graph_to_onnx.py) script included with the `transformers` library, which can also be used as a module, as shown below.

```python
# Converting to ONNX (from PyTorch-converted model)

# imports
import numpy as np
from transformers.convert_graph_to_onnx import convert, optimize, quantize
from transformers import *
from pathlib import Path

# paths
predictor_path = '/tmp/mypredictor'
pt_path = predictor_path+'_pt'
pt_onnx_path = pt_path +'_onnx/model.onnx'

# convert to ONNX
p = ktrain.load_predictor(predictor_path)
AutoModelForSequenceClassification.from_pretrained(predictor_path,
from_tf=True).save_pretrained(pt_path)
convert(framework='pt', model=pt_path,output=Path(pt_onnx_path), opset=11,
tokenizer=p.preproc.model_name, pipeline_name='sentiment-analysis')
pt_onnx_quantized_path = quantize(optimize(Path(pt_onnx_path)))

# create ONNX session and make predictions
sess = p.create_onnx_session(pt_onnx_quant_name.as_posix())
tokenizer = p.preproc.get_tokenizer()
tokens = tokenizer.encode_plus('My computer monitor is blurry.', max_length=p.preproc.maxlen, truncation=True)
tokens = {name: np.atleast_2d(value) for name, value in tokens.items()}
print(p.get_classes()[np.argmax(sess.run(None, tokens)[0])])

# output:
# comp.graphics
```

The example above assumes the model saved at `predictor_path` was trained on a subset of the 20 Newsgroup corpus as was done in [this tutorial](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-A3-hugging_face_transformers.ipynb).

You can also use **ktrain** to [create ONNX models directly from TensorFlow](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/ktrain-ONNX-TFLite-examples.ipynb) with optional quantization. Note, that conversions to ONNX from TensorFlow models appear to [require a hard-coded input size](https://github.com/huggingface/transformers/issues/8227) (i.e., padding is used), whereas conversions to ONNX from PyTorch models do not appear to have this requirement.


[[Back to Top](#frequently-asked-questions-about-ktrain)]



### What kinds of applications have been built with *ktrain*?

Examples include:
Expand Down
4 changes: 2 additions & 2 deletions examples/text/ktrain-ONNX-TFLite-examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@
"doc = 'My computer monitor is blurry.'\n",
"maxlen = predictor.preproc.maxlen\n",
"tokenizer = predictor.preproc.get_tokenizer()\n",
"inputs = tokenizer(doc, max_length=maxlen, padding='max_length', return_tensors=\"tf\")\n",
"inputs = tokenizer(doc, max_length=maxlen, padding='max_length', truncation=True, return_tensors=\"tf\")\n",
"interpreter.set_tensor(input_details[0]['index'], inputs['attention_mask'])\n",
"interpreter.set_tensor(input_details[1]['index'], inputs['input_ids'])\n",
"interpreter.invoke()\n",
Expand Down Expand Up @@ -227,7 +227,7 @@
"doc = 'I received a chest x-ray at the hospital.'\n",
"maxlen = predictor.preproc.maxlen\n",
"tokenizer = predictor.preproc.get_tokenizer()\n",
"input_dict = tokenizer(doc, max_length=maxlen, padding='max_length')\n",
"input_dict = tokenizer(doc, max_length=maxlen, padding='max_length', truncation=True)\n",
"feed = {}\n",
"feed['input_ids'] = np.array(input_dict['input_ids']).astype('int32')[None,:]\n",
"feed['attention_mask'] = np.array(input_dict['attention_mask']).astype('int32')[None,:]\n",
Expand Down
2 changes: 1 addition & 1 deletion ktrain/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1474,7 +1474,7 @@ def load_predictor(fpath, batch_size=U.DEFAULT_BS):
elif preproc_name == 'mobilenet':
preproc.datagen.preprocessing_function = pre_mobilenet
elif preproc_name == 'inception':
preproc.datagen.preprocessing_function = pre_incpeption
preproc.datagen.preprocessing_function = pre_inception
else:
raise Exception('Uknown preprocessing_function name: %s' % (preproc_name))

Expand Down
1 change: 1 addition & 0 deletions ktrain/graph/data.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from ..imports import *
from .. import utils as U
from .preprocessor import NodePreprocessor, LinkPreprocessor
import networkx as nx


def graph_nodes_from_csv(nodes_filepath,
Expand Down
1 change: 1 addition & 0 deletions ktrain/graph/preprocessor.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from ..imports import *
from .. import utils as U
from ..preprocessor import Preprocessor
import networkx as nx


class NodePreprocessor(Preprocessor):
Expand Down
40 changes: 26 additions & 14 deletions ktrain/imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,33 +190,45 @@
except:
# fastprogress < v0.2.0
from fastprogress import master_bar, progress_bar
import keras_bert
from keras_bert import Tokenizer as BERT_Tokenizer


import requests
# verify=False added to avoid headaches from some corporate networks
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

# multilingual
# text processing
import syntok.segmenter as segmenter

# multilingual text processing
import langdetect
import jieba
import cchardet as chardet

# graphs
import networkx as nx
#from sklearn import preprocessing, feature_extraction, model_selection
# 'bert' text classification model
try:
import keras_bert
from keras_bert import Tokenizer as BERT_Tokenizer
except ImportError:
warnings.warn("keras_bert is not installed - needed only for 'bert' text classification model")

# ner
from seqeval.metrics import classification_report as ner_classification_report
from seqeval.metrics import f1_score as ner_f1_score
from seqeval.metrics import accuracy_score as ner_accuracy_score
from seqeval.metrics.sequence_labeling import get_entities
import syntok.segmenter as segmenter

# text.ner module
try:
from seqeval.metrics import classification_report as ner_classification_report
from seqeval.metrics import f1_score as ner_f1_score
from seqeval.metrics import accuracy_score as ner_accuracy_score
from seqeval.metrics.sequence_labeling import get_entities
except ImportError:
warnings.warn("seqeval is not installed - needed only by 'text.ner' module")


# transformers
# transformers for models in 'text' module
logging.getLogger("transformers").setLevel(logging.ERROR)
import transformers
try:
import transformers
except ImportError:
warnings.warn("transformers not installed - needed by various models in 'text' module")


try:
Expand Down
2 changes: 1 addition & 1 deletion ktrain/tests/test_qa.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def test_qa(self):
tmp_folder = tempfile.mkdtemp()
shutil.rmtree(tmp_folder)
text.SimpleQA.initialize_index(tmp_folder)
text.SimpleQA.index_from_list(docs, tmp_folder, commit_every=len(docs))
text.SimpleQA.index_from_list(docs, tmp_folder, commit_every=len(docs), multisegment=True)
qa = text.SimpleQA(tmp_folder)

answers = qa.ask('When did Cassini launch?')
Expand Down
4 changes: 4 additions & 0 deletions ktrain/text/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,10 @@ def texts_from_array(x_train, y_train, x_test=None, y_test=None,
Args:
x_train(list): list of training texts
y_train(list): labels in one of the following forms:
1. list of integers representing classes (class_names is required)
2. list of strings representing classes (class_names is not needed and ignored.)
3. a one or multi hot encoded array representing classes (class_names is required)
4. numerical values for text regresssion (class_names should be left empty)
x_test(list): list of training texts
y_test(list): labels in one of the following forms:
1. list of integers representing classes (class_names is required)
Expand Down
2 changes: 2 additions & 0 deletions ktrain/text/learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,8 @@ def predict(self, val_data=None):
if hasattr(val, 'reset'): val.reset()
classification, multilabel = U.is_classifier(self.model)
preds = self.model.predict(self._prepare(val, train=False))
if type(preds).__name__ == 'TFSequenceClassifierOutput': # dep_fix: undocumented breaking change in transformers==4.0.0
preds = preds.logits

# dep_fix: transformers in TF 2.2.0 returns a tuple insead of NumPy array for some reason
if isinstance(preds, tuple) and len(preds) == 1: preds = preds[0]
Expand Down
76 changes: 75 additions & 1 deletion ktrain/text/predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ def predict(self, texts, return_proba=False):
tseq.batch_size = self.batch_size
texts = tseq.to_tfdataset(train=False)
preds = self.model.predict(texts)
if type(preds).__name__ == 'TFSequenceClassifierOutput': # dep_fix: undocumented breaking change in transformers==4.0.0
preds = preds.logits

# dep_fix: transformers in TF 2.2.0 returns a tuple insead of NumPy array for some reason
if isinstance(preds, tuple) and len(preds) == 1: preds = preds[0]
else:
Expand Down Expand Up @@ -168,4 +171,75 @@ def _save_model(self, fpath):
return



def export_model_to_onnx(self, fpath, quantize=False, target_opset=None, verbose=1):
"""
Export model to onnx
Args:
fpath(str): String representing full path to model file where ONNX model will be saved.
Example: '/tmp/my_model.onnx'
quantize(str): If True, will create a total of three model files will be created using transformers.convert_graph_to_onnx:
1) ONNX model (created directly using keras2onnx
2) an optimized ONNX model (created by transformers library)
3) a quantized version of optimized ONNX model (created by transformers library)
All files will be created in the parent folder of fpath:
Example:
If fpath='/tmp/model.onnx', then both /tmp/model-optimized.onnx and
/tmp/model-optimized-quantized.onnx will also be created.
verbose(bool): verbosity
Returns:
str: string representing fpath. If quantize=True, returned fpath will be different than supplied fpath
"""
try:
import onnxruntime, onnxruntime_tools, onnx, keras2onnx
except ImportError:
raise Exception('This method requires ONNX libraries to be installed: '+\
'pip install -q --upgrade onnxruntime==1.5.1 onnxruntime-tools onnx keras2onnx')
from pathlib import Path
if type(self.preproc).__name__ == 'BERTPreprocessor':
raise Exception('currently_unsupported: BERT models created with text_classifier("bert",...) are not supported (i.e., keras_bert models). ' +\
'Only BERT models created with Transformer(...) are supported.')

if verbose: print('converting to ONNX format ... this may take a few moments...')
if U.is_huggingface(model=self.model):
tokenizer = self.preproc.get_tokenizer()
maxlen = self.preproc.maxlen
input_dict = tokenizer('Name', return_tensors='tf',
padding='max_length', max_length=maxlen)

if version.parse(tf.__version__) < version.parse('2.2'):
raise Exception('export_model_to_tflite requires tensorflow>=2.2')
#self.model._set_inputs(input_spec, training=False) # for tf < 2.2
self.model._saved_model_inputs_spec = None # for tf > 2.2
self.model._set_save_spec(input_dict) # for tf > 2.2
self.model._get_save_spec()

onnx_model = keras2onnx.convert_keras(self.model, self.model.name, target_opset=target_opset)
keras2onnx.save_model(onnx_model, fpath)
return_fpath = fpath

if quantize:
from transformers.convert_graph_to_onnx import optimize, quantize
#opt_path = optimize(Path(fpath))

if U.is_huggingface(model=self.model) and\
type(self.model).__name__ in ['TFDistilBertForSequenceClassification', 'TFBertForSequenceClassification']:
try:
from onnxruntime_tools import optimizer
from onnxruntime_tools.transformers.onnx_model_bert import BertOptimizationOptions
# disable embedding layer norm optimization for better model size reduction
opt_options = BertOptimizationOptions('bert')
opt_options.enable_embed_layer_norm = False
opt_model = optimizer.optimize_model(
fpath,
'bert', # bert_keras causes error with transformers
num_heads=12,
hidden_size=768,
optimization_options=opt_options)
opt_model.save_model_to_file(fpath)
except:
warnings.warn('Could not run BERT-specific optimizations')
pass
quantize_path = quantize(Path(fpath))
return_fpath = quantize_path.as_posix()
if verbose: print('done.')
return return_fpath

0 comments on commit 5c9c6b3

Please sign in to comment.