TinyMLTextClassification

Experiments with Tensor Flow Lite

To process a machine learning model on a microcontroller it needs to be small and fast. Tensor flow processes numbers not words so the first thing to do is to convert the data input into an array of numbers. Conventionally this is done with a big lookup table where each word is mapped to a number. I realised on a microcontroller such as the SAMD21 Cortex-M0+ 32bit low power ARM MCU used in my target board would not be able to store that table as well as rest of the code and machine learning model.

So my thought was to try with a hash function that could be easily reproduced on both the Python training environment and over on the Arduino MKR Zero.

Tokenisation using a hash function

My first challenge was to hook up a fast hash function to TensorFlow TextEncoder so that the words were encoded without that need for the lookup table. I used the Super Fast Hash by Paul Hsieh which has ports for Python and C.

Shrinking the model

For the model there do seem to be some slightly different approaches to building the classifier. So I've tried a couple of variations and have been tuning the parameters. I need to get the model size down small enough to fit into the little 256 KB flash memory with space left for the rest of my code.

Building the model

I've been using tinymlgen to export the model but I may need to dig into that and produce my own variation with more optimisations. You can follow the build process using the Juypter notebook.

Machine Learning Notebook

Problems running the model

Still a key outstaning issue is that this text classifier is failing to load onto the Arduino MKR board. I think the issue is that it is too big to run in memory.

Optimisation

Did some experiments with the optimisations.

optimizers = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]

results in

Initialising...
Type FLOAT16 (10) not is not supported
Failed to initialize tensor 1
MicroAllocator: Failed to initialize.
AllocateTensors() failed

Tried also:

# From TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power
def representative_dataset_gen():
    for value in test_dataset:
        yield np.array(value,dtype=np.dtype((np.float32,8)),ndmin=2)

...

optimizers = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

but could not work out how to get a generator to produce data in the right way

Simpler Model

A simpler model was created using the raw USB data rather than text. This avoids the issues of encoding and allows for simpler models to be tried.

Run in Google Colab : https://colab.research.google.com/github/Workshopshed/TinyMLTextClassification/blob/master/key_classification_rnn.ipynb

Classifying words

Based on the work done using the simple model that has allowed me to redefine the more complex case.

Run in Google Colab : https://colab.research.google.com/github/Workshopshed/TinyMLTextClassification/blob/master/text_classification_rnn_withCustomEncoder.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
KeyClassifierExample		KeyClassifierExample
TextClassifierExample		TextClassifierExample
HashedTextEncoder.py		HashedTextEncoder.py
Machine Learning Text Classification.png		Machine Learning Text Classification.png
README.md		README.md
key_classification_rnn.ipynb		key_classification_rnn.ipynb
text_classification_rnn_withCustomEncoder.ipynb		text_classification_rnn_withCustomEncoder.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TinyMLTextClassification

Experiments with Tensor Flow Lite

Tokenisation using a hash function

Shrinking the model

Building the model

Problems running the model

Optimisation

Simpler Model

Classifying words

Further Reading

About

Uh oh!

Releases

Packages

Languages

Workshopshed/TinyMLTextClassification

Folders and files

Latest commit

History

Repository files navigation

TinyMLTextClassification

Experiments with Tensor Flow Lite

Tokenisation using a hash function

Shrinking the model

Building the model

Problems running the model

Optimisation

Simpler Model

Classifying words

Further Reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages