- Jaejun Lee, Raphael Tang, Jimmy Lin. Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, pages 91-96.
Honkling implements a residual convolutional neural network  and utilizes Speech Commands Dataset for training.
Honkling-node & Honkling-assistant
Node.js implementation of Honkling is also available under Honking-node folder.
Details about Honkling-node and Honkling-assistant can be found in:
Honkling can be personalized to individual user by recognizing the accent. From our experiments it is found that only 5 recordings of individual keyword can increase accuracy by up to 10%! With GPU, personalization can be achieved within only 8 seconds.
Pre-trained weights are available at Honkling-models.
Please run the following command to obtain pre-trained weights:
git submodule update --init --recursive
honkling branch of honk to customize keyword set or train a new model.
Once you obtain weight file in json format using honk, move the file into
weights/ directory and append
weights[<wight_id>] = to link it to weights object.
Depending on change, config.js has to be updated and a model object can be instantiated as
let model = new SpeechResModel(<wight_id>, commands);
It is possible to evaluate the in-browser neural network inference performance of your device on the Evaluate Performance page of Honkling.
Evaluation is conducted on a subset of the validation and test sets used in training. Once the evaluation is complete, it will generate reports on input processing time (MFCC) and inference time.
As part of our research, we explored the network slimming  technique to analyze trade-offs between accuracy and inference latency. With honkling, it is possible to evaluate the performance on a pruned model as well!
The following is the evaluation result on Macbook Pro (2017) with Firefox:
|Model||Amount Pruned (%)||Accuracy (%)||Innput Processing (ms)||Inference (ms)|
- Note that WebGL is disabled on Chrome and enabled on Firefox by default
- Honkling uses RES8-NARROW
- Details on model architecture can be found in the paper
- Raphael Tang and Jimmy Lin. Deep Residual Learning for Small-Footprint Keyword Spotting. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018), pages 5484-5488.
- Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, Changshui Zhang. Learning Efficient Convolutional Networks through Network Slimming. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV 2017), pages 2755-2763.