Table of Contents
This prototype visualizes an example of a fully implemented interface within a multi-function food processor.It is based on the user-flow shown in the following figure.
Although our prototype does not cover all functions that a real multi-function food processor can offer, it provides a generic solution to research examples for multimodal interaction throughvoice & touch interaction.
The prototype is adaptable to different insights and recommendations, allowing for further research. The software architecture is seen in the following figure.
This prototype is meant to run locally or on a server like AWS. To run it locally please refer to the section Development.
Disclaimer: If you use the code or dataset please cite our work:
VoiceCookingAssistant. 2021. Audio-Visual-Cooking-Assistant. https://github.com/VoiceCookingAssistant/Audio-Visual-Cooking-Assistant
Inititally the frontend application is built with svelte and has a node server which serves as a middleware to the Rhasspy instance.
You need to have node and npm installed. To run it on your machine we propose to have docker and docker-compose installed.
Install the dependencies...
#from root
cd frontend
npm install
#from root
cd server
npm install
Start with docker (Recommended):
docker-compose build
docker-compose up
Navigate to localhost:5000. You see your app running. Edit a component file in src
, save it, and reload the page to see your changes.
Start without docker (Not recommended):
#from root
cd frontend
npm run dev
#from root
cd server
npm run start-dev
Navigate to localhost:5000.
You should see your app running. Edit a component file in src
, save it, and reload the page to see your changes.
To run a local Rhasspy Envrionemnt you need to have docker and docker-compose installed.
cd rhasspy
docker-compose up
Navigate to localhost:12101/. You see the environment running
The first time you have to adjust the Rhasspy settings in the UI environment:
- Click the Home Button
- Go to Advanced
- Copy the file rhasspy/profile.json from this repo in it and click "Save Profile".
- Click on the "Sentences-Menu-Icon" in the left Menu Bar
- Copy the file rhasspy/template.ini from this repo in it and click "Save Sentences"
Dataset in the file rhasspy/template.ini is provided under a “CC BY 4.0”.
- Click "Okay" in the Retrain Rhasspy Alert
Now you can test the prototype. For further adaption of the Rhasspy environment please refer to the official doumentation of Rhasspy.
To create an optimised version of the app:
docker-compose -f docker-compose.yml build
docker-compose -f docker-compose.yml up
This version expects an .env file in root directory with follwing content:
#.env
PORT=3000
HOST=0.0.0.0
MQTTHOST=<YOUR_RHASSPY_HOST_IP>
RHASSPY_PORT=12183
Navigate to localhost:5000. You see your app running.
You can apply our large amounts of in-domain dataset for your spoken language understanding research.
- Training Dataset (1964 queries with 10724 running words): rhasspy/NLU/trainset.md
- Test Dataset (839 queries with 4507 running words): rhasspy/NLU/testset.md
Training Dataset and Testing Dataset are provided under a “CC BY 4.0”.
If you're using Visual Studio Code we recommend installing the official extension Svelte for VS Code. If you are using other editors you may need to install a plugin in order to get syntax highlighting and intellisense.
This prototype was tested in the Browser Google Chrome .
Distributed under the Apache 2.0 License. See LICENSE
for more information.
If you use or build on our work, please cite our paper related to this project:
@inproceedings{kendrick-etal-2021-audio,
title = "Audio-Visual Recipe Guidance for Smart Kitchen Devices",
author = "Kendrick, Caroline and
Frohnmaier, Mariano and
Georges, Munir",
booktitle = "Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)",
month = "12--13 " # nov,
year = "2021",
address = "Trento, Italy",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.icnlsp-1.30",
pages = "257--261",
}