Skip to content

VoiceCookingAssistant/Audio-Visual-Cooking-Assistant

Repository files navigation

Audio-Visual-Cooking-Assistant

Table of Contents
  1. About The Project
  2. Getting Started
  3. Development
  4. Production
  5. Data for Spoken Language Understanding
  6. Recommendations
  7. License
  8. Citation

About The Project

This prototype visualizes an example of a fully implemented interface within a multi-function food processor.It is based on the user-flow shown in the following figure.

User flow

Although our prototype does not cover all functions that a real multi-function food processor can offer, it provides a generic solution to research examples for multimodal interaction throughvoice & touch interaction.

The prototype is adaptable to different insights and recommendations, allowing for further research. The software architecture is seen in the following figure.

Architecture

This prototype is meant to run locally or on a server like AWS. To run it locally please refer to the section Development.

Disclaimer: If you use the code or dataset please cite our work:

VoiceCookingAssistant. 2021. Audio-Visual-Cooking-Assistant. https://github.com/VoiceCookingAssistant/Audio-Visual-Cooking-Assistant

Built With

Inititally the frontend application is built with svelte and has a node server which serves as a middleware to the Rhasspy instance.

Getting started

Prerequisites

You need to have node and npm installed. To run it on your machine we propose to have docker and docker-compose installed.


Development

1. Start the Application:

Install the dependencies...

#from root
cd frontend
npm install

#from root
cd server
npm install

Start with docker (Recommended):

docker-compose build
docker-compose up

Navigate to localhost:5000. You see your app running. Edit a component file in src, save it, and reload the page to see your changes.


Start without docker (Not recommended):

#from root
cd frontend
npm run dev

#from root
cd server
npm run start-dev

Navigate to localhost:5000. You should see your app running. Edit a component file in src, save it, and reload the page to see your changes. 

2. Connect Application with local Rhasspy Environment

To run a local Rhasspy Envrionemnt you need to have docker and docker-compose installed.

cd rhasspy
docker-compose up

Navigate to localhost:12101/. You see the environment running 

The first time you have to adjust the Rhasspy settings in the UI environment:

  1. Click the Home Button
  2. Go to Advanced
  3. Copy the file rhasspy/profile.json from this repo in it and click "Save Profile".
  4. Click on the "Sentences-Menu-Icon" in the left Menu Bar
  5. Copy the file rhasspy/template.ini from this repo in it and click "Save Sentences"

    Dataset in the file rhasspy/template.ini is provided under a “CC BY 4.0”.

  6. Click "Okay" in the Retrain Rhasspy Alert

Now you can test the prototype. For further adaption of the Rhasspy environment please refer to the official doumentation of Rhasspy.

Production

To create an optimised version of the app:

    docker-compose -f docker-compose.yml build
    docker-compose -f docker-compose.yml up

This version expects an .env file in root directory with follwing content:

    #.env
    PORT=3000
    HOST=0.0.0.0
    MQTTHOST=<YOUR_RHASSPY_HOST_IP>
    RHASSPY_PORT=12183

Navigate to localhost:5000. You see your app running.

Dataset for Spoken Language Understanding

You can apply our large amounts of in-domain dataset for your spoken language understanding research.

  • Training Dataset (1964 queries with 10724 running words): rhasspy/NLU/trainset.md
  • Test Dataset (839 queries with 4507 running words): rhasspy/NLU/testset.md

Training Dataset and Testing Dataset are provided under a “CC BY 4.0”.

Recommendations

If you're using Visual Studio Code we recommend installing the official extension Svelte for VS Code. If you are using other editors you may need to install a plugin in order to get syntax highlighting and intellisense.

This prototype was tested in the Browser Google Chrome .

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Citation

If you use or build on our work, please cite our paper related to this project:

@inproceedings{kendrick-etal-2021-audio,
    title = "Audio-Visual Recipe Guidance for Smart Kitchen Devices",
    author = "Kendrick, Caroline  and
      Frohnmaier, Mariano  and
      Georges, Munir",
    booktitle = "Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)",
    month = "12--13 " # nov,
    year = "2021",
    address = "Trento, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.icnlsp-1.30",
    pages = "257--261",
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published