Skip to content

How it works

BramKraai edited this page Nov 23, 2018 · 4 revisions

Framework

Applications & Events

This package revolves around Applications, which program some robot behaviour. Applications provide Event callbacks for each sensory experience the robot 'perceives'. These Events include:

  • on_image for every frame captured by robot camera
  • on_object for each object detected in a camera frame
  • on_face for each face detected in a camera frame
  • on_person when detected face is 'known' to the robot
  • on_new_person when detected face is 'new' to the robot
  • on_transcript for every utterance that can be resolved into text

Components

Not every Application needs all features that are available, or there could be multiple implementations of a particular feature: say open source Speech-to-Text, instead of Google. For this reason, Components are used. When you create an application, you can specify which features you need, and they will be provided to you. Available Components are:

  • FaceDetection: exposes the on_face, on_person & on_new_person events
  • ObjectDetection: exposes the on_image & on_object events
  • SpeechRecognition: exposes the on_transcript event
  • Statistics: shows live statistics in terminal
  • VideoDisplay: shows video feed + Object & Person overlay in browser
  • etc..

Backends & Devices

In order to run Applications on multiple Backends, abstractions have been made for all Backend-specific Devices:

  • AbstractCamera
  • AbstractMicrophone
  • AbstractTextToSpeech

Backend implementations have been made for:

  1. Naoqi Backend: using the Camera / Microphone / AnimatedSpeech from Pepper / Nao
  2. Laptop / PC Backend: using the built in Webcam / Microphone / Google Text-to-Speech.

Being able to run applications on your local PC gives the advantage that you can test your application without needing a robot, speeding up development.

Adding new Backends

Any Hardware than can provide a:

  • Video Feed
  • Audio Feed
  • Speaker/Speech Output

can be a potential Backend for our Pepper Package! Please delve into the source code and implement AbstractCamera, AbstractMicrophone & AbstractTextToSpeech for your favourite hardware. Also, please send us a pull request if you did so!

In Practice

Writing an Application is straightforward and requires just a few steps:

  1. Create an Application that inherits from pepper.framework.Application
  2. Add required Components by inheriting from them
    • Order matters here, because of Component dependencies
  3. Run the Application with a specific Backend

see pepper/test/app/verbose.py for a minimalist working example.

from pepper.framework import *
from pepper import config


class MyApplication(Application, ObjectDetectionComponent, FaceDetectionComponent, SpeechRecognitionComponent):
    def on_image(self, image):
        pass

    def on_object(self, image, objects):
        pass

    def on_face(self, faces):
        pass

    def on_person(self, persons):
        pass

    def on_new_person(self, persons):
        pass

    def on_transcript(self, hypotheses, audio):
        pass


if __name__ == '__main__':
    MyApplication(config.get_backend()).run()

Intentions

When Applications get bigger, the need for more structure arises. That is where Intentions come in. Within each Application, the user programs one or several Intentions (The 'I' in BDI). These intentions act as subgoals within each application. An example is demonstrated below.

See pepper/test/app/intention.py for a minimalist working example.

from pepper.framework import *
from pepper import config


class MyApplication(Application, StatisticsComponent, FaceDetectionComponent, SpeechRecognitionComponent):
    pass


class IdleIntention(Intention, MyApplication):
    def on_face(self, faces):
        TalkIntention(self.application)


class TalkIntention(Intention, MyApplication):
    def __init__(self, application):
        super(TalkIntention, self).__init__(application)
        self.say("Hello, Human!")

    def on_transcript(self, hypotheses, audio):
        utterance = hypotheses[0].transcript

        if utterance == "bye bye":
            self.say("Goodbye, Human!")
            IdleIntention(self.application)
        else:
            self.say("How interesting!")


if __name__ == '__main__':

    # Initialize Application
    application = MyApplication(config.get_backend())

    # Run Intention
    IdleIntention(application)

    # Run Application
    application.run()

Structured data

In order to store knowledge in the brain, we need to parse unstructured natural language and transform it to triples. For this purpose, and following GRaSP, we have designed a json template that allows us to transmit this information between modules.

{
  "subject": {
    "id": "str: URI for this instance",
    "label": "str: label to refer to this instance (lower case)",
    "type": "str: one of leolani's 35 classes, or similar", 
    "confidence": "float: value between 0-1",
    "position": "str: beginPosition-endPosition"
  },
  "predicate": {
    "type": "str: one of leolani's 21 predicates, or similar", 
    "confidence": "float: value between 0-1",
    "position": "str: beginPosition-endPosition"
  },
  "object": {
    "id": "str: URI for this instance",
    "label": "str: label to refer to this instance (lower case)",
    "type": "str: one of leolani's 35 classes, or similar", 
    "confidence": "float: value between 0-1",
    "position": "str: beginPosition-endPosition"
  },
  "output_meta": {
    "type": "categorical: [subject, predicate, object]",
    "format": "categorical: [list, bool]"
  },
  "input_meta": {
    "raw": "str: original input parsed/sensed (transcript or image)",
    "type": "categorical: [statement, question, experience]",
    "author": "str: label of person producing the input",
    "chat": "str: chat ID",
    "turn": "str: turn ID",
    "date": "datestamp: date when input was produced",
    "attributions": {
      "certainty": "categorical: [certain, possible, probable, underspecified]",
      "sentiment": "categorical: [negative, positive]",
      "emotion": "categorical: [anger, disgust, fear, happiness, sadness, surprise]"
    }
  }
}

Considerations

  • Entity/Predicate types (subject and objects): In general, recognized/parsed types are the ones present in the ontology. However, it is also possible to have unknown types, in which case these types will be created in the ontology.
  • Case folding: All fields (type, label, author, and attribution) should be lowercase, except for raw. Snake case (replacing spaces with underscores) should be also followed when needed.
  • Positions: It is necessary to keep track of where in the raw input was the subject/predicate/object mentioned. As such, for an utterance as 'Piek likes pizza' should return positions like '0-3', '5-9' and '11-15' respectively.
  • Entity URI: Use full valid URIs for IDs (i.e. http://cltl.nl/leolani/world/piek)
Clone this wiki locally
You can’t perform that action at this time.