Unlocking proactive behaviours in Pepper with deep learning

Pepper, a versatile humanoid robot designed for human interaction, has gained significant popularity in various settings. However, it faces limitations in its default capabilities, including sensitivity to lighting conditions, limited human tracking, and the inability to adapt to modern computer vision techniques. To overcome these limitations and enhance Pepper's functionality, a framework is necessary. This framework integrates state-of-the-art deep learning (DL) models, enabling advanced perception and interaction with humans. By leveraging deep learning, Pepper can improve its visual perception, adapt to complex real-world scenarios, and deliver more satisfactory services in a wide range of environments. The development of this framework is crucial to enhance Pepper's sociability, acceptance, and effectiveness in different contexts.

Framework

The framework requires 3 pieces of hardware:

Pepper
Another PC (ideally with a modern GPU, CPU, or both)
Wireless router

It solves Pepper’s compatibility issue with modern DL models by hosting two Python environments on a separate computer. The router establishes a local area network (LAN) that allows Pepper to connect to the server.

The server environment, utilising Python 2 is responsible for communicating with Pepper’s operating system (OS) to:

extract live camera footage
execute commands

The client environment, powered by Python 3 can:

control the robot through the server by sending Flask requests
execute DL models with Pepper’s camera footage as input

Finally, the Pepper robot listens for commands from the server.

Note that this framework only works for NAOqi 2.5

Default Follow Behaviour

Pepper’s default system, NAOqi 2.5, includes the ALPeoplePerception module, which uses Pepper’s front and 3D cameras to monitor surrounding humans. The default behaviour divides the area in front of Pepper into three engagement zones based on distance. Zone 1 is 0m to 1.5m away, Zone 2 is 1.5m to 2.5m away, and Zone 3 is beyond 2.5m. The maximum effective range of Pepper’s 3D camera, ASUS Xtion, is 3.5m, so it is reasonable to assume that Zone 3 ends there. Pepper’s default “follow” behaviour, developed using Choregraphe, waits for the keyword “follow me” as an audio cue to initiate tracking. It follows the person closest to it and maintains a default distance of 1m until the keyword “stop” is spoken 2. However, the speech recognition system often fails to register the keyword, resulting in poor user experience. It can be found here

To consistently trigger Pepper's tracking mode, we changed this behaviour to be triggered by its haptic sensors located on its head rather than through voice command. This modified version is available here.

DL Follow Behaviour

The DL-based follow behaviour spins around to search for potential targets using YOLO-Pose, which allows it to capture both keypoint and bounding box data of detected individuals. If a potential target raises their hand, the system can tell by calculating the difference between the wrist and shoulder keypoint. Then, when the system is sure that a potential target is intentionally raising their hand, it'll filter out every other detection and engage in tracking mode, where the filtered data is passed onto of the trackers ByteTrack, BoT-SORT, or OC-SORT. We'll get a tracking ID from the tracker, and for every subsequent tracker output, the robot will only focus on the target with the same ID that we just saved.

A high level diagram depicts what happens during run-time:

Setup

To use this code, you need to set up the hardware and software. Let's start with the hardware:

With the router, set up a LAN
Connect the PC and Pepper to the LAN
On Pepper's tablet, search for the version of its OS and make sure it's 2.5.x
On Pepper's tablet, in the Wi-Fi tab, find its IP address and record it somewhere

For the software, follow these steps:

Clone this repo and make sure the branch is pepper_skeleton
On Anaconda, create a Python 2, and a Python 3 virtual environment
Visit this website and download the Python 2.7 SDK with the same OS that you're using, and the same NAOqi version as your Pepper robot¹
Unzip the Python SDK and copy-paste its contents into the Python 2 environment that you've just created, replacing everything
Activate the Python 2 environment, and install the packages in requirements.txt ²
Activate the Python 3 environment, and install the packages in requirements1.txt
In the same Python 3 environment, install the packages in requirements2.txt ³

Execution

Before execution, make sure that Pepper's surrounding is clear, otherwise, you may risk damaging the robot.

To run the behaviour:

Turn on Pepper, and double-tap its chest button to make it turn of automatic life. It'll assume this lifeless posture if you're successful
Open up two terminals, and activate the Python 2 and Python 3 environment with them
In the Python 2 terminal, cd into the directory of server
In the Python 3 terminal, cd into the directory of client
In the Python 2 terminal, run python server.py --ip *Pepper's IP address*, if successful, Pepper should stand up-right again
In the Python 3 terminal, run python experiments.py (should run the follow behaviour with OC-SORT), if successful, Pepper should start rotating to look for potential targets. At this stage, you can raise your hand to initiate its tracking mode

This has been tested on Linux (Ubuntu) and it works. We haven't tested it on other OS, but there should be no reason for them to not work. ↩
Pip may not necessarily install all dependencies for you. So, when you run the program later, it may tell you to install additional dependencies ↩
In our project, we used Pytorch, but because there are many ways to install it depending on your hardware, you have to figure this out yourself ↩

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.idea		.idea
FollowCome2Me		FollowCome2Me
FollowMeTap		FollowMeTap
assets		assets
client		client
exp2_data		exp2_data
server		server
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unlocking proactive behaviours in Pepper with deep learning

Table of Contents

Framework

Default Follow Behaviour

DL Follow Behaviour

Setup

Execution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unlocking proactive behaviours in Pepper with deep learning

Table of Contents

Framework

Default Follow Behaviour

DL Follow Behaviour

Setup

Execution

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages