Skip to content
cebeery edited this page May 6, 2017 · 11 revisions

CompRobo Final Project: Neato Soccer, Extended Edition

For this project, our ostensible goal is to program a robot (a neato vacuum cleaner) to recognize a soccer ball visually, approach it, and kick it. The real purpose is to develop our own learning goals around object recognition via a neural network and computer vision. The project is largely split along those lines.

Project Goals

To see a description of the originally proposed project, goals, and timeline see the Project Proposal and Proposed Timeline pages.

The Story So Far

As a part of documenting our learning during the development project, a few incremental blog posts about the paths we took as we developed the computer vision and neural network parts of the software were made. These story blurbs can be navigated to from the right-hand side page menu. Like the the project, these blurbs are divided into Neural Network, Vision Suite, and Framework groupings.

Final Project Implementation Details

Framework

Master Script: neato_soccer.py

Our simple two-state FSM tracks the current state a class function object. This allows us to call that state attribute as a function during the main loop to run the state specific state behaviours. The class also behaves as a wrapper for a ROS node. The node publishes robot commands (Twist msg) to /cmd_vel and subscribes to /camera/image_raw and /bump for the image and bump sensor messages, respectively. The camera topic callback simply saves the current image as an attribute for later use. The bump sensor call back is used to detect crashes and ends the node and robot motion if a bump is detected.

state machine flow diagram

Two states were implemented. DetermineBall calls the pretrained neural network to determine if there is or is not a ball within frame. If the ball is in frame, the state transitions to the alignNeato state. If no ball is found, the Neato rotates until a ball is in frame. In the alignNeato state, one of the vision suite implementations is called to locate the size and location of the ball in the image. Currently manual commenting and un-commenting is used to chose which implementation is called. If the center of the ball is within 20 pixels of the center of the frame, the state machine drives forward to kick or dribble the soccer ball. If the ball is not centered, a proportional control is used to rotate the robot and a Guassian function is used to slowly move the robot forward. The latter was included to allow for smoother dribbling of the soccer ball.

Check out the videos below of our system in action:
Extended Neato Soccer I
Extended Neato Soccer II

Neural Network

This was our first time working with neural networks or machine learning in general. Because we were so new, we took special care to make a tool-set architecture that was readily reconfigurable and to maintain a simple, unchanging interface for client code. The specific modules we created were:

To support these, we also created:

In this section, we'll detail each core module, describing its role in our code and key features. We end the section with a brief description of our most successful network's configuration and efficacy.

Data Gathering with Ball Tinder

One of the first challenges we faced was the need to quickly gather and tag a series of images of the soccer ball. To do this, we created a program named 'Ball Tinder'. When run, Ball Tinder streams frames from the neato's camera, crops/downsamples the image, and accepts user input to classify the frames as either a ball or not a ball. For a control scheme, we chose to use the arrow keys to classify incoming frames, taking inspiration from the popular app Tinder (thus the name). As a particular quirk of this module, we used the Pygame library to build our interface code as it had powerful and easy to use windowing, drawing, and keyboard-input features.

Ball Tinder was a great asset to our project as it made data gathering, formatting, and classification a short exercise. With it, we gathered, formatted, and classified over 1000 images total in less than a couple hours.

Ball Tinder in Action

Neural Net Configuration and Training

We entered this project highly unsure of what appropriate configurations of a neural network might be or which tools were best for creating neural nets. Largely following Daniel Nouri's excellent tutorial, we opted to use the library NoLearn, which wraps Lasagne, which wraps Theano.

Ultimately, to test different network concepts, we created a Network class that has access to a constant dictionary of available network configurations including multi-layered perceptrons (MLPs) with varying numbers of hidden nodes and convolutional networks (conv nets). Based on the network type specified in the config.ini file, the network configuration and training script, neuralnet.py, will automatically configure the network, load in images, train, and save the network parameters to disk. To allow us to iterate quickly, the script is able to make all of the necessary configuration and filepath changes without having the code touched; it relies only on the config file. To save disk space, only the trained network parameters are uploaded: not the training images that generated them.

Client-Code Interface to Network

To create a simple client interface to our network that could be used outside of the Neural Net collection of modules, we created balldetector.py. By loading a saved, trained network using neuralnet.py's Network class, and using it to classify an image sent from the client code, we are able to return a simple True/False prediction of whether or not the image contains a ball. Similar to the training code, all of the information pertaining to which network to load (and other parameters) is handled by the configuration file, config.ini. Effectively, this means that a client need not edit code: the config file should suffice for all changes they wish to make.

Results

Ultimately, our neural network was able to detect our target soccer ball most anywhere in an image at a distance of about two feet or less. We found our greatest success with a 5-layer multi-layered perceptron (MLP) with 100 neurons in each of the layers, trained on images down-sampled to a standard 64x64 RGB format. Based on our testing, we believe that his network effectively learned to detect the soccer ball based on its fairly unique, green color. A convolutional network using grayscale images may have had the potential to create a more robust, pattern-detecting network, but we ran into difficulties when training that network because our laptops were unable to train it to convergence before running out of available memory and locking up.

If we were to fine-tune the network configurations, collect additional training data, and train the network on a more powerful computer, we are confident that we could scale up the accuracy and detection range of our network with no more than a few code-edits and a fair amount of experimentation.

Vision Suite Implementations

The intent of the visual suite is to provide multiple methods of locating and sizing a foam soccer ball with in a image taken by the Neato robot platform. Three such implementations using the OpenCV library were created:

To support these, we also created:

AccuracyChecker

In order to quantify and understand the accuracy of methods to be implemented in the visual suite, a simple script was created to both display the output of those methods against the input image as well as calculate the discrepancy between the methods' outputs and the actual location of the ball. Seven test images hand labeled in the form of

##_neatoSoccer.png:
  location: [x,y]
  size: int

were used. Each image is fed into each of the three vision suite implementation. They return information in the form of (x,y),size, where x,y,size are all integers. The accuracy script then calculates the Pythagorean distance between the label's x,y-location and the visual suite method's x,y-location as well as the absolute error in size between the label's size and the visual suite method's size.

documentation/accuracyChecker_numerical

In addition to calculating the accuracy of the visual suite's methods, they are also visualized using openCV's circle function.

documentation/accuracyChecker_visual

Here is shown the current call and output of the accuracyChecker.py script. In red is the label. The green, blue, and cyan circles are the results of the visual suite methods. Note that the blue circle is arbitrary due to the SimpleBlobDetector returning a no data error.

Color Filtering

A simple color filter was ultimately used for the video demo due to its reliably. As the foam ball being used for this project is a bright neon green, there was a clear color difference between the ball and the rest of the room. The final boundaries were chosen to minimize noise at the cost of having the full sphere. This was done to ensure there would be an accurate center of mass for the ball.

After color filtering, the image moments of the resultant binary image were calculated. From these, the x,y-coordinates of the centroid of the white pixel blob could be found. The size of the blob was determined by adding up its pixels as an estimate of area, then applying the area of a circle formula to extract the radius. An additional proportionality was added to the area to account for the loss of the black pentagons during color filtering.

documentation/colorFilteredCOM_accuracy.png

An example of the accuracy when tested using the accuracyChecker script is shown above. Red is the image's label. Green is the implementation’s guess. For most of the 7 test images, the distance error was 2-4 pixels. Similarly the sizing error was 4-5 pixels (radius). However there were two exemptions. These are representative of the limitations of the ball locator method. First, occlusion due to the ball being partly off camera or behind objects causes both the number and spacial distribution of pixels to be different than that of a full sphere. Namely, the center of mass may be shifted from the sphere's center and the size of the sphere is noticeably smaller than it should be.

SimpleBlobDetector

This method was unable to locate the soccer ball at all. The first of type of parameters varied determines how a blob is detected. The second type filters down detected blobs. While parameter tuning successfully identified circles in a simplified example image, the soccer ball was not distinct enough to be located. We believe this to be in part due to the pattern on the ball itself and in part due to the lack of distinction between the value of the ball and the value of the background.

Hough Circles

This method tends to detect many false positives. With the tuning of parameters, including min/max circle size, minimum convergence threshold, and contour finding parameter, a smaller circle set was detected. The further use of image blurring using a medianBlur of kernal size 9 cut out the remaining false positives at the cost of losing the soccer ball in 1/3 of the test images.

Final Blur Hough circle result

If more time is spent on this implementation, the strictness of the solution set will be moderated to consistently find the ball, even if it is not the most certain circle. A second filter will then be applied to the circle set to remove false positives via an average color in circle check.