Greetings! This is a conx user tutorial that will explain basic protocol on how to create deep machine learning networks, train them on examples, and test the network.

So lets jump right into it. First step to using conx's network class is to import it....

In [1]:
from conx import Network

Using gpu device 0: GeForce GTX 950 (CNMeM is disabled, cuDNN not available)


After importing the Network class, its time to initialize a new class. Creating a new class requires 3 parameters: the number of inputs into the network, the unit size of the hidden layer, and the number of outputs expected.

Lets try it now. These two networks are ones we will use later...

In [2]:
#Here is the model:
#new_network = Network(inputs, hidden layer, outputs)

snetx = Network(1200, 50, 1)
cnetx = Network(1200, 50, 2)

Since conx networks can only take float values between 0 and 1 as inputs, next one must decide what he or she wants the network to learn and how to represent that with only float values between 0 and 1.

For an example project, lets have a Nao be able to recognize where a bright orange ball is and look towards it without using its tracking modules. To accomplish this, we will first have the robot take photos of the ball with its head's joint angles both set to 0 in neutral position, track the ball, and record the joint angle values in a text document. Then we will convert the photos into arrays, pair them up with their respective angle values in a training set, and pass them through the network.

To start, lets import the myNao.py file and use the photo taking and tracking function written in the My Nao tutorial to accomplish the first half of the project.

In [3]:
from myNao import Robot

In [None]:
#save_multiple_photos(target name, image name, desired # of photos taken)

save_multiple_photos("RedBall", "ExamplePhoto", 5)

#Keep the ball in position until the console says the photo has been 
#taken. Then the nao will ask you to move it into a new location for the 
#next photo. It continues this cycle until the desired amount of photos 
#have been taken.


Note: Although the example line calls to take 5 demo photos, an optimal training set for a conx network will need at least more than 30 items.

Next, we need to convert the photos into arrays, pair them up with their respective angles, and group all the items together in a training set. To convert photos into arrays, we will need python's "Image" package, which lets the user open, save, and construct photos, and python's "numpy" package, which enables the creation and manipulation of arrays. To save the arrays as theano variables, we will need theano's config, and shared modules as well.

In [4]:
import numpy as np
import time
import Image
from theano import config, shared

Conx networks only accept one dimensional arrays with values ranging between 0 and 1. Since the photos the example code takes with the Nao are in RGB colorscale, they are considered 3D arrays. Black and white photos are typically converted into 2D arrays. Here are some helpful numpy array manipulation functions you will need to convert the 3D arrays into 1D arrays.

Note: Product of reshape arguments should be equal to length of array which you changing

In [5]:
def grayscale(pic): #converts 30x40 RGB photos into grayscale (3D to 2D)
    bwpic = pic.convert('L')
    pic_array = np.asarray(bwpic, dtype=config.floatX).reshape((bwpic.size[1],bwpic.size[0]))
    return pic_array

def vectorizer(array): #converts 30x40 grayscale photos to vectors (2D to 1D)
    v = array.flatten()
    return v

def matrixer(array): #converts 1D vectors into 2D arrays
    twoD = np.reshape(array, (-1, 2))
    return twoD

def twoD_arrayifier(arr, h, w): #3D to 2D function that gives option of height and width of new array
    """
    Return an array of shape (h, w) where
    h * w = arr.size

    If arr is of shape (n, nrows, ncols), n sublocks of shape (nrows, ncols),
    then the returned array preserves the "physical" layout of the sublocks.
    """
    n, nrows, ncols = arr.shape
    return (arr.reshape(h//nrows, -1, nrows, ncols)
               .swapaxes(1,2)
               .reshape(h, w))

As such, by calling grayscale(), then vectorizer(), then dividing the values in the array by 255 on the photo you have opened and saved to a variable, the array will be compatible with conx, or "conx-ready".

ex code:

pic = Image.open(photoname)

array = vectorizer(grayscale(pic)) / 255

Here is an algorithm that completes this next step...

In [6]:
def dataset_maker(photoName, AnglesFile, amount):
    #returns dataset list containing tuples of a pic_array and a list of
    #the HeadYaw & HeadPitch angle values in radians.
    #(pic_array(), array([HeadYaw, HeadPitch]))
    dataset = []
    jointAngleValues = open(AnglesFile,'r').read().splitlines()
    arrays = []
    temp = []
    headAngles = []
    counter = "1"
    for index in range(0, amount):
        pic = Image.open(photoName + counter + ".jpeg")  
        arrays.append(vectorizer(grayscale(pic))/255) #gotta add the photo array's joint angle values too
        counter = str(int(counter) + 1)
    for i in range(0, len(jointAngleValues)): #for each angle pair...
        x = 0
        previous_slice = 0
        while x <= len(jointAngleValues[i]): #for each character in angle pair...
            if x == len(jointAngleValues[i]):
                temp.append((float(jointAngleValues[i][previous_slice:x])+1)/2) #finish off the pair of angles
                headAngles.append(temp) #add em to headAngles list
            elif jointAngleValues[i][x] == ",":
                temp = [((float(jointAngleValues[i][previous_slice:x]))+1)/2]
                x += 2 #hop over space
                previous_slice = x
            x += 1 #increment counter
    for i in range(0, amount):
        #a = np.array(headAngles[i], dtype=config.floatX)
        next_input = [arrays[i], headAngles[i]] #create tuple of data
        dataset.append(next_input) #add tuple to the dataset
    return dataset #add class labels as well?

Here is an algorithm that just notes whether the ball is on the left or right side of the photo with a boolean value of 0 or 1 and pairs this value up with the photo arrays instead of the angles, to show that a simpler task can be done with conx.

In [7]:
def S_dataset_maker(photoName, AnglesFile, amount):
    #returns dataset list containing tuples of a pic_array and a list of
    #the HeadYaw & HeadPitch angle values in radians.
    #[(pic_array1(), [boolean]), (pic_array2(), [boolean]),...]
    dataset = []
    jointAngleValues = open(AnglesFile,'r').read().splitlines()
    arrays = []
    headValues = []
    counter = "1"
    for index in range(0, amount):
        pic = Image.open(photoName + counter + ".jpeg")  
        arrays.append(vectorizer(grayscale(pic))/255) #gotta add the photo array's joint angle values too
        counter = str(int(counter) + 1)
    for i in range(0, len(jointAngleValues)):
        if jointAngleValues[i][0] == "-": #if HeadYaw is negative...
            temp = [1]  #head is turned to the right/ball is in right side of photo
            headValues.append(temp)
        else:
            temp = [0]  #head is turned to the left/ball is in left side of photo
            headValues.append(temp)
    for k in range(0, amount):
        next_input = (arrays[k], headValues[k])
        dataset.append(next_input)
    return dataset

Now its time to create our training datasets. ets use some photos I took already for the training sets...

In [8]:
S_training_set = S_dataset_maker("naoBallPhoto", "naoheadangles.txt", 35)

C_training_set = dataset_maker("naoBallPhoto", "naoheadangles.txt", 35)

Finally, to pass these training sets through the networks, we need to set the initialized network's input to be the list variable you created from dataset_maker() or S_dataset_maker(), and then call train() on the network.

network.train() has a few optional parameters you can take advantage of. You can select after how many full runs through the training set, or epochs, the system will print out its TSS error and percent accuracy with report_rate. 

You can also set the margin of error the system will allow with its predictions with tolerance. Note that the optimal tolerance for training a dataset will always be dependent on what the goal of the training is. For instance, since the S_training_set only tells whether the ball is on the left or right side of the photo with a boolean value, any float value the system returns that is less than 0.5 can be considered 0 and greater than 0.5 can be considered 1. So a tolerance of 0.4 is acceptable in this case.

You can set your desired prediction accuracy percentage as well with stop_percentage, which provides the threshold for when the network should stop training.

You can also set the maximum amount of epochs the train() call will go through with max_training_epochs

To check what the training settings are: run network.settings()

In [21]:
snetx.settings

{'activation_function': <theano.tensor.elemwise.Elemwise at 0x7f77290b2110>,
 'epsilon': 0.1,
 'max_training_epochs': 5000,
 'momentum': 0.9,
 'report_rate': 500,
 'stop_percentage': 1.0,
 'tolerance': 0.1}

Now to call train...

In [23]:
snetx.set_inputs(S_training_set)
snetx.reset()
snetx.train(report_rate = 100, tolerance = 0.3, stop_percentage=1.0)

--------------------------------------------------
Training for max trails: 5000 ...
Epoch: 0 TSS error: 11.4807234672 %correct: 0.4
--------------------------------------------------
Epoch: 73 TSS error: 0.388495914765 %correct: 1.0


In [10]:
cnetx.set_inputs(C_training_set)
cnetx.reset()
cnetx.train(report_rate = 500, tolerance = 0.1, stop_percentage = 1.0)

--------------------------------------------------
Training for max trails: 5000 ...
Epoch: 0 TSS error: 10.6444192156 %correct: 0.0
--------------------------------------------------
Epoch: 232 TSS error: 0.0424394530855 %correct: 1.0


Now that we have successfully trained our networks, time to test them with these prediction functions...

In [11]:
def S_predict(pic_array):
    #accepts 1D photo array as input and predicts whether the ball in
    #the photo is in the left or right half of the photo, returning a 
    #boolean 0 or 1 (0 = left, 1 = right)
    y = snetx.propagate(pic_array)
    if 1 - y[0] < y[0]:
        return "the ball is on the right side of the photo"
    else:
        return "the ball is on the left side of the photo"

def C_predict(pic_array):
    #takes in a 1D photo array and predicts the angles the head joints
    #should assume if the robot was looking directly at the ball in the
    #photo. Also scales the output back into the proper -1 to 1 range.
    x = cnetx.propagate(pic_array)
    return [(x[0] * 2)-1, (x[1] *2)-1]

<img src="hardware_headjoint_3.3.png" width="400"/>

Time to test the trained system with some of the training data photos...

Note: The head's joint angles are recorded in radians, ranging between -1:1

<img src="naoBallPhoto23.jpeg" width="250"/>

In [12]:
y = Image.open("naoBallPhoto23.jpeg")
z = vectorizer(grayscale(y))/255

In [24]:
x = S_predict(z)
print x

the ball is on the left side of the photo


In [14]:
C_predict(z)

[0.29144561290740967, -0.19938766956329346]

<img src="naoBallPhoto17.jpeg" width="250"/>

In [15]:
o = Image.open("naoBallPhoto17.jpeg")
p = vectorizer(grayscale(o))/255

In [25]:
x = S_predict(p)
print x

the ball is on the right side of the photo


In [17]:
C_predict(p)

[-0.26814043521881104, 0.18113255500793457]

<img src="naoBallPhoto11.jpeg" width="250"/>

In [18]:
q = Image.open("naoBallPhoto11.jpeg")
w = vectorizer(grayscale(q))/255

In [26]:
x = S_predict(w)
print x

the ball is on the left side of the photo


In [24]:
C_predict(w)

[0.00073981285095214844, 0.020813941955566406]

Woohoo! Looks good! For the final test, lets take a brand new photo with the Nao, and test whether our networks will provide some accurate predictions...

To do so, we will call C_predict() and S_predict() and then use the move_head() and say() functions, which will take the predict() outputs and either sets the nao's head joint to the predicted angles or says whether the ball is on the left or right side of its view, respectively.

We need to make a robot class first, so lets use Wol:

In [27]:
wol = Robot('wol',9559)
wol.life.setState("disabled")

Now Wol should take a photo...

In [28]:
img_name = "demoPhoto1"
wol.set_neutral()
wol.save_photo(img_name)

taking photo: demoPhoto1
photo demoPhoto1 saved!


Then we open up the photo, convert it to be conx-ready, and call the predict() functions...

In [29]:
pic = Image.open(img_name + ".jpeg")

<img src="demoPhoto1.jpeg" width="250"/>

In [30]:
array = vectorizer(grayscale(pic))/255
C_newAngles = C_predict(array)
S_verdict = S_predict(array)

Let's test what we got...

In [31]:
wol.say(S_verdict)

In [32]:
wol.move_head(C_newAngles)

done!


If Wol successfuly could tell if the ball was on the left or right side of its view and moved its head to look at the ball, CONGRATULATIONS! It works!