Detecting objects from a set of training images by shape and color using machine learning in Python from scratch (doing all the math on only numpy arrays, no machine learning packages used).
In the example shown in the above figure, a 16 pixel image with red, blue, and green color channels in the third dimension.
We can flatten the image rows-columns wise and make a weights matrix for image processing and machine learning to train the weights to detect and track objects:
in Python by generating a numpy two-dimensional array of random numbers in which the image rows and columns are flattened into the first dimension with rgb colors for the second dimension in the weights matrix:
def init_weights(self):
return np.reshape(self.add_noise(self.rgb_dim*self.image_size**2), [self.image_size**2, self.rgb_dim])
Training the neural network weights on the following images of different object shapes and colors:
by iterating through our list of training images and adjusting our weights matrix by the difference between each image and the product of the image and weights passed through an activation function (and we add a bit of noise into the input layer for science):
def train(self, iterations, new_weights=False, learning_rate=1.0):
if new_weights:
self.weights = self.init_weights()
for _ in range(iterations):
for image_num, image in self.training_images.items():
input_layer = np.reshape(image, [self.image_size**2, self.rgb_dim]) + np.reshape(self.add_noise(self.rgb_dim*self.image_size**2)*0.01, [self.image_size**2, self.rgb_dim])
output_layer = self.activation_function(self.weights*input_layer)
_error = np.reshape(image, [self.image_size**2, self.rgb_dim]) - output_layer
weights_feedback = self.activation_function((self.weights*_error*output_layer*(1.0 - output_layer))*input_layer)
self.weights += weights_feedback*learning_rate
Here are the weights before training which are initially random:
After training our neural network for 100 iterations per image for each of the eight training images:
We can see that the weights matrix learned the eight images, and now we can test our neural network to find and detect one of the objects it was trained on in the real world via grid search and match-to-sample:
multiplying the sample of the world by the weights matrix and finding the maximum (or minimum) difference after subtracting that from the sample and from the training images:
def test(self, world_size=3, new_weights=False):
world = self.create_world(world_size)
if new_weights:
self.weights = self.init_weights()
errors = np.zeros((world.shape[0] - self.image_size, world.shape[1] - self.image_size, len(self.training_images))).astype(float)
rgb_errors = np.zeros((world.shape[0] - self.image_size, world.shape[1] - self.image_size, len(self.training_images))).astype(float)
for row in range(world.shape[0] - self.image_size):
for col in range(world.shape[1] - self.image_size):
for image_num, image in self.training_images.items():
sample_of_world = np.reshape(world[row:row + self.image_size, col:col + self.image_size, :], [self.image_size**2, self.rgb_dim])
match_image_to_sample = self.weights*sample_of_world
_error = np.reshape(sample_of_world - match_image_to_sample, [self.image_size, self.image_size, self.rgb_dim])
errors[row, col, int(image_num)] = np.sum(abs(_error)) if self.object_shapes is not None else np.sum(abs(_error*np.reshape(self.hidden_weights, [self.image_size, self.image_size, self.rgb_dim])))
rgb_errors[row, col, int(image_num)] = np.sum([np.sum(np.sum(abs(image - np.reshape(match_image_to_sample, [self.image_size, self.image_size, self.rgb_dim])), axis=_axis)) for _axis in range(2)])
found_image_row_col = [[np.where(errors[:, :, int(image_num)] == np.max(errors[:, :, int(image_num)]))[0][0], np.where(errors[:, :, int(image_num)] == np.max(errors[:, :, int(image_num)]))[1][0]] for image_num, image in self.training_images.items()][0]
found_image_num = np.where(rgb_errors[found_image_row_col[0], found_image_row_col[1], :] == np.min(rgb_errors[found_image_row_col[0], found_image_row_col[1], :]))[0][0]
By calculating the errors separately for the rgb color dimensions, the neural network weights matrix can discriminate objects of the same shape but differ in color, too:
Though this neural network is fairly simple, potential applications for using machine learning models to train weights matrixes for image processing and object detection and tracking are wide ranging. For example, suppose we wanted to find and detect where an enemy military tank is located in an image fed directly from the battle field to inform nearby friendly soldiers:
Here is our initially random weights matrix before training it on the training image:
Here is our weights matrix after 1,000 training iterations:
Here is the real world test image where we expect our neural network to find and detect the tank and its exact location from the training image:
And here is the neural network detecting the location of the tank in the image:
And this is what the sample of the world multiplied by our weights matrix looks like:
Let's use a new neural network weights matrix and train it on a different tank image:
Here is our initially random weights matrix before training it on the training image:
Here are the weights after 1,000 training iterations:
And testing it on a real world image:
Below I included the entire class and its methods:
import numpy as np
from matplotlib import pyplot as plt
class NN:
def __init__(self, object_shapes=None, object_colors=None, training_images=None):
self.rgb_dim = 3
self.object_shapes = object_shapes
self.object_colors = object_colors
if training_images is not None:
self.training_images = self.use_images(training_images)
self.image_size = self.training_images['0'].shape[0]
else:
self.image_size = 100
self.training_images = self.create_images(object_shapes, object_colors)
self.weights = self.init_weights()
@staticmethod
def add_noise(n=1):
return np.random.random(n) if n != 1 else np.random.random(n)[0]
@staticmethod
def activation_function(x):
return 1.0 / (1.0 + np.exp(-x))
def normalize_rgb_values(self, rgb, factor=255.0):
norm_rgb = (rgb - np.mean(rgb)) / np.var(rgb)**0.5
norm_rgb += abs(np.min(norm_rgb))
norm_rgb *= (factor / np.max(norm_rgb))
return np.round(norm_rgb, decimals=0).astype(int) if factor == 255.0 else np.round(norm_rgb, decimals=9).astype(float)
def insert_object(self, _image, object_shape, object_color, row_col_position=[0, 0]):
if object_shape == 'H':
_image[30 + row_col_position[0]:75 + row_col_position[0], 60 + row_col_position[1]:70 + row_col_position[1], :] = object_color
_image[50 + row_col_position[0]:55 + row_col_position[0], 40 + row_col_position[1]:60 + row_col_position[1], :] = object_color
_image[30 + row_col_position[0]:75 + row_col_position[0], 30 + row_col_position[1]:40 + row_col_position[1], :] = object_color
if object_shape == 'T':
_image[30 + row_col_position[0]:38 + row_col_position[0], 30 + row_col_position[1]:70 + row_col_position[1], :] = object_color
_image[38 + row_col_position[0]:75 + row_col_position[0], 45 + row_col_position[1]:55 + row_col_position[1], :] = object_color
if object_shape == '|':
_image[30 + row_col_position[0]:75 + row_col_position[0], 45 + row_col_position[1]:55 + row_col_position[1], :] = object_color
if object_shape == '-':
_image[50 + row_col_position[0]:55 + row_col_position[0], 40 + row_col_position[1]:60 + row_col_position[1], :] = object_color
return _image
def use_images(self, training_images):
_images = {}
for _image in training_images:
image = np.array(_image).astype(np.uint8)[:, :, :3]
_images[str(len(_images))] = self.normalize_rgb_values(image, factor=1.0)
return _images
def create_images(self, object_shapes, object_colors):
_images = {}
for _object_shape, _object_color in zip(object_shapes, object_colors):
_image = np.zeros((self.image_size, self.image_size, self.rgb_dim)).astype(float)
_image = self.insert_object(_image, _object_shape, _object_color)
_images[str(len(_images))] = _image
return _images
def init_weights(self):
_weights = np.reshape(self.add_noise(self.rgb_dim*self.image_size**2), [self.image_size**2, self.rgb_dim])
self.weights_over_training_iterations = {'0': _weights}
return _weights
def train(self, iterations, new_weights=False, learning_rate=1.0):
if new_weights:
self.weights = self.init_weights()
for _ in range(iterations):
for image_num, image in self.training_images.items():
input_layer = np.reshape(image, [self.image_size**2, self.rgb_dim]) + np.reshape(self.add_noise(self.rgb_dim*self.image_size**2)*0.01, [self.image_size**2, self.rgb_dim])
output_layer = self.activation_function(self.weights*input_layer)
_error = np.reshape(image, [self.image_size**2, self.rgb_dim]) - output_layer
weights_feedback = self.activation_function((self.weights*_error*output_layer*(1.0 - output_layer))*input_layer)
self.weights += weights_feedback*learning_rate
self.weights_over_training_iterations[str(len(self.weights_over_training_iterations))] = self.weights.copy()
self.hidden_weights = self.get_hidden_weights(self.weights)
if self.object_shapes is None:
self.weights = self.normalize_rgb_values(self.weights, factor=1.0)
def get_hidden_weights(self, _weights):
_hidden_weights = np.zeros((self.image_size**2, self.rgb_dim)).astype(float)
_hidden_weights[np.where(_weights >= np.mean(_weights)), :] = 1.0
return _hidden_weights
def create_world(self, world_size):
_world = np.zeros((int(self.image_size*world_size), int(self.image_size*world_size), self.rgb_dim)).astype(float)
_random_row_col = [np.random.randint(self.image_size*world_size - self.image_size), np.random.randint(self.image_size*world_size - self.image_size)]
_random_object = np.random.randint(len(self.object_shapes))
_object_shape, _object_color = self.object_shapes[_random_object], self.object_colors[_random_object]
return self.insert_object(_world, _object_shape, _object_color, _random_row_col)
def test(self, test_image=None, outline_color=None, world_size=3, new_weights=False):
if test_image is not None:
world = self.normalize_rgb_values(np.array(test_image).astype(np.uint8)[:, :, :3], factor=1.0)
else:
world = self.create_world(world_size)
if new_weights:
self.weights = self.init_weights()
errors = np.zeros((world.shape[0] - self.image_size, world.shape[1] - self.image_size, len(self.training_images))).astype(float)
rgb_errors = np.zeros((world.shape[0] - self.image_size, world.shape[1] - self.image_size, len(self.training_images))).astype(float)
self.plot_image(_image=world.copy(), _title='World')
for row in range(world.shape[0] - self.image_size):
for col in range(world.shape[1] - self.image_size):
for image_num, image in self.training_images.items():
sample_of_world = np.reshape(world[row:row + self.image_size, col:col + self.image_size, :], [self.image_size**2, self.rgb_dim])
match_image_to_sample = self.weights*sample_of_world
_error = np.reshape(sample_of_world - match_image_to_sample, [self.image_size, self.image_size, self.rgb_dim])
errors[row, col, int(image_num)] = np.sum(abs(_error)) if self.object_shapes is not None else np.sum(abs(_error*np.reshape(self.hidden_weights, [self.image_size, self.image_size, self.rgb_dim])))
rgb_errors[row, col, int(image_num)] = np.sum([np.sum(np.sum(abs(image - np.reshape(match_image_to_sample, [self.image_size, self.image_size, self.rgb_dim])), axis=_axis)) for _axis in range(2)])
if self.object_shapes is None:
found_image_row_col = [[np.where(errors[:, :, int(image_num)] == np.min(errors[:, :, int(image_num)]))[0][0], np.where(errors[:, :, int(image_num)] == np.min(errors[:, :, int(image_num)]))[1][0]] for image_num, image in self.training_images.items()][0]
else:
found_image_row_col = [[np.where(errors[:, :, int(image_num)] == np.max(errors[:, :, int(image_num)]))[0][0], np.where(errors[:, :, int(image_num)] == np.max(errors[:, :, int(image_num)]))[1][0]] for image_num, image in self.training_images.items()][0]
found_image_num = np.where(rgb_errors[found_image_row_col[0], found_image_row_col[1], :] == np.min(rgb_errors[found_image_row_col[0], found_image_row_col[1], :]))[0][0]
sample_of_world = world[found_image_row_col[0]:found_image_row_col[0] + self.image_size, found_image_row_col[1]:found_image_row_col[1] + self.image_size, :]
self.plot_image(_image=sample_of_world, _title=f'Sample of World at [{found_image_row_col[0]}, {found_image_row_col[1]}]')
match_image_to_sample = np.reshape(self.weights, [self.image_size, self.image_size, self.rgb_dim])*sample_of_world
_title = f'''
Sample of World at [{found_image_row_col[0]}, {found_image_row_col[1]}]
Multiplied by Weights'''
self.plot_image(_image=match_image_to_sample.copy(), _title=_title, normalize_rgb=True)
_title = f'''
Object from Image #{found_image_num + 1}
Detected in World at [{found_image_row_col[0]}, {found_image_row_col[1]}]'''
self.plot_image(_image=world.copy(), _title=_title, outline_image_row_col=found_image_row_col, outline_color=outline_color)
self.plot_image(_image=self.training_images[str(found_image_num)], _title=f'Object #{found_image_num + 1}')
def plot_image(self, _image=None, _title=None, normalize_rgb=False, outline_image_row_col=None, outline_color=None):
plot_image = self.training_images.copy() if _image is None else _image
if isinstance(plot_image, dict):
fig = plt.figure(figsize=(15 if len(self.training_images) > 3 else 10, 5))
for image_num, image in plot_image.items():
ax = plt.subplot(1, len(plot_image), int(image_num) + 1)
if outline_image_row_col is not None:
for row in range(outline_image_row_col[0], outline_image_row_col[0] + self.image_size):
for col in range(outline_image_row_col[1], outline_image_row_col[1] + self.image_size):
image[row, col, :] += 3
plt.imshow(image)
ax.set_title(f'Object #{int(image_num) + 1}' if _title is None else _title, fontsize=15, fontweight='bold')
ax.axis('off')
fig.suptitle('Training Image' if len(self.training_images) == 1 else 'Training Images', fontsize=20, fontweight='bold')
else:
fig = plt.figure(figsize=(10, 5))
ax = plt.subplot(1, 1, 1)
find_background = [np.where(self.hidden_weights == 0.0)[0][0], np.where(self.hidden_weights == 0.0)[1][0]]
outline_color = np.array([1.0, 1.0, 1.0]) - plot_image[find_background[0], find_background[1], :] if outline_color is None else outline_color
if outline_image_row_col is not None:
for row in range(outline_image_row_col[0], outline_image_row_col[0] + self.image_size):
plot_image[row, outline_image_row_col[1], :] = outline_color
plot_image[row, outline_image_row_col[1] + self.image_size, :] = outline_color
for col in range(outline_image_row_col[1], outline_image_row_col[1] + self.image_size):
plot_image[outline_image_row_col[0], col, :] = outline_color
plot_image[outline_image_row_col[0] + self.image_size, col, :] = outline_color
plt.imshow(plot_image if not normalize_rgb else self.normalize_rgb_values(plot_image))
ax.set_title('Image' if _title is None else _title, fontsize=15, fontweight='bold')
ax.axis('off')
plt.show()
def plot_weights(self, iteration=None):
if iteration is None:
iteration = len(self.weights_over_training_iterations) - 1
fig = plt.figure(figsize=(10, 5))
ax = plt.subplot(1, 1, 1)
plt.imshow(np.sum(np.reshape(self.weights_over_training_iterations[str(iteration)], [self.image_size, self.image_size, self.rgb_dim]), axis=2))
ax.set_title(f'Weights Matrix: Training Iteration #{iteration}', fontsize=15, fontweight='bold')
ax.axis('off')
plt.show()
#from PIL import Image
#training_image1 = Image.open(open('/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/IMAGES/tank_training_image1.png', 'rb'))
#training_image2 = Image.open(open('/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/IMAGES/tank_training_image2.png', 'rb'))
#test_image = Image.open(open('/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/IMAGES/tank_test_image.png', 'rb'))
nn = NN(object_shapes=['H', 'H', 'T', 'T', '|', 'H', 'H', '-'], object_colors=[[1.0, 0.0, 0.0], [1.0, 0.5, 0.0], [0.0, 1.0, 0.5], [0.25, 0.0, 1.0], [0.25, 1.0, 1.0], [1.0, 1.0, 1.0], [0.0, 0.0, 1.0], [1.0, 1.0, 1.0]])
#nn = NN(training_images=[training_image1])
nn.plot_image()
nn.plot_weights()
nn.train(iterations=100)
#nn.train(iterations=1000)
nn.plot_weights()
nn.test()
#nn.test(test_image=test_image, outline_color=np.array([1.0, 0.0, 0.0]))
Although this object detection machine learning image processing example is simple, it is still at least a bit impressive that the tank can be identified by the neural network while not only being camouflaged (as the tanks match the background with similar colors or rgb values opposed to the simulated images with the distinct black background in the test image) but also surrounded by other tanks in the image. The Python Neural Network class I provided above works on both simulated and real world images for object detection training and testing, and I included the ipynb (colab notebook) file and the tank training and testing images as png files in this repository so check it out!