Object-Detection

Object detection using Machine learning(Image AI) to label ,annotate and score objects in images and videos.We use Image AI built by Moses Olafenwa and John Olafenwa brothers, creators of TorchFusion and Authors of Introduction to Deep Computer Vision.We are extremely happy for the launch of ImageAI (v2.1.4) which is extensively used by developers/engineers/scientists/researchers across the globe.

The field of supervised learning in AI has tremondously impacted the object tracking and detection mechanisms in many ways.We are fascinated in building out custom models and datasets of our own, improving accuracy of models, optimizing the training sessions for the operating systems. The AI concept has evolved from Alan Turing (Enigma code maker and breaker), John Mccarthy (American computer scientist and cognitive scientist) who is best known as "Father Of Artificial Intelligence" together with Marvin Minsky,Allen Newell, Herbert A.The contributions made by these great scientists has made the future path very clear. The ACM A.M. Turing Award is an annual prize given by the Association for Computing Machinery (ACM) to an individual selected for contributions "of lasting and major technical importance to the computer field". The Turing Award is generally recognized as the highest distinction in computer science and the "Nobel Prize of computing".

The Following scientists are core contributers of AI/ML-

Scientists Contributions in AI

Alan J. Perlis
Maurice Wilkes
Richard Hamming
Marvin Minsky
James H. Wilkinson
John McCarthy
Edsger W. Dijkstra
Charles W. Bachman
Donald E. Knuth
Allen Newell and Herbert A. Simon
Michael O. Rabin
Dana S. Scott
John Backus
Robert W. Floyd
Kenneth E. Iverson
Tony Hoare
Edgar F. Codd
Stephen A. Cook
Ken Thompson and Dennis M. Ritchie
Niklaus Wirth
Richard M. Karp
John Hopcroft and Robert Tarjan
John Cocke
Ivan Sutherland
William Kahan
Whitfield Diffie and Martin E. Hellman
Tim Berners-Lee
John L. Hennessy and David A. Patterson
Yoshua Bengio, Geoffrey Hinton and Yann LeCun

Introduction

ImageAI supports a list of state-of-the-art Machine Learning algorithms for image prediction, custom image prediction, object detection, video detection, video object tracking and image predictions trainings. ImageAI currently supports image prediction and training using 4 different Machine Learning algorithms trained on the ImageNet-1000 dataset. ImageAI also supports object detection, video detection and object tracking using RetinaNet, YOLOv3 and TinyYOLOv3 trained on COCO dataset. Eventually, ImageAI will provide support for a wider and more specialized aspects of Computer Vision including and not limited to image recognition in special environments and special fields.

ImageAI provides classes and methods for you to run image prediction your own custom objects using your own model trained with ImageAI Model Training class. You can use your custom models trained with SqueezeNet, ResNet50, InceptionV3 and DenseNet and the JSON file containing the mapping of the custom object names.

ImageAI uses the Tensorflow backbone for it's Computer Vision operations. Tensorflow supports both CPUs and GPUs ( Specifically NVIDIA GPUs. You can get one for your PC or get a PC that has one) for machine learning and artificial intelligence algorithms' implementations.

Currently, we are using Image AI(A python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code) to track objects and score onto sports platform.

In 2018, we have used the pretrained yolo v3 model to train the logos/brands and detect in sports platform or any other events.

Dependencies

To use ImageAI in your application developments, you must have installed the following dependencies before you install ImageAI :

Python 3.5.1 (and later versions) (Support for Python 2.7 coming soon
Tensorflow 1.4.0 (and later versions)
OpenCV
Keras 2.x
pip install -U tensorflow keras opencv-python

Link to download my Nike model that we have trained with 300 images of Nike logo: https://drive.google.com/file/d/1K6mscGrFylM_0kXmjsACAoVbTZn58OfB/view?usp=sharing

Installation

To install ImageAI, run the python installation instruction below in the command line: pip3 install imageai --upgrade

ImageAI provides 4 different algorithms and model types to perform image prediction. To perform image prediction on any picture, take the following simple steps. The 4 algorithms provided for image prediction include SqueezeNet, ResNet, InceptionV3 and DenseNet. Each of these algorithms have individual model files which you must use depending on the choice of your algorithm. To download the model file for your choice of algorithm, click on any of the links below:

SqueezeNet (Size = 4.82 mb, fastest prediction time and moderate accuracy)
ResNet50 by Microsoft Research (Size = 98 mb, fast prediction time and high accuracy)
InceptionV3 by Google Brain team (Size = 91.6 mb, slow prediction time and higher accuracy)
DenseNet121 by Facebook AI Research (Size = 31.6 mb, slower prediction time and highest accuracy)

Image Prediction

FirstPrediction.py


from imageai.Prediction import ImagePrediction
import os

execution_path = os.getcwd()

prediction = ImagePrediction()
prediction.setModelTypeAsResNet()
prediction.setModelPath(os.path.join(execution_path, "resnet50_weights_tf_dim_ordering_tf_kernels.h5"))
prediction.loadModel()

predictions, probabilities = prediction.predictImage(os.path.join(execution_path, "1.jpg"), result_count=5 )
for eachPrediction, eachProbability in zip(predictions, probabilities):
    print(eachPrediction , " : " , eachProbability)

Multiple Images Prediction


from imageai.Prediction import ImagePrediction
import os

execution_path = os.getcwd()

multiple_prediction = ImagePrediction()
multiple_prediction.setModelTypeAsResNet()
multiple_prediction.setModelPath(os.path.join(execution_path, "resnet50_weights_tf_dim_ordering_tf_kernels.h5"))
multiple_prediction.loadModel()

all_images_array = []

all_files = os.listdir(execution_path)
for each_file in all_files:
    if(each_file.endswith(".jpg") or each_file.endswith(".png")):
        all_images_array.append(each_file)

results_array = multiple_prediction.predictMultipleImages(all_images_array, result_count_per_image=5)

for each_result in results_array:
    predictions, percentage_probabilities = each_result["predictions"], each_result["percentage_probabilities"]
    for index in range(len(predictions)):
        print(predictions[index] , " : " , percentage_probabilities[index])
    print("-----------------------")

Prediction in MultiThreading

When developing programs that run heavy task on the deafult thread like User Interfaces (UI), you should consider running your predictions in a new thread. When running image prediction using ImageAI in a new thread, you must take note the following:

You can create your prediction object, set its model type, set model path and json path outside the new thread. The .loadModel() must be in the new thread and image prediction (predictImage()) must take place in th new thread. Take a look of a sample code below on image prediction using multithreading:

from imageai.Prediction import ImagePrediction
import os
import threading

execution_path = os.getcwd()

prediction = ImagePrediction()
prediction.setModelTypeAsResNet()
prediction.setModelPath( os.path.join(execution_path, "resnet50_weights_tf_dim_ordering_tf_kernels.h5"))

picturesfolder = os.environ["USERPROFILE"] + "\\Pictures\\"
allfiles = os.listdir(picturesfolder)

class PredictionThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)
    def run(self):
        prediction.loadModel()
        for eachPicture in allfiles:
            if eachPicture.endswith(".png") or eachPicture.endswith(".jpg"):
                predictions, percentage_probabilities = prediction.predictImage(picturesfolder + eachPicture, result_count=1)
                for prediction, percentage_probability in zip(predictions, probabilities):
                    print(prediction , " : " , percentage_probability)

predictionThread = PredictionThread ()
predictionThread.start()

ImageAI : Custom Prediction Model Training

ImageAI provides the most simple and powerful approach to training custom image prediction models using state-of-the-art SqueezeNet, ResNet50, InceptionV3 and DenseNet which you can load into the imageai.Prediction.Custom.CustomImagePrediction class. This allows you to train your own model on any set of images that corresponds to any type of objects/persons. The training process generates a JSON file that maps the objects types in your image dataset and creates lots of models. You will then pick the model with the highest accuracy and perform custom image prediction using the model and the JSON file generated.

Custom Model Training Prediction

Saving Full Custom Model (NEW)
Training on the IdenProf Dataset
Continuous Model Training (NEW)
Transfer Learning (Training from a pre-trained model) (NEW)

To train a custom prediction model, you need to prepare the images you want to use to train the model. You will prepare the images as follows:

Create a dataset folder with the name you will like your dataset to be called (e.g pets)
In the dataset folder, create a folder by the name train
In the dataset folder, create a folder by the name test
In the train folder, create a folder for each object you want to the model to predict and give the folder a name that corresponds to the respective object name (e.g dog, cat, squirrel, snake)
In the test folder, create a folder for each object you want to the model to predict and give the folder a name that corresponds to the respective object name (e.g dog, cat, squirrel, snake)
In each folder present in the train folder, put the images of each object in its respective folder. This images are the ones to be used to train the model To produce a model that can perform well in practical applications, I recommend you about 500 or more images per object. 1000 images per object is just great
In each folder present in the test folder, put about 100 to 200 images of each object in its respective folder. These images are the ones to be used to test the model as it trains
Once you have done this, the structure of your image dataset folder should look like below: pets//train//dog//dog-train-images pets//train//cat//cat-train-images pets//train//squirrel//squirrel-train-images pets//train//snake//snake-train-images pets//test//dog//dog-test-images pets//test//cat//cat-test-images pets//test//squirrel//squirrel-test-images pets//test//snake//snake-test-images

#Then your training code goes as follows:

from imageai.Prediction.Custom import ModelTraining

model_trainer = ModelTraining()
model_trainer.setModelTypeAsResNet()
model_trainer.setDataDirectory("pets")
model_trainer.trainModel(num_objects=4, num_experiments=100, enhance_data=True, batch_size=32, show_network_summary=True)

In the code above, we start the training process. The parameters stated in the function are as below:

num_objects : this is to state the number of object types in the image dataset
num_experiments : this is to state the number of times the network will train over all the training images, which is also called epochs
enhance_data (optional) : This is used to state if we want the network to produce modified copies of the training images for better performance.
batch_size : This is to state the number of images the network will process at ones. The images are processed in batches until they are exhausted per each experiment performed.
show_network_summary : This is to state if the network should show the structure of the training network in the console.

When you start the training, you should see something like this in the console:


Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_2 (InputLayer)             (None, 224, 224, 3)   0
____________________________________________________________________________________________________
conv2d_1 (Conv2D)                (None, 112, 112, 64)  9472        input_2[0][0]
____________________________________________________________________________________________________
batch_normalization_1 (BatchNorm (None, 112, 112, 64)  256         conv2d_1[0][0]
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 112, 112, 64)  0           batch_normalization_1[0][0]
____________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)   (None, 55, 55, 64)    0           activation_1[0][0]
____________________________________________________________________________________________________
conv2d_3 (Conv2D)                (None, 55, 55, 64)    4160        max_pooling2d_1[0][0]
____________________________________________________________________________________________________
batch_normalization_3 (BatchNorm (None, 55, 55, 64)    256         conv2d_3[0][0]
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 55, 55, 64)    0           batch_normalization_3[0][0]
____________________________________________________________________________________________________
conv2d_4 (Conv2D)                (None, 55, 55, 64)    36928       activation_2[0][0]
____________________________________________________________________________________________________
batch_normalization_4 (BatchNorm (None, 55, 55, 64)    256         conv2d_4[0][0]
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 55, 55, 64)    0           batch_normalization_4[0][0]
____________________________________________________________________________________________________
conv2d_5 (Conv2D)                (None, 55, 55, 256)   16640       activation_3[0][0]
____________________________________________________________________________________________________
conv2d_2 (Conv2D)                (None, 55, 55, 256)   16640       max_pooling2d_1[0][0]
____________________________________________________________________________________________________
batch_normalization_5 (BatchNorm (None, 55, 55, 256)   1024        conv2d_5[0][0]
____________________________________________________________________________________________________
batch_normalization_2 (BatchNorm (None, 55, 55, 256)   1024        conv2d_2[0][0]
____________________________________________________________________________________________________
add_1 (Add)                      (None, 55, 55, 256)   0           batch_normalization_5[0][0]
                                                                   batch_normalization_2[0][0]
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 55, 55, 256)   0           add_1[0][0]
____________________________________________________________________________________________________
conv2d_6 (Conv2D)                (None, 55, 55, 64)    16448       activation_4[0][0]
____________________________________________________________________________________________________
batch_normalization_6 (BatchNorm (None, 55, 55, 64)    256         conv2d_6[0][0]
____________________________________________________________________________________________________
activation_5 (Activation)        (None, 55, 55, 64)    0           batch_normalization_6[0][0]
____________________________________________________________________________________________________
conv2d_7 (Conv2D)                (None, 55, 55, 64)    36928       activation_5[0][0]
____________________________________________________________________________________________________
batch_normalization_7 (BatchNorm (None, 55, 55, 64)    256         conv2d_7[0][0]
____________________________________________________________________________________________________
activation_6 (Activation)        (None, 55, 55, 64)    0           batch_normalization_7[0][0]
____________________________________________________________________________________________________
conv2d_8 (Conv2D)                (None, 55, 55, 256)   16640       activation_6[0][0]
____________________________________________________________________________________________________
batch_normalization_8 (BatchNorm (None, 55, 55, 256)   1024        conv2d_8[0][0]
____________________________________________________________________________________________________
add_2 (Add)                      (None, 55, 55, 256)   0           batch_normalization_8[0][0]
                                                                   activation_4[0][0]
____________________________________________________________________________________________________
activation_7 (Activation)        (None, 55, 55, 256)   0           add_2[0][0]
____________________________________________________________________________________________________
conv2d_9 (Conv2D)                (None, 55, 55, 64)    16448       activation_7[0][0]
____________________________________________________________________________________________________
batch_normalization_9 (BatchNorm (None, 55, 55, 64)    256         conv2d_9[0][0]
____________________________________________________________________________________________________
activation_8 (Activation)        (None, 55, 55, 64)    0           batch_normalization_9[0][0]
____________________________________________________________________________________________________
conv2d_10 (Conv2D)               (None, 55, 55, 64)    36928       activation_8[0][0]
____________________________________________________________________________________________________
batch_normalization_10 (BatchNor (None, 55, 55, 64)    256         conv2d_10[0][0]
____________________________________________________________________________________________________
activation_9 (Activation)        (None, 55, 55, 64)    0           batch_normalization_10[0][0]
____________________________________________________________________________________________________
conv2d_11 (Conv2D)               (None, 55, 55, 256)   16640       activation_9[0][0]
____________________________________________________________________________________________________
batch_normalization_11 (BatchNor (None, 55, 55, 256)   1024        conv2d_11[0][0]
____________________________________________________________________________________________________
add_3 (Add)                      (None, 55, 55, 256)   0           batch_normalization_11[0][0]
                                                                   activation_7[0][0]
____________________________________________________________________________________________________
activation_10 (Activation)       (None, 55, 55, 256)   0           add_3[0][0]
____________________________________________________________________________________________________
conv2d_13 (Conv2D)               (None, 28, 28, 128)   32896       activation_10[0][0]
____________________________________________________________________________________________________
batch_normalization_13 (BatchNor (None, 28, 28, 128)   512         conv2d_13[0][0]
____________________________________________________________________________________________________
activation_11 (Activation)       (None, 28, 28, 128)   0           batch_normalization_13[0][0]
____________________________________________________________________________________________________
conv2d_14 (Conv2D)               (None, 28, 28, 128)   147584      activation_11[0][0]
____________________________________________________________________________________________________
batch_normalization_14 (BatchNor (None, 28, 28, 128)   512         conv2d_14[0][0]
____________________________________________________________________________________________________
activation_12 (Activation)       (None, 28, 28, 128)   0           batch_normalization_14[0][0]
____________________________________________________________________________________________________
conv2d_15 (Conv2D)               (None, 28, 28, 512)   66048       activation_12[0][0]
____________________________________________________________________________________________________
conv2d_12 (Conv2D)               (None, 28, 28, 512)   131584      activation_10[0][0]
____________________________________________________________________________________________________
batch_normalization_15 (BatchNor (None, 28, 28, 512)   2048        conv2d_15[0][0]
____________________________________________________________________________________________________
batch_normalization_12 (BatchNor (None, 28, 28, 512)   2048        conv2d_12[0][0]
____________________________________________________________________________________________________
add_4 (Add)                      (None, 28, 28, 512)   0           batch_normalization_15[0][0]
                                                                   batch_normalization_12[0][0]
____________________________________________________________________________________________________
activation_13 (Activation)       (None, 28, 28, 512)   0           add_4[0][0]
____________________________________________________________________________________________________
conv2d_16 (Conv2D)               (None, 28, 28, 128)   65664       activation_13[0][0]
____________________________________________________________________________________________________
batch_normalization_16 (BatchNor (None, 28, 28, 128)   512         conv2d_16[0][0]
____________________________________________________________________________________________________
activation_14 (Activation)       (None, 28, 28, 128)   0           batch_normalization_16[0][0]
____________________________________________________________________________________________________
conv2d_17 (Conv2D)               (None, 28, 28, 128)   147584      activation_14[0][0]
____________________________________________________________________________________________________
batch_normalization_17 (BatchNor (None, 28, 28, 128)   512         conv2d_17[0][0]
____________________________________________________________________________________________________
activation_15 (Activation)       (None, 28, 28, 128)   0           batch_normalization_17[0][0]
____________________________________________________________________________________________________
conv2d_18 (Conv2D)               (None, 28, 28, 512)   66048       activation_15[0][0]
____________________________________________________________________________________________________
batch_normalization_18 (BatchNor (None, 28, 28, 512)   2048        conv2d_18[0][0]
____________________________________________________________________________________________________
add_5 (Add)                      (None, 28, 28, 512)   0           batch_normalization_18[0][0]
                                                                   activation_13[0][0]
____________________________________________________________________________________________________
activation_16 (Activation)       (None, 28, 28, 512)   0           add_5[0][0]
____________________________________________________________________________________________________
conv2d_19 (Conv2D)               (None, 28, 28, 128)   65664       activation_16[0][0]
____________________________________________________________________________________________________
batch_normalization_19 (BatchNor (None, 28, 28, 128)   512         conv2d_19[0][0]
____________________________________________________________________________________________________
activation_17 (Activation)       (None, 28, 28, 128)   0           batch_normalization_19[0][0]
____________________________________________________________________________________________________
conv2d_20 (Conv2D)               (None, 28, 28, 128)   147584      activation_17[0][0]
____________________________________________________________________________________________________
batch_normalization_20 (BatchNor (None, 28, 28, 128)   512         conv2d_20[0][0]
____________________________________________________________________________________________________
activation_18 (Activation)       (None, 28, 28, 128)   0           batch_normalization_20[0][0]
____________________________________________________________________________________________________
conv2d_21 (Conv2D)               (None, 28, 28, 512)   66048       activation_18[0][0]
____________________________________________________________________________________________________
batch_normalization_21 (BatchNor (None, 28, 28, 512)   2048        conv2d_21[0][0]
____________________________________________________________________________________________________
add_6 (Add)                      (None, 28, 28, 512)   0           batch_normalization_21[0][0]
                                                                   activation_16[0][0]
____________________________________________________________________________________________________
activation_19 (Activation)       (None, 28, 28, 512)   0           add_6[0][0]
____________________________________________________________________________________________________
conv2d_22 (Conv2D)               (None, 28, 28, 128)   65664       activation_19[0][0]
____________________________________________________________________________________________________
batch_normalization_22 (BatchNor (None, 28, 28, 128)   512         conv2d_22[0][0]
____________________________________________________________________________________________________
activation_20 (Activation)       (None, 28, 28, 128)   0           batch_normalization_22[0][0]
____________________________________________________________________________________________________
conv2d_23 (Conv2D)               (None, 28, 28, 128)   147584      activation_20[0][0]
____________________________________________________________________________________________________
batch_normalization_23 (BatchNor (None, 28, 28, 128)   512         conv2d_23[0][0]
____________________________________________________________________________________________________
activation_21 (Activation)       (None, 28, 28, 128)   0           batch_normalization_23[0][0]
____________________________________________________________________________________________________
conv2d_24 (Conv2D)               (None, 28, 28, 512)   66048       activation_21[0][0]
____________________________________________________________________________________________________
batch_normalization_24 (BatchNor (None, 28, 28, 512)   2048        conv2d_24[0][0]
____________________________________________________________________________________________________
add_7 (Add)                      (None, 28, 28, 512)   0           batch_normalization_24[0][0]
                                                                   activation_19[0][0]
____________________________________________________________________________________________________
activation_22 (Activation)       (None, 28, 28, 512)   0           add_7[0][0]
____________________________________________________________________________________________________
conv2d_26 (Conv2D)               (None, 14, 14, 256)   131328      activation_22[0][0]
____________________________________________________________________________________________________
batch_normalization_26 (BatchNor (None, 14, 14, 256)   1024        conv2d_26[0][0]
____________________________________________________________________________________________________
activation_23 (Activation)       (None, 14, 14, 256)   0           batch_normalization_26[0][0]
____________________________________________________________________________________________________
conv2d_27 (Conv2D)               (None, 14, 14, 256)   590080      activation_23[0][0]
____________________________________________________________________________________________________
batch_normalization_27 (BatchNor (None, 14, 14, 256)   1024        conv2d_27[0][0]
____________________________________________________________________________________________________
activation_24 (Activation)       (None, 14, 14, 256)   0           batch_normalization_27[0][0]
____________________________________________________________________________________________________
conv2d_28 (Conv2D)               (None, 14, 14, 1024)  263168      activation_24[0][0]
____________________________________________________________________________________________________
conv2d_25 (Conv2D)               (None, 14, 14, 1024)  525312      activation_22[0][0]
____________________________________________________________________________________________________
batch_normalization_28 (BatchNor (None, 14, 14, 1024)  4096        conv2d_28[0][0]
____________________________________________________________________________________________________
batch_normalization_25 (BatchNor (None, 14, 14, 1024)  4096        conv2d_25[0][0]
____________________________________________________________________________________________________
add_8 (Add)                      (None, 14, 14, 1024)  0           batch_normalization_28[0][0]
                                                                   batch_normalization_25[0][0]
____________________________________________________________________________________________________
activation_25 (Activation)       (None, 14, 14, 1024)  0           add_8[0][0]
____________________________________________________________________________________________________
conv2d_29 (Conv2D)               (None, 14, 14, 256)   262400      activation_25[0][0]
____________________________________________________________________________________________________
batch_normalization_29 (BatchNor (None, 14, 14, 256)   1024        conv2d_29[0][0]
____________________________________________________________________________________________________
activation_26 (Activation)       (None, 14, 14, 256)   0           batch_normalization_29[0][0]
____________________________________________________________________________________________________
conv2d_30 (Conv2D)               (None, 14, 14, 256)   590080      activation_26[0][0]
____________________________________________________________________________________________________
batch_normalization_30 (BatchNor (None, 14, 14, 256)   1024        conv2d_30[0][0]
____________________________________________________________________________________________________
activation_27 (Activation)       (None, 14, 14, 256)   0           batch_normalization_30[0][0]
____________________________________________________________________________________________________
conv2d_31 (Conv2D)               (None, 14, 14, 1024)  263168      activation_27[0][0]
____________________________________________________________________________________________________
batch_normalization_31 (BatchNor (None, 14, 14, 1024)  4096        conv2d_31[0][0]
____________________________________________________________________________________________________
add_9 (Add)                      (None, 14, 14, 1024)  0           batch_normalization_31[0][0]
                                                                   activation_25[0][0]
____________________________________________________________________________________________________
activation_28 (Activation)       (None, 14, 14, 1024)  0           add_9[0][0]
____________________________________________________________________________________________________
conv2d_32 (Conv2D)               (None, 14, 14, 256)   262400      activation_28[0][0]
____________________________________________________________________________________________________
batch_normalization_32 (BatchNor (None, 14, 14, 256)   1024        conv2d_32[0][0]
____________________________________________________________________________________________________
activation_29 (Activation)       (None, 14, 14, 256)   0           batch_normalization_32[0][0]
____________________________________________________________________________________________________
conv2d_33 (Conv2D)               (None, 14, 14, 256)   590080      activation_29[0][0]
____________________________________________________________________________________________________
batch_normalization_33 (BatchNor (None, 14, 14, 256)   1024        conv2d_33[0][0]
____________________________________________________________________________________________________
activation_30 (Activation)       (None, 14, 14, 256)   0           batch_normalization_33[0][0]
____________________________________________________________________________________________________
conv2d_34 (Conv2D)               (None, 14, 14, 1024)  263168      activation_30[0][0]
____________________________________________________________________________________________________
batch_normalization_34 (BatchNor (None, 14, 14, 1024)  4096        conv2d_34[0][0]
____________________________________________________________________________________________________
add_10 (Add)                     (None, 14, 14, 1024)  0           batch_normalization_34[0][0]
                                                                   activation_28[0][0]
____________________________________________________________________________________________________
activation_31 (Activation)       (None, 14, 14, 1024)  0           add_10[0][0]
____________________________________________________________________________________________________
conv2d_35 (Conv2D)               (None, 14, 14, 256)   262400      activation_31[0][0]
____________________________________________________________________________________________________
batch_normalization_35 (BatchNor (None, 14, 14, 256)   1024        conv2d_35[0][0]
____________________________________________________________________________________________________
activation_32 (Activation)       (None, 14, 14, 256)   0           batch_normalization_35[0][0]
____________________________________________________________________________________________________
conv2d_36 (Conv2D)               (None, 14, 14, 256)   590080      activation_32[0][0]
____________________________________________________________________________________________________
batch_normalization_36 (BatchNor (None, 14, 14, 256)   1024        conv2d_36[0][0]
____________________________________________________________________________________________________
activation_33 (Activation)       (None, 14, 14, 256)   0           batch_normalization_36[0][0]
____________________________________________________________________________________________________
conv2d_37 (Conv2D)               (None, 14, 14, 1024)  263168      activation_33[0][0]
____________________________________________________________________________________________________
batch_normalization_37 (BatchNor (None, 14, 14, 1024)  4096        conv2d_37[0][0]
____________________________________________________________________________________________________
add_11 (Add)                     (None, 14, 14, 1024)  0           batch_normalization_37[0][0]
                                                                   activation_31[0][0]
____________________________________________________________________________________________________
activation_34 (Activation)       (None, 14, 14, 1024)  0           add_11[0][0]
____________________________________________________________________________________________________
conv2d_38 (Conv2D)               (None, 14, 14, 256)   262400      activation_34[0][0]
____________________________________________________________________________________________________
batch_normalization_38 (BatchNor (None, 14, 14, 256)   1024        conv2d_38[0][0]
____________________________________________________________________________________________________
activation_35 (Activation)       (None, 14, 14, 256)   0           batch_normalization_38[0][0]
____________________________________________________________________________________________________
conv2d_39 (Conv2D)               (None, 14, 14, 256)   590080      activation_35[0][0]
____________________________________________________________________________________________________
batch_normalization_39 (BatchNor (None, 14, 14, 256)   1024        conv2d_39[0][0]
____________________________________________________________________________________________________
activation_36 (Activation)       (None, 14, 14, 256)   0           batch_normalization_39[0][0]
____________________________________________________________________________________________________
conv2d_40 (Conv2D)               (None, 14, 14, 1024)  263168      activation_36[0][0]
____________________________________________________________________________________________________
batch_normalization_40 (BatchNor (None, 14, 14, 1024)  4096        conv2d_40[0][0]
____________________________________________________________________________________________________
add_12 (Add)                     (None, 14, 14, 1024)  0           batch_normalization_40[0][0]
                                                                   activation_34[0][0]
____________________________________________________________________________________________________
activation_37 (Activation)       (None, 14, 14, 1024)  0           add_12[0][0]
____________________________________________________________________________________________________
conv2d_41 (Conv2D)               (None, 14, 14, 256)   262400      activation_37[0][0]
____________________________________________________________________________________________________
batch_normalization_41 (BatchNor (None, 14, 14, 256)   1024        conv2d_41[0][0]
____________________________________________________________________________________________________
activation_38 (Activation)       (None, 14, 14, 256)   0           batch_normalization_41[0][0]
____________________________________________________________________________________________________
conv2d_42 (Conv2D)               (None, 14, 14, 256)   590080      activation_38[0][0]
____________________________________________________________________________________________________
batch_normalization_42 (BatchNor (None, 14, 14, 256)   1024        conv2d_42[0][0]
____________________________________________________________________________________________________
activation_39 (Activation)       (None, 14, 14, 256)   0           batch_normalization_42[0][0]
____________________________________________________________________________________________________
conv2d_43 (Conv2D)               (None, 14, 14, 1024)  263168      activation_39[0][0]
____________________________________________________________________________________________________
batch_normalization_43 (BatchNor (None, 14, 14, 1024)  4096        conv2d_43[0][0]
____________________________________________________________________________________________________
add_13 (Add)                     (None, 14, 14, 1024)  0           batch_normalization_43[0][0]
                                                                   activation_37[0][0]
____________________________________________________________________________________________________
activation_40 (Activation)       (None, 14, 14, 1024)  0           add_13[0][0]
____________________________________________________________________________________________________
conv2d_45 (Conv2D)               (None, 7, 7, 512)     524800      activation_40[0][0]
____________________________________________________________________________________________________
batch_normalization_45 (BatchNor (None, 7, 7, 512)     2048        conv2d_45[0][0]
____________________________________________________________________________________________________
activation_41 (Activation)       (None, 7, 7, 512)     0           batch_normalization_45[0][0]
____________________________________________________________________________________________________
conv2d_46 (Conv2D)               (None, 7, 7, 512)     2359808     activation_41[0][0]
____________________________________________________________________________________________________
batch_normalization_46 (BatchNor (None, 7, 7, 512)     2048        conv2d_46[0][0]
____________________________________________________________________________________________________
activation_42 (Activation)       (None, 7, 7, 512)     0           batch_normalization_46[0][0]
____________________________________________________________________________________________________
conv2d_47 (Conv2D)               (None, 7, 7, 2048)    1050624     activation_42[0][0]
____________________________________________________________________________________________________
conv2d_44 (Conv2D)               (None, 7, 7, 2048)    2099200     activation_40[0][0]
____________________________________________________________________________________________________
batch_normalization_47 (BatchNor (None, 7, 7, 2048)    8192        conv2d_47[0][0]
____________________________________________________________________________________________________
batch_normalization_44 (BatchNor (None, 7, 7, 2048)    8192        conv2d_44[0][0]
____________________________________________________________________________________________________
add_14 (Add)                     (None, 7, 7, 2048)    0           batch_normalization_47[0][0]
                                                                   batch_normalization_44[0][0]
____________________________________________________________________________________________________
activation_43 (Activation)       (None, 7, 7, 2048)    0           add_14[0][0]
____________________________________________________________________________________________________
conv2d_48 (Conv2D)               (None, 7, 7, 512)     1049088     activation_43[0][0]
____________________________________________________________________________________________________
batch_normalization_48 (BatchNor (None, 7, 7, 512)     2048        conv2d_48[0][0]
____________________________________________________________________________________________________
activation_44 (Activation)       (None, 7, 7, 512)     0           batch_normalization_48[0][0]
____________________________________________________________________________________________________
conv2d_49 (Conv2D)               (None, 7, 7, 512)     2359808     activation_44[0][0]
____________________________________________________________________________________________________
batch_normalization_49 (BatchNor (None, 7, 7, 512)     2048        conv2d_49[0][0]
____________________________________________________________________________________________________
activation_45 (Activation)       (None, 7, 7, 512)     0           batch_normalization_49[0][0]
____________________________________________________________________________________________________
conv2d_50 (Conv2D)               (None, 7, 7, 2048)    1050624     activation_45[0][0]
____________________________________________________________________________________________________
batch_normalization_50 (BatchNor (None, 7, 7, 2048)    8192        conv2d_50[0][0]
____________________________________________________________________________________________________
add_15 (Add)                     (None, 7, 7, 2048)    0           batch_normalization_50[0][0]
                                                                   activation_43[0][0]
____________________________________________________________________________________________________
activation_46 (Activation)       (None, 7, 7, 2048)    0           add_15[0][0]
____________________________________________________________________________________________________
conv2d_51 (Conv2D)               (None, 7, 7, 512)     1049088     activation_46[0][0]
____________________________________________________________________________________________________
batch_normalization_51 (BatchNor (None, 7, 7, 512)     2048        conv2d_51[0][0]
____________________________________________________________________________________________________
activation_47 (Activation)       (None, 7, 7, 512)     0           batch_normalization_51[0][0]
____________________________________________________________________________________________________
conv2d_52 (Conv2D)               (None, 7, 7, 512)     2359808     activation_47[0][0]
____________________________________________________________________________________________________
batch_normalization_52 (BatchNor (None, 7, 7, 512)     2048        conv2d_52[0][0]
____________________________________________________________________________________________________
activation_48 (Activation)       (None, 7, 7, 512)     0           batch_normalization_52[0][0]
____________________________________________________________________________________________________
conv2d_53 (Conv2D)               (None, 7, 7, 2048)    1050624     activation_48[0][0]
____________________________________________________________________________________________________
batch_normalization_53 (BatchNor (None, 7, 7, 2048)    8192        conv2d_53[0][0]
____________________________________________________________________________________________________
add_16 (Add)                     (None, 7, 7, 2048)    0           batch_normalization_53[0][0]
                                                                   activation_46[0][0]
____________________________________________________________________________________________________
activation_49 (Activation)       (None, 7, 7, 2048)    0           add_16[0][0]
____________________________________________________________________________________________________
global_avg_pooling (GlobalAverag (None, 2048)          0           activation_49[0][0]
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            20490       global_avg_pooling[0][0]
____________________________________________________________________________________________________
activation_50 (Activation)       (None, 10)            0           dense_1[0][0]
====================================================================================================
Total params: 23,608,202
Trainable params: 23,555,082
Non-trainable params: 53,120
____________________________________________________________________________________________________
Using Enhanced Data Generation
Found 4000 images belonging to 4 classes.
Found 800 images belonging to 4 classes.
JSON Mapping for the model classes saved to  C:\Users\User\PycharmProjects\ImageAITest\pets\json\model_class.json
Number of experiments (Epochs) :  100
When the training progress progresses, you will see results as follows in the console:

Epoch 1/100
 1/25 [>.............................] - ETA: 52s - loss: 2.3026 - acc: 0.2500
 2/25 [=>............................] - ETA: 41s - loss: 2.3027 - acc: 0.1250
 3/25 [==>...........................] - ETA: 37s - loss: 2.2961 - acc: 0.1667
 4/25 [===>..........................] - ETA: 36s - loss: 2.2980 - acc: 0.1250
 5/25 [=====>........................] - ETA: 33s - loss: 2.3178 - acc: 0.1000
 6/25 [======>.......................] - ETA: 31s - loss: 2.3214 - acc: 0.0833
 7/25 [=======>......................] - ETA: 30s - loss: 2.3202 - acc: 0.0714
 8/25 [========>.....................] - ETA: 29s - loss: 2.3207 - acc: 0.0625
 9/25 [=========>....................] - ETA: 27s - loss: 2.3191 - acc: 0.0556
10/25 [===========>..................] - ETA: 25s - loss: 2.3167 - acc: 0.0750
11/25 [============>.................] - ETA: 23s - loss: 2.3162 - acc: 0.0682
12/25 [=============>................] - ETA: 21s - loss: 2.3143 - acc: 0.0833
13/25 [==============>...............] - ETA: 20s - loss: 2.3135 - acc: 0.0769
14/25 [===============>..............] - ETA: 18s - loss: 2.3132 - acc: 0.0714
15/25 [=================>............] - ETA: 16s - loss: 2.3128 - acc: 0.0667
16/25 [==================>...........] - ETA: 15s - loss: 2.3121 - acc: 0.0781
17/25 [===================>..........] - ETA: 13s - loss: 2.3116 - acc: 0.0735
18/25 [====================>.........] - ETA: 12s - loss: 2.3114 - acc: 0.0694
19/25 [=====================>........] - ETA: 10s - loss: 2.3112 - acc: 0.0658
20/25 [=======================>......] - ETA: 8s - loss: 2.3109 - acc: 0.0625
21/25 [========================>.....] - ETA: 7s - loss: 2.3107 - acc: 0.0595
22/25 [=========================>....] - ETA: 5s - loss: 2.3104 - acc: 0.0568
23/25 [==========================>...] - ETA: 3s - loss: 2.3101 - acc: 0.0543
24/25 [===========================>..] - ETA: 1s - loss: 2.3097 - acc: 0.0625Epoch 00000: saving model to C:\Users\Moses\Documents\Moses\W7\AI\Custom Datasets\IDENPROF\idenprof-small-test\idenprof\models\model_ex-000_acc-0.100000.h5

25/25 [==============================] - 51s - loss: 2.3095 - acc: 0.0600 - val_loss: 2.3026 - val_acc: 0.1000

Continuous Model Training

ImageAI now allows you to continue training your custom model on your previously saved model. This is useful in cases of incomplete training due compute time limits/large size of dataset or should you intend to further train your model.Kindly note that continuous training is for using a previously saved model to train on the same dataset the model was trained on. All you need to do is specify the continue_from_model parameter to the path of the previously saved model in your trainModel() function. See an example code below.

from imageai.Prediction.Custom import ModelTraining
import os

trainer = ModelTraining()
trainer.setModelTypeAsDenseNet()
trainer.setDataDirectory("idenprof")
trainer.trainModel(num_objects=10, num_experiments=50, enhance_data=True, batch_size=8, show_network_summary=True, continue_from_model="idenprof_densenet-0.763500.h5")

Transfer Learning (Training from a pre-trained model)

from imageai.Prediction.Custom import ModelTraining
import os

trainer = ModelTraining()
trainer.setModelTypeAsResNet()
trainer.setDataDirectory("idenprof")
trainer.trainModel(num_objects=10, num_experiments=50, enhance_data=True, batch_size=32, show_network_summary=True,transfer_from_model="resnet50_weights_tf_dim_ordering_tf_kernels.h5", initial_num_objects=1000)

Prediction Speed ImageAI now provides prediction speeds for all image prediction tasks. The prediction speeds allow you to reduce the time of prediction at a rate between 20% - 60%, and yet having just slight changes but accurate prediction results. The available prediction speeds are "normal"(default), "fast", "faster" and "fastest". All you need to do is to state the speed mode you desire when loading the model as seen below.

prediction.loadModel(prediction_speed="fast")

To observe the differences in the prediction speeds, look below for each speed applied to multiple prediction with time taken to predict and predictions given. The results below are obtained from predictions performed on a Windows 8 laptop with Intel Celeron N2820 CPU, with processor speed of 2.13GHz

Prediction Speed = "normal" , Prediction Time = 5.9 seconds

convertible : 52.459555864334106
sports_car : 37.61284649372101
pickup : 3.1751200556755066
car_wheel : 1.817505806684494
minivan : 1.7487050965428352
-----------------------
toilet_tissue : 13.99008333683014
jeep : 6.842949986457825
car_wheel : 6.71963095664978
seat_belt : 6.704962253570557
minivan : 5.861184373497963
-----------------------
bustard : 52.03368067741394
vulture : 20.936034619808197
crane : 10.620515048503876
kite : 10.20539253950119
white_stork : 1.6472270712256432

Prediction Speed = "fast" , Prediction Time = 3.4 seconds

sports_car : 55.5136501789093
pickup : 19.860029220581055
convertible : 17.88402795791626
tow_truck : 2.357563190162182
car_wheel : 1.8646160140633583
-----------------------
drum : 12.241223454475403
toilet_tissue : 10.96322312951088
car_wheel : 10.776633024215698
dial_telephone : 9.840480983257294
toilet_seat : 8.989936858415604
-----------------------
vulture : 52.81011462211609
bustard : 45.628002285957336
kite : 0.8065823465585709
goose : 0.3629807382822037
crane : 0.21266008261591196
-----------------------

Prediction Speed = "faster" , Prediction Time = 2.7 seconds


sports_car : 79.90474104881287
tow_truck : 9.751049429178238
convertible : 7.056044787168503
racer : 1.8735893070697784
car_wheel : 0.7379394955933094
-----------------------
oil_filter : 73.52778315544128
jeep : 11.926891654729843
reflex_camera : 7.9965077340602875
Polaroid_camera : 0.9798810817301273
barbell : 0.8661789819598198
-----------------------
vulture : 93.00530552864075
bustard : 6.636220961809158
kite : 0.15161558985710144
bald_eagle : 0.10513027664273977
crane : 0.05982434959150851
----------------------

ImageAI : Video Object Detection, Tracking and Analysis

-First Video Object Detection
-Custom Video Object Detection (Object Tracking)
-Camera / Live Stream Video Detection
-Video Analysis
-Detection Speed
- Hiding/Showing Object Name and Probability
-Frame Detection Intervals
-Video Detection Timeout (NEW)

ImageAI provides convenient, flexible and powerful methods to perform object detection on videos. The video object detection class provided only supports RetinaNet, YOLOv3 and TinyYOLOv3. This version of ImageAI provides commercial grade video objects detection features, which include but not limited to device/IP camera inputs, per frame, per second, per minute and entire video analysis for storing in databases and/or real-time visualizations and for future insights.

-RetinaNet (Size = 145 mb, high performance and accuracy, with longer detection time)
-YOLOv3 (Size = 237 mb, moderate performance and accuracy, with a moderate detection time)
-TinyYOLOv3 (Size = 34 mb, optimized for speed and moderate performance, with fast detection time)

Because video object detection is a compute intensive tasks, we advise you perform this experiment using a computer with a NVIDIA GPU and the GPU version of Tensorflow installed. Performing Video Object Detection CPU will be slower than using an NVIDIA GPU powered computer. You can use Google Colab for this experiment as it has an NVIDIA K80 GPU available for free.

Once you download the object detection model file, you should copy the model file to the your project folder where your .py files will be. Then create a python file and give it a name; an example is FirstVideoObjectDetection.py. Then write the code below into the python file

FirstVideoObjectDetection.py

from imageai.Detection import VideoObjectDetection
import os

execution_path = os.getcwd()

detector = VideoObjectDetection()
detector.setModelTypeAsRetinaNet()
detector.setModelPath( os.path.join(execution_path , "resnet50_coco_best_v2.0.1.h5"))
detector.loadModel()

video_path = detector.detectObjectsFromVideo(input_file_path=os.path.join(execution_path, "traffic.mp4"),
                                output_file_path=os.path.join(execution_path, "traffic_detected")
                                , frames_per_second=20, log_progress=True)
print(video_path)

Interestingly, ImageAI allow you to perform detection for one or more of the items above. That means you can customize the type of object(s) you want to be detected in the video. Let's take a look at the code below:

from imageai.Detection import VideoObjectDetection
import os

execution_path = os.getcwd()

detector = VideoObjectDetection()
detector.setModelTypeAsRetinaNet()
detector.setModelPath( os.path.join(execution_path , "resnet50_coco_best_v2.0.1.h5"))
detector.loadModel()

custom_objects = detector.CustomObjects(person=True, bicycle=True, motorcycle=True)

video_path = detector.detectCustomObjectsFromVideo(
                custom_objects=custom_objects,
                input_file_path=os.path.join(execution_path, "traffic.mp4"),
                output_file_path=os.path.join(execution_path, "traffic_custom_detected"),
                frames_per_second=20, log_progress=True)
print(video_path)

Camera / Live Stream Video Detection

ImageAI now allows live-video detection with support for camera inputs. Using OpenCV's VideoCapture() function, you can load live-video streams from a device camera, cameras connected by cable or IP cameras, and parse it into ImageAI's detectObjectsFromVideo() and detectCustomObjectsFromVideo() functions. All features that are supported for detecting objects in a video file is also available for detecting objects in a camera's live-video feed. Find below an example of detecting live-video feed from the device camera.

from imageai.Detection import VideoObjectDetection
import os
import cv2

execution_path = os.getcwd()


camera = cv2.VideoCapture(0)

detector = VideoObjectDetection()
detector.setModelTypeAsRetinaNet()
detector.setModelPath(os.path.join(execution_path , "resnet50_coco_best_v2.0.1.h5"))
detector.loadModel()


video_path = detector.detectObjectsFromVideo(
                camera_input=camera,
                output_file_path=os.path.join(execution_path, "camera_detected_video"),
                frames_per_second=20, log_progress=True, minimum_percentage_probability=40)

###Video Analysis

ImageAI now provide commercial-grade video analysis in the Video Object Detection class, for both video file inputs and camera inputs. This feature allows developers to obtain deep insights into any video processed with ImageAI. This insights can be visualized in real-time, stored in a NoSQL database for future review or analysis.

For video analysis, the detectObjectsFromVideo() and detectCustomObjectsFromVideo() now allows you to state your own defined functions which will be executed for every frame, seconds and/or minute of the video detected as well as a state a function that will be executed at the end of a video detection. Once this functions are stated, they will receive raw but comprehensive analytical data on the index of the frame/second/minute, objects detected (name, percentage_probability and box_points), number of instances of each unique object detected and average number of occurrence of each unique object detected over a second/minute and entire video.

To obtain the video analysis, all you need to do is specify a function, state the corresponding parameters it will be receiving and parse the function name into the per_frame_function, per_second_function, per_minute_function and video_complete_function parameters in the detection function. Find below examples of video analysis functions.


def forFrame(frame_number, output_array, output_count):
    print("FOR FRAME " , frame_number)
    print("Output for each object : ", output_array)
    print("Output count for unique objects : ", output_count)
    print("------------END OF A FRAME --------------")

def forSeconds(second_number, output_arrays, count_arrays, average_output_count):
    print("SECOND : ", second_number)
    print("Array for the outputs of each frame ", output_arrays)
    print("Array for output count for unique objects in each frame : ", count_arrays)
    print("Output average count for unique objects in the last second: ", average_output_count)
    print("------------END OF A SECOND --------------")

def forMinute(minute_number, output_arrays, count_arrays, average_output_count):
    print("MINUTE : ", minute_number)
    print("Array for the outputs of each frame ", output_arrays)
    print("Array for output count for unique objects in each frame : ", count_arrays)
    print("Output average count for unique objects in the last minute: ", average_output_count)
    print("------------END OF A MINUTE --------------")

video_detector = VideoObjectDetection()
video_detector.setModelTypeAsYOLOv3()
video_detector.setModelPath(os.path.join(execution_path, "yolo.h5"))
video_detector.loadModel()

video_detector.detectObjectsFromVideo(
    input_file_path=os.path.join(execution_path, "traffic.mp4"),
    output_file_path=os.path.join(execution_path, "traffic_detected"),
    frames_per_second=10,
    per_second_function=forSeconds,
    per_frame_function=forFrame,
    per_minute_function=forMinute,
    minimum_percentage_probability=30
)

When the detection starts on a video feed, be it from a video file or camera input, the result will have the format as below:

Results for the Frame function

FOR FRAME : 1
 
Output for each object : [{'box_points': (362, 295, 443, 355), 'name': 'boat', 'percentage_probability': 26.666194200515747}, {'box_points': (319, 245, 386, 296), 'name': 'boat', 'percentage_probability': 30.052968859672546}, {'box_points': (219, 308, 341, 358), 'name': 'boat', 'percentage_probability': 47.46982455253601}, {'box_points': (589, 198, 621, 241), 'name': 'bus', 'percentage_probability': 24.62330162525177}, {'box_points': (519, 181, 583, 263), 'name': 'bus', 'percentage_probability': 27.446213364601135}, {'box_points': (493, 197, 561, 272), 'name': 'bus', 'percentage_probability': 59.81815457344055}, {'box_points': (432, 187, 491, 240), 'name': 'bus', 'percentage_probability': 64.42965269088745}, {'box_points': (157, 225, 220, 255), 'name': 'car', 'percentage_probability': 21.150341629981995}, {'box_points': (324, 249, 377, 293), 'name': 'car', 'percentage_probability': 24.089913070201874}, {'box_points': (152, 275, 260, 327), 'name': 'car', 'percentage_probability': 30.341443419456482}, {'box_points': (433, 198, 485, 244), 'name': 'car', 'percentage_probability': 37.205660343170166}, {'box_points': (184, 226, 233, 260), 'name': 'car', 'percentage_probability': 38.52525353431702}, {'box_points': (3, 296, 134, 359), 'name': 'car', 'percentage_probability': 47.80363142490387}, {'box_points': (357, 302, 439, 359), 'name': 'car', 'percentage_probability': 47.94844686985016}, {'box_points': (481, 266, 546, 314), 'name': 'car', 'percentage_probability': 65.8585786819458}, {'box_points': (597, 269, 624, 318), 'name': 'person', 'percentage_probability': 27.125394344329834}]
 
Output count for unique objects : {'bus': 4, 'boat': 3, 'person': 1, 'car': 8}

------------END OF A FRAME --------------

For any function you parse into the per_frame_function, the function will be executed after every single video frame is processed and he following will be parsed into it:

-Frame Index: This is the position number of the frame inside the video (e.g 1 for first frame and 20 for twentieth frame). -Output Array: This is an array of dictionaries. Each dictionary corresponds to each detected object in the image and it contains the "name", "percentage_probabaility" and "box_points"(x1,y1,x2,y2) values of the object. -Output Count: This is a dictionary that has the name of each unique object detected as its keys and the number of instances of the objects detected as the values. Results for the Second function

FOR SECOND : 1
 
 Array for the outputs of each frame [[{'box_points': (362, 295, 443, 355), 'name': 'boat', 'percentage_probability': 26.666194200515747}, {'box_points': (319, 245, 386, 296), 'name': 'boat', 'percentage_probability': 30.052968859672546}, {'box_points': (219, 308, 341, 358), 'name': 'boat', 'percentage_probability': 47.46982455253601}, {'box_points': (589, 198, 621, 241), 'name': 'bus', 'percentage_probability': 24.62330162525177}, {'box_points': (519, 181, 583, 263), 'name': 'bus', 'percentage_probability': 27.446213364601135}, {'box_points': (493, 197, 561, 272), 'name': 'bus', 'percentage_probability': 59.81815457344055}, {'box_points': (432, 187, 491, 240), 'name': 'bus', 'percentage_probability': 64.42965269088745}, {'box_points': (157, 225, 220, 255), 'name': 'car', 'percentage_probability': 21.150341629981995}, {'box_points': (324, 249, 377, 293), 'name': 'car', 'percentage_probability': 24.089913070201874}, {'box_points': (152, 275, 260, 327), 'name': 'car', 'percentage_probability': 30.341443419456482}, {'box_points': (433, 198, 485, 244), 'name': 'car', 'percentage_probability': 37.205660343170166}, {'box_points': (184, 226, 233, 260), 'name': 'car', 'percentage_probability': 38.52525353431702}, {'box_points': (3, 296, 134, 359), 'name': 'car', 'percentage_probability': 47.80363142490387}, {'box_points': (357, 302, 439, 359), 'name': 'car', 'percentage_probability': 47.94844686985016}, {'box_points': (481, 266, 546, 314), 'name': 'car', 'percentage_probability': 65.8585786819458}, {'box_points': (597, 269, 624, 318), 'name': 'person', 'percentage_probability': 27.125394344329834}],
 [{'box_points': (316, 240, 384, 302), 'name': 'boat', 'percentage_probability': 29.594269394874573}, {'box_points': (361, 295, 441, 354), 'name': 'boat', 'percentage_probability': 36.11513376235962}, {'box_points': (216, 305, 340, 357), 'name': 'boat', 'percentage_probability': 44.89373862743378}, {'box_points': (432, 198, 488, 244), 'name': 'truck', 'percentage_probability': 22.914741933345795}, {'box_points': (589, 199, 623, 240), 'name': 'bus', 'percentage_probability': 20.545457303524017}, {'box_points': (519, 182, 583, 263), 'name': 'bus', 'percentage_probability': 24.467085301876068}, {'box_points': (492, 197, 563, 271), 'name': 'bus', 'percentage_probability': 61.112016439437866}, {'box_points': (433, 188, 490, 241), 'name': 'bus', 'percentage_probability': 65.08989334106445}, {'box_points': (352, 303, 442, 357), 'name': 'car', 'percentage_probability': 20.025095343589783}, {'box_points': (136, 172, 188, 195), 'name': 'car', 'percentage_probability': 21.571354568004608}, {'box_points': (152, 276, 261, 326), 'name': 'car', 'percentage_probability': 33.07966589927673}, {'box_points': (181, 225, 230, 256), 'name': 'car', 'percentage_probability': 35.111838579177856}, {'box_points': (432, 198, 488, 244), 'name': 'car', 'percentage_probability': 36.25282347202301}, {'box_points': (3, 292, 130, 360), 'name': 'car', 'percentage_probability': 67.55480170249939}, {'box_points': (479, 265, 546, 314), 'name': 'car', 'percentage_probability': 71.47912979125977}, {'box_points': (597, 269, 625, 318), 'name': 'person', 'percentage_probability': 25.903674960136414}],................, 
[{'box_points': (133, 250, 187, 278), 'name': 'umbrella', 'percentage_probability': 21.518094837665558}, {'box_points': (154, 233, 218, 259), 'name': 'umbrella', 'percentage_probability': 23.687003552913666}, {'box_points': (348, 311, 425, 360), 'name': 'boat', 'percentage_probability': 21.015766263008118}, {'box_points': (11, 164, 137, 225), 'name': 'bus', 'percentage_probability': 32.20453858375549}, {'box_points': (424, 187, 485, 243), 'name': 'bus', 'percentage_probability': 38.043853640556335}, {'box_points': (496, 186, 570, 264), 'name': 'bus', 'percentage_probability': 63.83994221687317}, {'box_points': (588, 197, 622, 240), 'name': 'car', 'percentage_probability': 23.51653128862381}, {'box_points': (58, 268, 111, 303), 'name': 'car', 'percentage_probability': 24.538707733154297}, {'box_points': (2, 246, 72, 301), 'name': 'car', 'percentage_probability': 28.433072566986084}, {'box_points': (472, 273, 539, 323), 'name': 'car', 'percentage_probability': 87.17672824859619}, {'box_points': (597, 270, 626, 317), 'name': 'person', 'percentage_probability': 27.459821105003357}]
 ]
 
Array for output count for unique objects in each frame : [{'bus': 4, 'boat': 3, 'person': 1, 'car': 8},
 {'truck': 1, 'bus': 4, 'boat': 3, 'person': 1, 'car': 7},
 {'bus': 5, 'boat': 2, 'person': 1, 'car': 5},
 {'bus': 5, 'boat': 1, 'person': 1, 'car': 9},
 {'truck': 1, 'bus': 2, 'car': 6, 'person': 1},
 {'truck': 2, 'bus': 4, 'boat': 2, 'person': 1, 'car': 7},
 {'truck': 1, 'bus': 3, 'car': 7, 'person': 1, 'umbrella': 1},
 {'bus': 4, 'car': 7, 'person': 1, 'umbrella': 2},
 {'bus': 3, 'car': 6, 'boat': 1, 'person': 1, 'umbrella': 3},
 {'bus': 3, 'car': 4, 'boat': 1, 'person': 1, 'umbrella': 2}]
 
Output average count for unique objects in the last second: {'truck': 0.5, 'bus': 3.7, 'umbrella': 0.8, 'boat': 1.3, 'person': 1.0, 'car': 6.6}

------------END OF A SECOND --------------

In the above result, the video was processed and saved in 10 frames per second (FPS). For any function you parse into the per_second_function, the function will be executed after every single second of the video that is processed and he following will be parsed into it:

-Second Index: This is the position number of the second inside the video (e.g 1 for first second and 20 for twentieth second).
-Output Array: This is an array of arrays, with each contained array and its position (array index + 1) corresponding to the equivalent frame in the last second of the video (In the above example, their are 10 arrays which corresponds to the 10 frames contained in one second). Each contained array contains dictionaries. Each dictionary corresponds to each detected object in the image and it contains the "name", "percentage_probabaility" and "box_points"(x1,y1,x2,y2) values of the object.
-Count arrays: This is an array of dictionaries. Each dictionary and its position (array index + 1) corresponds to the equivalent frame in the last second of he video. Each dictionary has the name of each unique object detected as its keys and the number of instances of the objects detected as the values.
-Average Output Count: This is a dictionary that has the name of each unique object detected in the last second as its keys and the average number of instances of the objects detected across the number of frames as the values.

Results for the Minute function The above set of 4 parameters that are returned for every second of the video processed is the same parameters to that will be returned for every minute of the video processed. The difference is that the index returned corresponds to the minute index, the output_arrays is an array that contains the number of FPS * 60 number of arrays (in the code example above, 10 frames per second(fps) * 60 seconds = 600 frames = 600 arrays), and the count_arrays is an array that contains the number of FPS * 60 number of dictionaries (in the code example above, 10 frames per second(fps) * 60 seconds = 600 frames = 600 dictionaries) and the average_output_count is a dictionary that covers all the objects detected in all the frames contained in the last minute.

Results for the Video Complete Function ImageAI allows you to obtain complete analysis of the entire video processed. All you need is to define a function like the forSecond or forMinute function and set the video_complete_function parameter into your .detectObjectsFromVideo() or .detectCustomObjectsFromVideo() function. The same values for the per_second-function and per_minute_function will be returned. The difference is that no index will be returned and the other 3 values will be returned, and the 3 values will cover all frames in the video. Below is a sample function:

def forFull(output_arrays, count_arrays, average_output_count):
    #Perform action on the 3 parameters returned into the function

video_detector.detectObjectsFromVideo(
    input_file_path=os.path.join(execution_path, "traffic.mp4"),
    output_file_path=os.path.join(execution_path, "traffic_detected"),
    frames_per_second=10,
    video_complete_function=forFull,
    minimum_percentage_probability=30
)

FINAL NOTE ON VIDEO ANALYSIS : ImageAI allows you to obtain the detected video frame as a Numpy array at each frame, second and minute function. All you need to do is specify one more parameter in your function and set return_detected_frame=True in your detectObjectsFromVideo() or detectCustomObjectsFrom() function. Once this is set, the extra parameter you sepecified in your function will be the Numpy array of the detected frame. See a sample below:

def forFrame(frame_number, output_array, output_count, detected_frame):
    print("FOR FRAME " , frame_number)
    print("Output for each object : ", output_array)
    print("Output count for unique objects : ", output_count)
	print("Returned Objects is : ", type(detected_frame))
    print("------------END OF A FRAME --------------")

video_detector.detectObjectsFromVideo(
    input_file_path=os.path.join(execution_path, "traffic.mp4"),
    output_file_path=os.path.join(execution_path, "traffic_detected"),
    frames_per_second=10,
    per_frame_function=forFrame,
    minimum_percentage_probability=30,
    return_detected_frame=True
)

Video Detection Speed

ImageAI now provides detection speeds for all video object detection tasks. The detection speeds allow you to reduce the time of detection at a rate between 20% - 80%, and yet having just slight changes but accurate detection results. Coupled with lowering the minimum_percentage_probability parameter, detections can closely match the normal speed and yet reduce detection time drastically. The available detection speeds are "normal"(default), "fast", "faster" , "fastest" and "flash". All you need to do is to state the speed mode you desire when loading the model as seen below.

detector.loadModel(detection_speed="fast")

Frame Detection Intervals

The above video objects detection task are optimized for frame-real-time object detections that ensures that objects in every frame of the video is detected. ImageAI provides you the option to adjust the video frame detections which can speed up your video detection process. When calling the .detectObjectsFromVideo() or .detectCustomObjectsFromVideo(), you can specify at which frame interval detections should be made. By setting the frame_detection_interval parameter to be equal to 5 or 20, that means the object detections in the video will be updated after 5 frames or 20 frames. If your output video frames_per_second is set to 20, that means the object detections in the video will be updated once in every quarter of a second or every second. This is useful in case scenarious where the available compute is less powerful and speeds of moving objects are low. This ensures you can have objects detected as second-real-time , half-a-second-real-time or whichever way suits your needs. We conducted video object detection on the same input video we have been using all this while by applying a frame_detection_interval value equal to 5. The results below are obtained from detections performed on a NVIDIA K80 GPU. See the results and link to download the videos below:

Link1: https://drive.google.com/file/d/10m6kXlXWGOGc-IPw6TsKxBi-SXXOH9xK/view
Link2: https://drive.google.com/open?id=17934YONVSXvd4uuJE0KwenEFks7fFYe4
Link3: https://drive.google.com/open?id=1cs_06CuhXDvZp3fHJWFpam-31eclOhc-

Video Detection Timeout

ImageAI now allows you to set a timeout in seconds for detection of objects in videos or camera live feed. To set a timeout for your video detection code, all you need to do is specify the detection_timeout parameter in the detectObjectsFromVideo() function to the number of desired seconds. In the example code below, we set detection_timeout to 120 seconds (2 minutes).

from imageai.Detection import VideoObjectDetection
import os
import cv2

execution_path = os.getcwd()
camera = cv2.VideoCapture(0)

detector = VideoObjectDetection()
detector.setModelTypeAsRetinaNet()
detector.setModelPath(os.path.join(execution_path , "resnet50_coco_best_v2.0.1.h5"))
detector.loadModel()


video_path = detector.detectObjectsFromVideo(camera_input=camera,
                                             output_file_path=os.path.join(execution_path, "camera_detected_video"),
                                             frames_per_second=20,
                                             log_progress=True,
                                             minimum_percentage_probability=40,
                                             detection_timeout=120)

ImageAI : Custom Detection Model Training

ImageAI provides the most simple and powerful approach to training custom object detection models using the YOLOv3 architeture, which which you can load into the imageai.Detection.Custom.CustomObjectDetection class. This allows you to train your own model on any set of images that corresponds to any type of objects of interest. The training process generates a JSON file that maps the objects names in your image dataset and the detection anchors, as well as creates lots of models. In choosing the best model for your custom object detection task, an evaluateModel() function has been provided to compute the mAP of your saved models by allowing you to state your desired IoU and Non-maximum Suppression values. Then you can perform custom object detection using the model and the JSON file generated.

Preparing your custom dataset
Training on your custom Dataset
Evaluating your saved detection models' mAP

Preparing your custom datasets

To train a custom detection model, you need to prepare the images you want to use to train the model. You will prepare the images as follows:

Decide the type of object(s) you want to detect and collect about 200 (minimum recommendation) or more picture of each of the object(s)
Once you have collected the images, you need to annotate the object(s) in the images. ImageAI uses the Pascal VOC format for image annotation. You can generate this annotation for your images using the easy to use LabelImg image annotation tool, available for Windows, Linux and MacOS systems. Open the link below to install the annotation tool. See: https://github.com/tzutalin/labelImg
When you are done annotating your images, annotation XML files will be generated for each image in your dataset. The annotation XML file describes each or all of the objects in the image. For example, if each image your image names are image(1).jpg, image(2).jpg, image(3).jpg till image(z).jpg; the corresponding annotation for each of the images will be image(1).xml, image(2).xml, image(3).xml till image(z).xml.This is to reference or map the objects for detection in json.
Once you have the annotations for all your images, create a folder for your dataset (E.g headsets) and in this parent folder, create child folders train and validation
In the train folder, create images and annotations sub-folders. Put about 70-80% of your dataset of each object's images in the images folder and put the corresponding annotations for these images in the annotations folder.
In the validation folder, create images and annotations sub-folders. Put the rest of your dataset images in the images folder and put the corresponding annotations for these images in the annotations folder.
Once you have done this, the structure of your image dataset folder should look like below:

>> train    >> images       >> img_1.jpg  (shows Object_1)
            >> images       >> img_2.jpg  (shows Object_2)
            >> images       >> img_3.jpg  (shows Object_1, Object_3 and Object_n)
            >> annotations  >> img_1.xml  (describes Object_1)
            >> annotations  >> img_2.xml  (describes Object_2)
            >> annotations  >> img_3.xml  (describes Object_1, Object_3 and Object_n)

>> validation   >> images       >> img_151.jpg (shows Object_1, Object_3 and Object_n)
                >> images       >> img_152.jpg (shows Object_2)
                >> images       >> img_153.jpg (shows Object_1)
                >> annotations  >> img_151.xml (describes Object_1, Object_3 and Object_n)
                >> annotations  >> img_152.xml (describes Object_2)
                >> annotations  >> img_153.xml (describes Object_1)

You can train your custom detection model completely from scratch or use transfer learning (recommended for better accuracy) from a pre-trained YOLOv3 model. Also, we have provided a sample annotated Hololens and Headsets (Hololens and Oculus) dataset for you to train with. Download the pre-trained YOLOv3 model and the sample datasets in the link below.

Link to source code of ImageAI by Moses Olfenwa- https://github.com/OlafenwaMoses/ImageAI/releases/tag/essential-v4

Training on your custom dataset

Before you start training your custom detection model, kindly take note of the following:

The default batch_size is 4. If you are training with Google Colab, this will be fine. However, I will advice you use a more powerful GPU than the K80 offered by Colab as the higher your batch_size (8, 16), the better the accuracy of your detection model.
If you experience '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string' error in Google Colab, it is due to a bug in Tensorflow. You can solve this by installing Tensorflow GPU 1.13.1.

pip3 install tensorflow-gpu==1.13.1

Then your training code goes as follows:

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")
trainer.setTrainConfig(object_names_array=["hololens"], batch_size=4, num_experiments=200, train_from_pretrained_model="pretrained-yolov3.h5")
# In the above,when training for detecting multiple objects,
#set object_names_array=["object1", "object2", "object3",..."objectz"]
trainer.trainModel()

In the line above, we configured our detection model trainer. The parameters we stated in the function as as below:

num_objects : this is an array containing the names of the objects in our dataset
batch_size : this is to state the batch size for the training
num_experiments : this is to state the number of times the network will train over all the training images, which is also called epochs
train_from_pretrained_model(optional) : this is to train using transfer learning from a pre-trained YOLOv3 model

trainer.trainModel()

When you start the training, you should see something like this in the console:

Using TensorFlow backend.
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.78
Anchor Boxes generated.
Detection configuration saved in  hololens/json/detection_config.json
Training on: 	['hololens']
Training with Batch Size:  4
Number of Experiments:  200

Epoch 1/200
 - 733s - loss: 34.8253 - yolo_layer_1_loss: 6.0920 - yolo_layer_2_loss: 11.1064 - yolo_layer_3_loss: 17.6269 - val_loss: 20.5028 - val_yolo_layer_1_loss: 4.0171 - val_yolo_layer_2_loss: 7.5175 - val_yolo_layer_3_loss: 8.9683
Epoch 2/200
 - 648s - loss: 11.1396 - yolo_layer_1_loss: 2.1209 - yolo_layer_2_loss: 4.0063 - yolo_layer_3_loss: 5.0124 - val_loss: 7.6188 - val_yolo_layer_1_loss: 1.8513 - val_yolo_layer_2_loss: 2.2446 - val_yolo_layer_3_loss: 3.5229
Epoch 3/200
 - 674s - loss: 6.4360 - yolo_layer_1_loss: 1.3500 - yolo_layer_2_loss: 2.2343 - yolo_layer_3_loss: 2.8518 - val_loss: 7.2326 - val_yolo_layer_1_loss: 1.8762 - val_yolo_layer_2_loss: 2.3802 - val_yolo_layer_3_loss: 2.9762
Epoch 4/200
 - 634s - loss: 5.3801 - yolo_layer_1_loss: 1.0323 - yolo_layer_2_loss: 1.7854 - yolo_layer_3_loss: 2.5624 - val_loss: 6.3730 - val_yolo_layer_1_loss: 1.4272 - val_yolo_layer_2_loss: 2.0534 - val_yolo_layer_3_loss: 2.8924
Epoch 5/200
 - 645s - loss: 5.2569 - yolo_layer_1_loss: 0.9953 - yolo_layer_2_loss: 1.8611 - yolo_layer_3_loss: 2.4005 - val_loss: 6.0458 - val_yolo_layer_1_loss: 1.7037 - val_yolo_layer_2_loss: 1.9754 - val_yolo_layer_3_loss: 2.3667
Epoch 6/200
 - 655s - loss: 4.7582 - yolo_layer_1_loss: 0.9959 - yolo_layer_2_loss: 1.5986 - yolo_layer_3_loss: 2.1637 - val_loss: 5.8313 - val_yolo_layer_1_loss: 1.1880 - val_yolo_layer_2_loss: 1.9962 - val_yolo_layer_3_loss: 2.6471
Epoch 7/200

Using TensorFlow backend.
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.78
Anchor Boxes generated.
Detection configuration saved in  hololens/json/detection_config.json
Training on: 	['hololens']
Training with Batch Size:  4
Number of Experiments:  200

The above details signifies the following:

ImageAI autogenerates the best match detection anchor boxes for your image dataset.
The anchor boxes and the object names mapping are saved in json/detection_config.json path of in the image dataset folder. Please note that for every new training you start, a new detection_config.json file is generated and is only compatible with the model saved during that training.

Epoch 1/200
 - 733s - loss: 34.8253 - yolo_layer_1_loss: 6.0920 - yolo_layer_2_loss: 11.1064 - yolo_layer_3_loss: 17.6269 - val_loss: 20.5028 - val_yolo_layer_1_loss: 4.0171 - val_yolo_layer_2_loss: 7.5175 - val_yolo_layer_3_loss: 8.9683
Epoch 2/200
 - 648s - loss: 11.1396 - yolo_layer_1_loss: 2.1209 - yolo_layer_2_loss: 4.0063 - yolo_layer_3_loss: 5.0124 - val_loss: 7.6188 - val_yolo_layer_1_loss: 1.8513 - val_yolo_layer_2_loss: 2.2446 - val_yolo_layer_3_loss: 3.5229
Epoch 3/200
 - 674s - loss: 6.4360 - yolo_layer_1_loss: 1.3500 - yolo_layer_2_loss: 2.2343 - yolo_layer_3_loss: 2.8518 - val_loss: 7.2326 - val_yolo_layer_1_loss: 1.8762 - val_yolo_layer_2_loss: 2.3802 - val_yolo_layer_3_loss: 2.9762
Epoch 4/200
 - 634s - loss: 5.3801 - yolo_layer_1_loss: 1.0323 - yolo_layer_2_loss: 1.7854 - yolo_layer_3_loss: 2.5624 - val_loss: 6.3730 - val_yolo_layer_1_loss: 1.4272 - val_yolo_layer_2_loss: 2.0534 - val_yolo_layer_3_loss: 2.8924
Epoch 5/200
 - 645s - loss: 5.2569 - yolo_layer_1_loss: 0.9953 - yolo_layer_2_loss: 1.8611 - yolo_layer_3_loss: 2.4005 - val_loss: 6.0458 - val_yolo_layer_1_loss: 1.7037 - val_yolo_layer_2_loss: 1.9754 - val_yolo_layer_3_loss: 2.3667
Epoch 6/200
 - 655s - loss: 4.7582 - yolo_layer_1_loss: 0.9959 - yolo_layer_2_loss: 1.5986 - yolo_layer_3_loss: 2.1637 - val_loss: 5.8313 - val_yolo_layer_1_loss: 1.1880 - val_yolo_layer_2_loss: 1.9962 - val_yolo_layer_3_loss: 2.6471
Epoch 7/200

The above signifies the progress of the training.
For each experiment (Epoch), the general total validation loss (E.g - loss: 4.7582) is reported.
For each drop in the loss after an experiment, a model is saved in the hololens/models folder. The lower the loss, the better the model.

Once you are done training, you can visit the link below for performing object detection with your custom detection model and detection_config.json file.

Evaluating your saved detection models' mAP

After training on your custom dataset, you can evaluate the mAP of your saved models by specifying your desired IoU and Non-maximum suppression values. See details as below:

Single Model Evaluation: To evaluate a single model, simply use the example code below with the path to your dataset directory, the model file and the detection_config.json file saved during the training. In the example, we used an object_threshold of 0.3 ( percentage_score >= 30% ), IoU of 0.5 and Non-maximum suppression value of 0.5.

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")
metrics = trainer.evaluateModel(model_path="detection_model-ex-60--loss-2.76.h5", json_path="detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold=0.5)

Consider that trainer.evaluateModel method will show the metrics on standard output as shown below, but also returns a list of dicts containing all the information that is displayed.

Sample Result:

Model File:  hololens_detection_model-ex-09--loss-4.01.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9613
mAP: 0.9613
===============================

Let's see how those metrics looks like:

[{
    'average_precision': {'hololens': 0.9613334437735249},
    'map': 0.9613334437735249,
    'model_file': 'hololens_detection_model-ex-09--loss-4.01.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
}]

Multi Model Evaluation: To evaluate all your saved models, simply parse in the path to the folder containing the models as the model_path as seen in the example below:

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")
metrics = trainer.evaluateModel(model_path="hololens/models", json_path="hololens/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold=0.5)

Sample Result:

Model File:  hololens/models/detection_model-ex-07--loss-4.42.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9231
mAP: 0.9231
===============================
Model File:  hololens/models/detection_model-ex-10--loss-3.95.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9725
mAP: 0.9725
===============================
Model File:  hololens/models/detection_model-ex-05--loss-5.26.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9204
mAP: 0.9204
===============================
Model File:  hololens/models/detection_model-ex-03--loss-6.44.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.8120
mAP: 0.8120
===============================
Model File:  hololens/models/detection_model-ex-18--loss-2.96.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9431
mAP: 0.9431
===============================
Model File:  hololens/models/detection_model-ex-17--loss-3.10.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9404
mAP: 0.9404
===============================
Model File:  hololens/models/detection_model-ex-08--loss-4.16.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9725
mAP: 0.9725
===============================
Let's see how those metrics looks like:

[{
    'average_precision': {'hololens': 0.9231334437735249},
    'map': 0.9231334437735249,
    'model_file': 'hololens/models/detection_model-ex-07--loss-4.42.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.9725334437735249},
    'map': 0.97251334437735249,
    'model_file': 'hololens/models/detection_model-ex-10--loss-3.95.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.92041334437735249},
    'map': 0.92041334437735249,
    'model_file': 'hololens/models/detection_model-ex-05--loss-5.26.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.81201334437735249},
    'map': 0.81201334437735249,
    'model_file': 'hololens/models/detection_model-ex-03--loss-6.44.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.94311334437735249},
    'map': 0.94311334437735249,
    'model_file': 'hololens/models/detection_model-ex-18--loss-2.96.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.94041334437735249},
    'map': 0.94041334437735249,
    'model_file': 'hololens/models/detection_model-ex-17--loss-3.10.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.97251334437735249},
    'map': 0.97251334437735249,
    'model_file': 'hololens/models/detection_model-ex-08--loss-4.16.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
}
]

ImageAI : Custom Object Detection

Custom Object Detection
Object Detection, Extraction and Fine-tune
Hiding/Showing Object Name and Probability
Image Input & Output Types

ImageAI provides very convenient and powerful methods to perform object detection on images and extract each object from the image using your own custom YOLOv3 model and the corresponding detection_config.json generated during the training. To test the custom object detection, you can download a sample custom model we have trained to detect the Hololens headset and its detection_config.json file via the links below:

- https://github.com/OlafenwaMoses/ImageAI/releases/download/essential-v4/hololens-ex-60--loss-2.76.h5
- https://github.com/OlafenwaMoses/ImageAI/releases/download/essential-v4/detection_config.json

Once you download the custom object detection model file, you should copy the model file to the your project folder where your .py files will be. Then create a python file and give it a name; an example is FirstCustomDetection.py. Then write the code below into the python file:

FirstCustomDetection.py

from imageai.Detection.Custom import CustomObjectDetection

detector = CustomObjectDetection()
detector.setModelTypeAsYOLOv3()
detector.setModelPath("hololens-ex-60--loss-2.76.h5")
detector.setJsonPath("detection_config.json")
detector.loadModel()
detections = detector.detectObjectsFromImage(input_image="holo2.jpg", output_image_path="holo2-detected.jpg")
for detection in detections:
   print(detection["name"], " : ", detection["percentage_probability"], " : ", detection["box_points"])

Object Detection, Extraction and Fine-tune

from imageai.Detection.Custom import CustomObjectDetection

detector = CustomObjectDetection()
detector.setModelTypeAsYOLOv3()
detector.setModelPath("hololens-ex-60--loss-2.76.h5")
detector.setJsonPath("detection_config.json") 
detector.loadModel()
detections, extracted_objects_array = detector.detectObjectsFromImage(input_image="holo2.jpg", output_image_path="holo2-detected.jpg", extract_detected_objects=True)

for detection, object_path in zip(detections, extracted_objects_array):
    print(object_path)
    print(detection["name"], " : ", detection["percentage_probability"], " : ", detection["box_points"])
    print("---------------")

Hiding/Showing Object Name and Probability

detections = detector.detectObjectsFromImage(input_image=os.path.join(execution_path , "holo2.jpg"), output_image_path=os.path.join(execution_path , "holo2_nodetails.jpg"), minimum_percentage_probability=30, display_percentage_probability=False, display_object_name=False)

Image Input & Output Types

ImageAI custom object detection supports 2 input types of inputs which are file path to image file(default) and numpy array of an image as well as 2 types of output which are image file(default) and numpy **array **. This means you can now perform object detection in production applications such as on a web server and system that returns file in any of the above stated formats. To perform object detection with numpy array input, you just need to state the input type in the .detectObjectsFromImage() function. See example below.

detections = detector.detectObjectsFromImage(input_type="array", input_image=image_array , output_image_path=os.path.join(execution_path , "holo2-detected.jpg")) # For numpy array input type

To perform object detection with numpy array output you just need to state the output type in the .detectObjectsFromImage() function. See example below.

detected_image_array, detections = detector.detectObjectsFromImage(output_type="array", input_image="holo2.jpg" ) # For numpy array output type

ImageAI : Custom Video Object Detection, Tracking and Analysis

First Custom Video Object Detection
Camera / Live Stream Video Detection
Video Analysis
Hiding/Showing Object Name and Probability
Frame Detection Intervals
Video Detection Timeout (NEW)

ImageAI provides convenient, flexible and powerful methods to perform object detection on videos using your own custom YOLOv3 model and the corresponding detection_config.json generated during the training.

Because video object detection is a compute intensive tasks, we advise you perform this experiment using a computer with a NVIDIA GPU and the GPU version of Tensorflow installed. Performing Video Object Detection CPU will be slower than using an NVIDIA GPU powered computer. You can use Google Colab for this experiment as it has an NVIDIA K80 GPU available for free.

FirstCustomVideoObjectDetection.py


from imageai.Detection.Custom import CustomVideoObjectDetection
import os

execution_path = os.getcwd()

video_detector = CustomVideoObjectDetection()
video_detector.setModelTypeAsYOLOv3()
video_detector.setModelPath("hololens-ex-60--loss-2.76.h5")
video_detector.setJsonPath("detection_config.json")
video_detector.loadModel()

video_detector.detectObjectsFromVideo(input_file_path="holo1.mp4",
                                          output_file_path=os.path.join(execution_path, "holo1-detected3"),
                                          frames_per_second=20,
                                          minimum_percentage_probability=40,
                                          log_progress=True)

Camera / Live Stream Video Detection

ImageAI now allows live-video detection with support for camera inputs. Using OpenCV's VideoCapture() function, you can load live-video streams from a device camera, cameras connected by cable or IP cameras, and parse it into ImageAI's detectObjectsFromVideo() function. All features that are supported for detecting objects in a video file is also available for detecting objects in a camera's live-video feed. Find below an example of detecting live-video feed from the device camera.

from imageai.Detection import VideoObjectDetection
import os
import cv2

execution_path = os.getcwd()
camera = cv2.VideoCapture(0)

video_detector = CustomVideoObjectDetection()
video_detector.setModelTypeAsYOLOv3()
video_detector.setModelPath("hololens-ex-60--loss-2.76.h5")
video_detector.setJsonPath("detection_config.json")
video_detector.loadModel()

video_detector.detectObjectsFromVideo(camera_input=camera,
                                          output_file_path=os.path.join(execution_path, "holo1-detected3"),
                                          frames_per_second=20,
                                          minimum_percentage_probability=40,
                                          log_progress=True)

The difference in the code above and the code for the detection of a video file is that we defined an OpenCV VideoCapture instance and loaded the default device camera into it. Then we parsed the camera we defined into the parameter camera_input which replaces the input_file_path that is used for video file.

Video Analysis ImageAI now provide commercial-grade video analysis in the Custom Video Object Detection class, for both video file inputs and camera inputs. This feature allows developers to obtain deep insights into any video processed with ImageAI. This insights can be visualized in real-time, stored in a NoSQL database for future review or analysis.

For video analysis, the detectObjectsFromVideo() now allows you to state your own defined functions which will be executed for every frame, seconds and/or minute of the video detected as well as a state a function that will be executed at the end of a video detection. Once this functions are stated, they will receive raw but comprehensive analytical data on the index of the frame/second/minute, objects detected (name, percentage_probability and box_points), number of instances of each unique object detected and average number of occurrence of each unique object detected over a second/minute and entire video.

To obtain the video analysis, all you need to do is specify a function, state the corresponding parameters it will be receiving and parse the function name into the per_frame_function, per_second_function, per_minute_function and video_complete_function parameters in the detection function. Find below examples of video analysis functions.

def forFrame(frame_number, output_array, output_count):
    print("FOR FRAME " , frame_number)
    print("Output for each object : ", output_array)
    print("Output count for unique objects : ", output_count)
    print("------------END OF A FRAME --------------")

def forSeconds(second_number, output_arrays, count_arrays, average_output_count):
    print("SECOND : ", second_number)
    print("Array for the outputs of each frame ", output_arrays)
    print("Array for output count for unique objects in each frame : ", count_arrays)
    print("Output average count for unique objects in the last second: ", average_output_count)
    print("------------END OF A SECOND --------------")

def forMinute(minute_number, output_arrays, count_arrays, average_output_count):
    print("MINUTE : ", minute_number)
    print("Array for the outputs of each frame ", output_arrays)
    print("Array for output count for unique objects in each frame : ", count_arrays)
    print("Output average count for unique objects in the last minute: ", average_output_count)
    print("------------END OF A MINUTE --------------")

video_detector = CustomVideoObjectDetection()
video_detector.setModelTypeAsYOLOv3()
video_detector.setModelPath("hololens-ex-60--loss-2.76.h5")
video_detector.setJsonPath("detection_config.json")
video_detector.loadModel()

video_detector.detectObjectsFromVideo(camera_input=camera,
                                          output_file_path=os.path.join(execution_path, "holo1-detected3"),
                                          frames_per_second=20, per_second_function=forSeconds, per_frame_function = forFrame, per_minute_function= forMinute,
                                          minimum_percentage_probability=40,
                                          log_progress=True)

ImageAI also allows you to obtain complete analysis of the entire video processed. All you need is to define a function like the forSecond or forMinute function and set the video_complete_function parameter into your .detectObjectsFromVideo() function. The same values for the per_second-function and per_minute_function will be returned. The difference is that no index will be returned and the other 3 values will be returned, and the 3 values will cover all frames in the video. Below is a sample function:

def forFull(output_arrays, count_arrays, average_output_count):
    #Perform action on the 3 parameters returned into the function


video_detector.detectObjectsFromVideo(camera_input=camera,
                                          output_file_path=os.path.join(execution_path, "holo1-detected3"),
                                          video_complete_function=forFull,
                                          minimum_percentage_probability=40,
                                          log_progress=True)

FINAL NOTE ON VIDEO ANALYSIS : ImageAI allows you to obtain the detected video frame as a Numpy array at each frame, second and minute function. All you need to do is specify one more parameter in your function and set return_detected_frame=True in your detectObjectsFromVideo() function. Once this is set, the extra parameter you sepecified in your function will be the Numpy array of the detected frame. See a sample below:

def forFrame(frame_number, output_array, output_count, detected_frame):
    print("FOR FRAME " , frame_number)
    print("Output for each object : ", output_array)
    print("Output count for unique objects : ", output_count)
	print("Returned Objects is : ", type(detected_frame))
    print("------------END OF A FRAME --------------")


video_detector.detectObjectsFromVideo(camera_input=camera,
                                          output_file_path=os.path.join(execution_path, "holo1-detected3"),
                                          per_frame_function=forFrame,
                                          minimum_percentage_probability=40,
                                          log_progress=True, return_detected_frame=True)

Frame Detection Intervals

The above video objects detection task are optimized for frame-real-time object detections that ensures that objects in every frame of the video is detected. ImageAI provides you the option to adjust the video frame detections which can speed up your video detection process. When calling the .detectObjectsFromVideo(), you can specify at which frame interval detections should be made. By setting the frame_detection_interval parameter to be equal to 5 or 20, that means the object detections in the video will be updated after 5 frames or 20 frames. If your output video frames_per_second is set to 20, that means the object detections in the video will be updated once in every quarter of a second or every second. This is useful in case scenarios where the available compute is less powerful and speeds of moving objects are low. This ensures you can have objects detected as second-real-time , half-a-second-real-time or whichever way suits your needs.

Custom Video Detection Timeout

ImageAI now allows you to set a timeout in seconds for detection of objects in videos or camera live feed. To set a timeout for your video detection code, all you need to do is specify the detection_timeout parameter in the detectObjectsFromVideo() function to the number of desired seconds. In the example code below, we set detection_timeout to 120 seconds (2 minutes).


from imageai.Detection import VideoObjectDetection
import os
import cv2

execution_path = os.getcwd()
camera = cv2.VideoCapture(0)

video_detector = CustomVideoObjectDetection()
video_detector.setModelTypeAsYOLOv3()
video_detector.setModelPath("hololens-ex-60--loss-2.76.h5")
video_detector.setJsonPath("detection_config.json")
video_detector.loadModel()

video_detector.detectObjectsFromVideo(camera_input=camera,
                                          output_file_path=os.path.join(execution_path, "holo1-detected3"),
                                          frames_per_second=20,  minimum_percentage_probability=40,
                                          detection_timeout=120)

We have trained some logos on Nike and Adidas through ImageAI for detection.If any queries,please contact below-

Shilpa Jagadeeshappa

-Email: shilpatc25@gmail.com

-Facebook: https://www.facebook.com/shilpasj.25

-Github: https://github.com/ShilpaJagadeeshappa

-LinkedIn: https://www.linkedin.com/in/shilpa-s-j-5b0792130/

References

https://github.com/OlafenwaMoses/ImageAI
Somshubra Majumdar, DenseNet Implementation of the paper, Densely Connected Convolutional Networks in Keras https://github.com/titu1994/DenseNet
Broad Institute of MIT and Harvard, Keras package for deep residual networks https://github.com/broadinstitute/keras-resnet
Experiencor, Training and Detecting Objects with YOLO3 https://github.com/experiencor/keras-yolo3
https://www.tensorflow.org/guide
https://pythonprogramming.net/introduction-use-tensorflow-object-detection-api-tutorial/
O Russakovsky et al, ImageNet Large Scale Visual Recognition Challenge https://arxiv.org/abs/1409.0575
Fizyr, Keras implementation of RetinaNet object detection https://github.com/fizyr/keras-retinanet
Forrest N. et al, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size https://arxiv.org/abs/1602.07360
Francois Chollet, Keras code and weights files for popular deeplearning models https://github.com/fchollet/deep-learning-models
Kaiming H. et al, Deep Residual Learning for Image Recognition https://arxiv.org/abs/1512.03385
Szegedy. et al, Rethinking the Inception Architecture for Computer Vision https://arxiv.org/abs/1512.00567
Gao. et al, Densely Connected Convolutional Networks https://arxiv.org/abs/1608.06993
TY Lin et al, Microsoft COCO: Common Objects in Context https://arxiv.org/abs/1405.0312
Moses & John Olafenwa, A collection of images of identifiable professionals. https://github.com/OlafenwaMoses/IdenProf
Joseph Redmon and Ali Farhadi, YOLOv3: An Incremental Improvement. https://arxiv.org/abs/1804.02767
Tsung-Yi. et al, Focal Loss for Dense Object Detection https://arxiv.org/abs/1708.02002
https://towardsdatascience.com/creating-your-own-object-detector-ad69dda69c85

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
images		images
README.md		README.md
_config.yml		_config.yml
test1.jpg		test1.jpg
test10_Out - Copy.jpg		test10_Out - Copy.jpg
test11.jpg		test11.jpg
test12 - Copy.jpg		test12 - Copy.jpg
test13_Out.jpg		test13_Out.jpg
test1_Out.jpg		test1_Out.jpg
test2.mp4		test2.mp4
test2_Out.jpg		test2_Out.jpg
test2_out.avi		test2_out.avi
test3.mp4		test3.mp4
test3_Out.jpg		test3_Out.jpg
test3_out.avi		test3_out.avi
test4_Out.jpg		test4_Out.jpg
test5_Out.jpg		test5_Out.jpg
test6_Out - Copy.jpg		test6_Out - Copy.jpg
test7_Out.jpg		test7_Out.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Object-Detection

Introduction

Image Prediction

ImageAI : Custom Prediction Model Training

ImageAI : Video Object Detection, Tracking and Analysis

ImageAI : Custom Detection Model Training

Preparing your custom datasets

Training on your custom dataset

Evaluating your saved detection models' mAP

ImageAI : Custom Object Detection

ImageAI : Custom Video Object Detection, Tracking and Analysis

About

Uh oh!

Releases

Packages

Pythonbratty/Object-Detection

Folders and files

Latest commit

History

Repository files navigation

Object-Detection

Introduction

Image Prediction

ImageAI : Custom Prediction Model Training

ImageAI : Video Object Detection, Tracking and Analysis

ImageAI : Custom Detection Model Training

Preparing your custom datasets

Training on your custom dataset

Evaluating your saved detection models' mAP

ImageAI : Custom Object Detection

ImageAI : Custom Video Object Detection, Tracking and Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages