# Object Detection with the Custom Vision Service

In the world of computer vision, *object detection* is the next step from *image classification*. In image classification, your model classifies images based on their content. In object detection, the model determines the location of one or more objects in an image - typically predicting the coordinates of a *bounding box* that surrounds each object in the image.

In this notebook, we'll build a model that can detect apples and carrots.

*Some of the images used in the lab are sourced from the free image library at <a href='http://www.pachd.com' target='_blank'>www.pachd.com</a>*

## Frameworks for Object Detection

There are many ways to create an object detection model, including machine learning frameworks like TensorFlow and the Microsoft Cognitive Toolkit (CNTK). The Microsoft *Custom Vision* cognitive service also provides an API that you can use for object detection; and that's what we'll use in this notebook.

First, we need to install the SDK:

In [None]:
# Install the Custom Vision SDK
! pip install azure-cognitiveservices-vision-customvision

## Sign up for a Custom Vision service account
Now you're ready to use the Custom Vision service. You'll need to sign up for an account and get your unique training and prediction keys so you can access it:
1. If you don't already have a Microsoft Azure subscription, sign up at https://azure.microsoft.com/en-us/. 
2. Go to https://customvision.ai/ and sign in using your Microsoft account. Then create a new custom vision service account in your Azure subscription.
3. Click the *Settings* (&#9881;) icon at the top right to view *training* and  *prediction* key's endpoints, and resource Ids. Then assign the appropriate values to the variables below and run the cell:

In [None]:
TRAINING_KEY = 'YOUR_TRAINING_KEY'
PREDICTION_KEY = 'YOUR_PREDICTION_KEY'
ENDPOINT='https://YOUR_REGION.api.cognitive.microsoft.com' # Use just the base URL - https://<region>.api.cognitive.microsoft.com
PREDICTION_RESOURCE_ID="/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/YOUR_RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/YOUR_ACCOUNT_Prediction"

## Create an Object Detection project
Next, you need to create a new Cutom Vision project for Object Detection.

In [None]:
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.training.models import ImageFileCreateEntry, Region

trainer = CustomVisionTrainingClient(TRAINING_KEY, endpoint=ENDPOINT)

# Find the object detection domain
obj_detection_domain = next(domain for domain in trainer.get_domains() if domain.type == "ObjectDetection")

# Create a new project
print ("Creating project...")
project = trainer.create_project("Produce Detection", domain_id=obj_detection_domain.id)
print("Created project!")

## Create tags
Just as with image classification, we need to define the tags that identify the different types of object class our model will detect - in this case, apples and carrots.

In [None]:
print("Creating tags...")
apple_tag = trainer.create_tag(project.id, "Apple")
carrot_tag = trainer.create_tag(project.id, "Carrot")
print('Created tags')

## Define object regions in training images
The key difference between object detection and image classification is that object detection predicts not just the *class* (tag) of the objects in an image, but also a bounding box that encloses their location. To accomplish this, you need to train the model using not just images but also the coordinates of the bounding boxes in the training images.

There are various tools you can use to tag the bounding boxes in images, including the <a href='https://github.com/Microsoft/VoTT' target='_blank'>Visual Object Tagging Tool (VoTT)</a>.

In this example, the images have already been processed and the coordinates of bounding boxes for the apple and carrot images they contain are stored in text files. Run the following cell to view the first few rows from each file.

In [None]:
import pandas as pd
df=pd.read_csv('data/apples.txt', sep=',',header=None)
print('Apples:')
print(df.head())
df=pd.read_csv('data/carrots.txt', sep=',',header=None)
print('Carrots:')
print(df.head())

The text files contain the following information for each tagged object:
* The file name
* The x pixel coordinate of the top-left corner
* The y pixel coordinate of the top-left corner
* The x pixel coordinate of the bottom-right corner
* The y pixel coordinate of the bottom-right corner

## Prepare the training data
Depending on the framework you are using to train the model, you will need to convert the image bounding box data into the approriate format. In this case, we're using the Custom Vision service; which needs the following data for each object in the training files:
* The tag ID that defines the class of the object.
* The normalized left pixel location
* The normalized top pixel location
* The normalized width of the object
* The normalized height of the object

> The *normalized* pixel values are calculated as the proportion of the actual image size between 0 and 1.

The following cell defines a function to calculate the normalized bounding box coordinates and dimensions for each objects in a text file, and add them to a list of all objects. This function is then called for the apple and carrot files to define the complete collection of tagged objects.

In [None]:
def tag_images(txtFile, tag):
    import pandas as pd
    import numpy as np
    from matplotlib import image as mpimg

    print ("Tagging images from", txtFile)
    df=pd.read_csv(txtFile, sep=',',header=None)
    for row in df.values:
        file_name,l,t,r,b = row
        print(file_name)
        #normalize values
        img = mpimg.imread(file_name)
        img_h, img_w, img_ch = img.shape
        l = l / img_w
        r = r / img_w
        t = t/img_h
        b = b / img_h

        w = r-l
        h = b-t

        regions = [Region(tag_id=tag, left=l,top=t,width=w,height=h) ]

        with open(file_name, mode="rb") as image_contents:
            tagged_images_with_regions.append(ImageFileCreateEntry(name=file_name, contents=image_contents.read(), regions=regions))


tagged_images_with_regions = []
tag_images('data/apples.txt', apple_tag.id)
tag_images('data/carrots.txt', carrot_tag.id)
print('Adding images to project...')
trainer.create_images_from_files(project.id, images=tagged_images_with_regions)
print("Images added!")

## Train the project
Now that we have the object class and location information, we can train the project and create a predictive model.

In [None]:
import time

print ("Training...")
iteration = trainer.train_project(project.id)
while (iteration.status != "Completed"):
    iteration = trainer.get_iteration(project.id, iteration.id)
    print ("Training status: " + iteration.status)
    time.sleep(1)

# The iteration is now trained. Publish it to the project endpoint
trainer.publish_iteration(project.id, iteration.id, "First Iteration", PREDICTION_RESOURCE_ID)

# Make it the default iteration
iteration = trainer.update_iteration(project_id= project.id, iteration_id=iteration.id, name= "First Iteration", is_default=True)

print ("Trained!")

## Detect objects in a new image
Now that you have a trained model, you can use it to find objects in new images.

In [None]:
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from matplotlib import image as mpimg
from matplotlib import pyplot as plt
from PIL import Image, ImageDraw, ImageFont
import numpy as np
%matplotlib inline

# Now there is a trained endpoint that can be used to make a prediction
test_img_file = "data/produce_test.jpg"
test_img = Image.open(test_img_file)
test_img_h, test_img_w, test_img_ch = np.array(test_img).shape
    
predictor = CustomVisionPredictionClient(PREDICTION_KEY, endpoint=ENDPOINT)

# Open the sample image and get back the prediction results.
with open(test_img_file, mode="rb") as test_data:
    results = predictor.detect_image(project.id, iteration.name, test_data)


# Display the results.
draw = ImageDraw.Draw(test_img)

for prediction in results.predictions:
    if (prediction.probability*100) > 50:
        print ("\t" + prediction.tag_name + ": {0:.2f}%".format(prediction.probability * 100))  
        left = prediction.bounding_box.left * test_img_w 
        top = prediction.bounding_box.top * test_img_h 
        height = prediction.bounding_box.height * test_img_h
        width =  prediction.bounding_box.width * test_img_w
        points = ((left,top), (left+width,top), (left+width,top+height), (left,top+height),(left,top))
        draw.line(points, fill='magenta', width=20)
        plt.annotate(prediction.tag_name + ": {0:.2f}%".format(prediction.probability * 100),(left,top-20))
        
plt.imshow(test_img)


## Learn More:
<a href='https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/python-tutorial-od' target='_blank'>Building an Object Detection solution with the Custom Vision service</a>