# OpenCV's Capture Feature
First, let's make a program that captures the feedback of the camera and displays it in a new window.


The variable 'capture' is the webcam device. The parameters inside 'cv.VideoCapture()' can either take a video in the same directory of the code if the parameter the string of that file name, or a live video such as the webcam if the parameter is an integer (generally 0). I am using 1 since I want to use my other webcam.

The following will be in a loop:

The variable 'frame' takes a frame of that webcam from the function 'capture.read()'

The function 'cv.imshow()' now displays that frame on another window. The parameters inputted are the title that will be shown in the new window, and the displayed 'frame'.

The function 'cv.waitKey()' tells the program to wait for that long. If it is set to 0, it will wait infinitely. The value 27 is the 'esc' key.

To exit the program, press the 'esc' key

The last two functions 'capture.release()' and 'cv.destroyAllWindows()' just closes the windows.

In [1]:
import cv2 as cv
# Integer for live camera, or string for file
capture = cv.VideoCapture(0) 
while True:
    # 'isTrue' is a boolen if it successfuly captured something
    # 'frame' is the frame of the video
    isTrue, frame = capture.read() 
    # 'imshow' function makes a new window
    cv.imshow('Webcam', frame)
    if cv.waitKey(1) == 27: # Break while loop using esc
        break
    # Close 
capture.release() 
cv.destroyAllWindows()

# Import our Class IDs
Next is to import the file that will be consisting of our weights, config, and names. The weight and config determine if a portion of the image is potentially an identified object. Since we are using YOLO, we will be using the coco set consisting of 80 common objects like 'person', 'cup', 'cell phone', etc. The weights themselves are indexed and do not have a name associated with that index, so we will be making that below.

We will first open the text file name 'coco.names' using 'with open() as f' with the first parameter being the file name and the second parameter being the read type. The variable classNames will store a list consisting of the names in the correct index.'f.read()' just reads the whole file as a string, then '.split' splits the string into a list separated by the newline.

In [2]:
# Text file of names
importClassNames = 'coco.names'
# List
classNames = []
# File to list
with open(importClassNames, 'r') as f: 
    classNames = f.read().split('\n')

In [3]:
print(classNames) #check the list

['person', 'bicycle', 'car', 'motorbike', 'aeroplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'sofa', 'pottedplant', 'bed', 'diningtable', 'toilet', 'tvmonitor', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush', '']


## Implement Machine Learning Algorithm
Next, we will create the machine learning algorithm using the two files found on this website. We will be using the variable 'net' later.

In [3]:
import cv2 as cv
# File for the config
modelConfiguation = 'yolov3.cfg' 
# File for the weights of the objects(this is already pretrained)
modelWeights = 'yolov3.weights'  
# Creating the network of nodes to compare input and outputs
# Machine learning algorithm
net = cv.dnn.readNet(modelConfiguation, modelWeights)

# Converting 'frame' to be Used in the Algorithm
Right now the picture cannot be processed in the machine learning algorithm, So we will convert the frame into something called a 'blob'. Blob takes in a few parameters: image, scale factor, size, mean, swap RB, crop. We will then send the input of the blob into the network. The 'layerNames' variable gets all the variable names of our layers in the network. 

Next, we have to extract the output layers using the function 'network.getUnconnectedOutlayers()'. Note that we are just getting the index of the output, so we will use the index and refer it back to the 'layerNames'. Next, we will make a variable that forwards the outputNames to the network using the function 'net.forward'.

Now the network has a bunch of values including the bounding box, as well as its percentages, so the next part of the programs start filtering those results to be displayed later

Since the output does not use the index '0' and starts the index at 1, we will shift the index by '-1'.
Outputs will be the results we will need to display the results.

In [5]:
import numpy as np
def getOutputs(frame):
    blob = cv.dnn.blobFromImage(frame, 1/255, (widthheight, widthheight), [0, 0, 0], 1, crop = False)
    # network is where all the magic happens
    # it just tells gives you outputs for objects on the screen
    net.setInput(blob)
    layerNames = net.getLayerNames() #gets all names of the layer
    #print(layerNames)
    #print(net.getUnconnectedOutLayers()) #this gets the index of the output (does not use '0')
    outputNames = [layerNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
    #print(outputNames)
    #send image to network, find output
    outputs = net.forward(outputNames) # output is a list containing 3 values, within the list 'numpy'
    #print(len(outputs))         # 3
    #print(type(outputs))        # list
    #print(type(outputs[0]))     # numpy.ndarray (matrix)
    #print(outputs[0].shape)      # (300, 85 )
    #print(outputs[0][0])
    #print(outputs[1].shape)      # (1200, 85)
    #print(outputs[2].shape)      # (4800, 85)
    # for each box number, center x, center y, width, height, confidence object present, rest probability of that class
    return outputs


# Creating What to be Displayed
Now that we have the results, we will now make an algorithm to display the highest probability of that object in the bounding box. First, note that the output consists of 85 values instead of 80 from our name list. The first 5 values are as followed: center x, center y, width, height, confidence object present. The rest of the 80 values are the probability of that object displayed. We will make a new variable 'scores' consisting of only the probability of those objects. We will then get the class ID of the max probability of that object. We will also get that probability value by finding the index of the scores. 

The next thing is to check if the probability of that object is higher than our threshold. If it is we will make a bounding box for it. To make a bounding box, we will need 4 variables: x and y coordinate, the width, and the height of the bounding box. The code below edits the center x and y coordinate to the respective coordinate.
The variable 'outputBox' removes uneccesary boxes, and this is controlled by the variable 'nmsThreshold'.We will finaly display the boxes to the window using 'cv.rectangle()', 'cv.putText()', 'cv.imshow()'.

In [4]:

def displayBox(outputs):
    height, width, channels = frame.shape  # get the values of the h, w, channel is not needed
    boundingBox = []   # creates a list of bounding boxes to be created
    classIDs = []      # gets the id of the list
    confidenceValues = []   # gets the confident value
    
    for output in outputs:
        for values in output:  # values contains values and predictables
            scores = values[5:] # only contains the values of the predictions
            classID = np.argmax(scores)  #gets the class ID with the highest score within the list
            confidence = scores[classID] #gets the confidence value
            if confidence > confidenceThreshold:  #checks if it passes the threshold
                w, h = int(values[2] * width), int(values[3] * height) # width and height of bounding box
                x, y = int(values[0] * width - w/2), int(values[1] * height - h/2)  # get the x y position
                boundingBox.append([x, y, w, h])  # add bounding box
                classIDs.append(classID) # add classids
                confidenceValues.append(float(confidence)) # add confidence value
    
    #removes inside boxes , based on nms threshold
    outputBox = cv.dnn.NMSBoxes(boundingBox, confidenceValues, confidenceThreshold, nmsThreshold)
            #gets a nested list    
    for i in outputBox:
        index = i[0]
        color = colors[classIDs[index]]
        x, y, w, h = boundingBox[index][:4]
        cv.rectangle(frame, (x, y), (x + w, y + h), color, 2)
        #(image, point 1, point 2, color, thickness)
        cv.putText(frame, f'{classNames[classIDs[index]].capitalize()} {int(confidenceValues[index] * 100)}%',
                   (x,y-10), font, 0.6, color, 2)
        #(image, text, position, font, size, color, thickness)

# Main Function
Finally we will add the features to the original capture feature we have made earlier with some constants that we can change later if we felt like it.

In [6]:
import numpy as np
widthheight = 320             # used to evaluated the image at that pixel
confidenceThreshold = 0.5     # value that tells you the threshold of how the network sees as that object
nmsThreshold = 0.6            # value is used to decrease the number of bounding boxes displayed
font = cv.FONT_HERSHEY_SIMPLEX# font value for output name
colors = np.random.uniform(0, 255, size = (len(classNames),3)) # option for using different colors for every name
capture = cv.VideoCapture(1) #use webcam with integer, or path
while True:
    ifTrue, frame = capture.read()
    #network understands blob
    #outputs = getOutputs(frame)
    displayBox(getOutputs(frame))
    cv.imshow('cam', frame)
    if cv.waitKey(1) == 27: #break while loop
        break
capture.release() # close
cv.destroyAllWindows()