In [1]:
import os
import time
import cv2
import numpy as np
from model.yolo_model import YOLO

Using TensorFlow backend.


In [4]:
def process_image(img):
    '''Resize, reduce and expand image.
    
    # Argument :
        img : original image
        
    # Returns : 
        image : original image
    '''
    
    image = cv2.resize(img, (416, 416), interpolation=cv2.INTER_CUBIC)
    image = np.array(image, dtype='float32')
    image /= 255
    image = np.expand_dims(image, axis=0)
    
    return image

This function takes an original image as input and performs several processing steps on it.

Resize: The cv2.resize() function resizes the input image to a specified size (416, 416). 
It uses the cv2. INTER_CUBIC interpolation method, which is a higher-quality interpolation method compared to others like cv2.INTER_LINEAR.

Data type conversion: The resized image is then converted to a NumPy array with dtype='float32'. This step ensures that the image data type is suitable for further processing, typically in machine learning or computer vision tasks.

Normalization:  The pixel values of the image are normalized by dividing each pixel value by 255. This step scales the pixel values to the range [0,1], which is a common practice for neural network input data normalization

Dimension expansion: Finally, np.expand_dims() is used to add an extra d imension to the processed image array. This is often necessary when deal ing with batch processing in deep learning frameworks, where the first d imension represents the batch size. In this case, axis-e indicates that the new dimension is added as the first dimension.

The function returns the processed image with the specified shape (1, 416, 416, 3), where 1 represents the batch size, 416 and 416 represent the image dimensions, and 3 represents the number of color channels (RGB).

In [5]:
def get_classes(file):
    '''
    Get classes name
    
    # Argument:
        file : classes name for database.
        
    # Returns:
        class_names : List, classes name.
    '''
    
    with open(file) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    
    return class_names

This function is designed to read a file containing class names and return them as a list.

Opening the File: The function takes a parameter file, which is the path to the file containing the class names. It then opens this file using a with statement, which ensures that the file is properly closed after its suite finishes, even if an exception is raised.

Reading Class Names: Inside the with block, f.readlines() reads all the lines from the file and returns them as a list of strings. Each string represents a class name.

Stripping Newlines: Since readlines() includes the newline character \n at the end of each line, the list of class names may contain trailing wh itespace. The list comprehension [c.strip() for c in class_names] is used to remove leading and trailing whitespace (including newlines) from each class name.

Returning Class Names: The function then returns the list of class names.

Overall, this function is a simple utility for extracting class names from a file. It's commonly used in machine learning and computer vision tasks where class names are stored externally, such as in a text file

In [6]:
'''
This function, draw(image, boxes, scores, classes, all_classes), is intended to draw bounding boxes around detected objects 
on an image along with their corresponding class labels and confidence scores.
'''
def draw(image, boxes, scores, classes, all_classes):
    '''
    Draw the boxes on the image.
    
    # Argument : 
        image: original image
        boxes: ndarray, boxes of objects
        classes: ndarray, classes of objects
        scores: ndarray, scores of objects
        all_classes: all classes name
    '''
    for box, score, cl in zip(boxes, scores, classes):
        x, y, w, h = box
        
        top = max(0, np.floor(x + 0.5).astype(int))
        left = max(0, np.floor(x + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x + w + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y + h + 0.5).astype(int))
        
        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(all_classes[cl], score),
                    (top, left -6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 1,
                    cv2.LINE_AA
                   )
        
        print('class: {0}, score: {1:.2f}'.format(all_classes[cl], score))
        print('box coordinate x,y,w,h: {0}'.format(box))
        
    print()

This function is designed to draw bounding boxes around detected objects on an image, along with their class labels and confidence scores.

    Iterating Over Detected Objects:
    It iterates over each detected object using a for loop and zip(boxe s, scores, classes), where boxes, scores, and classes are arrays containing the bounding box coordinates, confidence scores, and class indices o f detected objects, respectively.

    Bounding Box Coordinates:
    For each detected object, it unpacks the bounding box coordinates (x, y, w, h) from box.
    It calculates the top-left (top, left) and bottom-right (right, bott on) coordinates of the bounding box. These coordinates are used to draw the rectangle

    Drawing Bounding Boxes:
    Using cv2.rectangle(), it draws a rectangle around the detected obje ct on the image. The rectangle is drawn using coordinates (top, left) an d (right, bottom) with a blue color (255, 0, 0) and a thickness of 2 pix els.

    Annotating with Class Label and Score:
    It annotates the bounding box with the class label and confidence sc ore using cv2.putText(). The label and score are formatted with the clas s name obtained from all classes [cl] (where cl is the class index) and t he confidence score. This annotation is placed slightly above the top-le ft corner of the bounding box with a red color (0, 0, 255) and a font scale of 0.6.

    Printing Information:
    For each detected object, it prints the class name and confidence score along with the bounding box coordinates.

    Blank Line:
    Finally, it prints a blank line to separate the output for different images if multiple images are being processed.

    This function is commonly used in object detection tasks to visualize the detected objects on an image for validation or debugging purposes.

In [12]:
def detect_image(image, yolo, all_classes):
    '''
    Use YOLO v3 to detect images.
    
    # Argument:
        image: original image
        yolo: YOLO, yolo model
        all_classes: all classes names
        
    # Returns:
        image: processed image
    '''
    
    pimage = process_image(image)
    
    #The image is first processed into a format that the YOLO model 
    #expects, typically involving resizing, normalizing, and possibly 
    #transforming the image to fit the input shape for the YOLO model.
    
    start = time.time()
    boxes, classes, scores = yolo.predict(pimage, image.shape)
    end = time.time()
    
    print('time: {0:.2f}s'.format(end - start))

#Here, the function calls YOLO's predict method on the processed image
#(pimage). This method returns three things:

#boxes: coordinates of bounding boxes for detected objects.
#classes: the class indices (e.g., 'e' for 'person', '1' for "car", etc.).
#scores: confidence scores indicating how certain the model is that an object
#The time taken for the detection is measured using time.time() to calculate
    
    if boxes is not None:
        draw(image, boxes, scores, classes, all_classes)
#If YOLD detects any objects (i.e., boxes is not None)
#the draw function is called. This function will typically:
#Draw bounding boxes on the original image.

#Label each box with the class name (using all_classes).
#Optionally include the confidence score.

    return image

In [13]:
def detect_video(video, yolo, all_classes):
    """Use yolo v3 to detect video.

    # Argument:
        video: video file.
        yolo: YOLO, yolo model.
        all_classes: all classes name.
    """
    video_path = os.path.join("C:/10-Computer_Vision/YOLOv3/videos", "C:/10-Computer_Vision/YOLOv3/videos/test", video)
    camera = cv2.VideoCapture(video_path)
    cv2.namedWindow("C:/10-Computer_Vision/YOLOv3/detection", cv2.WINDOW_AUTOSIZE)
    
    #fource= cv2.VideoWriter_fourcc(**mpeg') specifies the codec 28 to be used for saving the video (in this case, the 'moea codec).
    #video_path constructs the path to the input video file by joining folder
    #names and the video filename.
    #camera cv2.VideoCapture(video_path) initializes the video capture object #using OpenCV (cv2). This allows reading frames from the video.
    #cv2.namedWindow creates a window named "detection" for displaying the video
    
    # Prepare for saving the detected video
    sz = (int(camera.get(cv2.CAP_PROP_FRAME_WIDTH)),
        int(camera.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    fourcc = cv2.VideoWriter_fourcc(*'mpeg')
    
    #sz stores the dimensions of the video (width and height),
    #which are retrieved using OpenCV's camera.get method.
    
    vout = cv2.VideoWriter()
    vout.open(os.path.join("C:/10-Computer_Vision/YOLOv3/videos", "C:\10-Computer_Vision\YOLOv3\videos\res", video), fourcc, 20, sz, True)
    
    #vout = cv2.VideoWriter() creates a video writer object,
    #and vout.open opens a new video file where the processed frames (with object
    while True:
        res, frame = camera.read()

        if not res:
            break

        image = detect_image(frame, yolo, all_classes)
        cv2.imshow("C:/10-Computer Vision/YOLOv3/detection", image)
    #The function detect_image(frame, yolo, all_classes) (the one we discussed eai
    #The processed frame (with bounding boxes and Labels) is stored in image.
    #cv2.imshow("detection", image) displays the frame in the "detection" window
    
        # Save the video frame by frame
        vout.write(image)

        if cv2.waitKey(110) & 0xff == 27:
                break

    vout.release()
    camera.release()


In [14]:
yolo = YOLO(0.6, 0.5)
file = 'C:/10-Computer_Vision/YOLOv3/data/coco_classes.txt'
all_classes = get_classes(file)

# Detecting Images

In [15]:
f = 'C:/10-Computer_Vision/YOLOv3/images/test/person.jpg'
image = cv2.imread('C:/10-Computer_Vision/YOLOv3/images/test/person.jpg')
image = detect_image(image, yolo, all_classes)
cv2.imwrite('C:/10-Computer_Vision/YOLOv3/images/res' + f, image)

time: 37.50s
class: person, score: 1.00
box coordinate x,y,w,h: [187.71986008  84.54503465  91.60767555 304.36179113]
class: horse, score: 1.00
box coordinate x,y,w,h: [396.45050049 137.2821641  215.70493698 208.54855251]
class: dog, score: 1.00
box coordinate x,y,w,h: [ 61.28296375 263.38459015 145.58371544  88.16218674]



False