# You Only Look Once (YOLO) Library

You Only Look Once (YOLO) Library is used for image recognition tasks. The YOLOv8 Documentation can be found in teh following links.
- https://docs.ultralytics.com/
- https://github.com/ultralytics/ultralytics
- https://docs.ultralytics.com/models/yolov8

This library includes various models that have already been pretrained for different tasks (https://docs.ultralytics.com/models/yolov8/#key-features). These tasks include but is not limited to image detection, facial recognition, gender classification, object blurring, gender detection, counting objects, alerts, and using it with custom datasets.

In [1]:
# To install YOLOv8 install the ultralytics library by uncommenting and running the code line below. 
#pip install ultralytics

# Python Library Loading

In [2]:
import os
# HW Transformers to 0 fixes issue with USB Webcam Loading very slow in OpenCV
os.environ["OPENCV_VIDEOIO_MSMF_ENABLE_HW_TRANSFORMS"] = "0"

In [3]:
from ultralytics import YOLO
import cv2 as cv
import math

import ipywidgets as widgets
import threading
from IPython.display import display, Image

# YOLO Pre-Trained Models

The YOLO libarry includes various sets of models that vary in size (e.g., yolov8s.pt, yolov8m.pt, yolov8l.pt, yolov8x.pt). These pretrained models have been trained using various pictures and typically the larger the model the better the accuracy. This pretrained models can be deployed to read a picture or read the frame of a video feed and recognize the objects that the model has been trained on.
- https://docs.ultralytics.com/models/yolov8/#supported-tasks-and-modes

In [4]:
model = YOLO('./input_data/yolov8n.pt') # Downloads (if does not exist) or accesses pre-trained model yolov8.pt into specified directory. 

#model = YOLO('./output_data/yolov8n_custom_2-PPE-COLAB-3epochs.pt') # Custom Dataset model trained in later steps.

The model.names include the categories of images that the YOLOv8 model loaded above was trained to recognize. The list includes a total of 80 classes including but not limited to persons, vehicles, animals, fruits, kitchen appliances and various other objects. 

In [5]:
print(model.names) # Prints classes within the Pre-Trained Model.

{0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microw

In [6]:
print(model.names.keys()) # Prints keys within the Pre-Trained Model.

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79])


In [7]:
# To select only specific classes modify this cell. 
keys_to_select = model.names.keys() # Selects all keys
classes_selected = [] # Starting master list for selected classes.

for key in keys_to_select: # Iterates through keys to extract classes as a list.
    if key in model.names:
        classes_selected.append(model.names[key]) #Appends classes to master list of selected class.

classes_selected

['person',
 'bicycle',
 'car',
 'motorcycle',
 'airplane',
 'bus',
 'train',
 'truck',
 'boat',
 'traffic light',
 'fire hydrant',
 'stop sign',
 'parking meter',
 'bench',
 'bird',
 'cat',
 'dog',
 'horse',
 'sheep',
 'cow',
 'elephant',
 'bear',
 'zebra',
 'giraffe',
 'backpack',
 'umbrella',
 'handbag',
 'tie',
 'suitcase',
 'frisbee',
 'skis',
 'snowboard',
 'sports ball',
 'kite',
 'baseball bat',
 'baseball glove',
 'skateboard',
 'surfboard',
 'tennis racket',
 'bottle',
 'wine glass',
 'cup',
 'fork',
 'knife',
 'spoon',
 'bowl',
 'banana',
 'apple',
 'sandwich',
 'orange',
 'broccoli',
 'carrot',
 'hot dog',
 'pizza',
 'donut',
 'cake',
 'chair',
 'couch',
 'potted plant',
 'bed',
 'dining table',
 'toilet',
 'tv',
 'laptop',
 'mouse',
 'remote',
 'keyboard',
 'cell phone',
 'microwave',
 'oven',
 'toaster',
 'sink',
 'refrigerator',
 'book',
 'clock',
 'vase',
 'scissors',
 'teddy bear',
 'hair drier',
 'toothbrush']

### Image Recognition: In Jupyter Notebook

In [8]:
# WebCam input with Stop button
stopButton = widgets.ToggleButton(value = False,
                                  description = 'Stop',
                                  disabled = False,
                                  button_style= 'danger',
                                  tooltip = 'Description',
                                  icon = 'square' # (FontAwesome names without the `fa-` prefix)
                                 )

# Display function
def view(button):
    capture = cv.VideoCapture(1) # Video capture object. Camera 0 is the integrated camera while 1 is USB Camera.
    # Frame Width/Height can be commented out. Common resolutions: 1920x1080, 1600x900, 1280x800, 640x480.
    capture.set(cv.CAP_PROP_FRAME_WIDTH, 1360) 
    capture.set(cv.CAP_PROP_FRAME_HEIGHT, 960)
    
    # To save video as .avi file in the specified directory. May crash the application
    #frame_width = int(capture.get(3)) # Gets frame width from cv.VideoCapture
    #frame_height = int(capture.get(4))  # Gets frame height from cv.VideoCapture
    #out = cv.VideoWriter('.output_data/video_output.avi', cv.VideoWriter_fourcc('M', 'J', 'P', 'G'), 10, (frame_width, frame_height))

    # Specify Classes you need the model to predict if different from default.

    display_handle = display(None, display_id=True)

    #while True: # Both while statements seem to work.
    while(capture.isOpened()):
        _, frame = capture.read() # Reads and captures the image frame
        frame = cv.flip(frame, 1) # Reverses image if webcam reverses
        
        results = model(source = frame,
                        stream = True, # If True, can prevents out of memory issues
                        verbose = False) # If True, prints information on inference speed, times, and other info

        # Extracts each bounding box and displays it in the frame coordinates
        for r in results:
            boxes = r.boxes
            for box in boxes:
                x1,y1,x2,y2 = box.xyxy[0]
                #print(x1, y1, x2, y2)
                x1,y1,x2,y2 = int(x1), int(y1), int(x2), int(y2)
                #print(x1,y1,x2,y2)
                cv.rectangle(frame, (x1,y1), (x2,y2), (255,0,255),3)
                #print(box.conf[0])
                conf = math.ceil((box.conf[0]*100))/100
                cls = int(box.cls[0])
                class_name = classes_selected[cls]
                label = f'{class_name}{conf}'
                t_size = cv.getTextSize(label, 0, fontScale=1, thickness=2)[0]
                #print(t_size)
                c2 = x1 + t_size[0], y1 - t_size[1] - 3
                cv.rectangle(frame, (x1,y1), c2, [255,0,255], -1, cv.LINE_AA)  # filled
                cv.putText(frame, label, (x1,y1-2),0,1,[255,255,255], thickness=1, lineType=cv.LINE_AA)
        #out.write(frame) # Writes video to file specified above
        
        _, frame = cv.imencode('.jpeg', frame) # Reads frame from buffer in memory
        display_handle.update(Image(data=frame.tobytes())) # Uses the Jupyter Notebook Display to show in Notebook.
        if stopButton.value == True: # Stops video instance using a button
            capture.release()
            cv.destroyAllWindows()
            display_handle.update(None)
            out.release()

In [9]:
# Run the view function. Note takes about 4 minutes to load video.
display(stopButton)
thread = threading.Thread(target = view, 
                          args = (stopButton,))
thread.start()

ToggleButton(value=False, button_style='danger', description='Stop', icon='square', tooltip='Description')

None

### Image Recognition: New Window

In [None]:
# Note takes about 5 minutes to load image in new window.

capture = cv.VideoCapture(1) # Video capture object. Camera 0 is the integrated camera while 1 is USB Camera.
# Frame Width/Height can be commented out. Common resolutions: 1920x1080, 1600x900, 1280x800, 640x480.
capture.set(cv.CAP_PROP_FRAME_WIDTH, 1360) 
capture.set(cv.CAP_PROP_FRAME_HEIGHT, 960)  

# To save video as .avi file in the specified directory. May crash the application
#frame_width = int(capture.get(3)) # Gets frame width from cv.VideoCapture
#frame_height = int(capture.get(4))  # Gets frame height from cv.VideoCapture
#out = cv.VideoWriter('.output_data/video_output.avi', cv.VideoWriter_fourcc('M', 'J', 'P', 'G'), 10, (frame_width, frame_height))

while(capture.isOpened()):
    _, frame = capture.read() # Reads and captures the image frame
    frame = cv.flip(frame, 1) # Reverses image if webcam reverses
    
    results = model(source = frame,
                    stream = True, # If True, can prevents out of memory issues
                    verbose = False) # If True, prints information on inference speed, times, and other info
    
    # Extracts each bounding box and displays it in the frame coordinates
    for r in results:
        boxes = r.boxes
        for box in boxes:
            x1,y1,x2,y2 = box.xyxy[0]
            #print(x1, y1, x2, y2)
            x1,y1,x2,y2 = int(x1), int(y1), int(x2), int(y2)
            #print(x1,y1,x2,y2)
            cv.rectangle(frame, (x1,y1), (x2,y2), (255,0,255),3)
            #print(box.conf[0])
            conf = math.ceil((box.conf[0]*100))/100
            cls = int(box.cls[0])
            class_name = classes_selected[cls]
            label = f'{class_name}{conf}'
            t_size = cv.getTextSize(label, 0, fontScale=1, thickness=2)[0]
            #print(t_size)
            c2 = x1 + t_size[0], y1 - t_size[1] - 3
            cv.rectangle(frame, (x1,y1), c2, [255,0,255], -1, cv.LINE_AA)  # filled
            cv.putText(frame, label, (x1,y1-2),0, 1,[255,255,255], thickness = 1, lineType = cv.LINE_AA)
    
    #out.write(frame) # Writes video to file specified above
    cv.imshow('Image Detection', frame) # Display in new window
    
    key = cv.waitKey(30) & 0xff # Reads keys every defined milliseconds to enter break below on key press
    if key & 0xFF == ord('q') or key == 27 or 'x' == chr(key & 255): # To exit, press q, x, or ESC.
        break
        
# Release the VideoCapture object
capture.release()
cv.destroyAllWindows()

### Pre-Trained Model Notes
- Accuracy will depend on the images that were used which we may or may not have access to via documentation.
- Pre-trained models accuracy and size may vary significantly. Typically the higher the number of pciture used for training, the higher the accuracy, and the larger the model size.
- Only useful in the defined model labels.

What if we need to increase the capability of the model in a new label? 
We can start from scratch and train a new model with new set of images labeled for our specific use case.
We could fine-tune a pre-trained model with a custom dataset to increase and expand its usability. 

# NOTEBOOK END