# Introduction

This project utilizes a variety of libraries to facilitate image processing and object detection, providing us with a comprehensive toolkit for effective image handling and analysis. Here's an overview of the essential libraries and their primary purposes:

- **OpenCV (`cv2`)**: The foundation for computer vision tasks, enabling a wide range of image processing operations.
- **NumPy (`np`)**: Supports large, multi-dimensional arrays and matrices, offering numerous mathematical functions crucial for image manipulation.
- **Pillow (`PIL`)**: A modern fork of the Python Imaging Library, used for opening, manipulating, and saving various image file formats, essential for image frame display in this project.
- **Matplotlib (`plt`)**: Facilitates the creation of visualizations, such as graphs and charts, aiding in the presentation of data and results from image processing.
- **Scikit-learn (`KMeans`)**: Provides various machine learning algorithms; the `KMeans` algorithm is specifically used for clustering pixel data in images.
- **Collections (`defaultdict`)**: Enhances data storage and organization capabilities during image processing tasks through the `defaultdict` utility.
- **Math**: Offers basic mathematical functions necessary for conducting calculations in image analysis.
- **Ultralytics YOLO (`YOLO`)**: A fast and accurate object detection system capable of identifying multiple objects in images and videos in real-time.


# Tips
In my experimentation, I've noted that the cv2.imshow function exhibits compatibility issues within Jupyter notebooks on my system. Specifically, there are instances where the display window fails to close post the exhibition of all image frames. As a precautionary measure, I have archived the anticipated output of each segment as files. These files serve as references in the introductions of their corresponding sections.

# Pre-preparation section
1. **Scanning JPEG Image Files**: This step involves scanning the JPEG image files within the dataset and saving their paths to a text file. Providing a flexible approach for referencing the dataset.
2. **Green Percentage Calculation Function**: A function is created to calculate the percentage of the image that is green. This utility is primarily employed in Task 1.

Output Files
- *SINGLEframes.txt*
- *DOUBLEframes.txt*

In [1]:
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from collections import defaultdict
import math
double_finder = r"DOUBLEframes.txt"
single_finder = r"SINGLEframes.txt"
single_passed_path = r"SIGNLE_passed_pic_str.txt"
double_passed_path = r"DOUBLE_passed_pic_str.txt"

single_w = 0
single_h = 0
double_w = 0
double_h = 0

In [2]:
def raw_jpeg_load():
  """
  is used to loading path to txt it is just to used at first
  """
  #double_finder = "D:\Computer_Version_Assignment\DOUBLEframes"
  #single_finder = "D:\Computer_Version_Assignment\SINGLEframes"
  with open(r'DOUBLEframes.txt', 'w') as f:
    pass
  with open(r'SINGLEframes.txt', 'w') as f:
    pass
  #with open('/content/drive/My Drive/测试/bad_pic_str.txt','w') as uessless_file:
  #  pass
  for i in range(1,502):
    image_str = r"img\1"+str(i)+"-SINGLE.jpeg"
    image = cv2.imread(image_str)
    # cv2_imshow(Image)
    if image is None:
      continue
    with open(r'SINGLEframes.txt', 'a') as f:
      f.write(image_str+"\n")
  for i in range(562,814):
    image_str = r"img\2"+str(i)+"-DOUBLE.jpeg"
    image = cv2.imread(image_str)
    # cv2_imshow(Image)
    if image is None:
      continue
    with open(r'DOUBLEframes.txt', 'a') as f:
      f.write(image_str+"\n")
raw_jpeg_load()

SyntaxError: unterminated string literal (detected at line 14) (3079282125.py, line 14)

In [3]:
def Compute_Green_area(Image):
  """
  it is mainly used in task 1
  """
  hsv = cv2.cvtColor(Image, cv2.COLOR_BGR2HSV)

  lower_green = np.array([55, 40, 55])  
  upper_green = np.array([88, 150, 250])  

  
  mask = cv2.inRange(hsv, lower_green, upper_green)

  kernel = np.ones((3, 3), np.uint8)
  mask = cv2.erode(mask, kernel, iterations=3)
  mask = cv2.dilate(mask, kernel, iterations=5)

  green_area = np.sum(mask == 255)

  total_area = Image.shape[0] * Image.shape[1]
  green_area_ratio = green_area / total_area

  print('Proportion of green area:', green_area_ratio)

  return green_area_ratio


# Task 1
This task involves utilising a function designed to calculate the proportion of colour within an image. Specifically, it filters the images based on defined criteria such as correlation to a specific percentage of green area in each image. The image paths that meet the green area percentage criteria are then saved to their respective text files for easy access and further analysis.

Output Files
- *DOUBLE_passed_pic_str.txt*
- *SINGLE_passed_pic_str.txt*


In [4]:
def task1(single_finder,double_finder,single_passed_path,double_passed_path):
  with open(single_passed_path,'w') as single_passed:
    pass
  with open(double_passed_path,'w') as double_passed:
    pass
  with open(single_finder, 'r', encoding='utf-8') as single_file:
      for line in single_file:

          line = line.strip()
          image = cv2.imread(line)
          print(line)
          Compute_Green_area(image)
          image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

          image_pil = Image.fromarray(image_rgb)

          # image_pil.show()
          start_height = 285
          clip_Image = image[start_height:, :, :]
          green_data = Compute_Green_area(clip_Image)

          if green_data > 0.2:
            approved_pic = line
            print(approved_pic)
            with open(single_passed_path, 'a') as single_passd_file:
              single_passd_file.write(approved_pic+"\n")
          else:
            print(line+" is failed")
  with open(double_finder,'r',encoding='utf-8') as double_file:
    for line_2 in double_file:
      line_2 = line_2.strip()
      image_2 = cv2.imread(line_2)
      start_height = 285
      clip_Image = image_2[start_height:, :, :]
      green_data = Compute_Green_area(clip_Image)
      if green_data > 0.4:
        approved_pic = line_2
        print(approved_pic)
        with open(double_passed_path,'a') as double_passed_file:
          double_passed_file.write(approved_pic+"\n")
      else:
        print(line_2+" is failed")


task1(single_finder,double_finder,single_passed_path,double_passed_path)

D:\Computer_Version_Assignment\SINGLEframes\frame1-SINGLE.jpeg
Proportion of green area: 0.0
Proportion of green area: 0.0
D:\Computer_Version_Assignment\SINGLEframes\frame1-SINGLE.jpeg is failed
D:\Computer_Version_Assignment\SINGLEframes\frame2-SINGLE.jpeg
Proportion of green area: 0.0
Proportion of green area: 0.0
D:\Computer_Version_Assignment\SINGLEframes\frame2-SINGLE.jpeg is failed
D:\Computer_Version_Assignment\SINGLEframes\frame3-SINGLE.jpeg
Proportion of green area: 0.0
Proportion of green area: 0.0
D:\Computer_Version_Assignment\SINGLEframes\frame3-SINGLE.jpeg is failed
D:\Computer_Version_Assignment\SINGLEframes\frame4-SINGLE.jpeg
Proportion of green area: 0.0
Proportion of green area: 0.0
D:\Computer_Version_Assignment\SINGLEframes\frame4-SINGLE.jpeg is failed
D:\Computer_Version_Assignment\SINGLEframes\frame5-SINGLE.jpeg
Proportion of green area: 0.0
Proportion of green area: 0.0
D:\Computer_Version_Assignment\SINGLEframes\frame5-SINGLE.jpeg is failed
D:\Computer_Version_

# TASK 2
1. **Field Boundary Identification and Processing**: This step involves a comprehensive series of image processing operations to delineate the boundaries of the playing field within a given image. Initially, the image undergoes a color space conversion to facilitate color filtering. Subsequent morphological operations refine the image for edge detection, followed by contour identification to locate boundary lines. The process culminates with the detection of corner points and line segments, outlining the playing field's precise boundaries.

2. **Corner Point Identification and Perspective Transformation**: This task focuses on the accurate identification and transformation of the badminton court's corner points within the image. The initial phase involves detecting corner points and then classifying and filtering these points based on their spatial location. A perspective transformation matrix is calculated based on these corner points. Applying this matrix enables the precise drawing of the badminton court's outline, effectively transforming specific points to provide a clear depiction of the court's layout.

Output Files
- *img_2.jpg*
- *img_4.jpg*


In [5]:
def Find_court_single(image,corner_counts,type):
    
    """
    it can find the court
    """

    start_height = 285
    single_image = image[start_height:, :, :]
    height, width = single_image.shape[:2]
    single_w, single_h = width, height + start_height

    orange_color = (0, 140, 255)

    orange_bar = np.full((40, single_w, 3), orange_color, dtype=np.uint8)

    
    combined_image = cv2.vconcat([single_image, orange_bar])

    hsv = cv2.cvtColor(combined_image, cv2.COLOR_BGR2HSV)
    lower_green = np.array([55, 70, 90])
    upper_green = np.array([88, 130, 188])

    mask_green = cv2.inRange(hsv, lower_green, upper_green)

    kernel = np.ones((3, 3), np.uint8)
    mask_green = cv2.dilate(mask_green, kernel, iterations=32)
    mask_green = cv2.erode(mask_green, kernel, iterations=26)

    green_edges = cv2.Canny(mask_green, 50, 150)
    contours, hierarchy = cv2.findContours(green_edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)


    max_area = -1
    max_index = -1

    for index, cnt in enumerate(contours):
        area = cv2.contourArea(cnt)
        if area > max_area:
            max_area = area
            max_index = index

    if max_index != -1:
        max_contour = contours[max_index]
        epsilon = 0.01 * cv2.arcLength(max_contour, True)
        approx = cv2.approxPolyDP(max_contour, epsilon, True)

        if len(approx) > 3 and cv2.isContourConvex(approx):
            contour_corners = [tuple(point[0]) for point in approx]
            left_bottom = list(contour_corners[0])
            right_bottom = list(contour_corners[1])
            right_top = list(contour_corners[2])
            left_top = list(contour_corners[3])

            print(len(contour_corners))
            print(contour_corners)
            print(left_bottom, right_bottom, right_top, left_top)

            left_bottom[1] = left_bottom[1] + start_height
            right_bottom[1] = right_bottom[1] + start_height
            right_top[1] = right_top[1] + start_height
            left_top[1] = left_top[1] + start_height

            left_bottom[0] = min(left_bottom[0], single_w)
            left_bottom[1] = min(left_bottom[1], single_h)
            right_top[0] = min(right_top[0], single_w)
            right_top[1] = min(right_top[1], single_h)
            right_bottom[0] = min(right_bottom[0], single_w)
            right_bottom[1] = min(right_bottom[1], single_h)
            left_top[0] = min(left_top[0], single_w)
            left_top[1] = min(left_top[1], single_h)

            points_list = [left_bottom, right_bottom, right_top, left_top]
            points_array = np.array(points_list, dtype=np.int32)
            corners_array_reshaped = points_array.reshape((-1, 1, 2))

            contour_corners_new = [tuple(point[0]) for point in corners_array_reshaped]

            # cv2.drawContours(image, [corners_array_reshaped], 0, (0, 255, 0), 3)
            
            #for corner in contour_corners_new:
            #    cv2.circle(image, corner, radius=5, color=(255, 255, 255), thickness=-1)

            points_list_white = [[174,720],[1101,720],[916,325],[359,325]]
            points_white = np.array([points_list_white])

            mask_white = np.zeros_like(image)
            cv2.fillPoly(mask_white, [points_white], (255,)*image.shape[2], cv2.LINE_AA)

            pts_marked = np.array([[402, 490], [827, 490], [823, 436], [410, 436]], dtype=np.int32)

            pts_marked = pts_marked.reshape((-1, 1, 2))

            # cv2.fillPoly
            roi = cv2.bitwise_and(image, mask_white)
            #backSub = cv2.createBackgroundSubtractorMOG2(history=150, varThreshold=17, detectShadows=False)
            #mask = backSub.apply(roi)
            #cv2.imshow("mask_1", mask)
            #_, thresh = cv2.threshold(mask, 200, 255, cv2.THRESH_BINARY)
            #invertedMask = cv2.bitwise_not(thresh)
            #background = cv2.bitwise_and(roi, roi, mask=invertedMask)
            #cv2.imshow("mask", background)


            cv2.fillPoly(roi, [pts_marked], color=(0, 0, 0))
            gray_image = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
            _, binary_image = cv2.threshold(gray_image, 160, 255, cv2.THRESH_BINARY)
            kernel_white = np.ones((3,3), np.uint8)
            # dilated_edges = cv2.dilate(edges, kernel, iterations=1)           
            blurred = cv2.GaussianBlur(binary_image, (3, 3), 0)
            edges = cv2.Canny(blurred, 50, 150, apertureSize=5)
            closing = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel_white)
            if type == 4:
                points_cricle = [[224,720],[224,684],[1046,684],[1046,720]]
                points_cricle = np.array([points_cricle])
                mask_white = np.zeros_like(closing)
                cv2.fillPoly(closing, [points_cricle], color=(0, 0, 0))

            
            corner_track_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

            cv2.imshow("closing", closing)
            corners = cv2.goodFeaturesToTrack(closing, mask=None, **corner_track_params)
            corners = np.int0(corners)  
            threshold = 5
            for i in corners:
              for x, y in np.float32(i).reshape(-1, 2):
                          found = False
                          for pos in corner_counts:
                              if abs(pos[0] - x) <= threshold and abs(pos[1] - y) <= threshold:
                                  corner_counts[pos] += 1
                                  found = True
                                  break
                          if not found:
                              corner_counts[get_position((x, y))] = 1

            
            lines = cv2.HoughLinesP(closing, 1470, np.pi/180, threshold=50, minLineLength=30, maxLineGap=100)
            #cv2.imshow("edges", edges)
            print(lines)
            if lines is not None:
                for line in lines:
                    x1, y1, x2, y2 = line[0]
                    cv2.line(edges, (x1, y1), (x2, y2), (0, 0, 255), 2)
            #cv2.imshow("dilated_edges", edges)
            #cv2.imshow("roi", roi)
            #if cv2.waitKey(25) & 0xFF == ord('q'):
                # cv2.destroyAllWindows()
            image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

            image_pil = Image.fromarray(image_rgb)

            #image_pil.show()

            return left_bottom, right_top
        else:
            print("The outline is not a convex polygon or has an insufficient number of vertices.")
            return None, None
    else:
        print("No profile detected.")
        return None, None

In [6]:
def get_position(p):
    return (round(p[0]), round(p[1]))

In [7]:
def classify_and_find_extreme_points(points):
    i = 0
    while i < len(points):
        
        j = i+1
        while j < len(points):
            print("European distance:",math.sqrt((points[i][0] - points[j][0])**2 + (points[i][1] - points[j][1])**2))
            if math.sqrt((points[i][0] - points[j][0])**2 + (points[i][1] - points[j][1])**2) < 25:
                avg_x = (points[i][0] + points[j][0]) / 2
                avg_y = (points[i][1] + points[j][1]) / 2
                points[i] = [int(avg_x), int(avg_y)]
                print("i: ",i,"j: ",j,"avg_x: ",avg_x,"avg_y: ",avg_y,"points: ",points[i],points[j])
                print(points[i])
                points.pop(j)
            else:
                print("i: ",i,"j: ",j,"European distance:",math.sqrt((points[i][0] - points[j][0])**2 + (points[i][1] - points[j][1])**2))
                j += 1
        i += 1
    Y = np.array([p[1] for p in points]).reshape(-1, 1)

    kmeans = KMeans(n_clusters=7, random_state=0)

    kmeans.fit(Y)

    labels = kmeans.labels_

    categories = [[] for _ in range(7)]
    for label, point in zip(labels, points):
        categories[label].append(point)
    leftmost_points = [None] * 7
    rightmost_points = [None] * 7
    for i, category in enumerate(categories):
        print("Category",i+1)
        print(f"Category {i+1}: {category}")
        for point in category:
            x, y = point
            if leftmost_points[i] is None or x < leftmost_points[i][0]:
                leftmost_points[i] = point
            if rightmost_points[i] is None or x > rightmost_points[i][0]:
                rightmost_points[i] = point
    
    
    return leftmost_points, rightmost_points,points

def Compute_transform_matrix_single(left_bottom, right_top):
    """
    Calculate the perspective transformation matrix
    Select the four points in the first half of the field
    """
    left_bottom_point = left_bottom[6]
    right_top_point = right_top[0]
    left_top_point = left_bottom[0]
    right_bottom_point = right_top[6]
    print(left_bottom_point,right_bottom_point, right_top_point, left_top_point)
    pts_dst = np.array([left_bottom_point,right_bottom_point, right_top_point, left_top_point])
    width, height = 610, 1264  # Adjustment of units to actual conditions
    pts_src = np.array([[0, 0], [610, 0], [610, 1264], [0, 1264]])   
    M = cv2.getPerspectiveTransform(pts_src.astype(np.float32), pts_dst.astype(np.float32))

    return M

def Compute_transform_matrix_double(left_bottom, right_top):
    """
    Calculate the perspective transformation matrix
    Pick four points in the second half
    """
    left_bottom_point = left_bottom[0]
    right_top_point = right_top[3]
    left_top_point = left_bottom[3]
    right_bottom_point = right_top[0]

    pts_dst = np.array([left_bottom_point,right_bottom_point, right_top_point, left_top_point])
    width, height = 610, 856  # Adjustment of units to actual conditions
    pts_src = np.array([[0, 0], [width, 0], [width, height], [0, height]])   
    M = cv2.getPerspectiveTransform(pts_src.astype(np.float32), pts_dst.astype(np.float32))
    return M

def compute_points(M,img):
    '''
    Calculating Badminton Court Corner Points
    '''
    court_length = 1340
    court_width_doubles = 610
    point_transformed_list = []
    points_doubles = [[0,0],[0,court_length],[court_width_doubles,court_length],[court_width_doubles,0]]
    points_singles = [[42,72],[568,72],[568,1268],[42,1268]]
    point_transformed_list_1 = []
    for point in points_doubles:
        point = np.array([point[0], point[1]], dtype=np.float32)
        point_homogeneous = np.array([point[0], point[1], 1], dtype=np.float32)
        point_transformed = M.dot(point_homogeneous)
        point_transformed = point_transformed / point_transformed[2]
        point_transformed_dot = [int(point_transformed[0]), int(point_transformed[1])]
        point_transformed_list_1.append(point_transformed_dot)
        cv2.circle(img, point_transformed_dot, 5, (0, 0, 255), -1)
        
    cv2.drawContours(img, [np.array(point_transformed_list_1, dtype=np.int32).reshape(1, -1, 2)],-1, (0, 255, 0),thickness=5)
    #cv2.fillConvexPoly(img, np.array(point_transformed_list_1, dtype=np.int32), (0, 255, 0))
    point_transformed_list_2 = []
    for point in points_singles:
        point = np.array([point[0], point[1]], dtype=np.float32)
        point_homogeneous = np.array([point[0], point[1], 1], dtype=np.float32)
        point_transformed = M.dot(point_homogeneous)
        point_transformed = point_transformed / point_transformed[2]
        point_transformed_dot = [int(point_transformed[0]), int(point_transformed[1])]
        point_transformed_list_2.append(point_transformed_dot)
        cv2.circle(img, point_transformed_dot, 5, (0, 0, 255), -1)
    print(point_transformed_list_2)
    cv2.drawContours(img, [np.array(point_transformed_list_2, dtype=np.int32).reshape(1, -1, 2)],-1, (255, 0, 0),thickness=5)
    #cv2.fillConvexPoly(img, np.array(point_transformed_list_2, dtype=np.int32), (255, 0, 0))
    point_transformed_list.append(point_transformed_list_1) # out 0
    point_transformed_list.append(point_transformed_list_2) # in 1 
    return img,point_transformed_list


single_points_results = []
double_points_results = []

def testing(corner_counts,image,type):
    sorted_corner_counts = sorted(corner_counts.items(), key=lambda item: item[1], reverse=True)
    sorted_corner_counts_filter = sorted_corner_counts[:30]
    points = []
    img = image.copy()
    for rank, (pos, count) in enumerate(sorted_corner_counts_filter, start=1):
      points.append(pos)
      print(pos)
      cv2.circle(image, pos, 5, (0, 255, 0), -1)
      print(rank, pos, count)


    leftmost_points, rightmost_points,tem_points = classify_and_find_extreme_points(points)
    if type == 2:
        M = Compute_transform_matrix_single(leftmost_points,rightmost_points)
        img,point_transformed_list = compute_points(M,img)
        single_points_results.append(point_transformed_list)
        cv2.imwrite("img_2.jpg",img)
        print(point_transformed_list)
    elif type == 4:
        M = Compute_transform_matrix_double(leftmost_points,rightmost_points)
        img,point_transformed_list = compute_points(M,img)
        double_points_results.append(point_transformed_list)
        cv2.imwrite("img_4.jpg",img)
        print(point_transformed_list)




In [8]:
saved_image_single = []
saved_image_double = []
corner_counts_single = defaultdict(int)
corner_counts_double = defaultdict(int)
def passed_pic_load():
  """
  testing code
  """
  with open(single_passed_path) as single_files:
    for line in single_files:
      line = line.strip()
      image = cv2.imread(line)
      (single_h, single_w) = image.shape[:2]
      #print(single_h,single_w)
      saved_image_single.append(image)
      
  with open(double_passed_path) as double_files:
    for line_2 in double_files:
      line_2 = line_2.strip()
      image_2 = cv2.imread(line_2)
      (double_h,double_w) = image_2.shape[:2]
      saved_image_double.append(image_2)

def task2(saved_image_single,saved_image_double):
  for single_img in saved_image_single:    
    single_data = Find_court_single(single_img,corner_counts_single,2)
    print(single_data)
  for double_img in saved_image_double:
    double_data = Find_court_single(double_img,corner_counts_double,4)
    print(double_data)
  testing(corner_counts_single,saved_image_single[-1],2)
  testing(corner_counts_double,saved_image_double[-1],4)

In [9]:
saved_image_single.clear()
saved_image_double.clear()
passed_pic_load()
task2(saved_image_single,saved_image_double)

4
[(113, 424), (1168, 440), (932, 39), (327, 39)]
[113, 424] [1168, 440] [932, 39] [327, 39]
[[[ 810  680 1068  676]]

 [[ 830  344  886  341]]

 [[ 321  445  331  563]]

 ...

 [[ 363  710  419  676]]

 [[ 251  596  260  682]]

 [[ 464  341  473  435]]]
([113, 709], [932, 324])
4
[(113, 424), (1168, 440), (942, 39), (327, 39)]
[113, 424] [1168, 440] [942, 39] [327, 39]


  corners = np.int0(corners)


[[[601 537 725 535]]

 [[356 370 421 340]]

 [[493 713 645 664]]

 ...

 [[612 338 631 421]]

 [[886 436 933 395]]

 [[793 436 899 344]]]
([113, 709], [942, 324])
4
[(113, 424), (1168, 440), (940, 39), (327, 39)]
[113, 424] [1168, 440] [940, 39] [327, 39]
[[[ 361  357  729  350]]

 [[ 722  435  839  341]]

 [[ 336  714  405  676]]

 ...

 [[ 207  716  229  643]]

 [[ 799  341  802  432]]

 [[ 936  711 1001  672]]]
([113, 709], [940, 324])
4
[(113, 424), (1168, 440), (940, 39), (327, 39)]
[113, 424] [1168, 440] [940, 39] [327, 39]
[[[636 358 913 353]]

 [[287 533 381 340]]

 [[344 714 419 676]]

 ...

 [[796 432 882 378]]

 [[677 712 737 675]]

 [[400 365 705 333]]]
([113, 709], [940, 324])
4
[(113, 424), (1168, 440), (940, 39), (327, 39)]
[113, 424] [1168, 440] [940, 39] [327, 39]
[[[ 891  715 1085  712]]

 [[ 637  542  813  536]]

 [[ 490  435  597  399]]

 ...

 [[ 318  451  335  550]]

 [[ 697  675  703  708]]

 [[ 620  337  634  417]]]
([113, 709], [940, 324])
4
[(113, 424), (1168,

# TASK 3
1. **'roi_get' Function Definition**: This function is designed to meticulously extract a specific region of interest (ROI) from the input image, concurrently capturing the background of the specified region. This dual capability facilitates focused analysis or manipulation of the ROI while maintaining context with the surrounding area.

2. **Image Target Detection and Segmentation**: This process leverages the advanced capabilities of the Ultralytics YOLOv8 model, a cutting-edge iteration in the YOLO model series. YOLOv8 distinguishes itself by offering unparalleled accuracy and processing speed in detecting and segmenting targets within images. Its deployment marks a significant advancement in the field of object detection.

4. **Object Detection with YOLOv8:** Utilizing YOLOv8, this task involves the precise detection of objects within an image, followed by the annotation of these objects with bounding boxes and labels. This approach not only identifies objects but also provides a visual representation of their location and classification, enhancing the interpretability of the detection results.

Output Files:
- *result2*
- *result4*

In [10]:
def roi_get(img):
  mask = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8)
  points = np.array([[99, 717], [307, 193], [1023, 213], [1218, 715]])
  cv2.fillPoly(mask, [points], 255)
  roi = cv2.bitwise_and(img, img, mask=mask)
  mask_inverted = cv2.bitwise_not(mask)
  background = cv2.bitwise_and(img, img, mask=mask_inverted)
  return roi,background

In [11]:
from ultralytics import YOLO
from PIL import Image

# Load a model
model = YOLO('yolov8n.pt')  # load an official model
model = YOLO('yolov8n-seg.pt')  # Load an official Segment model

# Predict with the model
# results = model('https://ultralytics.com/images/bus.jpg')  # predict on an image

In [12]:
def task3(images,type):

    font = cv2.FONT_HERSHEY_SIMPLEX
    font_scale = 0.5
    font_thickness = 2
    j = 0
    for image in images:
        image_track,back = roi_get(image)

        results = model(image_track,conf=0.4)
        #testing_image.append(image)
        for r in results:
            img = r.plot(show = False)
            print("start")
            print(r.boxes.conf)
            print(r.boxes.cls)
            print(r.boxes.xywh)
            print("end")
            for i in range(len(r.boxes.conf)):
                if r.boxes.conf[i] > 0.5:
                    if r.boxes.cls[i] == 0:
                            x, y, w, h = r.boxes.xywh[i]
                            x = int(x)
                            y = int(y)
                            # class_name = r.boxes.cls[i]
                            class_name = "person"
                            x_min = int(x - w / 2)
                            y_min = int(y - h / 2)
                            x_max = int(x + w / 2)
                            y_max = int(y + h / 2)
                            
                            cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
                            
                            text = f'{class_name} ({x}, {y})'
                            
                            (text_width, text_height), baseline = cv2.getTextSize(text, font, font_scale, font_thickness)
                            
                            text_x = x_min
                            text_y = y_min - text_height - baseline
                            
                            cv2.rectangle(image, (text_x, text_y + baseline), (text_x + text_width, text_y - text_height),(0, 0, 255), cv2.FILLED)
                            
                            cv2.putText(image, text, (text_x, text_y), font, font_scale,  (255, 255, 255) , font_thickness)
                # merged_image = cv2.add(back, r.orig_img)

                cv2.imshow("result", image)
                cv2.waitKey(1)
        if j % 10 == 0:
            cv2.imwrite("result"+str(type)+"/result" + str(j) + ".jpg", image)
        j= j+1

In [13]:
saved_image_single.clear()
saved_image_double.clear()
passed_pic_load()
task3(saved_image_double, 4)
task3(saved_image_single, 2)


0: 384x640 3 persons, 347.9ms
Speed: 51.5ms preprocess, 347.9ms inference, 20.6ms postprocess per image at shape (1, 3, 384, 640)
start
tensor([0.8093, 0.7996, 0.4893])
tensor([0., 0., 0.])
tensor([[617.2498, 470.6701,  66.4292, 211.7120],
        [717.1749, 432.5284,  62.7539, 179.5266],
        [652.3661, 295.2518,  46.3794, 143.5387]])
end

0: 384x640 4 persons, 174.7ms
Speed: 2.0ms preprocess, 174.7ms inference, 7.9ms postprocess per image at shape (1, 3, 384, 640)
start
tensor([0.8365, 0.7850, 0.5748, 0.5445])
tensor([0., 0., 0., 0.])
tensor([[716.4196, 432.7372,  64.1096, 180.3141],
        [619.3293, 470.0941,  69.7596, 213.0625],
        [616.4698, 329.8485,  54.0104, 130.6447],
        [650.7179, 293.7844,  48.8209, 140.6550]])
end

0: 384x640 4 persons, 163.3ms
Speed: 2.0ms preprocess, 163.3ms inference, 9.0ms postprocess per image at shape (1, 3, 384, 640)
start
tensor([0.8470, 0.7818, 0.5956, 0.5357])
tensor([0., 0., 0., 0.])
tensor([[716.3541, 433.8450,  64.9688, 180.8326

# TASK 4
1. **OpenCV Functionality Encapsulation**: This step involves the definition of classes specifically designed to encapsulate the functionality of OpenCV for video file creation and writing. These classes are structured to provide a streamlined and efficient approach to handling video data, enabling seamless integration of OpenCV's video processing capabilities into broader applications.

2. **Image Series Processing for Target Tracking**: In this process, a sequence of images (constituting video frames) is subjected to analysis using advanced deep learning models for the purpose of target tracking. The procedure entails the dynamic tracking of targets across frames, leveraging the models' capabilities to accurately identify and follow objects. The outcomes of this tracking are meticulously recorded to a video file, encapsulating the tracking results in a visual format that facilitates easy review and analysis.

Output Files:
- *output2.avi*
- *output4.avi*

In [14]:
class VideoRecorder:
    def __init__(self, output_path, frame_size, fps=30):
        fourcc = cv2.VideoWriter_fourcc(*'XVID')
        self.output_path = output_path
        self.frame_size = frame_size
        self.fps = fps
        self.out = cv2.VideoWriter(output_path, fourcc, fps, frame_size)

    def add_frame(self, frame):
        self.out.write(frame)

    def release(self):
        self.out.release()
        print(f"The video has been saved to {self.output_path}")

In [15]:
def iamge_track(saved_image,type):
  frame_size = (1280, 720)
  recorder = VideoRecorder('output'+str(type)+'.avi', frame_size)
  for image_track in saved_image:
      image_track,back = roi_get(image_track)
      results = model.track(image_track,conf = 0.29)

        # Visualize the results on the frame
      annotated_frame = results[0].plot()
      merged_image = cv2.add(annotated_frame,back)

      image_rgb = cv2.cvtColor(merged_image, cv2.COLOR_BGR2RGB)
      if type == 4:
         print
         cv2.drawContours(merged_image, [np.array(double_points_results[0][0], dtype=np.int32).reshape(1, -1, 2)],-1, (0, 255, 0),thickness=5)
         cv2.drawContours(merged_image, [np.array(double_points_results[0][1], dtype=np.int32).reshape(1, -1, 2)],-1, (255, 0, 0),thickness=5)
      else:
         
         cv2.drawContours(merged_image, [np.array(single_points_results[0][0], dtype=np.int32).reshape(1, -1, 2)],-1, (0, 255, 0),thickness=5)
         cv2.drawContours(merged_image, [np.array(single_points_results[0][1], dtype=np.int32).reshape(1, -1, 2)],-1, (255, 0, 0),thickness=5)

      # image_pil = Image.fromarray(image_rgb)
      
      recorder.add_frame(merged_image)

      #image_pil.show()
      cv2.imshow("Frame", merged_image)
      if cv2.waitKey(25) & 0xFF == ord('q'):
          break
  recorder.release()
  cv2.destroyAllWindows()

In [16]:
saved_image_single.clear()
saved_image_double.clear()
passed_pic_load()
iamge_track(saved_image_double, 4)
iamge_track(saved_image_single, 2)


0: 384x640 2 persons, 265.8ms
Speed: 7.0ms preprocess, 265.8ms inference, 9.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 290.2ms
Speed: 7.0ms preprocess, 290.2ms inference, 12.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 201.4ms
Speed: 4.0ms preprocess, 201.4ms inference, 11.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 198.7ms
Speed: 2.0ms preprocess, 198.7ms inference, 24.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 213.3ms
Speed: 3.6ms preprocess, 213.3ms inference, 14.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 203.9ms
Speed: 3.0ms preprocess, 203.9ms inference, 13.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 212.4ms
Speed: 3.0ms preprocess, 212.4ms inference, 10.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 183.4ms
Speed: 3.4ms preprocess, 183.4ms inference, 8.7ms postproces