Determining focal length
The focal length 𝑓 can be determined experimentally by following these steps:

Object Measurement:

Select an object of known size. For example, take a ruler or box with a known width 𝑊 (in real units like centimeters).
Place this object at a known distance 𝐷 from the camera (for example, 100 cm).

Shooting an image:

Take an image of this object using your camera.
Picture Measurement:

Measure the width of the object in the image in pixels 𝑃.
Calculation of focal length:

Use the formula to calculate the focal length:


𝑓 = (𝑃 ⋅𝐷) / 𝑊
 
Example
Let's say you select an object that is 20 cm wide and place it 100 cm away from the camera. In the image, the width of the object is 200 pixels. Then the focal length can be calculated as follows:

𝑓 = (200pixels ⋅ 100cm) / 20cm = 1000pixels

Using Focal Length
Once you have determined the focal length, you can use it to calculate the distance to any object of known size in the image. In your case, you can use the following formula to calculate the distance to a person:

𝐷 = (𝑊 ⋅𝑓) / 𝑃
Where:

𝐷 - distance to object
𝑊 - real width of the object (person) in centimeters
𝑓 - focal length in pixels
𝑃 - width of the object in the image in pixels

In [1]:
import cv2
from ultralytics import YOLO



In [2]:
# main videofile 
cap = cv2.VideoCapture("/Users/maxkucher/data_handling/yolo_destination/road_3.mp4")
ret, frame = cap.read()

# # we are going to define focus destination based on 'car' object from this video
cap_temp = cv2.VideoCapture("/Users/maxkucher/data_handling/yolo_destination/road_2.mp4")
t_ret, t_frame = cap_temp.read()

model = YOLO("yolov8l")

names = model.names
threshold = 0.5

# real width of objects (for road_2)
real_width_car = 400
real_width_person = 50


# known distance between camera and 'car' for cap_temp
known_distance = 450

# define f based on 'auto'
pre_results =  model(t_frame)[0]

focal_length = None

for result in pre_results.boxes.data.tolist():
    x1, _, x2, _, score, class_id = result
    name = names[int(class_id)]
    if score > threshold and name == "car":
        # define width in pixels of "car"
        width_in_pixels = x2 - x1
        # f = P * D / W
        focal_length = (width_in_pixels * known_distance) / real_width_car
        break

if focal_length is None:
    print("Focus destination is not detected.")
    exit()

# run main video with defined destination 
while ret:

    results = model(frame)[0]

    for result in results.boxes.data.tolist():
          obj_list = ["car", "person"]
          x1, y1, x2, y2, score, class_id = result
          name = names[int(class_id)]
          if score > threshold and name in obj_list:
                
            
            width_in_pixels = x2 - x1

            if name == "car":
                  real_width = real_width_car
            elif name == "person":
                  real_width = real_width_person
            
            # D = (W * f) / P - and turn to meters 
            distance = ((real_width * focal_length) / width_in_pixels) // 100



            cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)
            text = f"{name}: {distance:.2f} m"
            cv2.putText(frame, text, (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    cv2.imshow("Video", frame)
    ret, frame = cap.read()
    if cv2.waitKey(1) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()


0: 384x640 1 person, 2 cars, 1 traffic light, 373.7ms
Speed: 3.8ms preprocess, 373.7ms inference, 590.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 9 cars, 443.4ms
Speed: 1.9ms preprocess, 443.4ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 cars, 350.0ms
Speed: 22.4ms preprocess, 350.0ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 cars, 343.7ms
Speed: 1.4ms preprocess, 343.7ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 cars, 343.1ms
Speed: 2.1ms preprocess, 343.1ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 cars, 345.9ms
Speed: 1.9ms preprocess, 345.9ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 cars, 342.9ms
Speed: 6.2ms preprocess, 342.9ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 cars, 335.9ms
Speed: 1.6ms preprocess, 335.9ms inference, 0.4ms postproces