Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made inference faster (this is especially useful when using Yolo9000) #8009

Merged
merged 3 commits into from
Aug 25, 2021
Merged

Made inference faster (this is especially useful when using Yolo9000) #8009

merged 3 commits into from
Aug 25, 2021

Conversation

sarimmehdi
Copy link

If you look at remove_negatives in darknet.py, it gives you the final detections for an image after going through a nested loop. The outer loop iterates through all the output detection objects given by the neural net and then the inner loop iterates through all the class names (since one of the outputs of the neural net is an array of probabilities whose size is equal to the number of all classes). This is not much of an issue if you have just 80 class names (the original COCO). But, with YOLO9000, you have 9418 class names. So, iterating through almost 9k names for each detection object causes a significant slowdown.

In network.c, I keep track of the class index that gives the highest probability value. Then, in remove_negatives_faster, instead of iterating through all class indices (9418 in the case of YOLO9000), I just grab the index with the highest probability value (called the best_class_idx). Furthermore, the nms method provided is quite slow. So, I decided to provide an extra function that can do nms in python but much faster.

You will notice that all the changes in this fork have already been provided in this issue. The person who opened that issue is actually me but I was using my work account. I will post in that comment action to prove it. On my public LinkedIn, I always advertise my personal Github account, hence why I am using it to make this commit and not my work account.

custom_get_region_detections function now keeps track of class index with the highest probability.
Added best_class_idx to detection struct
added python code for faster negative removal and also faster non-max suppression
@ghost
Copy link

ghost commented Aug 17, 2021

I am adding this comment to prove that my work account (this) and the personal account that made this commit are the same.

@AlexeyAB
Copy link
Owner

@sarimmehdi @iAmJuan550 Hi, Thanks!

  • Can you provide some benchmark results, how many FPS was before and after this fix for YOLO9000 on RTX 2080 Super?

  • Shouldn't we change remove_negatives() to remove_negatives_faster()?

    predictions = remove_negatives(detections, class_names, num)

  • Is non_max_suppression_fast() in Python faster than do_nms_sort() in C?

@sarimmehdi
Copy link
Author

I will provide answers to all these tomorrow when I go to my workplace

@ghost
Copy link

ghost commented Aug 18, 2021

@AlexeyAB I have provided the benchmarks below (NVIDIA Corporation TU104 [GeForce RTX 2080 SUPER], Intel® Core™ i9-10900F CPU @ 2.80GHz × 20, 31.2 GB Memory):

  1. Benchmark results. Here is the code I use with remove_negatives_faster() and non_max_suppression_fast() (BATCH_SIZE is 1 because otherwise, I get CUDA out of space error which is something to be looked into):
BATCH_SIZE = 1
network, CLASS_LABELS[:], class_colors = darknet.load_network(
    'cfg/yolo9000.cfg',
    'cfg/combine9k.data',
    'weights/yolo9000.weights',
    batch_size=BATCH_SIZE
)
i = 0
while i < len(img_files):
    images = [cv2.imread(os.path.join(root_dir, 'det_imgs', img_files[j]))
              for j in range(i, min(i+BATCH_SIZE, len(img_files)))]
    image_height, image_width, _ = check_batch_shape(images, BATCH_SIZE)
    darknet_images = prepare_batch(images, network)

    batch_detections = darknet.network_predict_batch(network, darknet_images, BATCH_SIZE, image_width,
                                                     image_height, 0.5, 0.5, None, 0, 0)
    batch_predictions = []
    for index in range(BATCH_SIZE):
        num = batch_detections[index].num
        detections = batch_detections[index].dets
        if not detections:
            continue
        start_time = time.time()
        predictions = darknet.remove_negatives_faster(detections, CLASS_LABELS, num)
        print('NUMBER OF DETECTIONS: ' + str(num))
        print('TIME TAKEN TO DETECT (SECONDS): ' + str(time.time() - start_time))
        if not predictions:
            continue
        start_time = time.time()
        detections = non_max_suppression_fast(predictions, 0.5)
        print('TIME TAKEN TO DO NMS (SECONDS): ' + str(time.time() - start_time))

Results on 6 images:

NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 0.00018095970153808594
TIME TAKEN TO DO NMS (SECONDS): 0.000225067138671875
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 9.870529174804688e-05
TIME TAKEN TO DO NMS (SECONDS): 0.00013780593872070312
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 9.465217590332031e-05
TIME TAKEN TO DO NMS (SECONDS): 0.0001399517059326172
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 9.799003601074219e-05
TIME TAKEN TO DO NMS (SECONDS): 0.0001380443572998047
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 0.00013589859008789062
TIME TAKEN TO DO NMS (SECONDS): 0.00017452239990234375
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 0.00013828277587890625
TIME TAKEN TO DO NMS (SECONDS): 0.00021123886108398438

Here is the same code but now with remove_negatives() and do_nms_sort():

BATCH_SIZE = 1
network, CLASS_LABELS[:], class_colors = darknet.load_network(
    'cfg/yolo9000.cfg',
    'cfg/combine9k.data',
    'weights/yolo9000.weights',
    batch_size=BATCH_SIZE
)
i = 0
while i < len(img_files):
    images = [cv2.imread(os.path.join(root_dir, 'det_imgs', img_files[j]))
              for j in range(i, min(i+BATCH_SIZE, len(img_files)))]
    image_height, image_width, _ = check_batch_shape(images, BATCH_SIZE)
    darknet_images = prepare_batch(images, network)

    batch_detections = darknet.network_predict_batch(network, darknet_images, BATCH_SIZE, image_width,
                                                     image_height, 0.5, 0.5, None, 0, 0)
    batch_predictions = []
    for index in range(BATCH_SIZE):
        num = batch_detections[index].num
        detections = batch_detections[index].dets
        if not detections:
            continue
        start_time = time.time()
        predictions = darknet.remove_negatives(detections, CLASS_LABELS, num)
        print('NUMBER OF DETECTIONS: ' + str(num))
        print('TIME TAKEN TO DETECT (SECONDS): ' + str(time.time() - start_time))
        if not predictions:
            continue
        start_time = time.time()
        darknet.do_nms_sort(detections, num, len(CLASS_LABELS), True)
        print('TIME TAKEN TO DO NMS (SECONDS): ' + str(time.time() - start_time))

Results:

NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 1.1554949283599854
TIME TAKEN TO DO NMS (SECONDS): 0.18290448188781738
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 1.1924941539764404
TIME TAKEN TO DO NMS (SECONDS): 0.19454574584960938
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 1.1662535667419434
TIME TAKEN TO DO NMS (SECONDS): 0.181380033493042
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 1.1705167293548584
TIME TAKEN TO DO NMS (SECONDS): 0.18292856216430664
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 1.1798851490020752
TIME TAKEN TO DO NMS (SECONDS): 0.18198132514953613
NUMBER OF DETECTIONS: 867
TIME TAKEN TO DETECT (SECONDS): 1.1416120529174805
TIME TAKEN TO DO NMS (SECONDS): 0.18628549575805664
  1. I decided not to edit the detect_image() function in darknet.py just in case someone might still want to use the original remove_negatives()

@AlexeyAB AlexeyAB merged commit 9ff8653 into AlexeyAB:master Aug 25, 2021
juhong-rdv pushed a commit to juhong-rdv/darknet that referenced this pull request Dec 18, 2021
…AlexeyAB#8009)

* Update network.c

custom_get_region_detections function now keeps track of class index with the highest probability.

* Update darknet.h

Added best_class_idx to detection struct

* Update darknet.py

added python code for faster negative removal and also faster non-max suppression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants