How to compute mAP of tiny yolo on VOC2007-test #350

szm-R · 2018-01-25T13:50:15Z

Hi everyone,
The title says everything, I want to compute the mAP of tiny yolo on VOC2007-test, I have written a cpp code for this and get 39.78% for mAP whereas pjreddie reports 57.1% mAP on VOC2007-test.
I first downloaded the weights using:
wget https://pjreddie.com/media/files/tiny-yolo-voc.weights

Then performed detection with:
./darknet -i 0 detector valid cfg/voc.data cfg/tiny-yolo-voc.cfg models/tiny-yolo-voc.weights

I just changed detector.c code to save the results in a different format that was easier for me to read in my code.

I then count all the TPs and FPs (in all classes) and compute Precision-Recall for 11 thresholds (from 0 to 1) and then the AP (with the formula mentioed in Pascal VOC paper). Here is the PR curve I get:

My true purpose is to write a code to compute the AP for the model trained on my own custom data, but in order to verify it I am testing on a pretrained tiny yolo.

Thanks in advance for your help.

AlexeyAB · 2018-01-25T14:23:30Z

@szm2015 Hi,

Did you try to use https://github.com/AlexeyAB/darknet/blob/master/scripts/voc_eval.py and compare with your results?
What validation dataset did you use, is it voc/2007_test.txt?
Did you use such approach in your C-code?

mAP = AVG(AP for each object class)
AP = AVG(Precision for each of 11 Recalls {0, 0.1, ..., 1})
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
TP = number of detections with IoU>0.5
FP = number of detections with IoU<=0.5 or detected more than once
FN = number of objects that not detected or detected with IoU<=0.5

szm-R · 2018-01-25T18:35:03Z

1- I will look into it and report, thank you.
2. Yes.
3.
The number I reported was the AP taken over all classes, meaning that for every threshold I summed all TPs, FPs, and FNs from all classes (all counted the same as you mentioned) and then computed the overall Precision and Recall for every threshold and then computed the AP (also computed as you mentioned, for the points with no recall, like the recalls more than 60% in the image, I considered the precision 0).

Now I tried what you said about computing AP for every class and then averaging over them to get the mAP, here are the results:

AP of class "aeroplane": 44.0196
AP of class "bicycle": 47.6694
AP of class "bird": 28.4797
AP of class "boat": 20.7133
AP of class "bottle": 10.0164
AP of class "bus": 48.7149
AP of class "car": 48.0142
AP of class "cat": 49.9185
AP of class "chair": 14.9642
AP of class "cow": 38.3175
AP of class "diningtable": 34.7925
AP of class "dog": 41.3822
AP of class "horse": 48.7818
AP of class "motorbike": 37.5573
AP of class "person": 42.4456
AP of class "pottedplant": 15.4256
AP of class "sheep": 38.9875
AP of class "sofa": 19.6689
AP of class "train": 50.1913
AP of class "tvmonitor": 43.5746
mean Average Precision: 36.1818

Now it's even less!

AlexeyAB · 2018-01-26T09:59:07Z

@szm2015 Can you show your C-code for mAP?

szm-R · 2018-01-26T10:37:12Z

Hello again, I attached the code. It's a Qt project (I use the ui for plotting). Here's an overall explanation:

In lines 57 to 112, there's a for loop on 11 thresholds (from 0 to 1). Inside this (line 63 to 108) is a loop over txt prediction files (which I also attached). In this loop the detections that have a score above the threshold are stored in cv::Rect objects (along with their scores and labels). then functions "FillEvaluationsMatrix" evaluates the predictions against the ground truth and fills a confusion matrix which is initialized at the beginning of threshold loop.

Outside the predictions loop (still inside threshold loop), the TP and FP values are computed using the confusion matrix in "finalEval" function (I count the total objects in every class in ground truth labels and use that for TP+FN value in recall denominator). This function computes precision and recall and saves them in a matrix (named PRpairs) that has 20 rows (number of classes) and 11 columns (number of thresholds), this way each class has PRpair for every threshold at the end of the loop.

Finally "ComputeAPs" function computes the APs of every class using the PRpairs calculated before and averages them to get mAP.

Detections.zip

YoloPRcurve.zip

MiZhangWhuer · 2018-01-28T06:23:03Z

@szm2015 How about the issues now? I also have the problem w.r.t PR curve. I wonder why the recall (x axis) is 60 rather than 100?

szm-R · 2018-01-29T12:00:54Z

Hello everyone, I haven't had the time to work on this matter for a while. just now I was checking voc_eval,py and came across these lines:

if ovmax > ovthresh: if not R['difficult'][jmax]: if not R['det'][jmax]: tp[d] = 1. R['det'][jmax] = 1 else: fp[d] = 1. else: fp[d] = 1.

It seems that detections are only counted if their ground truth is not "difficult" and also if not R['det'][jmax]. I haven't considered any of these in my code, though I have no idea what the second one is! I would appreciate any clarifications!

AlexeyAB · 2018-01-29T14:16:50Z

R['det'][jmax] - this flag means that this ground-truth already detected, so re-detection of the same object is a False-Positive - fp[d] = 1.
Yes, ground-truths (objects) with parameter difficult=1 in the PascalVOC annotation .xml-file - count neither as false positives or negatives:
- Difficult Not scored in evaluation, page 4: http://www.frontiersincomputervision.com/slides/FCV_Dataset_Zisserman_2.pdf
- – difficult: labels objects which are particularly difficult to detect due to small size, illumination, image quality or the need to use significant contextual information. In the challenge evaluation, such objects are discarded, although no penalty is incurred for detecting them. The aim of this annotation is to maintain a reasonable level of difficulty while not contaminating the evaluation with many near-unrecognisable examples. page 7, The PASCAL Visual Object Classes (VOC) Challenge: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.5766&rep=rep1&type=pdf
- We followed a corresponding procedure to that for the classification task, creating a series of test sets in which all objects smaller than a threshold area were removed from consideration. For the detection task, this was done by adding a “difficult” annotation to such objects, so that they count neither as false positives or negatives. page 28, The PASCAL Visual Object Classes (VOC) Challenge: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.5766&rep=rep1&type=pdf

This code is taken from the repository of the author of Faster-RCNN detector: https://github.com/rbgirshick/py-faster-rcnn/blob/781a917b378dbfdedb45b6a56189a31982da1b43/lib/datasets/voc_eval.py#L177-L189

            overlaps = inters / uni
            ovmax = np.max(overlaps)
            jmax = np.argmax(overlaps)

        if ovmax > ovthresh:
            if not R['difficult'][jmax]:
                if not R['det'][jmax]:
                    tp[d] = 1.
                    R['det'][jmax] = 1
                else:
                    fp[d] = 1.
        else:
            fp[d] = 1.

Where:

overlaps is an array of IoUs - Intersects of Unions
ovmax is the maximum IoU
ovthresh is constant value of IoU-threshold = 0.5
R['difficult'][jmax] - flag that this ground-truth is difficult
R['det'][jmax] - flag that this ground-truth already detected
tp - true positive flag
fp - false positive flag

So if ground-truth is difficult - then this ground-truth is not taken into account in Precision (neither in true-positive, nor in false-positive).
If the the ovmax > 0.5 re-detected again - then this is false-positive fp.

10.1.1.157.5766.pdf

szm-R · 2018-01-29T15:36:48Z

Thank you @AlexeyAB for your complete explanation. I do something exactly like checking R['det'][jmax] in my code. I added the "difficult" checking part, but for some reason, the AP got even worse, I should check it more, but meanwhile, can you point me to the exact procedure of evaluating with voc_eval.py? Most important of which is the command line code to get the detection results as there are several validation functions in detector.c as far as I have understood.

MiZhangWhuer · 2018-01-29T16:19:04Z

Hi @szm2015 @AlexeyAB , I also plot the PR curve following the link https://github.com/D-X-Y/caffe-faster-rcnn/blob/dev/examples/FRCNN/calculate_voc_ap.py , and print the P-R values before mAP is computed.
1. But I wonder why the precision value do not approximate to zero ? Attachments are my predicted bounding boxes file (PredBBoxes.txt) and corresponding ground truth bounding boxes file (GTBBoxes.txt). Note that all the bounding boxes is not the "difficult" type.
2. The P-R curve seems correct when I adding the following code (see attachment: line 251 in calculate_ap.py.txt ) to normalize the recall values:
rec = (rec - rec.min())/(rec.max() - rec.min())

I would be much appreciate if all of you can help me to solve the problems above. And hope further discussions on the P-R curve issues.

GTBBoxes.txt
PredBBoxes.txt
calculate_ap.py.txt

szm-R · 2018-02-12T07:02:48Z

Hi @MiZhangWhuer , as I myself am still struggling with this issues I can't be of much help to you! But hopefully, if I could solve it, I would share my results.

Now @AlexeyAB , I still don't know how to run voc_eval.py . My python knowledge is really rusty! I first created the detection results using the following command:
./darknet -i 0 detector valid cfg/voc.data cfg/tiny-yolo-voc.cfg models/tiny-yolo-voc.weights

(Note that I'm using pjreddie version of darknet)

Then I had my detection results in a folder named voc in results directory with the following format:
className.txt

Now I added these lines at the end of voc_eval.py code to be able to run it (told you my python is rusty!!!):

print "Starting here"
detpath = "/path/to/results/voc/"
annopath = "/path/to/data/voc/VOC2007/Annotations/"
imagesetfile = "/path/to/data/voc/2007_test_FileNames.txt"
classname = "/path/to/data/voc/voc.names"
cachedir = "/path/to/data/voc/VOC2007/cache/"
ovthresh = 0.7
use_07_metric = True 
voc_eval(detpath, annopath, imagesetfile, classname, cachedir, ovthresh, use_07_metric)

But detpath and others seem to be something other that simple addresses, because running the code gives me the following error:

Traceback (most recent call last):
  File "voc_eval.py", line 211, in <module>
    voc_eval(detpath, annopath, imagesetfile, classname, cachedir, ovthresh, use_07_metric)
  File "voc_eval.py", line 137, in voc_eval
    with open(detfile, 'r') as f:
IOError: [Errno 21] Is a directory: '/home/szm/Work/Research/Models_and_Codes/darknet/darknet_GPU/results/voc/'

Can you please tell me how should I path these arguments to voc_eval.py?

szm-R · 2018-02-13T12:36:50Z

Hi everyone,
I figured out my last question now I run the voc_eval.py by adding the following lines at the end:

detpath = '/path/to/darknet/darknet_GPU/results/voc/{}.txt'
annopath = '/path/to/darknet/darknet_GPU/data/voc/VOC2007/Annotations/{}.xml'
imagesetfile = '/path/to/darknet/darknet_GPU/data/voc/2007_test_FileNames.txt'
cachedir = '/path/to/darknet/darknet_GPU/data/voc/VOC2007/cache/'
ovthresh = 0.7
use_07_metric = True 
classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
for classname in classes:
    rec, prec, ap = voc_eval(detpath, annopath, imagesetfile, classname, cachedir, ovthresh, use_07_metric)
    print "ClassName: %s AveragePrecision: %f" % (classname, ap)

Now I have a more fundamental question, In this code, we just hand the previously generated detection files to the evaluation function and in that we just calculate one pair of recall and precision for every class (as far as I have understood) and then calculate AP. Shouldn't there be some kind of a loop over different score thresholds (applied on the "confidence") to give us the precision-recall curve (to use for AP calculation)?

AlexeyAB · 2018-02-13T19:21:52Z

@szm2015 In your code use_07_metric = True.
So if use_07_metric is true, voc_ap-function uses the VOC 11 point method - this is for mAP.
You get from voc_eval function 3 params:

rec
prec

ap

darknet/scripts/voc_eval.py

Lines 198 to 200 in 9c84764

    
           ap = voc_ap(rec, prec, use_07_metric) 
        
           return rec, prec, ap

In the file reval_voc.py calculated mAP:

darknet/scripts/reval_voc.py

Lines 71 to 75 in 9c84764

    
               aps += [ap] 
        
               print('AP for {} = {:.4f}'.format(cls, ap)) 
        
               with open(os.path.join(output_dir, cls + '_pr.pkl'), 'w') as f: 
        
                   cPickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f) 
        
           print('Mean AP = {:.4f}'.format(np.mean(aps)))

Also:

Should be ovthresh = 0.5 instead of ovthresh = 0.7 for Pascal VOC evaluation.
If you want to run script from the command line - then try to use this script (it calls the voc_eval-function): https://github.com/AlexeyAB/darknet/blob/master/scripts/reval_voc.py

mAP calculation - 11 point method for PascalVOC:

darknet/scripts/voc_eval.py

Lines 31 to 45 in 9c84764

    
           def voc_ap(rec, prec, use_07_metric=False): 
        
               """ ap = voc_ap(rec, prec, [use_07_metric]) 
        
               Compute VOC AP given precision and recall. 
        
               If use_07_metric is true, uses the 
        
               VOC 07 11 point method (default:False). 
        
               """ 
        
               if use_07_metric: 
        
                   # 11 point metric 
        
                   ap = 0. 
        
                   for t in np.arange(0., 1.1, 0.1): 
        
                       if np.sum(rec >= t) == 0: 
        
                           p = 0 
        
                       else: 
        
                           p = np.max(prec[rec >= t]) 
        
                       ap = ap + p / 11.

AlexeyAB · 2018-02-13T21:39:50Z

@szm2015 @MiZhangWhuer I just added cmd-file for windows to calculate mAP. I got 56.6% for Tiny-Yolo 416x416 on PascalVOC 2007 test, that a little bit less than 57.1% that stated on the site: https://pjreddie.com/darknet/yolo/

If you use Windows and Python >= 3.5:

Just put 2007_test.txt file (that you got from VOCdataset) to the path /data/voc/
Comment line for yolo-voc.cfg and un-comment line for tiny-yolo-voc.cfg: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/compute_mAP.cmd
Set the correct path to the VOCdevkit instead of --voc_dir E:\VOC2007_2012\VOCtrainval_11-May-2012\VOCdevkit https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/compute_mAP.cmd
Run compute_mAP.cmd

If you use Python 2.x, then use voc_eval.py and reval_voc.py instead of voc_eval_py3.py and reval_voc_py3.py: https://github.com/AlexeyAB/darknet/tree/master/scripts
If you use Linux - just write your own compute_mAP.sh file in the same way as compute_mAP.cmd

Mean AP = 0.5666
~~~~~~~~
Results:
0.629
0.725
0.487
0.427
0.212
0.678
0.678
0.709
0.353
0.546
0.581
0.628
0.710
0.697
0.604
0.283
0.559
0.524
0.712
0.590
0.567
~~~~~~~~

--------------------------------------------------------------
Results computed with the **unofficial** Python eval code.
Results should be very close to the official MATLAB eval code.
-- Thanks, The Management
--------------------------------------------------------------

szm-R · 2018-02-14T12:48:22Z

Thank you @AlexeyAB, after digging a little more into the code I found what I've been missing, the fact that predicted bounding boxes are ranked according to their confidence scores and then the recall/precision is computed for every one of these ranks. What I myself have been doing was to consider a number of thresholds (say 20) and then compute the PR pair for each one of them (by omitting the predictions with confidence scores below the threshold in each turn) and then calculating AP from these PR pairs. I still don't know why this way of computing AP gives such a drastically wrong result, but for now, I will stick to your code. Thanks again.

AlexeyAB · 2018-02-15T12:58:47Z

@szm2015 @MiZhangWhuer

I added C-code for calculation mAP (mean average precision) using Darknet for VOCdataset and any your custom dataset. Just use command: darknet.exe detector map data/voc.data tiny-yolo-voc.cfg tiny-yolo-voc.weights
where in the voc.data file should be stated validation dataset valid=2007_test.txt

But my implementation shows lower value than reval_voc.py + voc_eval.py. If you will find error in my code and can fix it, let me know about it:

darknet/src/detector.c

Line 498 in a1af57d

void validate_detector_map(char *datacfg, char *cfgfile, char *weightfile)

I don't check difficult of ground truth as it does voc_eval.py, but as I see voc_label.py remove difficult objects already on labeling stage:

darknet/scripts/voc_label.py

Lines 37 to 38 in a1af57d

    
           if cls not in classes or int(difficult) == 1: 
        
               continue

get mAP using Darknet C-code: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/calc_mAP.cmd
get mAP using Python code: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/calc_mAP_voc_py.cmd

For darknet.exe detector map data/voc.data tiny-yolo-voc.cfg tiny-yolo-voc.weights
width=416 height=416
Got mAP = 55.61%
- but using reval_voc.py and voc_eval.py we can get 56.6%: How to compute mAP of tiny yolo on VOC2007-test #350 (comment)
- but on the site 57.1%: https://pjreddie.com/darknet/yolo/

class = 0, name = aeroplane,     ap = 61.01 %
class = 1, name = bicycle,       ap = 71.18 %
class = 2, name = bird,          ap = 47.84 %
class = 3, name = boat,          ap = 40.23 %
class = 4, name = bottle,        ap = 20.88 %
class = 5, name = bus,   ap = 67.68 %
class = 6, name = car,   ap = 66.21 %
class = 7, name = cat,   ap = 70.46 %
class = 8, name = chair,         ap = 33.77 %
class = 9, name = cow,   ap = 54.15 %
class = 10, name = diningtable,          ap = 55.45 %
class = 11, name = dog,          ap = 62.47 %
class = 12, name = horse,        ap = 71.24 %
class = 13, name = motorbike,    ap = 68.72 %
class = 14, name = person,       ap = 59.28 %
class = 15, name = pottedplant,          ap = 27.54 %
class = 16, name = sheep,        ap = 54.45 %
class = 17, name = sofa,         ap = 50.07 %
class = 18, name = train,        ap = 70.83 %
class = 19, name = tvmonitor,    ap = 58.63 %

 mean average precision (mAP) = 0.556050, or 55.61 %
Total Detection Time: 56.000000 Seconds

For darknet.exe detector map data/voc.data yolo-voc.cfg yolo-voc.weights
width=544 height=544
Got mAP = 75.77%
- but using reval_voc.py and voc_eval.py we can get 77.1%: How to compute mAP of tiny yolo on VOC2007-test #350 (comment)
- but on the site 78.6%: https://pjreddie.com/darknet/yolo/

So on the site and in the article stated 78.6% page-4 table-3: https://arxiv.org/pdf/1612.08242v1.pdf
Lower value because:
* My implementation shows some lower value than reval_voc.py+voc_eval.py. If you will find error in my code, let me know about it.
* yolo-voc.weights trained with keeping aspect ratio and with some other modification in the original repo, so you should test it on original repo: https://github.com/pjreddie/darknet

class = 0, name = aeroplane,     ap = 80.84 %
class = 1, name = bicycle,       ap = 84.10 %
class = 2, name = bird,          ap = 75.03 %
class = 3, name = boat,          ap = 65.30 %
class = 4, name = bottle,        ap = 55.22 %
class = 5, name = bus,   ap = 83.66 %
class = 6, name = car,   ap = 84.53 %
class = 7, name = cat,   ap = 88.20 %
class = 8, name = chair,         ap = 58.35 %
class = 9, name = cow,   ap = 80.53 %
class = 10, name = diningtable,          ap = 69.81 %
class = 11, name = dog,          ap = 84.07 %
class = 12, name = horse,        ap = 86.17 %
class = 13, name = motorbike,    ap = 83.33 %
class = 14, name = person,       ap = 78.44 %
class = 15, name = pottedplant,          ap = 50.86 %
class = 16, name = sheep,        ap = 77.36 %
class = 17, name = sofa,         ap = 71.74 %
class = 18, name = train,        ap = 82.96 %
class = 19, name = tvmonitor,    ap = 74.95 %

 mean average precision (mAP) = 0.757728, or 75.77 %
Total Detection Time: 214.000000 Seconds

szm-R · 2018-02-15T13:41:02Z

I think you should still consider difficult objects, because there may be cases where the model detects a difficult object and as that object is not listed as ground truth by voc_label.py, the code will count it as a false positive when it should not, and this will decrease precision. I think this explains the little difference between the mAP of python code and the C code (The python code gets the ground truth directly from xml files)

…

On Thu, Feb 15, 2018 at 4:28 PM, Alexey ***@***.***> wrote: @szm2015 <https://github.com/szm2015> @MiZhangWhuer <https://github.com/mizhangwhuer> I added C-code for calculation mAP (mean average precision) using Darknet for VOCdataset and any your custom dataset. Just use command: darknet.exe detector map data/voc.data tiny-yolo-voc.cfg tiny-yolo-voc.weights where in the voc.data file should be stated validation dataset valid=2007_test.txt *But my implementation shows lower value than reval_voc.py + voc_eval.py. If you will find error in my code and can fix it, let me know about it:* https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752 c8cc124177/src/detector.c#L498 I don't check difficult of ground truth as it does voc_eval.py, but as I see voc_label.py remove difficult objects already on labeling stage: https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752 c8cc124177/scripts/voc_label.py#L37-L38 - get mAP using Darknet C-code: https://github.com/AlexeyAB/ darknet/blob/master/build/darknet/x64/calc_mAP.cmd <https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/calc_mAP.cmd> - get mAP using Python code: https://github.com/AlexeyAB/ darknet/blob/master/build/darknet/x64/calc_mAP_voc_py.cmd <https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/calc_mAP_voc_py.cmd> ------------------------------ - For darknet.exe detector map data/voc.data tiny-yolo-voc.cfg tiny-yolo-voc.weights width=416 height=416 Got *mAP = 55.61%* - but using reval_voc.py and voc_eval.py we can get *56.6%*: #350 (comment) <#350 (comment)> - but on the site *57.1%*: https://pjreddie.com/darknet/yolo/ class = 0, name = aeroplane, ap = 61.01 % class = 1, name = bicycle, ap = 71.18 % class = 2, name = bird, ap = 47.84 % class = 3, name = boat, ap = 40.23 % class = 4, name = bottle, ap = 20.88 % class = 5, name = bus, ap = 67.68 % class = 6, name = car, ap = 66.21 % class = 7, name = cat, ap = 70.46 % class = 8, name = chair, ap = 33.77 % class = 9, name = cow, ap = 54.15 % class = 10, name = diningtable, ap = 55.45 % class = 11, name = dog, ap = 62.47 % class = 12, name = horse, ap = 71.24 % class = 13, name = motorbike, ap = 68.72 % class = 14, name = person, ap = 59.28 % class = 15, name = pottedplant, ap = 27.54 % class = 16, name = sheep, ap = 54.45 % class = 17, name = sofa, ap = 50.07 % class = 18, name = train, ap = 70.83 % class = 19, name = tvmonitor, ap = 58.63 % mean average precision (mAP) = 0.556050, or 55.61 % Total Detection Time: 56.000000 Seconds ------------------------------ - For darknet.exe detector map data/voc.data yolo-voc.cfg yolo-voc.weights width=544 height=544 Got *mAP = 75.77%* - but using reval_voc.py and voc_eval.py we can get *77.1%*: #350 (comment) <#350 (comment)> - but on the site *78.6%*: https://pjreddie.com/darknet/yolo/ So on the site and in the article stated *78.6%* page-4 table-3: https://arxiv.org/pdf/1612.08242v1.pdf Lower value because: * My implementation shows some lower value than reval_voc.py+voc_eval.py. If you will find error in my code, let me know about it. * yolo-voc.weights trained with keeping aspect ratio <#232 (comment)> and with some other modification in the original repo, so you should test it on original repo: https://github.com/pjreddie/darknet class = 0, name = aeroplane, ap = 80.84 % class = 1, name = bicycle, ap = 84.10 % class = 2, name = bird, ap = 75.03 % class = 3, name = boat, ap = 65.30 % class = 4, name = bottle, ap = 55.22 % class = 5, name = bus, ap = 83.66 % class = 6, name = car, ap = 84.53 % class = 7, name = cat, ap = 88.20 % class = 8, name = chair, ap = 58.35 % class = 9, name = cow, ap = 80.53 % class = 10, name = diningtable, ap = 69.81 % class = 11, name = dog, ap = 84.07 % class = 12, name = horse, ap = 86.17 % class = 13, name = motorbike, ap = 83.33 % class = 14, name = person, ap = 78.44 % class = 15, name = pottedplant, ap = 50.86 % class = 16, name = sheep, ap = 77.36 % class = 17, name = sofa, ap = 71.74 % class = 18, name = train, ap = 82.96 % class = 19, name = tvmonitor, ap = 74.95 % mean average precision (mAP) = 0.757728, or 75.77 % Total Detection Time: 214.000000 Seconds — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#350 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APaJX0Dugg0eviXFI04ZbK0D3vwfGv5Eks5tVCoOgaJpZM4Rs4Fh> .

AlexeyAB · 2018-02-15T13:52:29Z

@szm2015 Yes, I think it can influence.

Maybe I'll add a separate Python-script voc_eval_difficult.py that creates a txt-files for Yolo with the labels (coordinates) of difficult objects from the XML-files of PascalVOC dataset. And will use these txt-files to remove difficult objects from calculating of TP and FP.

AlexeyAB · 2018-02-16T21:07:21Z

I added python script to get a list of images and labels with Difficult objects that generates difficult_2007_test.txt file: https://github.com/AlexeyAB/darknet/blob/master/scripts/voc_label_difficult.py
This file should be set here (without #):

darknet/build/darknet/x64/data/voc.data

Line 4 in 65bff26

#difficult = data/voc/difficult_2007_test.txt

Then darknet.exe detector map data/voc.data tiny-yolo-voc.cfg tiny-yolo-voc.weights gives 56.21%, (but reval_voc.py and voc_eval.py gives 56.6%, diff = 0.39):

class = 0, name = aeroplane,     ap = 61.05 %
class = 1, name = bicycle,       ap = 71.58 %
class = 2, name = bird,          ap = 48.26 %
class = 3, name = boat,          ap = 40.61 %
class = 4, name = bottle,        ap = 20.92 %
class = 5, name = bus,   ap = 68.13 %
class = 6, name = car,   ap = 66.48 %
class = 7, name = cat,   ap = 70.46 %
class = 8, name = chair,         ap = 35.08 %
class = 9, name = cow,   ap = 55.10 %
class = 10, name = diningtable,          ap = 58.06 %
class = 11, name = dog,          ap = 62.59 %
class = 12, name = horse,        ap = 71.42 %
class = 13, name = motorbike,    ap = 69.23 %
class = 14, name = person,       ap = 59.74 %
class = 15, name = pottedplant,          ap = 27.80 %
class = 16, name = sheep,        ap = 55.32 %
class = 17, name = sofa,         ap = 52.50 %
class = 18, name = train,        ap = 70.84 %
class = 19, name = tvmonitor,    ap = 59.13 %

 mean average precision (mAP) = 0.562140, or 56.21 %

darknet.exe detector map data/voc.data yolo-voc.cfg yolo-voc.weights gives 76.94%, (but reval_voc.py and voc_eval.py gives 77.1%, diff = 0.16):

class = 0, name = aeroplane,     ap = 80.89 %
class = 1, name = bicycle,       ap = 84.36 %
class = 2, name = bird,          ap = 76.10 %
class = 3, name = boat,          ap = 66.57 %
class = 4, name = bottle,        ap = 55.50 %
class = 5, name = bus,   ap = 84.11 %
class = 6, name = car,   ap = 85.80 %
class = 7, name = cat,   ap = 88.31 %
class = 8, name = chair,         ap = 61.29 %
class = 9, name = cow,   ap = 82.67 %
class = 10, name = diningtable,          ap = 72.38 %
class = 11, name = dog,          ap = 84.46 %
class = 12, name = horse,        ap = 86.54 %
class = 13, name = motorbike,    ap = 83.92 %
class = 14, name = person,       ap = 79.27 %
class = 15, name = pottedplant,          ap = 51.84 %
class = 16, name = sheep,        ap = 78.71 %
class = 17, name = sofa,         ap = 75.63 %
class = 18, name = train,        ap = 83.19 %
class = 19, name = tvmonitor,    ap = 77.15 %

 mean average precision (mAP) = 0.769353, or 76.94 %

Now we somewhere lose 0.16 - 0.39 % of mAP :)

AlexeyAB added the question label Jan 28, 2018

gustavovaliati mentioned this issue Jul 16, 2018

yolov3 weights for PASCAL VOC #1226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to compute mAP of tiny yolo on VOC2007-test #350

How to compute mAP of tiny yolo on VOC2007-test #350

szm-R commented Jan 25, 2018

AlexeyAB commented Jan 25, 2018

szm-R commented Jan 25, 2018

AlexeyAB commented Jan 26, 2018

szm-R commented Jan 26, 2018 •

edited

Loading

MiZhangWhuer commented Jan 28, 2018

szm-R commented Jan 29, 2018

AlexeyAB commented Jan 29, 2018 •

edited

Loading

szm-R commented Jan 29, 2018

MiZhangWhuer commented Jan 29, 2018

szm-R commented Feb 12, 2018 •

edited

Loading

szm-R commented Feb 13, 2018

AlexeyAB commented Feb 13, 2018 •

edited

Loading

AlexeyAB commented Feb 13, 2018

szm-R commented Feb 14, 2018

AlexeyAB commented Feb 15, 2018

szm-R commented Feb 15, 2018 via email

AlexeyAB commented Feb 15, 2018 •

edited

Loading

AlexeyAB commented Feb 16, 2018 •

edited

Loading

How to compute mAP of tiny yolo on VOC2007-test #350

How to compute mAP of tiny yolo on VOC2007-test #350

Comments

szm-R commented Jan 25, 2018

AlexeyAB commented Jan 25, 2018

szm-R commented Jan 25, 2018

AlexeyAB commented Jan 26, 2018

szm-R commented Jan 26, 2018 • edited Loading

MiZhangWhuer commented Jan 28, 2018

szm-R commented Jan 29, 2018

AlexeyAB commented Jan 29, 2018 • edited Loading

szm-R commented Jan 29, 2018

MiZhangWhuer commented Jan 29, 2018

szm-R commented Feb 12, 2018 • edited Loading

szm-R commented Feb 13, 2018

AlexeyAB commented Feb 13, 2018 • edited Loading

AlexeyAB commented Feb 13, 2018

szm-R commented Feb 14, 2018

AlexeyAB commented Feb 15, 2018

szm-R commented Feb 15, 2018 via email

AlexeyAB commented Feb 15, 2018 • edited Loading

AlexeyAB commented Feb 16, 2018 • edited Loading

szm-R commented Jan 26, 2018 •

edited

Loading

AlexeyAB commented Jan 29, 2018 •

edited

Loading

szm-R commented Feb 12, 2018 •

edited

Loading

AlexeyAB commented Feb 13, 2018 •

edited

Loading

AlexeyAB commented Feb 15, 2018 •

edited

Loading

AlexeyAB commented Feb 16, 2018 •

edited

Loading