-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing MIST function and results #3
Comments
Thanks for the information. It seems to be correct to me even though I didn't run this code line by line. It's strange that you can only achieve 25% mAP on voc2007(?). Some of my thoughts here:
|
Yes and yes. As I lower p, the performance approaches OICR, as it increases the performance drops. Do you do the trick from their ECCV paper? Ignoring boxes with iou < 0.1? Also can I confirm for this particular experiment (vgg, no regression) did you use the original hyperparameters or the new ones in the new appendix? Any warmup or other things? |
we tried their trick but it doesn't help too much for our algo. We use the same params as in Appendix. For warm-up, we set it to 200 iters. But I don't think that's the reason because warm-up doesn't change the results too much. For other things:
FYI, the config we used (maskrcnn-benchmark format) :
|
I refactored, moving my sort outside the loop and using Also can I confirm that MIN_SIZE_TEST gets ignored when augmentation is enabled? I'm pretty sure it does, but I want to be sure |
Could you confirm whether all layers are at the same learning rate? and which layers are frozen? I noticed some variation in playing with learning rate. Typically OICR/PCL would have 10x LR for the refinement heads. I found I got better (although still not good) results for MIST using 10x LR for the WSDDN head too, and even better if I had the backbone at 0.001 with the heads at 0.01. |
Thanks so much for the feedback~ Here are some of my thoughts. Let me know if I miss or mis-understand anything.
|
Hey there, a couple more questions in this unending struggle:
|
|
Which layer has a ReLU removed? The only layers I see without ReLU are the output layers that are activated by Softmax |
Are there any updates on an ETA for code release? This is so far proving very difficult to reproduce. The pseudo-labelling function as you've described it now appears to boil down to "apply NMS to the top 15% of boxes in each positive class". The label assignment strategy is the same as OICR except in the case where multiple GT boxes overlap the same amount with a given proposal, where you assign to the box with the highest predicted class-confidence (essentially choosing the "most correct" assignment). I am still unable to exceed the performance of OICR in my replication. |
Still struggling to reproduce anything even close to your results. I'm sure there must be some hidden details that are missing from the paper still. Below is the code for the key functions as you've described. The network structure is identical to OICR as defined in the original Caffe implementation. The dataset is VOC at 6 scales (the 5 scales that are standard practice in WSOD as well as min_side=1000) with their horizontal flips. I have tried both including and excluding the "difficult" examples from the training set (these are typically excluded, but I saw maskrcnn-benchmark includes them). The optimiser is SGD with the parameters in your config as well as momentum=0.9, warmup=1/3. I've tried both with and without the modified bias parameters (no weight decay and 2xLR for biases). At test time the scores for all scales and flips are averaged for each image, NMS is applied (I've tried both 0.3 as is standard for OICR-based models, and 0.5 as is default in maskrcnn-benchmark) and the top 100 predictions per image are retained (as is default in maskrcnn-benchmark, although I've tried disabling it and it makes a minimal difference). Overall my experiments have shown strictly worse results for increasing values of p. When p approaches top-1 mAP is around 43%, and it falls to about 35% as p approaches 15%. def build_loss(preds, labels, rois):
midn_loss = F.binary_cross_entropy(preds['midn'].sum(0).clamp(EPS, 1-EPS), image_labels)
gt_boxes, gt_labels, gt_weights = mist_label(preds['midn'] * 3., rois, image_labels)
pseudo_labels, weights, tfm_tgt = mist_sample_rois(preds['ref0'].softmax(-1), rois, gt_boxes, gt_labels, gt_weights)
ref0_loss = weighted_softmax_with_loss(preds['ref0'], pseudo_labels, weights)
gt_boxes, gt_labels, gt_weights = mist_label(preds['ref0'].softmax(-1), rois, image_labels)
pseudo_labels, weights, tfm_tgt = mist_sample_rois(preds['ref1'].softmax(-1), rois, gt_boxes, gt_labels, gt_weights)
ref1_loss = weighted_softmax_with_loss(preds['ref1'], pseudo_labels, weights)
gt_boxes, gt_labels, gt_weights = mist_label(preds['ref1'].softmax(-1), rois, image_labels)
pseudo_labels, weights, tfm_tgt = mist_sample_rois(preds['ref2'].softmax(-1), rois, gt_boxes, gt_labels, gt_weights)
ref2_loss = weighted_softmax_with_loss(preds['ref2'], pseudo_labels, weights)
total_loss = midn_loss + ref0_loss + ref1_loss + ref2_loss
return total_loss
@torch.no_grad()
def mist_label(preds, rois, label, p=0.15, tau=0.2):
preds = (preds if preds.shape[-1] == label.shape[-1] else preds[...,1:]).clone() # remove background class if present
keep_count = int(p * preds.size(0))
klasses = label.nonzero(as_tuple=True)[0]
gt_boxes, gt_scores, gt_labels = [], [], []
for klass in klasses:
c_scores = preds[...,klass]
sort_idxs = c_scores.argsort(dim=0, descending=True)[:keep_count]
boxes = rois[sort_idxs]
c_scores = c_scores[sort_idxs]
keep_idxs = ops.nms(boxes, c_scores, tau)
gt_boxes.append(boxes[keep_idxs])
gt_scores.append(c_scores[keep_idxs])
gt_labels.append(torch.full_like(keep_idxs, klass+1))
gt_boxes = torch.cat(gt_boxes, 0)
gt_labels = torch.cat(gt_labels, 0)
gt_weights = torch.cat(gt_scores, 0)
return gt_boxes, gt_labels, gt_weights
@torch.no_grad()
def mist_sample_rois(preds, rois, gt_boxes, gt_labels, gt_weights, bg_threshold=0.5):
overlaps = ops.box_iou(rois, gt_boxes)
max_overlaps, gt_assignment = overlaps.max(dim=1)
# Compute assignment
bg = max_overlaps < bg_threshold
fg = ~bg
maximally_overlapping = ((max_overlaps[fg].unsqueeze(1) - overlaps[fg]) < EPS) # max_overlaps[fg].unsqueeze(1) == overlaps[fg] #
gt_assignment[fg] = (maximally_overlapping * preds[fg][:, gt_labels]).argmax(1)
# Construct labels
labels = gt_labels.gather(0, gt_assignment)
labels[bg] = 0
# Construct weights
weights = gt_weights.gather(0, gt_assignment)
# Calculate the regression target
G = to_xywh(gt_boxes).gather(0, gt_assignment.repeat(4,1).T)
P = to_xywh(rois)
T = torch.empty_like(rois)
T[:,:2] = (G[:,:2] - P[:,:2]) / P[:,2:]
T[:,2:] = (G[:,2:] / P[:,2:]).log()
return labels, weights, T
def weighted_softmax_with_loss(score:torch.Tensor, labels:torch.Tensor, weights:torch.Tensor) -> torch.Tensor:
# calculate loss
loss = -weights * F.log_softmax(score, dim=-1).gather(-1, labels.long().unsqueeze(-1)).squeeze(-1)
valid_sum = weights.gt(1e-12).float().sum()
if valid_sum < EPS:
return loss.sum() / loss.numel()
else:
return loss.sum() / valid_sum |
The results are way too strange. My OICR re-implementation got similar number as yours but mist never hurt. I'm wondering what's the tricky point that screwed up everything. |
No I have an updated repo. I will refactor my code over the weekend and try to put it up by Monday. I have claimed your paper in the PapersWithCode Reproducibility Challenge 2020 and so will need to produce a public implementation anyway. I hope I can reproduce your results as well as get some practice in writing a paper-like document by doing this. |
@jason718 |
I have also now tried using 8 GPUs with batch size 1 on each GPU, and the results are the same as with accumulation or true batch size 8. I also noticed that you don't mention initialisation in your paper, so tried using pytorch's default init as opposed to the normal init used in previous works which got me up to ~35.5 mAP. |
A query about this part: Do you use the score of the teacher or student branch at this stage as the tiebreaker? |
replied through email. please check |
Struggling to reproduce the results "MIST w/o Reg" from Table 5. From my understanding, this should be the same network structure as OICR but using the MIST algorithm (Algorithm 1) rather than the typical top-1 box. I have a working implementation of the OICR network structure, it achieves ~41% with OICR and ~44% with PCL using the dilated VGG-16 backbone.
I have tried using the original OICR and PCL hyperparameters (LR=1e-3, WD=5e-4, BS=2 or 4, 5 scales) as well as the new ones in the appendix (LR=1e-2, WD=1e-4, BS=8, 6 scales) and have been unable to break 25% with p=15%. My implementation of this function is included below:
The text was updated successfully, but these errors were encountered: