Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It took 3000ms in my conputer #24

Closed
chenzx2 opened this issue Jun 29, 2023 · 14 comments
Closed

It took 3000ms in my conputer #24

chenzx2 opened this issue Jun 29, 2023 · 14 comments

Comments

@chenzx2
Copy link

chenzx2 commented Jun 29, 2023

It took 3000ms in my conputer,I don't know what is wrong

@garbe-github-support
Copy link

Me too,It's even slower than Sam。
MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment
windows , pytorch 2.0.1, cuda 11.7, 4070

@newcoder0531
Copy link

newcoder0531 commented Jun 29, 2023

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

@ChaoningZhang
Copy link
Owner

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

@garbe-github-support
Copy link

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

Thank you, you are right. Now my time is half that of Sam

@chenzx2
Copy link
Author

chenzx2 commented Jun 29, 2023

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

here is my code ,my environment: ubantu18 torch=2.0.0cu117
企业微信截图_16880289877752

@ChaoningZhang
Copy link
Owner

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img


def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()


if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

Thank you, you are right. Now my time is half that of Sam

It seems that your issues are addressed. Thanks for your interest in our work.

@ChaoningZhang
Copy link
Owner

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

here is my code ,my environment: ubantu18 torch=2.0.0cu117 企业微信截图_16880289877752

May I ask you whether you choose anything mode or everything mode?

@SongYii
Copy link

SongYii commented Jun 29, 2023

Even after adding the following in the code
device = "cuda" mobile_sam.to(device=device)

MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.

SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

@chenzx2
Copy link
Author

chenzx2 commented Jun 29, 2023

fastSAM is fine,I run the code of notebook

@ChaoningZhang
Copy link
Owner

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)

MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.

SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

@SongYii
Copy link

SongYii commented Jun 29, 2023

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.
SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.

@ChaoningZhang
Copy link
Owner

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)
MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.
SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.

Thanks for your interest in our work. Note that MobileSAM makes the image encoder lightweight without changing the decoder (like 8ms on the encoder and 4ms on the decoder). Since we mainly target the anything mode (1 times image encoder and 1 times decoder) instead of everything mode (1 times image encoder and 32x32 times decoder), see the paper for definition difference (Anything mode is the foundation task while everything mode is just a downstream task as indicated in the original SAM paper). For everything mode, even though our encoder is much faster than that of the original SAM(close to 500ms), it cannot save too much time for the whole pipeline since most of the time is spent on the 32x32 times decoder. One way to mitigate this is to use smaller number of grids (like 10x10 or 5x5) to make the decoder consume less time, since many redundant masks are generated in the case of 32x32 grids. I hope this addresses your issues, otherwise, please kindly let us know. We are also currently trying to make the image decoder more lightweight by distilling it with smaller one as we did for image encoder. Stayed tuned for our progress. If you have more issues, please kindly let us know and we might not be able to respond in a timely manner, but will try our best.

@fujianhai
Copy link

fujianhai commented Jun 30, 2023

This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.

@ChaoningZhang
Copy link
Owner

This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.

Thanks for your interest in our work. Please check our replies to others on how to mitigate this issue. Yet another way to speed it up on GPU is to do a batch inference for the decoder with 32*32 grids of prompt points. You can try implementing it and help do a pull request here, if you complete it. We will also implement it by ourselves but it take a while~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants