Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About other datasets #18

Closed
Zoeeeing opened this issue Mar 8, 2022 · 8 comments
Closed

About other datasets #18

Zoeeeing opened this issue Mar 8, 2022 · 8 comments

Comments

@Zoeeeing
Copy link

Zoeeeing commented Mar 8, 2022

Hi, have you experimented on some other outdoor datasets such as nuscenes? As i used SST to train on nuScenes dataset, the results i got were not ideal. I just modified the hyperparameters about the voxel size and replaced the head .I would like to ask whether there is a problem.
Thanks!

@Abyssaledge
Copy link
Collaborator

Thanks for using SST.
No, we have not tried SST on nuScenes. But If you share your config and detailed results, maybe we can help you.

@Zoeeeing
Copy link
Author

Zoeeeing commented Mar 8, 2022

Thanks!
The modified model is as follows:


voxel_size=(0.25, 0.25, 8),
window_shape = (16, 16, 1),
point_cloud_range=[-50, -50, -5, 50, 50, 3],
model = dict(
    type='DynamicVoxelNet',
    voxel_layer=dict(
        voxel_size=(0.25, 0.25, 8),
        max_num_points=-1,
        point_cloud_range=[-50, -50, -5, 50, 50, 3],
        max_voxels=(-1, -1)),
    voxel_encoder=dict(
        type='DynamicVFE',
        in_channels=4,
        feat_channels=[64, 128],
        with_distance=False,
        voxel_size=(0.25, 0.25, 8),
        with_cluster_center=True,
        with_voxel_center=True,
        point_cloud_range=[-50, -50, -5, 50, 50, 3],
        norm_cfg=dict(type='naiveSyncBN1d', eps=0.001, momentum=0.01)),
    middle_encoder=dict(
        type='SSTInputLayerV2',
        window_shape=(16, 16, 1),
        sparse_shape=(400, 400, 1),
        shuffle_voxels=True,
        debug=True,
        drop_info=({
            0: {
                'max_tokens': 100,
                'drop_range': (0, 100)
            },
            1: {
                'max_tokens': 200,
                'drop_range': (100, 200)
            },
            2: {
                'max_tokens': 250,
                'drop_range': (200, 10000)
            }
        }, {
            0: {
                'max_tokens': 100,
                'drop_range': (0, 100)
            },
            1: {
                'max_tokens': 200,
                'drop_range': (100, 200)
            },
            2: {
                'max_tokens': 256,
                'drop_range': (200, 10000)
            }
        }),
        pos_temperature=10000,
        normalize_pos=False),
    backbone=dict(
        type='SSTv2',
        d_model=[128, 128, 128, 128, 128, 128],
        nhead=[8, 8, 8, 8, 8, 8],
        num_blocks=6,
        dim_feedforward=[256, 256, 256, 256, 256, 256],
        output_shape=[400, 400],
        num_attached_conv=3,
        conv_kwargs=[
            dict(kernel_size=3, dilation=1, padding=1, stride=1),
            dict(kernel_size=3, dilation=1, padding=1, stride=1),
            dict(kernel_size=3, dilation=2, padding=2, stride=1)
        ],
        conv_in_channel=128,
        conv_out_channel=128,
        debug=True),
    neck=dict(
        type='SECONDFPN',
        norm_cfg=dict(type='naiveSyncBN2d', eps=0.001, momentum=0.01),
        in_channels=[128],
        upsample_strides=[1],
        out_channels=[384]),
    bbox_head=dict(
        type='Anchor3DHead',
        num_classes=10,
        in_channels=384,
        feat_channels=384,
        use_direction_classifier=True,
        anchor_generator=dict(
            type='AlignedAnchor3DRangeGenerator',
            ranges=[[-49.6, -49.6, -1.80032795, 49.6, 49.6, -1.80032795],
                    [-49.6, -49.6, -1.74440365, 49.6, 49.6, -1.74440365],
                    [-49.6, -49.6, -1.68526504, 49.6, 49.6, -1.68526504],
                    [-49.6, -49.6, -1.67339111, 49.6, 49.6, -1.67339111],
                    [-49.6, -49.6, -1.61785072, 49.6, 49.6, -1.61785072],
                    [-49.6, -49.6, -1.80984986, 49.6, 49.6, -1.80984986],
                    [-49.6, -49.6, -1.763965, 49.6, 49.6, -1.763965]],
            sizes=[[1.95017717, 4.60718145, 1.72270761],
                   [2.4560939, 6.73778078, 2.73004906],
                   [2.87427237, 12.01320693, 3.81509561],
                   [0.60058911, 1.68452161, 1.27192197],
                   [0.66344886, 0.7256437, 1.75748069],
                   [0.39694519, 0.40359262, 1.06232151],
                   [2.49008838, 0.48578221, 0.98297065]],
            custom_values=[0, 0],
            rotations=[0, 1.57],
            reshape_out=True),
        assigner_per_size=False,
        diff_rad_by_sin=True,
        dir_offset=0.7854,
        dir_limit_offset=0,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=9),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
    train_cfg=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            iou_calculator=dict(type='BboxOverlapsNearest3D'),
            pos_iou_thr=0.6,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        allowed_border=0,
        code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        use_rotate_nms=True,
        nms_across_levels=False,
        nms_pre=1000,
        nms_thr=0.2,
        score_thr=0.05,
        min_bbox_size=0,
        max_num=500))

After training for 24 epochs, i got the detailed results as follows.

pts_bbox_NuScenes/car_AP_dist_0.5: 0.4701, pts_bbox_NuScenes/car_AP_dist_1.0: 0.6067, pts_bbox_NuScenes/car_AP_dist_2.0: 0.6618, pts_bbox_NuScenes/car_AP_dist_4.0: 0.6832, pts_bbox_NuScenes/car_trans_err: 0.2372, pts_bbox_NuScenes/car_scale_err: 0.1477, pts_bbox_NuScenes/car_orient_err: 0.1317, pts_bbox_NuScenes/car_vel_err: 0.2814, pts_bbox_NuScenes/car_attr_err: 0.2252, pts_bbox_NuScenes/mATE: 0.4841, pts_bbox_NuScenes/mASE: 0.2709, pts_bbox_NuScenes/mAOE: 0.5280, pts_bbox_NuScenes/mAVE: 0.3700, pts_bbox_NuScenes/mAAE: 0.1962, pts_bbox_NuScenes/truck_AP_dist_0.5: 0.0624, pts_bbox_NuScenes/truck_AP_dist_1.0: 0.2224, pts_bbox_NuScenes/truck_AP_dist_2.0: 0.3657, pts_bbox_NuScenes/truck_AP_dist_4.0: 0.3988, pts_bbox_NuScenes/truck_trans_err: 0.5955, pts_bbox_NuScenes/truck_scale_err: 0.2285, pts_bbox_NuScenes/truck_orient_err: 0.2259, pts_bbox_NuScenes/truck_vel_err: 0.2660, pts_bbox_NuScenes/truck_attr_err: 0.2360, pts_bbox_NuScenes/trailer_AP_dist_0.5: 0.0000, pts_bbox_NuScenes/trailer_AP_dist_1.0: 0.0000, pts_bbox_NuScenes/trailer_AP_dist_2.0: 0.0073, pts_bbox_NuScenes/trailer_AP_dist_4.0: 0.0857, pts_bbox_NuScenes/trailer_trans_err: 0.9790, pts_bbox_NuScenes/trailer_scale_err: 0.2405, pts_bbox_NuScenes/trailer_orient_err: 0.9358, pts_bbox_NuScenes/trailer_vel_err: 0.3954, pts_bbox_NuScenes/trailer_attr_err: 0.1308, pts_bbox_NuScenes/bus_AP_dist_0.5: 0.0105, pts_bbox_NuScenes/bus_AP_dist_1.0: 0.1396, pts_bbox_NuScenes/bus_AP_dist_2.0: 0.3895, pts_bbox_NuScenes/bus_AP_dist_4.0: 0.4736, pts_bbox_NuScenes/bus_trans_err: 0.7881, pts_bbox_NuScenes/bus_scale_err: 0.1895, pts_bbox_NuScenes/bus_orient_err: 0.1455, pts_bbox_NuScenes/bus_vel_err: 0.6699, pts_bbox_NuScenes/bus_attr_err: 0.1602, pts_bbox_NuScenes/construction_vehicle_AP_dist_0.5: 0.0000, pts_bbox_NuScenes/construction_vehicle_AP_dist_1.0: 0.0036, pts_bbox_NuScenes/construction_vehicle_AP_dist_2.0: 0.0457, pts_bbox_NuScenes/construction_vehicle_AP_dist_4.0: 0.0629, pts_bbox_NuScenes/construction_vehicle_trans_err: 0.9470, pts_bbox_NuScenes/construction_vehicle_scale_err: 0.5084, pts_bbox_NuScenes/construction_vehicle_orient_err: 1.3642, pts_bbox_NuScenes/construction_vehicle_vel_err: 0.1244, pts_bbox_NuScenes/construction_vehicle_attr_err: 0.4645, pts_bbox_NuScenes/bicycle_AP_dist_0.5: 0.0264, pts_bbox_NuScenes/bicycle_AP_dist_1.0: 0.0287, pts_bbox_NuScenes/bicycle_AP_dist_2.0: 0.0290, pts_bbox_NuScenes/bicycle_AP_dist_4.0: 0.0298, pts_bbox_NuScenes/bicycle_trans_err: 0.1875, pts_bbox_NuScenes/bicycle_scale_err: 0.2586, pts_bbox_NuScenes/bicycle_orient_err: 0.8511, pts_bbox_NuScenes/bicycle_vel_err: 0.3377, pts_bbox_NuScenes/bicycle_attr_err: 0.0047, pts_bbox_NuScenes/motorcycle_AP_dist_0.5: 0.1205, pts_bbox_NuScenes/motorcycle_AP_dist_1.0: 0.1384, pts_bbox_NuScenes/motorcycle_AP_dist_2.0: 0.1415, pts_bbox_NuScenes/motorcycle_AP_dist_4.0: 0.1458, pts_bbox_NuScenes/motorcycle_trans_err: 0.2381, pts_bbox_NuScenes/motorcycle_scale_err: 0.2787, pts_bbox_NuScenes/motorcycle_orient_err: 0.7527, pts_bbox_NuScenes/motorcycle_vel_err: 0.6352, pts_bbox_NuScenes/motorcycle_attr_err: 0.3060, pts_bbox_NuScenes/pedestrian_AP_dist_0.5: 0.5656, pts_bbox_NuScenes/pedestrian_AP_dist_1.0: 0.5758, pts_bbox_NuScenes/pedestrian_AP_dist_2.0: 0.5854, pts_bbox_NuScenes/pedestrian_AP_dist_4.0: 0.5960, pts_bbox_NuScenes/pedestrian_trans_err: 0.1403, pts_bbox_NuScenes/pedestrian_scale_err: 0.2611, pts_bbox_NuScenes/pedestrian_orient_err: 0.3074, pts_bbox_NuScenes/pedestrian_vel_err: 0.2499, pts_bbox_NuScenes/pedestrian_attr_err: 0.0425, pts_bbox_NuScenes/traffic_cone_AP_dist_0.5: 0.0727, pts_bbox_NuScenes/traffic_cone_AP_dist_1.0: 0.0775, pts_bbox_NuScenes/traffic_cone_AP_dist_2.0: 0.0849, pts_bbox_NuScenes/traffic_cone_AP_dist_4.0: 0.1073, pts_bbox_NuScenes/traffic_cone_trans_err: 0.1638, pts_bbox_NuScenes/traffic_cone_scale_err: 0.3195, pts_bbox_NuScenes/traffic_cone_orient_err: nan, pts_bbox_NuScenes/traffic_cone_vel_err: nan, pts_bbox_NuScenes/traffic_cone_attr_err: nan, pts_bbox_NuScenes/barrier_AP_dist_0.5: 0.0680, pts_bbox_NuScenes/barrier_AP_dist_1.0: 0.2386, pts_bbox_NuScenes/barrier_AP_dist_2.0: 0.3307, pts_bbox_NuScenes/barrier_AP_dist_4.0: 0.3615, pts_bbox_NuScenes/barrier_trans_err: 0.5643, pts_bbox_NuScenes/barrier_scale_err: 0.2763, pts_bbox_NuScenes/barrier_orient_err: 0.0374, pts_bbox_NuScenes/barrier_vel_err: nan, pts_bbox_NuScenes/barrier_attr_err: nan, pts_bbox_NuScenes/NDS: 0.4278, pts_bbox_NuScenes/mAP: 0.2253

@Abyssaledge
Copy link
Collaborator

Your config looks fine to me. I am sorry that I do not have enough information to explain the poor results. We will try to run SST on nuScenes, but I can not provide the precise schedule for now.
My suggestion is to debug each component (backbone/head/) using a small datasize. For example, changing the anchor head to the center head to check if the head module is correct.

@Zoeeeing
Copy link
Author

Zoeeeing commented Mar 8, 2022

OK. I will debug the component and check the result when you run on nuScenes. Thanks for your work.

@Devoe-97
Copy link

Hi, do you have more recent results on nuscenes? @Zoeeeing

@Zoeeeing
Copy link
Author

@Devoe-97 Sorry I can not get some better results.

@gopi-erabati
Copy link

@Abyssaledge did you try to run experiments on nuScenes dataset ?
As nuScenes has less (5 times) samples than Waymo, does that have any effect on training from scratch to get such poor results on nuScenes ? (Because transformers are data hungry!!!)
What do you think about it?

@Abyssaledge
Copy link
Collaborator

@gopi231091 I have not run the experiments on nuScenes yet.
To my knowledge, SST is not that data-hungry. It has a better performance than PointPillars baseline with 20% training data on Waymo.
However, its performance in nuScenes might a little worse than the SOTAs because the Pillar-based models show inferior performance in nuScenes, which is observed by many researchers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants