Train on own dataset #74

javierpastorfernandez · 2024-04-25T09:05:39Z

Good morning,

I am trying to train the model in my own dataset. I have updated the config so that the sizes of the different elements of the network are consistent, but whenever I see the results of the validation on my images during the training procedure, it is clear that something is going wrong. Furthermore, when I see the training curves, the graph is very noisy.

I am working with images of 3848 (width) x 2168 (height), but they rescaled to 1282 (width) x 722 (height) before being fed to the network (1:3 scale).

According to the config, the image is first downsampled from the original resolution to (1/3) -> 1282 (width) x 722 (height) , then cropping takes place using the cropbox [0, 150, 1282, 722], which leaves an output image of dimensions (1282 -0) (width) and (722-150 = 572). Finally, the image is resize to a final size of 576 x 352

Some more data on image Resizing:

scaling_1: (3.001560062402496, 3.0027700831024933)
cut_height: 150
(x_min) percentage: 0.0
(x_max) percentage: 100.0
(y_min) percentage: 20.775623268698062
(y_max) percentage: 100.0
After cropping image size: (1282 x 572)
Final image size: (576 x 352)
scaling_2: (2.2256944444444446, 1.625)

This is my current config:

# ----------------------------------- General Parameters ---------------------------------------
# https://github.com/Turoad/lanedet/issues/49

test_json_file = 'data/tusimple/test_label.json'
val_json_file = 'seg_label/val_full_meta_filters_road_type-highway_meta_filters_lane_zod_as_tu_simple_with_masks.json'

dataset_img_w = 3848
dataset_img_h=2168

ori_img_w=1282
ori_img_h=722
cut_height = 150

img_width=576 # 641
img_height=352 # 361

crop_bbox = [0, cut_height, ori_img_w, ori_img_h] # cam_x_min, cam_y_min, cam_x_max, cam_y_max

# Training Process
# 1. Bounding Box Cropping
# -> 1282 - 150  = 1132
# -> 722
# 2. Resizing -
# -> 1132/2 = 566
# -> 722 = 360

# 3. Adjusting
# -> Width 1132/2 = 566 -> 576 / 32 -> 18
# -> Height 722 = 360 -> 352/32 -> 11

# Training Process
# 1. Bounding Box Cropping -> 590 - 270  = 320
# 2. Resizing -> img_height = 320  ; img_width = 800 # Resize only x ? The f?

# sample_y=range(710, 150, -10)
sample_y = range(ori_img_h, cut_height, -8) 

batch_size = 8
num_lane_classes=1

# Mask size given a mask down scale of 4:
# - Width: 576/ 4 = 144
# - Height: 352/ 4 = 88
# mask_size = (1, height, width)

mask_size = (1, 88, 144)
mask_down_scale = 4
hm_down_scale = 16
num_lane_classes = 1
line_width = 3
radius = 6
nms_thr = 4
img_scale = (img_width, img_height)

net = dict(
    type='Detector',
)

backbone = dict(
    type='ResNetWrapper',
    resnet='resnet101',
    pretrained=True,
    replace_stride_with_dilation=[False, False, False],  # There are three convolutinal blocks within the ResNet Architecture -> #stride values will not be replaced with dilation in the ResNet-101 mode
    out_conv=False, #no additional convolutional layers are added after the backbone.
    in_channels=[64, 128, 256, 512] # specifying the number of input channels for each stage of the ResNet model.
    # Input channels is 64??
)

# After the backbone, fea[-1] -> (fea[-1]) shape:  torch.Size([8, 2048, 12, 20])
aggregator = dict(
    type='TransConvEncoderModule',
    in_dim=2048,
    attn_in_dims=[2048, 256],  # input dimensions for the attention modules within the aggregator.
    attn_out_dims=[256, 256], # output dimensions for the attention modules 
    strides=[1, 1],
    ratios=[4, 4],
    # pos_shape=(batch_size, 10, 25), # defining the shape of the positional encoding used in the aggregator.
    pos_shape=(batch_size, 11, 18), # defining the shape of the positional encoding used in the aggregator.

)
# pos_shape should also be changed, it's (batch_size, img_height/32, img_width/32).

neck=dict(
    type='FPN',
    in_channels=[256, 512, 1024, 256],
    out_channels=64,
    num_outs=4,
    #trans_idx=-1,
)

loss_weights=dict(
        hm_weight=1,
        kps_weight=0.4,
        row_weight=1.,
        range_weight=1.,
    )

heads=dict(
    type='CondLaneHead',
    heads=dict(hm=num_lane_classes),
    in_channels=(64, ),
    num_classes=num_lane_classes,
    head_channels=64,
    head_layers=1,
    disable_coords=False,
    branch_in_channels=64,
    branch_channels=64,
    branch_out_channels=64,
    reg_branch_channels=64,
    branch_num_conv=1,
    hm_idx=2,
    mask_idx=0,
    compute_locations_pre=True,
    location_configs=dict(size=(batch_size, mask_size[0], mask_size[1], mask_size[2]), device='cuda:0')
)
optimizer = dict(type='AdamW', lr=3e-4, betas=(0.9, 0.999), eps=1e-8)

epochs = 16
total_iter = (88880 // batch_size) * epochs
import math
scheduler = dict(
    type = 'MultiStepLR',
    milestones=[8, 14],
    gamma=0.1
)

seg_loss_weight = 1.0
eval_ep = 1
save_ep = 1

img_norm = dict(
    mean=[75.3, 76.6, 77.6],
    std=[50.5, 53.8, 54.3]
)

# dataset_img -> Is the original resolution of the dataset
# ori -> Is the  resolution that the method takes as original

train_process = [
    dict(
        type='Alaug',
        transforms=[
            dict(type='Compose', params=dict(bboxes=False, keypoints=True, masks=False)),
            dict(type='Resize', height=ori_img_h, width=ori_img_w, p=1),  # Get the /3 which matches DDV resolution 
            dict(type='Crop', x_min=crop_bbox[0], x_max=crop_bbox[2], y_min=crop_bbox[1], y_max=crop_bbox[3], p=1),
            dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1),  # Probability of Applying the Transformation 
            dict(type='OneOf',
                 transforms=[
                     dict(type='RGBShift', r_shift_limit=10, g_shift_limit=10, b_shift_limit=10, p=1.0),
                     dict(type='HueSaturationValue',
                          hue_shift_limit=(-10, 10),
                          sat_shift_limit=(-15, 15),
                          val_shift_limit=(-10, 10),
                          p=1.0),
                 ],
                 p=0.7),
            dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2),
            dict(type='OneOf',
                 transforms=[dict(type='Blur', blur_limit=3, p=1.0),
                             dict(type='MedianBlur', blur_limit=3, p=1.0)],
                 p=0.2),
            dict(type='RandomBrightness', limit=0.2, p=0.6),
            dict(type='ShiftScaleRotate',
                 shift_limit=0.1,
                 scale_limit=(-0.2, 0.2),
                 rotate_limit=10,
                 border_mode=0,
                 p=0.6),
            dict(type='RandomResizedCrop',
                 height=img_scale[1],
                 width=img_scale[0],
                 scale=(0.8, 1.2),
                 ratio=(1.7, 2.7),
                 p=0.6),
            dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1),
        ]),
    dict(type='CollectLane',
         down_scale=mask_down_scale,
         hm_down_scale=hm_down_scale,
         max_mask_sample=5,
         line_width=line_width,
         radius=radius,
         keys=['img', 'gt_hm'],
         meta_keys=['gt_masks', 'mask_shape', 'hm_shape', 'down_scale', 'hm_down_scale', 'gt_points']),
    #dict(type='Resize', size=(img_width, img_height)),
    dict(type='Normalize', img_norm=img_norm),
    dict(type='ToTensor', keys=['img', 'gt_hm'], collect_keys=['img_metas']),
]


val_process = [
    dict(type='Alaug',
         transforms=[
             dict(type='Compose', params=dict(bboxes=False, keypoints=True, masks=False)),
             dict(type='Resize', height=ori_img_h, width=ori_img_w, p=1),  # Get the /3 which matches DDV resolution
             dict(type='Crop', x_min=crop_bbox[0], x_max=crop_bbox[2], y_min=crop_bbox[1], y_max=crop_bbox[3], p=1),
             dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1)
         ]),
    #dict(type='Resize', size=(img_width, img_height)),
    dict(type='Normalize', img_norm=img_norm),
    dict(type='ToTensor', keys=['img']),
]

dataset_path = '/home/autolabelling/datasets/zenseact'
dataset = dict(
    train=dict(
        type='ZodAsTuSimple',
        data_root=dataset_path,
        split='train',
        processes=train_process,
    ),
    val=dict(
        type='ZodAsTuSimple',
        data_root=dataset_path,
        split='val', #what ? 
        processes=val_process,
    ),
    test=dict(
        type='ZodAsTuSimple',
        data_root=dataset_path,
        split='test',
        processes=val_process,
    )
)


workers = 12
# log_interval = 1000
log_interval = 1
lr_update_by_epoch = True


# Número de Imagenes en un epoch!
eval_ep = 1 
save_ep = 1
train_image_log_interval = save_ep * 9000

Regarding the validation results, I get predictions in the top left corner of the image all the time: Example

The text was updated successfully, but these errors were encountered:

SINGHxTUSHAR · 2024-04-25T11:25:16Z

Possible solution for Turoad/lanedet#49

check: 1

Check if the model configuration is correct and consistent with the input data.
The number of input channels, the dimensions of the attention modules, and other parameters should match the input data.

check: 2

The resizing and cropping operations can introduce the distortion of info. So, ensure the proper operation for that.
Try to play around with different learning rates and also experiment with different hyperparameters, especially learning rate, batch size, and augmentation techniques.

javierpastorfernandez · 2024-04-25T12:10:57Z

I already tried and changed the default configuration to match my dataset resolution and size, but I still got this results during training. There might be some adjustment in the model that is set incorrectly as the artifact is very unique.

SINGHxTUSHAR · 2024-04-26T03:37:36Z

yes, this could be.

javierpastorfernandez closed this as completed Apr 25, 2024

javierpastorfernandez reopened this Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train on own dataset #74

Train on own dataset #74

javierpastorfernandez commented Apr 25, 2024 •

edited

SINGHxTUSHAR commented Apr 25, 2024

javierpastorfernandez commented Apr 25, 2024

SINGHxTUSHAR commented Apr 26, 2024

Train on own dataset #74

Train on own dataset #74

Comments

javierpastorfernandez commented Apr 25, 2024 • edited

SINGHxTUSHAR commented Apr 25, 2024

Possible solution for Turoad/lanedet#49

javierpastorfernandez commented Apr 25, 2024

SINGHxTUSHAR commented Apr 26, 2024

javierpastorfernandez commented Apr 25, 2024 •

edited