Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train on own dataset #74

Open
javierpastorfernandez opened this issue Apr 25, 2024 · 3 comments
Open

Train on own dataset #74

javierpastorfernandez opened this issue Apr 25, 2024 · 3 comments

Comments

@javierpastorfernandez
Copy link

javierpastorfernandez commented Apr 25, 2024

Good morning,

I am trying to train the model in my own dataset. I have updated the config so that the sizes of the different elements of the network are consistent, but whenever I see the results of the validation on my images during the training procedure, it is clear that something is going wrong. Furthermore, when I see the training curves, the graph is very noisy.

I am working with images of 3848 (width) x 2168 (height), but they rescaled to 1282 (width) x 722 (height) before being fed to the network (1:3 scale).

According to the config, the image is first downsampled from the original resolution to (1/3) -> 1282 (width) x 722 (height) , then cropping takes place using the cropbox [0, 150, 1282, 722], which leaves an output image of dimensions (1282 -0) (width) and (722-150 = 572). Finally, the image is resize to a final size of 576 x 352

Some more data on image Resizing:

scaling_1: (3.001560062402496, 3.0027700831024933)
cut_height: 150
(x_min) percentage: 0.0
(x_max) percentage: 100.0
(y_min) percentage: 20.775623268698062
(y_max) percentage: 100.0
After cropping image size: (1282 x 572)
Final image size: (576 x 352)
scaling_2: (2.2256944444444446, 1.625)

This is my current config:

# ----------------------------------- General Parameters ---------------------------------------
# https://github.com/Turoad/lanedet/issues/49

test_json_file = 'data/tusimple/test_label.json'
val_json_file = 'seg_label/val_full_meta_filters_road_type-highway_meta_filters_lane_zod_as_tu_simple_with_masks.json'

dataset_img_w = 3848
dataset_img_h=2168

ori_img_w=1282
ori_img_h=722
cut_height = 150

img_width=576 # 641
img_height=352 # 361

crop_bbox = [0, cut_height, ori_img_w, ori_img_h] # cam_x_min, cam_y_min, cam_x_max, cam_y_max

# Training Process
# 1. Bounding Box Cropping
# -> 1282 - 150  = 1132
# -> 722
# 2. Resizing -
# -> 1132/2 = 566
# -> 722 = 360

# 3. Adjusting
# -> Width 1132/2 = 566 -> 576 / 32 -> 18
# -> Height 722 = 360 -> 352/32 -> 11

# Training Process
# 1. Bounding Box Cropping -> 590 - 270  = 320
# 2. Resizing -> img_height = 320  ; img_width = 800 # Resize only x ? The f?

# sample_y=range(710, 150, -10)
sample_y = range(ori_img_h, cut_height, -8) 

batch_size = 8
num_lane_classes=1

# Mask size given a mask down scale of 4:
# - Width: 576/ 4 = 144
# - Height: 352/ 4 = 88
# mask_size = (1, height, width)

mask_size = (1, 88, 144)
mask_down_scale = 4
hm_down_scale = 16
num_lane_classes = 1
line_width = 3
radius = 6
nms_thr = 4
img_scale = (img_width, img_height)

net = dict(
    type='Detector',
)

backbone = dict(
    type='ResNetWrapper',
    resnet='resnet101',
    pretrained=True,
    replace_stride_with_dilation=[False, False, False],  # There are three convolutinal blocks within the ResNet Architecture -> #stride values will not be replaced with dilation in the ResNet-101 mode
    out_conv=False, #no additional convolutional layers are added after the backbone.
    in_channels=[64, 128, 256, 512] # specifying the number of input channels for each stage of the ResNet model.
    # Input channels is 64??
)

# After the backbone, fea[-1] -> (fea[-1]) shape:  torch.Size([8, 2048, 12, 20])
aggregator = dict(
    type='TransConvEncoderModule',
    in_dim=2048,
    attn_in_dims=[2048, 256],  # input dimensions for the attention modules within the aggregator.
    attn_out_dims=[256, 256], # output dimensions for the attention modules 
    strides=[1, 1],
    ratios=[4, 4],
    # pos_shape=(batch_size, 10, 25), # defining the shape of the positional encoding used in the aggregator.
    pos_shape=(batch_size, 11, 18), # defining the shape of the positional encoding used in the aggregator.

)
# pos_shape should also be changed, it's (batch_size, img_height/32, img_width/32).

neck=dict(
    type='FPN',
    in_channels=[256, 512, 1024, 256],
    out_channels=64,
    num_outs=4,
    #trans_idx=-1,
)

loss_weights=dict(
        hm_weight=1,
        kps_weight=0.4,
        row_weight=1.,
        range_weight=1.,
    )

heads=dict(
    type='CondLaneHead',
    heads=dict(hm=num_lane_classes),
    in_channels=(64, ),
    num_classes=num_lane_classes,
    head_channels=64,
    head_layers=1,
    disable_coords=False,
    branch_in_channels=64,
    branch_channels=64,
    branch_out_channels=64,
    reg_branch_channels=64,
    branch_num_conv=1,
    hm_idx=2,
    mask_idx=0,
    compute_locations_pre=True,
    location_configs=dict(size=(batch_size, mask_size[0], mask_size[1], mask_size[2]), device='cuda:0')
)
optimizer = dict(type='AdamW', lr=3e-4, betas=(0.9, 0.999), eps=1e-8)

epochs = 16
total_iter = (88880 // batch_size) * epochs
import math
scheduler = dict(
    type = 'MultiStepLR',
    milestones=[8, 14],
    gamma=0.1
)

seg_loss_weight = 1.0
eval_ep = 1
save_ep = 1

img_norm = dict(
    mean=[75.3, 76.6, 77.6],
    std=[50.5, 53.8, 54.3]
)

# dataset_img -> Is the original resolution of the dataset
# ori -> Is the  resolution that the method takes as original

train_process = [
    dict(
        type='Alaug',
        transforms=[
            dict(type='Compose', params=dict(bboxes=False, keypoints=True, masks=False)),
            dict(type='Resize', height=ori_img_h, width=ori_img_w, p=1),  # Get the /3 which matches DDV resolution 
            dict(type='Crop', x_min=crop_bbox[0], x_max=crop_bbox[2], y_min=crop_bbox[1], y_max=crop_bbox[3], p=1),
            dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1),  # Probability of Applying the Transformation 
            dict(type='OneOf',
                 transforms=[
                     dict(type='RGBShift', r_shift_limit=10, g_shift_limit=10, b_shift_limit=10, p=1.0),
                     dict(type='HueSaturationValue',
                          hue_shift_limit=(-10, 10),
                          sat_shift_limit=(-15, 15),
                          val_shift_limit=(-10, 10),
                          p=1.0),
                 ],
                 p=0.7),
            dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2),
            dict(type='OneOf',
                 transforms=[dict(type='Blur', blur_limit=3, p=1.0),
                             dict(type='MedianBlur', blur_limit=3, p=1.0)],
                 p=0.2),
            dict(type='RandomBrightness', limit=0.2, p=0.6),
            dict(type='ShiftScaleRotate',
                 shift_limit=0.1,
                 scale_limit=(-0.2, 0.2),
                 rotate_limit=10,
                 border_mode=0,
                 p=0.6),
            dict(type='RandomResizedCrop',
                 height=img_scale[1],
                 width=img_scale[0],
                 scale=(0.8, 1.2),
                 ratio=(1.7, 2.7),
                 p=0.6),
            dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1),
        ]),
    dict(type='CollectLane',
         down_scale=mask_down_scale,
         hm_down_scale=hm_down_scale,
         max_mask_sample=5,
         line_width=line_width,
         radius=radius,
         keys=['img', 'gt_hm'],
         meta_keys=['gt_masks', 'mask_shape', 'hm_shape', 'down_scale', 'hm_down_scale', 'gt_points']),
    #dict(type='Resize', size=(img_width, img_height)),
    dict(type='Normalize', img_norm=img_norm),
    dict(type='ToTensor', keys=['img', 'gt_hm'], collect_keys=['img_metas']),
]


val_process = [
    dict(type='Alaug',
         transforms=[
             dict(type='Compose', params=dict(bboxes=False, keypoints=True, masks=False)),
             dict(type='Resize', height=ori_img_h, width=ori_img_w, p=1),  # Get the /3 which matches DDV resolution
             dict(type='Crop', x_min=crop_bbox[0], x_max=crop_bbox[2], y_min=crop_bbox[1], y_max=crop_bbox[3], p=1),
             dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1)
         ]),
    #dict(type='Resize', size=(img_width, img_height)),
    dict(type='Normalize', img_norm=img_norm),
    dict(type='ToTensor', keys=['img']),
]

dataset_path = '/home/autolabelling/datasets/zenseact'
dataset = dict(
    train=dict(
        type='ZodAsTuSimple',
        data_root=dataset_path,
        split='train',
        processes=train_process,
    ),
    val=dict(
        type='ZodAsTuSimple',
        data_root=dataset_path,
        split='val', #what ? 
        processes=val_process,
    ),
    test=dict(
        type='ZodAsTuSimple',
        data_root=dataset_path,
        split='test',
        processes=val_process,
    )
)


workers = 12
# log_interval = 1000
log_interval = 1
lr_update_by_epoch = True


# Número de Imagenes en un epoch!
eval_ep = 1 
save_ep = 1
train_image_log_interval = save_ep * 9000

Regarding the validation results, I get predictions in the top left corner of the image all the time: Example
image

@SINGHxTUSHAR
Copy link

Possible solution for Turoad/lanedet#49

check: 1


Check if the model configuration is correct and consistent with the input data.
The number of input channels, the dimensions of the attention modules, and other parameters should match the input data.


check: 2


The resizing and cropping operations can introduce the distortion of info. So, ensure the proper operation for that.
Try to play around with different learning rates and also experiment with different hyperparameters, especially learning rate, batch size, and augmentation techniques.


@javierpastorfernandez
Copy link
Author

I already tried and changed the default configuration to match my dataset resolution and size, but I still got this results during training. There might be some adjustment in the model that is set incorrectly as the artifact is very unique.

@SINGHxTUSHAR
Copy link

yes, this could be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants