Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trained culane-small but could not get the performance described in the paper. #33

Open
KiveeDong opened this issue Nov 29, 2021 · 26 comments

Comments

@KiveeDong
Copy link

KiveeDong commented Nov 29, 2021

I trained culane-small model using CULane train split(88.9K) for 16 epochs, but the F1-score on CULane test split(34.7K) is only 77.55.
The F1-score reported in the paper is 78.14, and the gap is 0.69.
I think this gap is not small, as the F1-score gap between samll model and medium model is only 0.6 .
So, could you kindly tell us some details about how to train the model reported in the paper?
By the way, I use a batch size of 32 and did not change any hyperparameter in the codes.

Thank you very much~

@wangnian97
Copy link

同学,想请问下这个是怎么训练呢。CUDA_VISIBLE_DEVICES=0 PORT=29001 tools/dist_train.sh configs/condlanenet/culane/culane_small_train.py 1 --no-validate ,我用这个在单gpu训练,但是报错

@KiveeDong
Copy link
Author

同学,想请问下这个是怎么训练呢。CUDA_VISIBLE_DEVICES=0 PORT=29001 tools/dist_train.sh configs/condlanenet/culane/culane_small_train.py 1 --no-validate ,我用这个在单gpu训练,但是报错

直接用tools/train.py训练

@wangnian97
Copy link

可以看下你在服务器训练时的具体命令吗

@shuizaola
Copy link

me too .how to train

@CongerW
Copy link

CongerW commented Jan 5, 2022

同学,你这个定量结果F1是怎么得到的啊,求教~

@jyang68sh
Copy link

他不是带evaluate 选项么

@KiveeDong
Copy link
Author

同学,你这个定量结果F1是怎么得到的啊,求教~

@CongerW 他会保存测试结果,然后用scnn源码里culane的evaluate工具就可以计算指标

@jyang68sh
Copy link

@KiveeDong 他test脚本里带evaluate选项

@KiveeDong
Copy link
Author

@KiveeDong 他test脚本里带evaluate选项

@jyang68sh 我看只有curvelanes的测试脚本带了evaluate参数,tusimple和culane都没有

@jyang68sh
Copy link

@KiveeDong 确实如此,但是他定义了 LaneMetricCore 这个类,调用就行了

@CongerW
Copy link

CongerW commented Jan 6, 2022

@KiveeDong 确实如此,但是他定义了 LaneMetricCore 这个类,调用就行了

老哥,你CULane定量分析的代码还在吗,可以分享下不,我配置SCNN测试环境配置了半天一直没成功

@jyang68sh
Copy link

你可以看下laneatt
他有python的量化代码,但是速度很慢。。。

@balajiiitg
Copy link

I am unable to train CULANE small in condlanenet.py 'DataContainer' object has no attribute 'type'

@shuizaola
Copy link

你可以看下laneatt 他有python的量化代码,但是速度很慢。。。

能分享下culane的evaluate代码吗或者你的github也行?大佬!

@balajiiitg
Copy link

balajiiitg commented Feb 16, 2022 via email

@jyang68sh
Copy link

laneatt gets culane metrics by using inferred txt lane results.
@balajiiitg

@balajiiitg
Copy link

thanks by using
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29001 tools/dist_train.sh configs/condlanenet/culane/culane_small_train.py 4
i am getting error dictionary object is not sussprictable
in function
def forward_train(self, img, img_metas, **kwargs):
gt_batch_masks = [m['gt_masks'] for m in img_metas]

@jyang68sh
Copy link

this error is straightforward.
Just debug img_meta

@balajiiitg
Copy link

ya i have debug that as image_metas.data.data[0][0]
output = self.backbone(img.type(torch.cuda.FloatTensor))? what is img.type has no attribute

@jyang68sh
Copy link

jyang68sh commented Feb 16, 2022

is img empty?

what is value of img?

@balajiiitg
Copy link

balajiiitg commented Feb 16, 2022 via email

@jyang68sh
Copy link

if img has tensor value then it is not the reason for the error.

i am getting error dictionary object is not sussprictable

what is m in for m in img_meta?

@balajiiitg
Copy link

balajiiitg commented Feb 16, 2022 via email

@balajiiitg
Copy link

balajiiitg commented Feb 16, 2022 via email

@lovelydjj
Copy link

lovelydjj commented Jan 11, 2023

python tools/train.py configs/condlanenet/curvelanes/curvelanes_large_train.py
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 256, 10, 25]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

@haerrel
Copy link

haerrel commented Mar 1, 2023

I am unable to train CULANE small in condlanenet.py 'DataContainer' object has no attribute 'type'

I only get this error with dist_train.sh. I do not get the error when executing python tools/train.py configs/condlanenet/culane/culane_small_train.py. But with this command i get another error, which is the same as for @lovelydjj:

python tools/train.py configs/condlanenet/curvelanes/curvelanes_large_train.py
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 256, 10, 25]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

i was able to fix it with this #30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants