Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pretrain model on Gen1 dataset #13

Open
Orekishiro opened this issue Mar 20, 2024 · 30 comments
Open

The pretrain model on Gen1 dataset #13

Orekishiro opened this issue Mar 20, 2024 · 30 comments

Comments

@Orekishiro
Copy link

In your paper, you use EMS-Res10 model and achieve 0.267 mAP on Gen1 Dataset, but I used the framework you provided to train on the Gen1 dataset, I couldn't get good results.
I don't know if there were some problems in my training stage, so could you provide the trained model on Gen1 Dataset?

@Orekishiro
Copy link
Author

results

@108360215
Copy link

Excuse me, Did you encounter the following issue when training gen1 data?
File "D:\ems\EMS_Origin\EMS-YOLO\g1\models\yolo.py", line 128, in _forward_once
x = m(x) # run
File "C:\Users\user\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\ems\EMS_Origin\EMS-YOLO\g1\models\common.py", line 162, in forward
return self.bn(self.conv(x))
File "C:\Users\user\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\ems\EMS_Origin\EMS-YOLO\g1\models\common.py", line 190, in forward
c1[i] = F.conv2d(input[i], weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [3, 32, 2, 2], expected input[1, 3, 256, 256] to have 32 channels, but got 3 channels instead
How did you solve this problem? Thanks! If you can help me I really appreciate it!

@Orekishiro
Copy link
Author

Excuse me, Did you encounter the following issue when training gen1 data? File "D:\ems\EMS_Origin\EMS-YOLO\g1\models\yolo.py", line 128, in _forward_once x = m(x) # run File "C:\Users\user\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "D:\ems\EMS_Origin\EMS-YOLO\g1\models\common.py", line 162, in forward return self.bn(self.conv(x)) File "C:\Users\user\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "D:\ems\EMS_Origin\EMS-YOLO\g1\models\common.py", line 190, in forward c1[i] = F.conv2d(input[i], weight, self.bias, self.stride, self.padding, self.dilation, self.groups) RuntimeError: Given groups=1, weight of size [3, 32, 2, 2], expected input[1, 3, 256, 256] to have 32 channels, but got 3 channels instead How did you solve this problem? Thanks! If you can help me I really appreciate it!

I don't encounter this issue. And I think it seems like an issue about channel setting, I suggest tocheck the config on *.yaml.

@108360215
Copy link

I changed the yolo.py and common.py from yolov3 source code, did you change these too?
I have a favor to ask. Could you please provide me with the code related to training gen1 data that you have? I've been stuck on this architecture for a while. I would greatly appreciate it if you could help me out with this favor!
"j88239806@gmail.com" It's my google drive and gmail

@zhuang5252
Copy link

In your paper, you use EMS-Res10 model and achieve 0.267 mAP on Gen1 Dataset, but I used the framework you provided to train on the Gen1 dataset, I couldn't get good results. I don't know if there were some problems in my training stage, so could you provide the trained model on Gen1 Dataset?

Hello, may I ask how you downloaded the dataset? Can you please let me know

@Orekishiro
Copy link
Author

In your paper, you use EMS-Res10 model and achieve 0.267 mAP on Gen1 Dataset, but I used the framework you provided to train on the Gen1 dataset, I couldn't get good results. I don't know if there were some problems in my training stage, so could you provide the trained model on Gen1 Dataset?

Hello, may I ask how you downloaded the dataset? Can you please let me know

You can download dataset at https://www.prophesee.ai/2020/01/24/prophesee-gen1-automotive-detection-dataset/

@108360215
Copy link

@Orekishiro I can train now!, but when I did val, my P、R、map all are zero. Did you changed the val or something else? Or you have already tested on your test_data?

@Orekishiro
Copy link
Author

@Orekishiro I can train now!, but when I did val, my P、R、map all are zero. Did you changed the val or something else? Or you have already tested on your test_data?

I did not directly run val.py, and the image shown in this issue is automatically generated after the training is completed. To run train_g1.py, I delete some unused code in val.py, such as DetectMultiBackend and plots code, they don't influence the results of the inference stage. You say that the P、R、mAP are all zero, I guess the reason is that the format of the target label.
我英文水平有限,可能表达不清。
我并没有直接去运行val.py文件,之前在这个issue上传的plt图片是在训练完成后框架自动生成的,这部分我没有去进行改动。而运行train_g1.py时,我暂时将没有用到代码都注释掉了,但它们不会影响推理的结果。你测试得到的指标都是0,这可能是因为读取label的时候格式出现问题;或者你注意一下损失函数的曲线是否正常下降,也有可能模型没有学好。
image

@108360215
Copy link

感謝你的回覆! 我再去試試看有關val的部分以及我的label格式,你方便加個discord嗎名字是diopang,我是碩士生現在正在研究關於event camera用SNN做object detection

@jsckdon
Copy link

jsckdon commented Mar 29, 2024

感謝你的回覆! 我再去試試看有關val的部分以及我的label格式,你方便加個discord嗎名字是diopang,我是碩士生現在正在研究關於event camera用SNN做object detection

May I ask how the event data was placed in the folder? I have been confused by this question for some time

@jsckdon
Copy link

jsckdon commented Mar 29, 2024

结果

我看您的结果的map在训练一段时间后呈下降的趋势,请问这是这个代码的问题吗还是其他的问题

@Orekishiro
Copy link
Author

结果

我看您的结果的map在训练一段时间后呈下降的趋势,请问这是这个代码的问题吗还是其他的问题
这个我估计过拟合了,可能是我参数设置问题,后来我换了个res34,基本上能有0.3多

@jsckdon
Copy link

jsckdon commented Mar 30, 2024

结果

我看您的结果的map在训练一段时间后呈下降的趋势,请问这是这个代码的问题吗还是其他的问题
这个我估计过拟合了,可能是我参数设置问题,后来我换了个res34,基本上能有0.3多
噢噢噢噢明白,我想请问下您的那个事件数据集是怎么放的,我最近开始跑那个事件数据集的代码,数据集放的一直不对

@Orekishiro
Copy link
Author

结果

我看您的结果的map在训练一段时间后呈下降的趋势,请问这是这个代码的问题吗还是其他的问题
这个我估计过拟合了,可能是我参数设置问题,后来我换了个res34,基本上能有0.3多
噢噢噢噢明白,我想请问下您的那个事件数据集是怎么放的,我最近开始跑那个事件数据集的代码,数据集放的一直不对

EMS-YOLO他这个框架的逻辑,应该是先用give_g1_data.py缓存下来了事件表示和label标签,之后再用datasets_g1T.py进行加载。
我是对数据集加载部分进行了修改,将数据集分别放入train/val/test文件夹下,根据传入的mode确定加载的数据集,然后重写了加载逻辑。
EMS-YOLO应该是参考https://github.com/loiccordone/object-detection-with-spiking-neural-networks/ 这个项目中的datasets/gen1_od_dataset.py 写的数据加载,EMS用的Yolov3框架和他的区别在于需要预先加载所有的Label进行自适应锚框。

@jsckdon
Copy link

jsckdon commented Mar 30, 2024

结果

我看您的结果的map在训练一段时间后呈下降的趋势,请问这是这个代码的问题吗还是其他的问题
这个我估计过拟合了,可能是我参数设置问题,后来我换了个res34,基本上能有0.3多
噢噢噢噢明白,我想请问下您的那个事件数据集是怎么放的,我最近开始跑那个事件数据集的代码,数据集放的一直不对

EMS-YOLO他这个框架的逻辑,应该是先用give_g1_data.py缓存下来了事件表示和label标签,之后再用datasets_g1T.py进行加载。 我是对数据集加载部分进行了修改,将数据集分别放入train/val/test文件夹下,根据传入的mode确定加载的数据集,然后重写了加载逻辑。 EMS-YOLO应该是参考https://github.com/loiccordone/object-detection-with-spiking-neural-networks/ 这个项目中的datasets/gen1_od_dataset.py 写的数据加载,EMS用的Yolov3框架和他的区别在于需要预先加载所有的Label进行自适应锚框。

非常感谢您的回复!

@108360215
Copy link

@Orekishiro 所以你會先用give_g1_data.py 生成出.npy檔案後 再用datasets_g1T來create_dataloader是嗎

@Orekishiro
Copy link
Author

Orekishiro commented Mar 30, 2024

@Orekishiro 所以你會先用give_g1_data.py 生成出.npy檔案後 再用datasets_g1T來create_dataloader是嗎

这个没有,我没用他原来代码读取。我是先找label的时间戳,然后截取这一段的事件转为numpy存储,同时生成事件表示,再重写了一个Dataset读的numpy文件,其实逻辑和原EMS是相同的。
当然你也可以参考 https://github.com/uzh-rpg/RVT RVT提供了Gen1和1Mpx数据集的h5文件(包括原始的事件和20通道的事件表示),逻辑上和读dat没啥区别,但h5占用硬盘空间要小一些。

@108360215
Copy link

@Orekishiro 你有看過他對應的img跟label的anchor嗎 我顯示後發現anchor都歪掉
event_label

@108360215
Copy link

但是我如果將label的框往左上角調整,就會換成我training時的框也跟著往左上角偏移導致沒有抓到

@Orekishiro
Copy link
Author

但是我如果將label的框往左上角調整,就會換成我training時的框也跟著往左上角偏移導致沒有抓到

我也可视化过,Anchor是能够对上的;有可能anchor格式问题,你可以确定一下是目标框的左上顶点,(x1,y1,w,h),还是目标框的中心(xc,yc,w,h),看你的结果好像左上顶点正好在物体中心位置,有可能是这里出了问题。
image

@jsckdon
Copy link

jsckdon commented Apr 2, 2024

@Orekishiro 请问您有没有遇到过这个tensor的问题, File "E:\EMS-YOLO-main\models\yolo.py", line 137, in forward
input[i] = x
RuntimeError: expand(torch.cuda.FloatTensor{[2, 5, 3, 320, 320]}, size=[2, 5, 3, 320]): the number of sizes provided (4) must be greater or equal to the number of dimensions in the tensor (5)就是我打印了这个x和input的shape,一开始都是正常的,类似于torch.Size([1, 3, 256, 256])

torch.Size([3, 1, 3, 256, 256])这种,然后要训练的时候就报错了,那个时候的tensor就变成了这样torch.Size([2, 5, 3, 320, 320])

torch.Size([3, 2, 5, 3, 320])

@108360215
Copy link

@Orekishiro 感謝你的回覆! 想問你有對val.py做修改嗎,因為我發現蠻多部分有缺少的

@Orekishiro
Copy link
Author

@Orekishiro 请问您有没有遇到过这个tensor的问题, File "E:\EMS-YOLO-main\models\yolo.py", line 137, in forward

input[i] = x
RuntimeError: expand(torch.cuda.FloatTensor{[2, 5, 3, 320, 320]}, size=[2, 5, 3, 320]): the number of sizes provided (4) must be greater or equal to the number of dimensions in the tensor (5)就是我打印了这个x和input的shape,一开始都是正常的,类似于torch.Size([1, 3, 256, 256])

torch.Size([3, 1, 3, 256, 256])这种,然后要训练的时候就报错了,那个时候的tensor就变成了这样torch.Size([2, 5, 3, 320, 320])

torch.Size([3, 2, 5, 3, 320])

你可以看一下yolo.py和common.py中time_window这个参数的设置是否一样;在yolo.py里经过forward函数时,根据时间步长time_window将输入复制,输入的shape为(time_window,batch_size,C,H,W),与commom.py中的mem_update函数就可以对上了。
但实际上他在datasets_g1T.py中返回的是下图这样的shape,所以我也不太确定他最终是如何训练Gen1数据集的,也可能他上传的yolo.py是对COCO数据集的版本。
image

@Orekishiro
Copy link
Author

@Orekishiro 感謝你的回覆! 想問你有對val.py做修改嗎,因為我發現蠻多部分有缺少的

这里我记得我好像是直接将那个import DetectMultiBackend给注释掉了,其他部分好像没有太修改,这个具体的记不太清了。

@jsckdon
Copy link

jsckdon commented Apr 3, 2024

DetectMultiBackend

非常感谢您的回复,我再多尝试下

@jsckdon
Copy link

jsckdon commented Apr 3, 2024

@Orekishiro 请问您选的什么初始权重文件呢?

@Orekishiro
Copy link
Author

@Orekishiro 请问您选的什么初始权重文件呢?

我之前拿Res10跑的,没加权重;后来的Res34也没加,可能加载coco的权重会提高一些性能。

@jsckdon
Copy link

jsckdon commented Apr 3, 2024

@Orekishiro 请问您选的什么初始权重文件呢?

我之前拿Res10跑的,没加权重;后来的Res34也没加,可能加载coco的权重会提高一些性能。

好滴好滴,谢谢您的回复!

@108360215
Copy link

@Orekishiro 感謝你的回覆! 想問你有對val.py做修改嗎,因為我發現蠻多部分有缺少的

这里我记得我好像是直接将那个import DetectMultiBackend给注释掉了,其他部分好像没有太修改,这个具体的记不太清了。

好的了解! 但我試了一下我的label格式跟train格式都是由give_g1_data.py生的create_dataloader,所以應該會一樣才對,但不知為啥map P R皆為0,網路上說有可能是cuda pytroch版本問題,想問一下你的這兩個版本是啥 謝謝

@108360215
Copy link

@Orekishiro 有關def non_max_suppression你有發現他回傳的output會跟Label一樣嗎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants