Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal Training Phenomena and Bad Performance #22

Open
sunpihai-up opened this issue Dec 7, 2023 · 6 comments
Open

Abnormal Training Phenomena and Bad Performance #22

sunpihai-up opened this issue Dec 7, 2023 · 6 comments

Comments

@sunpihai-up
Copy link

Dear Luigi Piccinelli,
I hope this message finds you well. I wanted to express my sincere appreciation for your exceptional article. Inspired by your work, I attempted to train your project on the KITTI Eigen partitioning dataset.

However, during my training process, I encountered several abnormal phenomena that I would like to bring to your attention:

  1. The loss curve consistently showed a downward trend, but the evaluation indicators' curves reached a stable state very early on.
  2. I observed poor and even abnormal performances across various evaluation indicators.

Here is a screenshot depicting the issue:
image

To accommodate the equipment I am using (a single machine with four RTX 3090s and no SLURM), I modified the distributed training setup from SLURM to standard DDP (DistributedDataParallel).
Additionally, I made some modifications in the dataloader directory to align with the directory structure of my existing KITTI dataset. I believe these changes should not be the cause of the undesirable results, as the code correctly outputs messages such as "Loaded 23158 images. Totally 0 invalid pairs are filtered" and "Loaded 652 images. Totally 45 invalid pairs are filtered."
Furthermore, in order to track the training process using TensorBoard, I incorporated some code in the training section to generate and save log information.
Apart from these adjustments, I have not made any additional modifications to the code. Specifically, the config file remains the same as the one you provided.

I would greatly appreciate your valuable insights and guidance regarding these issues. If there are any specific details or additional information I can provide to assist in troubleshooting, please let me know. Thank you once again for your remarkable contribution to the field.Best regards

@lpiccinelli-eth
Copy link
Collaborator

Thank you for your appreciation.

In my experience, the training loss is quite high, too. I would double check if the model is using the backbone pretrained on ImageNet, namely, does it print out something like "Encoder is pretrained from..." at the beginning of the training?
Another thing to check might be a mismatch between validation and training GT (for instance the depth_scale, usually for KITTI is 256.0).

Any additional information may be helpful in understanding where the problem lies.
Best.

@sunpihai-up
Copy link
Author

Thank you for your appreciation.

In my experience, the training loss is quite high, too. I would double check if the model is using the backbone pretrained on ImageNet, namely, does it print out something like "Encoder is pretrained from..." at the beginning of the training? Another thing to check might be a mismatch between validation and training GT (for instance the depth_scale, usually for KITTI is 256.0).

Any additional information may be helpful in understanding where the problem lies. Best.

Thanks for your reply, I have investigated the code as per your suggestion.
Firstly, I verified that the program correctly loads the swin_large_22k model pretrained on ImageNet.

Since I already have the pre-trained model locally, I modified the code that originally used the URL to load the model, and instead loaded it using the local file path.

# Before Modification
if pretrained:
            print(f"\t-> Encoder is pretrained from: {pretrained}")
            pretrained_state = load_state_dict_from_url(pretrained, map_location="cpu")[
                "model"
            ]
            info = self.load_state_dict(deepcopy(pretrained_state), strict=False)
            print("Loading pretrained info:", info)

# After Modification
if pretrained:
            
            from urllib.parse import urlparse
            def is_url(path):
                # Check pretrained is URL or path
                result = urlparse(path)
                return all([result.scheme, result.netloc])
            
            print(f"\t-> Encoder is pretrained from: {pretrained}")
            if is_url(pretrained):
                pretrained_state = load_state_dict_from_url(pretrained, map_location="cpu")[
                    "model"
                ]
                info = self.load_state_dict(deepcopy(pretrained_state), strict=False)
                print("Loading pretrained info:", info)
            else:
                pretrained_state = torch.load(pretrained, map_location="cpu")["model"]

Therefore, when I run the training program, four prompt messages (from four processes) will be printed: Encoder is pretrained from: /home/sph/data/swin_transformer/swin_large_patch4_window7_224_22k.pth.

Secondly, I addressed the alignment issue you mentioned between the training set and the test set. I used the Eigen splits on KITTI for both the training set and the test set. However, I couldn't find any factors in the program that could cause a mismatch between them. I noticed that loading the training set and the test set uses the same code module (class KITTIDataset). Also, their depth_scale values are both set to 256.

Additionally, I performed tests on the training set and the test set using the weights you provided and the weights I trained myself, respectively (using test.py from the repository).

Training Set: Randomly selected 600 images from the training set on Eigen Split.

Test Set: All 652 valid images from the test set on Eigen Split.

The model weights you provide The model weights I trained
Training Set Test/SILog: 0.38459012309710183
d05 0.9829 (0.9829)
d1 0.997 (0.997)
d2 0.9995 (0.9995)
d3 0.9999 (0.9999)
rmse 1.1298 (1.1298)
rmse_log 0.0408 (0.0408)
abs_rel 0.0259 (0.0259)
sq_rel 0.0404 (0.0404)
log10 0.0111 (0.0111)
silog 3.7325 (3.7325)
Test/SILog: 0.3662095022201537
d05 0.9858 (0.9858)
d1 0.9973 (0.9973)
d2 0.9995 (0.9995)
d3 0.9999 (0.9999)
rmse 0.9945 (0.9945)
rmse_log 0.0371 (0.0371)
abs_rel 0.022 (0.022)
sq_rel 0.0308 (0.0308)
log10 0.0095 (0.0095)
silog 3.6079 (3.6079)
Test Set Test/SILog: 0.7632721244561964
d05 0.8968 (0.8968)
d1 0.9771 (0.9771)
d2 0.9973 (0.9973)
d3 0.9993 (0.9993)
rmse 2.0665 (2.0665)
rmse_log 0.0772 (0.0772)
abs_rel 0.0504 (0.0504)
sq_rel 0.1455 (0.1455)
log10 0.0218 (0.0218)
silog 7.0735 (7.0735)
Test/SILog: 1.2326601183995969
d05 0.7809 (0.7809)
d1 0.9256 (0.9256)
d2 0.9846 (0.9846)
d3 0.9958 (0.9958)
rmse 3.177 (3.177)
rmse_log 0.1249 (0.1249)
abs_rel 0.081 (0.081)
sq_rel 0.3827 (0.3827)
log10 0.0351 (0.0351)
silog 11.2095 (11.2095)

Both models perform similarly on the training set (or my trained model even performs slightly better). However, there is a significant difference in performance between the two models on the test set. This suggests the presence of overfitting. However, during the training process, there was no occurrence of the evaluation metric initially improving and then deteriorating later.

I look forward to hearing your further suggestions. Thank you once again for your reply. Best wishes to you!

@lpiccinelli-eth
Copy link
Collaborator

You could try using the provided checkpoint and test it on your data/code and see if the results match the ones provided.
If they match then the problem is the training, if not, the problem might be the data.
Best.

@sunpihai-up
Copy link
Author

You could try using the provided checkpoint and test it on your data/code and see if the results match the ones provided. If they match then the problem is the training, if not, the problem might be the data. Best.

Yes, I did exactly that. The table I provided describes this work.
I wanted to evaluate the effectiveness of my training, so I tested it separately on the training set using both the checkpoint you provided and the one I trained.
I also wanted to check my data, so I tested it separately on the test set using both the checkpoint you provided and the one I trained.
However, the results were peculiar. The checkpoint you provided performed well on both the training and test sets. On the other hand, the checkpoint I trained outperformed yours on the training set but performed poorly on the test set.

So, the fact that the checkpoint you provided performs well on both my training and test sets suggests that there might not be an problem with my dataset.
The checkpoint I trained myself performs very well on the training set, which indicates that my training process is effective.
However, the strange thing is that my own trained checkpoint performs even better than the one you provided on the training set, yet it performs very poorly on the test set. This seems to exhibit signs of overfitting, but based on the evaluation metric trends during the training process, it doesn't appear that overfitting occurred.

@lpiccinelli-eth
Copy link
Collaborator

Honestly, I do not know, you are not seeing any overfitting, but it does not generalize either since the training metrics are good, but not the validation ones. Moreover, KITTI validation and training are pretty similar, so I wonder why such drop.
In addition, I was able to reproduce the results with SWin-Tiny: I checked validation after the first 1k steps and they matched my original training.

Either the training set is different wrt the one I used (I used the "new" Eigen split, namely the one after 2019) or the configs (i.e., augmentations, training schedule/lr, etc...) have something different.

@sunpihai-up
Copy link
Author

Honestly, I do not know, you are not seeing any overfitting, but it does not generalize either since the training metrics are good, but not the validation ones. Moreover, KITTI validation and training are pretty similar, so I wonder why such drop. In addition, I was able to reproduce the results with SWin-Tiny: I checked validation after the first 1k steps and they matched my original training.

Either the training set is different wrt the one I used (I used the "new" Eigen split, namely the one after 2019) or the configs (i.e., augmentations, training schedule/lr, etc...) have something different.

Thank you for your assistance! This situation is indeed perplexing. I believe we can rule out differences in the dataset and configuration since I used the kitti_eigen_test.txt and kitti_eigen_train.txt files provided in the repository. I also verified that the file paths in the loaded dataset match exactly with the split. Furthermore, I haven't made any modifications to the configuration file.

I would like to make some attempts based on your work. Therefore, I will continue to try and debug the issue.

Once again, thank you for your help, and I wish you a pleasant day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants