Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results on Evaluation with the same pretrained model #25

Open
VictorZoo opened this issue Dec 7, 2021 · 7 comments
Open

Different results on Evaluation with the same pretrained model #25

VictorZoo opened this issue Dec 7, 2021 · 7 comments

Comments

@VictorZoo
Copy link

VictorZoo commented Dec 7, 2021

Dear Author,

Sorry to bother you.

I have download the dataset and checkpoints with command python -m pixloc.download --select [dataset name] checkpoints . Then, I try to evaluate the datasets with python -m pixloc.run_[7Scenes|Cambridge|Aachen|CMU|RobotCar] (without --from pose) and got the results. However, it shows different results from the paper. The results are as followed:

7_scene

Chess Fire Heads Office Pumpkin Kitchen Stairs
Pixloc_author 2.8/0.95 2.4/0.95 1.1/0.83 3.5/1.04 5.2/1.46 4.5/1.48 6.9/1.71
Pixloc_reproduce 2.7/0.95 2.3/0.95 1.1/0.82 3.6/1.04 5.1/1.43 4.5/1.48 7.0/1.61
Paper 2/0.8 2/0.73 1/0.82 3/0.82 4/1.21 3/1.2 5/1.3

Pixloc_author means use the original checkpoint_best.tar, and Pixloc_reproduce uses the reproduced model (running for 18 epoch by myself).

It is interesting to note, for Pixloc_author and Pixloc_reproduce, the results are different from the paper, but similar to each others.

Cambridge

Court King’s Hospital Shop St. Mary’s
Pixloc_author 41.2/0.17 16.7/0.26 50.8/0.83 5.5/0.21 14.3/0.35
Pixloc_reproduce 37.7/0.15 16.3/0.25 45.9/0.66 5.5/0.21 14.0/0.37
Paper 30/0.12 14/0.24 16/0.32 5/0.23 10.0/0.34

Also different from the original results.

Aachen

微信图片_20211208032115

Note that, the results are similar to the issue #24

CMU

Original:

微信图片_20211207215849

See, the results also are similar to issue #20 .


My Torch version == 1.7.1 and Numpy == 1.21.2. It's worth to note that when using the model trained by myself or author's model , it will got similar results (see 7Scenes) but different from the paper listed results. I'm also eager to know what went wrong, and if you could help me, I'd appreciate it.

By the way, I will use reproduced model to test Cambridge, Aachen and CMU. To find if it's still the same as the best model that you provide, but different from the paper's results.

@VictorZoo
Copy link
Author

VictorZoo commented Dec 7, 2021

Some details are as follows:

7Scenes

with author's model

Evaluate scene chess: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_7scenes_chess.txt
[12/06/2021 16:26:38 pixloc.utils.eval INFO]
Median errors: 0.028m, 0.956deg
Percentage of test images localized within:
	1cm, 1deg : 5.10%
	2cm, 2deg : 31.15%
	3cm, 3deg : 55.80%
	5cm, 5deg : 84.70%
	25cm, 2deg : 90.25%
	50cm, 5deg : 94.40%
	500cm, 10deg : 94.85%
[12/06/2021 16:26:38 pixloc INFO] Evaluate scene fire: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_7scenes_fire.txt
[12/06/2021 16:26:41 pixloc.utils.eval INFO]
Median errors: 0.024m, 0.954deg
Percentage of test images localized within:
	1cm, 1deg : 9.10%
	2cm, 2deg : 41.35%
	3cm, 3deg : 63.15%
	5cm, 5deg : 78.10%
	25cm, 2deg : 79.15%
	50cm, 5deg : 87.85%
	500cm, 10deg : 92.85%
[12/06/2021 16:26:41 pixloc INFO] Evaluate scene heads: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_7scenes_heads.txt
[12/06/2021 16:26:42 pixloc.utils.eval INFO]
Median errors: 0.011m, 0.836deg
Percentage of test images localized within:
	1cm, 1deg : 39.30%
	2cm, 2deg : 79.40%
	3cm, 3deg : 84.30%
	5cm, 5deg : 86.30%
	25cm, 2deg : 82.80%
	50cm, 5deg : 86.80%
	500cm, 10deg : 87.80%
[12/06/2021 16:26:42 pixloc INFO] Evaluate scene office: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_7scenes_office.txt
[12/06/2021 16:26:44 pixloc.utils.eval INFO]
Median errors: 0.035m, 1.049deg
Percentage of test images localized within:
	1cm, 1deg : 3.25%
	2cm, 2deg : 21.45%
	3cm, 3deg : 39.92%
	5cm, 5deg : 67.80%
	25cm, 2deg : 81.20%
	50cm, 5deg : 95.50%
	500cm, 10deg : 97.10%
[12/06/2021 16:26:44 pixloc INFO] Evaluate scene pumpkin: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_7scenes_pumpkin.txt
[12/06/2021 16:26:46 pixloc.utils.eval INFO]
Median errors: 0.052m, 1.463deg
Percentage of test images localized within:
	1cm, 1deg : 1.85%
	2cm, 2deg : 9.35%
	3cm, 3deg : 21.05%
	5cm, 5deg : 48.75%
	25cm, 2deg : 61.15%
	50cm, 5deg : 82.40%
	500cm, 10deg : 84.95%
[12/06/2021 16:26:46 pixloc INFO] Evaluate scene redkitchen: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_7scenes_redkitchen.txt
[12/06/2021 16:26:51 pixloc.utils.eval INFO]
Median errors: 0.045m, 1.482deg
Percentage of test images localized within:
	1cm, 1deg : 1.62%
	2cm, 2deg : 13.24%
	3cm, 3deg : 30.08%
	5cm, 5deg : 55.10%
	25cm, 2deg : 65.08%
	50cm, 5deg : 85.20%
	500cm, 10deg : 88.78%
[12/06/2021 16:26:51 pixloc INFO] Evaluate scene stairs: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_7scenes_stairs.txt
[12/06/2021 16:26:52 pixloc.utils.eval INFO]
Median errors: 0.069m, 1.719deg
Percentage of test images localized within:
	1cm, 1deg : 0.70%
	2cm, 2deg : 8.20%
	3cm, 3deg : 18.90%
	5cm, 5deg : 36.60%
	25cm, 2deg : 51.00%
	50cm, 5deg : 74.00%
	500cm, 10deg : 86.00%

pixloc_7scenes_chess.txt
pixloc_7scenes_fire.txt
pixloc_7scenes_heads.txt
pixloc_7scenes_office.txt
pixloc_7scenes_pumpkin.txt
pixloc_7scenes_redkitchen.txt
pixloc_7scenes_stairs.txt

@VictorZoo
Copy link
Author

Cambridge

Evaluate scene OldHospital: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_Cambridge_OldHospital.txt
[12/06/2021 21:11:18 pixloc.utils.eval INFO]
Median errors: 0.508m, 0.831deg
Percentage of test images localized within:
	1cm, 1deg : 0.00%
	2cm, 2deg : 0.00%
	3cm, 3deg : 0.00%
	5cm, 5deg : 0.55%
	25cm, 2deg : 31.32%
	50cm, 5deg : 49.45%
	500cm, 10deg : 94.51%

Evaluate scene StMarysChurch: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_Cambridge_StMarysChurch.txt
[12/06/2021 21:44:31 pixloc.utils.eval INFO]
Median errors: 0.143m, 0.354deg
Percentage of test images localized within:
	1cm, 1deg : 0.38%
	2cm, 2deg : 1.70%
	3cm, 3deg : 4.15%
	5cm, 5deg : 10.57%
	25cm, 2deg : 70.19%
	50cm, 5deg : 81.32%
	500cm, 10deg : 86.60%

Evaluate scene ShopFacade: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_Cambridge_ShopFacade.txt
[12/06/2021 21:46:27 pixloc.utils.io INFO] Imported 103 images from pixloc_Cambridge_ShopFacade.txt
[12/06/2021 21:46:27 pixloc.utils.eval INFO]
Median errors: 0.055m, 0.218deg
Percentage of test images localized within:
	1cm, 1deg : 0.00%
	2cm, 2deg : 5.83%
	3cm, 3deg : 17.48%
	5cm, 5deg : 44.66%
	25cm, 2deg : 89.32%
	50cm, 5deg : 94.17%
	500cm, 10deg : 96.12%

Evaluate scene KingsCollege: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_Cambridge_KingsCollege.txt
[12/06/2021 21:22:38 pixloc.utils.io INFO] Imported 343 images from pixloc_Cambridge_KingsCollege.txt
[12/06/2021 21:22:38 pixloc.utils.eval INFO]
Median errors: 0.167m, 0.263deg
Percentage of test images localized within:
	1cm, 1deg : 0.00%
	2cm, 2deg : 0.58%
	3cm, 3deg : 1.75%
	5cm, 5deg : 7.00%
	25cm, 2deg : 66.47%
	50cm, 5deg : 82.51%
	500cm, 10deg : 97.08%

Evaluate scene GreatCourt: /home/victor/disk1/Disk_E/pixloc/outputs/results/pixloc_Cambridge_GreatCourt.txt
[12/06/2021 22:16:25 pixloc.utils.eval INFO]
Median errors: 0.412m, 0.175deg
Percentage of test images localized within:
	1cm, 1deg : 0.00%
	2cm, 2deg : 0.00%
	3cm, 3deg : 0.92%
	5cm, 5deg : 2.50%
	25cm, 2deg : 36.05%
	50cm, 5deg : 54.08%
	500cm, 10deg : 74.21%


@VictorZoo
Copy link
Author

There is another possibility, the checkpoints_best.tar downloaded by python -m pixloc.download --select checkpoints is not the best model. Because in the .log file (megadepth), it displays that there is only 4 epoch. Pleasw cheack into it.

Just a recommendation. For one check way, may you type python -m pixloc.download --select checkpoints and use the downloaded checkpoints_best.tar to see if it matches the results in paper. Just a recommendation.

Sorry to bother you. Thank you!

@sarlinpe
Copy link
Member

sarlinpe commented Dec 9, 2021

7Scenes

It turns out that the default path pointed to the raw SuperPoint+SuperGlue SfM model, while the results reported in the paper are based on the dense depth maps to cleanup the point cloud. This has now been fixed by 0072dc7. Here are the results that you should get:

Chess Fire Heads Office Pumpkin Kitchen Stairs
Pixloc_author 2.4/0.81 1.9/0.79 1.3/0.86 2.6/0.79 4.1/1.17 3.4/1.21 4.6/1.22
Paper 2/0.8 2/0.73 1/0.82 3/0.82 4/1.21 3/1.2 5/1.3

Cambridge

Let me run these numbers again.

CMU

Let's track this issue in #20

Aachen

Let's keep this in #23. I feel that this is a similar setup issue as with the CMU dataset.

@sarlinpe
Copy link
Member

sarlinpe commented Dec 9, 2021

These are the results that I obtain for Cambridge Landmarks:

Court King’s Hospital Shop St. Mary’s
Pixloc code 28.6/0.15 13.0/0.22 18.4/0.35 4.4/0.22 8.7/0.27
Paper 30/0.12 14/0.24 16/0.32 5/0.23 10.0/0.34

Again there seems to be a discrepancy between your setup and mine. All experiments were conducted with an RTX 2080 Ti and torch==1.10.0+cu102. Output of pip freeze

@VictorZoo
Copy link
Author

Hi @skydes , thanks for your solution. I changed reference_sfm='{scene}/sfm_superpoint+superglue+depth/' and got the similar results to paper. Detail :

Chess Fire Heads Office Pumpkin Kitchen Stairs
Pixloc_author 2.4/0.81 1.9/0.78 1.3/0.85 2.7/0.81 4.1/1.16 3.4/1.21 4.8/1.30
Pixloc_reproduce 2.4/0.80 1.9/0.79 1.2/0.87 2.7/0.81 4.2/1.18 3.4/1.20 5.2/1.20
Paper 2/0.8 2/0.73 1/0.82 3/0.82 4/1.21 3/1.2 5/1.3

As for Cambridge datasets, I notice that in your last block saids "Pixloc_release". It means the release version not the master version in this repo?

@sarlinpe
Copy link
Member

The evaluation is ran with the master branch of the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants