Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve normalisation logic? #2

Open
derhuerst opened this issue Oct 3, 2021 · 2 comments
Open

improve normalisation logic? #2

derhuerst opened this issue Oct 3, 2021 · 2 comments

Comments

@derhuerst
Copy link

Thanks for this project, I think it is very underrated!

I came across a use case where I assume the normalisation doesn't work as intended:

  1. Angermünde, Rosenstr. 53.01735, 14.00092
  2. Rosenstrasse, Angermünde 53.01691, 14.00058

If I rename the second one to Rosenstraße, Angermünde, the two are successfully classified as similar.

@patrickbr
Copy link
Member

patrickbr commented Oct 4, 2023

Could you share your exact setup / command, so that I can reproduce this issue? IIRC, normalization is not enabled per default. It might actually be the case that the model never had the opportunity to learn the equivalence of "str." and "strasse".

@derhuerst
Copy link
Author

Unfortunately, I don't know anymore how I had set things up.

I assume I was looking into how statsimi works with stops from the VBB GTFS dataset (mirror with old versions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants