improve normalisation logic? #2

derhuerst · 2021-10-03T14:01:44Z

Thanks for this project, I think it is very underrated!

I came across a use case where I assume the normalisation doesn't work as intended:

Angermünde, Rosenstr. 53.01735, 14.00092
Rosenstrasse, Angermünde 53.01691, 14.00058

If I rename the second one to Rosenstraße, Angermünde, the two are successfully classified as similar.

The text was updated successfully, but these errors were encountered:

patrickbr · 2023-10-04T14:42:58Z

Could you share your exact setup / command, so that I can reproduce this issue? IIRC, normalization is not enabled per default. It might actually be the case that the model never had the opportunity to learn the equivalence of "str." and "strasse".

derhuerst · 2023-10-05T18:36:20Z

Unfortunately, I don't know anymore how I had set things up.

I assume I was looking into how statsimi works with stops from the VBB GTFS dataset (mirror with old versions).

PartTimeDataScientist mentioned this issue Sep 8, 2023

inconsistencies when comparing names on tracks #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve normalisation logic? #2

improve normalisation logic? #2

derhuerst commented Oct 3, 2021

patrickbr commented Oct 4, 2023 •

edited

derhuerst commented Oct 5, 2023

improve normalisation logic? #2

improve normalisation logic? #2

Comments

derhuerst commented Oct 3, 2021

patrickbr commented Oct 4, 2023 • edited

derhuerst commented Oct 5, 2023

patrickbr commented Oct 4, 2023 •

edited