You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
text=Here is brettspielversand.de.
norm_text=Here is b r e t t s p i e l v e r s a n d punkt de punkt
expected output=Here is brettspielversand punkt de.
Similar problem with text=KIM.com-Specials..
I got same problem with website in text on Spanish and Italian.
I also found a specific bug in Spanish normalization. The following code is applied:
text=El texto de Li Qin en este libro ahora está disponible en forma de libro electrónico.
norm_text=El texto de quincuagésimo primero Qin en este libro ahora está disponible en forma de libro electrónico.
Not sure what is expected output, but current norm_text looks not okay.
The text was updated successfully, but these errors were encountered:
text="Das gibt uns Perspektive, Flexibilität, Optimismus, Engagement und Pluralität in allen Sinnesbereichen.in allen Sinnen."
normalized_text="Das gibt uns Perspektive, Flexibilität, Optimismus, Engagement und Pluralität in allen S i n n e s b e r e i c h e n punkt in allen Sinnen."
text="Das gibt uns Perspektive, Flexibilität, Optimismus, Engagement und Pluralität in allen Sinnesbereichen.in allen Sinnen." normalized_text="Das gibt uns Perspektive, Flexibilität, Optimismus, Engagement und Pluralität in allen S i n n e s b e r e i c h e n punkt in allen Sinnen."
The above is expected behavior. The normalizer assumes that consecutive sentences are separated by a period and at least one whitespace. The string quoted above comprises two clauses separated by a period without whitespaces. Adding a whitespace after the period induces correct normalization.
Hi!
I found a bug in English normalization. The following code is applied:
text=
Here is mail.nasa.gov.
norm_text=
Here is mail dot nasa dot gov dot
expected output=
Here is mail dot nasa dot gov.
Similar bug can be reached in German normalization. The following code is applied:
text=
Here is brettspielversand.de.
norm_text=
Here is b r e t t s p i e l v e r s a n d punkt de punkt
expected output=
Here is brettspielversand punkt de.
Similar problem with text=
KIM.com-Specials.
.I got same problem with website in text on Spanish and Italian.
I also found a specific bug in Spanish normalization. The following code is applied:
text=
El texto de Li Qin en este libro ahora está disponible en forma de libro electrónico.
norm_text=
El texto de quincuagésimo primero Qin en este libro ahora está disponible en forma de libro electrónico.
Not sure what is expected output, but current norm_text looks not okay.
The text was updated successfully, but these errors were encountered: