Encoding name normalization does not remove year suffix #156

rossj · 2017-07-03T17:17:10Z

This line is supposed to normalize encoding names by removing non-alphanumeric characters and stripping an appended year. The year is not being stripped with the current regex, causing encoding names with years to not match.

Example:
Input: iso_8859-5:1988
Output: iso885951988
Expected output: iso88595

I think the reason the current regex does not work is that the colon character is matched in the first part as a non-alphanumeric character, therefore causing the following year part to not match.

The text was updated successfully, but these errors were encountered:

erikkemperman · 2017-07-05T18:26:08Z

Just noticed the same thing. Suggested fix:

.toLowerCase().replace(/:\d{4}$/, "").replace(/[^0-9a-z]/g, "");

ashtuchkin · 2017-07-05T18:30:22Z

Thanks guys, I think you're right, I'll add a fix soon.

…

-- Alexander Shtuchkin

On Wed, Jul 5, 2017 at 11:26 AM, Erik Kemperman ***@***.***> wrote: Just noticed the same thing. Suggested fix: .toLowerCase().replace(/:\d{4}$/, "").replace(/[^0-9a-z]/g, ""); — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#156 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAmVHYk30eR2uB93vvK3Kr06jEYV5yLcks5sK9VAgaJpZM4OMhuw> .

erikkemperman · 2017-07-05T19:05:42Z

Actually, maybe it should be

.toLowerCase().replace(/:\d{4}[^0-9a-z]*$/, "").replace(/[^0-9a-z]/g, "");

rossj · 2017-07-05T19:44:15Z

FWIW I just swapped the ORs in my local copy

/:\d{4}$|[^0-9a-z]/g

erikkemperman · 2017-07-05T19:47:25Z

I was thinking you might want to get rid of trailing (after the year) non-alphanumerics as well?

erikkemperman · 2018-04-07T08:03:54Z

Thanks!

ashtuchkin closed this as completed in 696be8a Apr 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding name normalization does not remove year suffix #156

Encoding name normalization does not remove year suffix #156

rossj commented Jul 3, 2017

erikkemperman commented Jul 5, 2017

ashtuchkin commented Jul 5, 2017 via email

erikkemperman commented Jul 5, 2017

rossj commented Jul 5, 2017

erikkemperman commented Jul 5, 2017

erikkemperman commented Apr 7, 2018

Encoding name normalization does not remove year suffix #156

Encoding name normalization does not remove year suffix #156

Comments

rossj commented Jul 3, 2017

erikkemperman commented Jul 5, 2017

ashtuchkin commented Jul 5, 2017 via email

erikkemperman commented Jul 5, 2017

rossj commented Jul 5, 2017

erikkemperman commented Jul 5, 2017

erikkemperman commented Apr 7, 2018