New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding name normalization does not remove year suffix #156
Comments
Just noticed the same thing. Suggested fix: .toLowerCase().replace(/:\d{4}$/, "").replace(/[^0-9a-z]/g, ""); |
Thanks guys, I think you're right, I'll add a fix soon.
…--
Alexander Shtuchkin
On Wed, Jul 5, 2017 at 11:26 AM, Erik Kemperman ***@***.***> wrote:
Just noticed the same thing. Suggested fix:
.toLowerCase().replace(/:\d{4}$/, "").replace(/[^0-9a-z]/g, "");
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAmVHYk30eR2uB93vvK3Kr06jEYV5yLcks5sK9VAgaJpZM4OMhuw>
.
|
Actually, maybe it should be .toLowerCase().replace(/:\d{4}[^0-9a-z]*$/, "").replace(/[^0-9a-z]/g, ""); |
FWIW I just swapped the ORs in my local copy /:\d{4}$|[^0-9a-z]/g |
I was thinking you might want to get rid of trailing (after the year) non-alphanumerics as well? |
Thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This line is supposed to normalize encoding names by removing non-alphanumeric characters and stripping an appended year. The year is not being stripped with the current regex, causing encoding names with years to not match.
Example:
Input:
iso_8859-5:1988
Output:
iso885951988
Expected output:
iso88595
I think the reason the current regex does not work is that the colon character is matched in the first part as a non-alphanumeric character, therefore causing the following year part to not match.
The text was updated successfully, but these errors were encountered: