New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore ctype.h character classification for non-ASCII platforms #2494

Closed
wants to merge 4 commits into
from

Conversation

Projects
None yet
2 participants
@smuehlst
Contributor

smuehlst commented Apr 13, 2018

With commit 4272a0b curl-specific
character classification macros and functions were introduced
in curl_ctype.[ch] to avoid dependencies on the locale. This broke
curl on non-ASCII, e.g. EBCDIC platforms. This change restores the
previous set of character classification macros when
CURL_DOES_CONVERSIONS is defined.

This was discovered on a z/OS machine with EBCDIC encoding, where Content-type headers could no longer be matched because leading blanks were no longer stripped off by curl.

smuehlst added some commits Apr 13, 2018

Restore ctype.h character classification for non-ASCII platforms
With commit 4272a0b curl-speficic
character classification macros and functions were introduced
in curl_ctype.[ch] to avoid dependencies on the locale. This broke
curl on non-ASCII, e.g. EBCDIC platforms. This change restores the
previous set of character classification macros when
CURL_DOES_CONVERSIONS is defined.
Fix strcpy_url for non-ASCII platforms
strcpy_url() had a hard-coded dependency on ASCII to work because of
an explicit character comparison for being greater or equal than 0x80.
@smuehlst

This comment has been minimized.

Show comment
Hide comment
@smuehlst

smuehlst Apr 16, 2018

Contributor

I added a second fix for strcpy_url to the same pull request because the strcpy_url fix depends on the character classification fix.

Contributor

smuehlst commented Apr 16, 2018

I added a second fix for strcpy_url to the same pull request because the strcpy_url fix depends on the character classification fix.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Apr 16, 2018

Member

The red travis is a checkrsrc complaint:

./http_chunks.c:83:45: warning: space before last semicolon (SPACESEMICOLON)
         || (digit >= 0x61 && digit <= 0x66) /* a-f */;
                                             ^

While the warning looks a bit wrong there, I think you could improve the readability of that code by not having the semicolon "hidden" like that after a comment.

Member

bagder commented Apr 16, 2018

The red travis is a checkrsrc complaint:

./http_chunks.c:83:45: warning: space before last semicolon (SPACESEMICOLON)
         || (digit >= 0x61 && digit <= 0x66) /* a-f */;
                                             ^

While the warning looks a bit wrong there, I think you could improve the readability of that code by not having the semicolon "hidden" like that after a comment.

@smuehlst

This comment has been minimized.

Show comment
Hide comment
@smuehlst

smuehlst Apr 17, 2018

Contributor

Sorry, I had thought I had run "make checksrc", but apparently I did run it for the wrong sources. Fixed.

Contributor

smuehlst commented Apr 17, 2018

Sorry, I had thought I had run "make checksrc", but apparently I did run it for the wrong sources. Fixed.

@smuehlst

This comment has been minimized.

Show comment
Hide comment
@smuehlst

smuehlst Apr 17, 2018

Contributor

Test case 255 is failing in in one of the travis builds, while it is succeeding in all other builds:

https://travis-ci.org/curl/curl/jobs/367525539

What does that tell me? I doubt that the failure could be related to my change, given the fact that for ASCII platforms it has no effect.

Contributor

smuehlst commented Apr 17, 2018

Test case 255 is failing in in one of the travis builds, while it is succeeding in all other builds:

https://travis-ci.org/curl/curl/jobs/367525539

What does that tell me? I doubt that the failure could be related to my change, given the fact that for ASCII platforms it has no effect.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Apr 19, 2018

Member

It seems to be a test that started to become flaky (for other PRs as well) and shouldn't be related. I've triggered a rerun of it now.

Member

bagder commented Apr 19, 2018

It seems to be a test that started to become flaky (for other PRs as well) and shouldn't be related. I've triggered a rerun of it now.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Apr 23, 2018

Member

We avoid the use of ISXDIGIT to accommodate non-ASCII hosts.

So this is because you're checking ascii input so you don't want Curl_isxdigit() because that assumes the non-ascii input in your case?

In all ascii-based systems however, Curl_isxdigit_ascii() adds nothing but is simply a tad bit slower... I propose that we then introduce a #define Curl_isxdigit_ascii(x) Curl_isxdigit(x) for ascii hosts. What do you say?

Member

bagder commented Apr 23, 2018

We avoid the use of ISXDIGIT to accommodate non-ASCII hosts.

So this is because you're checking ascii input so you don't want Curl_isxdigit() because that assumes the non-ascii input in your case?

In all ascii-based systems however, Curl_isxdigit_ascii() adds nothing but is simply a tad bit slower... I propose that we then introduce a #define Curl_isxdigit_ascii(x) Curl_isxdigit(x) for ascii hosts. What do you say?

@smuehlst

This comment has been minimized.

Show comment
Hide comment
@smuehlst

smuehlst Apr 23, 2018

Contributor

We avoid the use of ISXDIGIT to accommodate non-ASCII hosts.

So this is because you're checking ascii input so you don't want Curl_isxdigit() because that assumes the non-ascii input in your case?

Correct. The above comment comes from the old code that was present in http_chunks.c before commit 4272a0b:

We avoid the use of isxdigit to accommodate non-ASCII hosts. */

I restored this code and renamed the old Curl_isxdigit() function to Curl_isxdigit_ascii() with the hope to avoid confusion.

In all ascii-based systems however, Curl_isxdigit_ascii() adds nothing but is simply a tad bit slower... I propose that we then introduce a #define Curl_isxdigit_ascii(x) Curl_isxdigit(x) for ascii hosts. What do you say?

Sounds reasonable to me. Do I understand it correctly that the current Curl_isxdigit_ascii() function shall be kept for the non-ASCII case and the define will be used for the ASCII case, like this:

#ifdef CURL_DOES_CONVERSIONS
/* Check for an ASCII hex digit.
   We avoid the use of ISXDIGIT to accommodate non-ASCII hosts. */
static bool Curl_isxdigit_ascii(char digit)
{
  return (digit >= 0x30 && digit <= 0x39) /* 0-9 */
        || (digit >= 0x41 && digit <= 0x46) /* A-F */
        || (digit >= 0x61 && digit <= 0x66); /* a-f */
}
#else
#define Curl_isxdigit_ascii(x) Curl_isxdigit(x) 
#endif
Contributor

smuehlst commented Apr 23, 2018

We avoid the use of ISXDIGIT to accommodate non-ASCII hosts.

So this is because you're checking ascii input so you don't want Curl_isxdigit() because that assumes the non-ascii input in your case?

Correct. The above comment comes from the old code that was present in http_chunks.c before commit 4272a0b:

We avoid the use of isxdigit to accommodate non-ASCII hosts. */

I restored this code and renamed the old Curl_isxdigit() function to Curl_isxdigit_ascii() with the hope to avoid confusion.

In all ascii-based systems however, Curl_isxdigit_ascii() adds nothing but is simply a tad bit slower... I propose that we then introduce a #define Curl_isxdigit_ascii(x) Curl_isxdigit(x) for ascii hosts. What do you say?

Sounds reasonable to me. Do I understand it correctly that the current Curl_isxdigit_ascii() function shall be kept for the non-ASCII case and the define will be used for the ASCII case, like this:

#ifdef CURL_DOES_CONVERSIONS
/* Check for an ASCII hex digit.
   We avoid the use of ISXDIGIT to accommodate non-ASCII hosts. */
static bool Curl_isxdigit_ascii(char digit)
{
  return (digit >= 0x30 && digit <= 0x39) /* 0-9 */
        || (digit >= 0x41 && digit <= 0x46) /* A-F */
        || (digit >= 0x61 && digit <= 0x66); /* a-f */
}
#else
#define Curl_isxdigit_ascii(x) Curl_isxdigit(x) 
#endif
@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Apr 23, 2018

Member

Right, that's exactly what I was thinking!

Member

bagder commented Apr 23, 2018

Right, that's exactly what I was thinking!

Use standard Curl_isxdigit function on ASCII hosts
While for non-ASCII hosts a special function is needed to check for
a hex digit in ASCII encoding, on ASCII hosts the normal Curl_isxdigit
function can be used.
@smuehlst

This comment has been minimized.

Show comment
Hide comment
@smuehlst

smuehlst Apr 23, 2018

Contributor

Pull request updated accordingly.

Contributor

smuehlst commented Apr 23, 2018

Pull request updated accordingly.

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Apr 24, 2018

Member

Thanks!

Member

bagder commented Apr 24, 2018

Thanks!

@bagder bagder closed this in dd7521b Apr 24, 2018

smuehlst referenced this pull request Apr 25, 2018

@lock lock bot locked as resolved and limited conversation to collaborators Jul 23, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.