New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ASCII dependency in strcpy_url and strlen_url #2535

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
@smuehlst
Contributor

smuehlst commented Apr 26, 2018

Commit 3c630f9 partially reverted
the changes from commit dd7521b because of
the problem that strcpy_url() was modified unilaterally without also
modifying strlen_url(). As a consequence strcpy_url() was again depending
on ASCII encoding.

This change fixes strlen_url() and strcpy_url() in parallel to use a
common host-encoding independent criterion for deciding whether an URL
character must be %-escaped.

Fix ASCII dependency in strcpy_url and strlen_url
Commit 3c630f9 partially reverted
the changes from commit dd7521b because of
the problem that strcpy_url() was modified unilaterally without also
modifying strlen_url(). As a consequence strcpy_url() was again depending
on ASCII encoding.

This change fixes strlen_url() and strcpy_url() in parallel to use a
common host-encoding independent criterion for deciding whether an URL
character must be %-escaped.
@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Apr 26, 2018

Member

This changes behavior for redirects to URLs with "control codes" in the string, which I think is probably fine, but we need a test case or two that verify this so that we can trust the code a bit more.

Member

bagder commented Apr 26, 2018

This changes behavior for redirects to URLs with "control codes" in the string, which I think is probably fine, but we need a test case or two that verify this so that we can trust the code a bit more.

@smuehlst

This comment has been minimized.

Show comment
Hide comment
@smuehlst

smuehlst Apr 27, 2018

Contributor

This changes behavior for redirects to URLs with "control codes" in the string, which I think is probably fine, but we need a test case or two that verify this so that we can trust the code a bit more.

The current pull request does not change the behavior regarding control codes. Please check the condition in function urlchar_needs_escaping, which excludes control characters from being escaped:

static bool urlchar_needs_escaping(int c)
{
    return !(ISCNTRL(c) || ISSPACE(c) || ISGRAPH(c));
}

To actually include control codes in the set of characters to be escaped I can simplify that condition back to !ISPRINT(c), and I can drop the new implementation of ISCNTRL(c) again as it is needed nowhere else.

For writing new test cases I need some guidance please. The test case test1138 apparently is the one that tests URLs with characters beyond 0x80. Can that be extended, or is a new test case in a new file required?

The documentation in the FILEFORMAT file says:

All data for a single test case resides in a single ASCII file.

But the file test1138 directly contains UTF-8. Do I understand it correctly that I can directly enter the control characters in the file then?

Contributor

smuehlst commented Apr 27, 2018

This changes behavior for redirects to URLs with "control codes" in the string, which I think is probably fine, but we need a test case or two that verify this so that we can trust the code a bit more.

The current pull request does not change the behavior regarding control codes. Please check the condition in function urlchar_needs_escaping, which excludes control characters from being escaped:

static bool urlchar_needs_escaping(int c)
{
    return !(ISCNTRL(c) || ISSPACE(c) || ISGRAPH(c));
}

To actually include control codes in the set of characters to be escaped I can simplify that condition back to !ISPRINT(c), and I can drop the new implementation of ISCNTRL(c) again as it is needed nowhere else.

For writing new test cases I need some guidance please. The test case test1138 apparently is the one that tests URLs with characters beyond 0x80. Can that be extended, or is a new test case in a new file required?

The documentation in the FILEFORMAT file says:

All data for a single test case resides in a single ASCII file.

But the file test1138 directly contains UTF-8. Do I understand it correctly that I can directly enter the control characters in the file then?

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 3, 2018

Member

The current pull request does not change the behavior regarding control codes

You're right and I was wrong. This should maintain the same functionality on ASCII systems and yet enable non-ascii systems to do the right thing.

The test file format is actually totally encoding agnostic and can contain whatever is suitable so test 1138 has some UTF-8 sequences in there to verify exactly this sort of %-encoding. This test continues to work after your patch, as the CI builds and tests here already show.

I will merge this asap.

Member

bagder commented May 3, 2018

The current pull request does not change the behavior regarding control codes

You're right and I was wrong. This should maintain the same functionality on ASCII systems and yet enable non-ascii systems to do the right thing.

The test file format is actually totally encoding agnostic and can contain whatever is suitable so test 1138 has some UTF-8 sequences in there to verify exactly this sort of %-encoding. This test continues to work after your patch, as the CI builds and tests here already show.

I will merge this asap.

@bagder bagder closed this in 7f41432 May 3, 2018

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder May 3, 2018

Member

Thanks a lot!

Member

bagder commented May 3, 2018

Thanks a lot!

@lock lock bot locked as resolved and limited conversation to collaborators Aug 1, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.