Clean up URI parsing #1316

Closed
wants to merge 9 commits into
from

Conversation

Projects
None yet
2 participants
@sourcejedi
Contributor

sourcejedi commented May 3, 2012

I'm new to CodeIgniter, so I had a look to see how the "nice urls" worked. I was hampered a little by my understanding of HTTP etc., which CodeIgniter did not seem to match in several places. After a little research and testing, I concluded that in several places the mismatch was in CodeIgniter, and not just my understanding of it :)

This is a "clean up", in the sense that these aren't critical fixes. (But they are behavioral changes; I don't mean it's just refactoring).

Passes unit tests (ignoring database, which I don't have set up).
Passed light manual testing against the codebase of itsravenous.com (not published).

sourcejedi added some commits May 1, 2012

Support an explicit $config['uri_protocol'] = '_GET'
All the other methods can be specified manually if necessary.
If you're trying to run on such a brain-damaged server,
relying on auto-detection sounds quite fragile.
Accept any non-empty value from PATH_INFO, QUERY_STRING, or _GET
If the server says the URI is '/' (or '/index.php'), that may be
just what it is!  (If we don't believe it, we can end up serving
pages at unexpected URIs).  And if the server actually lies to us,
it'd be better to use an explicit $config['uri_protocol'].
Mainly cosmetic fix to cleaning of REQUEST_URI
parse_url() is not going to explode if we pass it '/' or ''.

OTOH, in case REQUEST_URI is e.g. all slashes and we "clean" it
to nothing, we don't want to completely forget that REQUEST_URI was set.
Try to explain $config['uri_protocol'] better
The main purpose here is to document that REQUEST_URI is a hack.
To be specific, it relies on SCRIPT_NAME.  When using Apache
mod_rewrite as suggested in the CI manual, this only "happens" to work:
it assumes the rewrite only removes one path segment.

Also imply the same about AUTO (which defaults to REQUEST_URI).

Since the non-hacky protocol is PATH_INFO, also outline why PATH_INFO is
not the default - it's because we support versions of PHP which are not
CGI-compliant - and make explicit that ORIG_PATH_INFO is a workaround for
these older versions.  <https://bugs.php.net/bug.php?id=31892>
Accept absolute URIs in REQUEST_URI (strict HTTP/1.1 compliance)
"To allow for transition to absoluteURIs in all requests in future
versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI
form in requests, even though HTTP/1.1 clients will only generate
them in requests to proxies." - RFC2616
REQUEST_URI is url-encoded, so we should be decoding it
PATH_INFO is already decoded - but if we insist on using REQUEST_URI, then
we need to decode it ourselves.
URI_test: make sure we clear $_GET before trying to test it
At this point in the test, we're assuming that $_GET is empty.
But we've been calling uri->_fetch_uri_string(), which
(for better or worse) sets $_GET under certain circumstances.

This change will make sure the test will pass when
the implementation of _fetch_uri_string() changes
(see next commit).
/index.php/0 is not /index.php (inappropriate use of empty())
Don't use empty() to test for an empty string.

php > $a = '0';
php > echo empty($a);
1
Don't try to use getenv() unless we have a good reason
There's no need to use getenv() for QUERY_STRING (or CONTENT_TYPE).
The only reason to use getenv() was to bypass a bug in PHP's handling
of PATH_INFO.
+| 'PATH_INFO' Use PATH_INFO (CGI standard, but broken in PHP < 5.2.4)
+| 'ORIG_PATH_INFO' Use ORIG_PATH_INFO (workaround, PHP < 5.2.4 only)
+| 'QUERY_STRING' Use QUERY_STRING (CGI standard)
+| '_GET' Use _GET (the php array, in case even QUERY_STRING fails)

This comment has been minimized.

Show comment Hide comment
@timw4mail

timw4mail May 3, 2012

Contributor

Have you actually observed a case where $_GET works over $_SERVER['QUERY_STRING'] ?

I'm not really seeing the point of making this an additional method of path determination.

@timw4mail

timw4mail May 3, 2012

Contributor

Have you actually observed a case where $_GET works over $_SERVER['QUERY_STRING'] ?

I'm not really seeing the point of making this an additional method of path determination.

This comment has been minimized.

Show comment Hide comment
@sourcejedi

sourcejedi May 3, 2012

Contributor

On 03/05/12 16:52, Timothy Warren wrote:

@@ -62,11 +62,12 @@
| URI string. The default setting of 'AUTO' works for most servers.
| If your links do not seem to work, try one of the other delicious flavors:
|
-| 'AUTO' Default - auto detects
-| 'PATH_INFO' Uses the PATH_INFO
-| 'QUERY_STRING' Uses the QUERY_STRING
-| 'REQUEST_URI' Uses the REQUEST_URI
-| 'ORIG_PATH_INFO' Uses the ORIG_PATH_INFO
+| 'AUTO' Default - look at several globals and take a guess
+| 'REQUEST_URI' Use REQUEST_URI (hack, to work on either version of PHP)
+| 'PATH_INFO' Use PATH_INFO (CGI standard, but broken in PHP< 5.2.4)
+| 'ORIG_PATH_INFO' Use ORIG_PATH_INFO (workaround, PHP< 5.2.4 only)
+| 'QUERY_STRING' Use QUERY_STRING (CGI standard)
+| '_GET' Use _GET (the php array, in case even QUERY_STRING fails)
Have you actually observed a case where $_GET works over $_SERVER['QUERY_STRING'] ?

I'm not really seeing the point of making this an additional method of path determination.

The code is already there in AUTO. This just lets you access it
manually. I wanted it myself for testing - so I could see what $_GET
did, in case you end up falling back to it.

I was also thinking about people relying on AUTO falling back to $_GET -
whether they knew it or not. If some change to AUTO meant we stuck with
something different, that didn't work, then they'd want to be able to
select _GET manually.

Alan

@sourcejedi

sourcejedi May 3, 2012

Contributor

On 03/05/12 16:52, Timothy Warren wrote:

@@ -62,11 +62,12 @@
| URI string. The default setting of 'AUTO' works for most servers.
| If your links do not seem to work, try one of the other delicious flavors:
|
-| 'AUTO' Default - auto detects
-| 'PATH_INFO' Uses the PATH_INFO
-| 'QUERY_STRING' Uses the QUERY_STRING
-| 'REQUEST_URI' Uses the REQUEST_URI
-| 'ORIG_PATH_INFO' Uses the ORIG_PATH_INFO
+| 'AUTO' Default - look at several globals and take a guess
+| 'REQUEST_URI' Use REQUEST_URI (hack, to work on either version of PHP)
+| 'PATH_INFO' Use PATH_INFO (CGI standard, but broken in PHP< 5.2.4)
+| 'ORIG_PATH_INFO' Use ORIG_PATH_INFO (workaround, PHP< 5.2.4 only)
+| 'QUERY_STRING' Use QUERY_STRING (CGI standard)
+| '_GET' Use _GET (the php array, in case even QUERY_STRING fails)
Have you actually observed a case where $_GET works over $_SERVER['QUERY_STRING'] ?

I'm not really seeing the point of making this an additional method of path determination.

The code is already there in AUTO. This just lets you access it
manually. I wanted it myself for testing - so I could see what $_GET
did, in case you end up falling back to it.

I was also thinking about people relying on AUTO falling back to $_GET -
whether they knew it or not. If some change to AUTO meant we stuck with
something different, that didn't work, then they'd want to be able to
select _GET manually.

Alan

This comment has been minimized.

Show comment Hide comment
@timw4mail

timw4mail May 3, 2012

Contributor

I'm just wondering if $_SERVER['QUERY_STRING'] is not more reliable than $_GET.

Personally, I've seen issues with $_POST due to bad PHP configuration settings, and since $_GET is parsed from the query string, I would think it would be more probable to have issues with $_GET in the first place, making the use of $_GET a redundancy over $_SERVER['QUERY_STRING'].

@timw4mail

timw4mail May 3, 2012

Contributor

I'm just wondering if $_SERVER['QUERY_STRING'] is not more reliable than $_GET.

Personally, I've seen issues with $_POST due to bad PHP configuration settings, and since $_GET is parsed from the query string, I would think it would be more probable to have issues with $_GET in the first place, making the use of $_GET a redundancy over $_SERVER['QUERY_STRING'].

This comment has been minimized.

Show comment Hide comment
@sourcejedi

sourcejedi May 3, 2012

Contributor

On 03/05/12 17:06, Timothy Warren wrote:

+| 'QUERY_STRING' Use QUERY_STRING (CGI standard)
+| '_GET' Use _GET (the php array, in case even QUERY_STRING fails)
I'm just wondering if $_SERVER['QUERY_STRING'] is not more reliable than $_GET.

Personally, I've seen issues with $_POST due to bad PHP configuration settings, and since $_GET is parsed from the query string, I would think it would be more probable to have issues with $_GET in the first place, making the use of $_GET a redundancy over $_SERVER['QUERY_STRING'].
Absolutely. Right, I checked out the PHP interpreter code, and I don't
see custom code for _GET in any of the SAPIs.

I withdraw _GET as an option, and I will remove the code in the
current AUTO option, which falls back to $_GET.

Thanks
Alan

@sourcejedi

sourcejedi May 3, 2012

Contributor

On 03/05/12 17:06, Timothy Warren wrote:

+| 'QUERY_STRING' Use QUERY_STRING (CGI standard)
+| '_GET' Use _GET (the php array, in case even QUERY_STRING fails)
I'm just wondering if $_SERVER['QUERY_STRING'] is not more reliable than $_GET.

Personally, I've seen issues with $_POST due to bad PHP configuration settings, and since $_GET is parsed from the query string, I would think it would be more probable to have issues with $_GET in the first place, making the use of $_GET a redundancy over $_SERVER['QUERY_STRING'].
Absolutely. Right, I checked out the PHP interpreter code, and I don't
see custom code for _GET in any of the SAPIs.

I withdraw _GET as an option, and I will remove the code in the
current AUTO option, which falls back to $_GET.

Thanks
Alan

@sourcejedi

This comment has been minimized.

Show comment Hide comment
@sourcejedi

sourcejedi May 3, 2012

Contributor

Eek, this was not the code I was looking for. (Last commit screwed up).

PHP Parse error: syntax error, unexpected ';' in /home/alan/php/CodeIgniter/system/core/URI.php on line 125

I'll regenerate this tree tomorrow (GMT), incorporating your comments thus far. Thanks.

Contributor

sourcejedi commented May 3, 2012

Eek, this was not the code I was looking for. (Last commit screwed up).

PHP Parse error: syntax error, unexpected ';' in /home/alan/php/CodeIgniter/system/core/URI.php on line 125

I'll regenerate this tree tomorrow (GMT), incorporating your comments thus far. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment