Skip to content

JordanChoo/url-normalizer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Syntax based normalization of URI's

This normalizes URI's based on the specification RFC 3986 
http://www.apps.ietf.org/rfc/rfc3986.html

Example usage:
<code>
require_once 'URLNormalizer.php';

$url = 'eXAMPLE://a/./b/../b/%63/%7bfoo%7d';
$un = new URLNormalizer();
$un->setUrl( $url );
echo $un->normalize();

// result: "example://a/b/c/%7Bfoo%7D"
</code>

The normalization process preserves semantics so, for example, the following URL's are all equivalent:

HTTP://www.Example.com/ and http://www.example.com/
http://www.example.com/a%c2%b1b and http://www.example.com/a%C2%B1b
http://www.example.com/%7Eusername/ and http://www.example.com/~username/
http://www.example.com and http://www.example.com/
http://www.example.com:80/bar.html and http://www.example.com/bar.html
http://www.example.com/../a/b/../c/./d.html and http://www.example.com/a/c/d.html

The following normalizations are performed:

1. Converting the scheme and host to lower case
2. Capitalizing letters in escape sequences
3. Decoding percent-encoded octets of unreserved characters
4. Adding trailing /
5. Removing the default port
6. Removing dot-segments

For more information about these normalizations, please see the following Wikipedia article:

http://en.wikipedia.org/wiki/URL_normalization#Normalizations_that_Preserve_Semantics

For license information, please see LICENSE file.

TODO:

Add further scheme-based normalization steps, as detailed in section 6.2.3 of the RFC.

About

Syntax based normalization of URI's

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published