A library to assist in security-testing Unicode enabled applications during fuzzing, XSS, SQLi, etc.
C#
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
TestUniHax
UniHax
.gitignore
LICENSE.html
README.md
UniHax.sln

README.md

unicode-hax

A library to assist in security-testing Unicode enabled applications. The original intent of putting this together was threefold:

  1. To provide a reduced set of useful Unicode input to a software fuzzer
  2. To document historically problematic Unicode characters sequences which might negatively affect protocols and Web applications.
  3. To lookup mappings for ASCII equivalent characters

For example, the best-fit and normalization mappings can be useful for testing Web applications for cross-site scripting (XSS) or SQL injection (SQLi) vulnerabilities, by providing you with alternative characters which map back, or transform, to the intended ASCII encoded input - such as "<", "'", etc.

Additionally, many problem characters have been pre-defined as a small set, reducing the number of iterations a fuzzer might need to perform.

Major features:

  • best fit mappings
  • Unicode normalization mappings
  • hard-coded Unicode characters useful in fuzzing

For fuzzing applications it includes:

  • ill-formed byte sequences
  • non-characters
  • private use area (PUA)
  • unassigned code points
  • code points with special meaning such as the BOM and RLO
  • half-surrogate values

/TestUniHax

This Windows form application loads the UniHax library mainly to test the best-fit and normalization mappings.
If you simply input a single ASCII character, all of its equivalent characters will be displayed.

e.g. If you're testing a Web-application and want to test equivalents for the "<" character U+003C, enter that as input and select either "best-fit mapping", which is linked to a charset encoding, or "normalization" equivalents. For this character, the following are best-fits:

  • U+003B in the APL-ISO-IR-68 encoding
  • U+0014 in the CP424 encoding
  • etc...

Also, the following are normalization decomposition mappings:

  • U+FE64 SMALL LESS-THAN SIGN
  • U+FF1C FULLWIDTH LESS-THAN SIGN

/UniHax

This library contains a small set of problematic Unicode characters in Fuzzer.cs such as the following:

        /// <summary>
        /// An unassigned code point U+0FED
        /// </summary>
        public static readonly string uUnassigned = "\u0FED";
        /// <summary>
        ///  An illegal low half-surrogate U+DEAD
        /// </summary>
        public static readonly string uDEAD = "\uDEAD";

Also the following method to return those characters as a byte array in any encoding.

public byte[] GetCharacterBytes(string encoding, string character)

There's also the following method to return any Unicode character as a malformed byte sequence, simply by trimming the last byte.

public byte[] GetCharacterBytesMalformed(string encoding, string character)

This project also contains the data files, pre-created in the /data folder, and a Mapping.cs Mapping class which can lookup mapping equivalents for the following:

  • ASCII equivalent best-fit mappings across legacy character encodings
  • ASCII equivalent mappings for Unicode normalization types. For example, Web browsers commonly use a form of normalization for keeping URL content and host names compatible.

For more on Unicode Normalization see TR15: http://www.unicode.org/reports/tr15/

License

Unicode-Hax by Chris Weber is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License . Based on a work at https://github.com/cweb/unicode-hax.