Skip to content
/ PawnUTF Public

UTF to codepages and vise verse convertion stuff for Pawn.


Notifications You must be signed in to change notification settings


Repository files navigation


UTF to codepages and vise verse convertion stuff for Pawn. Supports only BMP (Basic Multilingual Plane).


N-String - arrays of symbols of type N with variable lenght in bytes (1 symbol - 1 cell) which can not be packed.
N-Stream - arrays of symbols of type N with fixed lenght in bytes which also can be packed.
CP-N - Code Page N.

General procedures

PawnUTF_gencollation_addSymbol(unicode_code, substitutuion[])

Add unicode symbol to general collation table.

  • unicode_code - unicode code to substitute with substitution.
  • substitutuion - the string of unicode symbols.

PawnUTF_GenCollation_loadFile(const filepath[])

Loads general collation table from filepath.

PawnUTF_IsValidCodeUTF8(const utf_code)

Returns true if utf_code is valid UTF-8 symbol.

PawnUTF_StringToCollation(string[], string_size, output[], output_size, is_packed, collation_table[], collation_size)

Convert codepage-string to other representation via collation table.

  • string[] - source string
  • string_size - size of string.
  • output[] - result string of collation table on string.
  • output_size - size of output.
  • is_packed - flag if source string is packed.
  • collation_table[] - collation table where index - is source character and value in cell - collated symbol.
  • collation_size - size of collation_table.

PawnUTF_UTFToStream(utf_string[], utf_string_size, output[], output_size, is_packed)

Unpacks UTF-8 string to UTF stream (in opposite to PawnUTF_StreamToUTF).

  • utf_string[] - UTF string to pack (source).
  • utf_string_size - size of source.
  • output[] - destination of UTF stream.
  • output_size - size of destination.
  • is_packed - is destination have to be packed.

PawnUTF_StreamToUTF(array[], array_size, output[], output_size, is_packed)

Extract UTF-string characters from string-stream.

  • array[] - source CP-N string.
  • array_size - size of array.
  • output[] - destination for result UTF-string.
  • output_size - size of output.
  • is_packed

stock PawnUTF_DecodeUTF_ToUnicode(utf_code)

Decode UTF-character to Unicode symbol.

  • utf_code - value representing UTF symbol.


Encode Unicode to UTF character.

  • unicode_code - value representing Unicode symbol.

PawnUTF_StringUTF_ToCodePage(utf_stream[], utf_stream_size, output[], output_size, table_from_utf[0x10000 char], table_from_utf_extra[][2], extra_size, is_packed)

Converts UTF-string to codepage-string via collation tables.

  • utf_string[] - source UTF-string.
  • utf_string_size - size of utf_string.
  • output[] - result CP-N string.
  • output_size - size of output.
  • table_from_utf[0x10000 char] - main collation table (CP-N -> UTF).
  • table_from_utf_extra[][2] - collation table for large UTF-code values.
  • extra_size - size of table_from_utf_extra.
  • is_packed - true if output have to be packed.

PawnUTF_StringUTF_ToUnicode(utf_stream[], utf_stream_size, output[], output_size)

Convert UTF-string to Unicode-string.

  • utf_string[] - source UTF-string.
  • utf_string_size - size of utf_string.
  • output[] - destination for Unicode-string.
  • output_size - size of output.

PawnUTF_StringUnicode_ToUTF(unicode_stream[], unicode_stream_size, output[], output_size)

Convert Unicode-string to UTF-string.

  • unicode_string[] - source Unicode-string.
  • unicode_string_size - size of unicode_stream.
  • output[] - destination for UTF-string.
  • output_size - size of output.

PawnUTF_WriteCurrentCP(filepath_input[], filepath_output[])

Writes current CP-UTF collation (which used in fwrite).

  • filepath_input[]
  • filepath_output[]


Reads one UTF-symbol from opened file.

  • File:handle

UTF_LoadCharsetMapping(filepath[], table_unicode[0x10000 char], table_to_unicode[256])

Load codepage collation table.

  • filepath[] - path of file with collation table to load.
  • table_unicode[0x10000 char] - destination collation table[unicode_symbol] = CP-N code.
  • table_to_unicode[256] - destination collation table[CP-N] = unicode_symbol.

CP1251 example procedures


Load collation table from file.

  • filepath - path of the file to load collation table (CP-N -> UTF-8).

CP1251_FromUTF(utf_stream[], utf_stream_size, output[], output_size, is_packed)

Convert UTF-string to CP1251-string.

  • utf_string[] - source UTF-8 string.
  • utf_string_size - size of source UTF-8 string.
  • output[] - destination of CP1251-string.
  • output_size - size of destination output.
  • is_packed - true if output have to be packed.

CP1251_FromUnicode(unicode_string[], unicode_string_size, output[], output_size)

Convert Unicode-string to CP1251-string.

  • unicode_string[] - source Unicode-string.
  • unicode_string_size - size of unicode_string.
  • output[] - destination of result CP1251-string.
  • output_size - size of output.

CP1251_StringToUTF(array[], array_size, output[], output_size, is_packed)

Convert CP1251-string to UTF-stream.

  • array[] - source CP1251-string.
  • array_size - size of source string.
  • output[] - destination of UTF result stream.
  • output_size - size of output.
  • is_packed - true if array is packed.

CP1251_StringToUnicode(array[], array_size, output[], output_size, is_packed)

Convert CP1251-string to Unicode-string.

  • array[] - source CP1251-string.
  • array_size - size of source string.
  • output[] - destination of Unicode result string.
  • output_size - size of output.
  • is_packed - true if array is packed.


Returns CP1251-code from Unicode-symbol.

  • unicode_char - Unicode symbol.


Returns Unicode-symbol code from CP1251-code symbol.

  • character - CP1251 code.

CP1251_FileFixUTF(filepath_source[], filepath_dest[])

Fixes classic encoding problem in CP1251-based file.

  • filepath_source[] - source file.
  • filepath_dest[] - destination file.

UTF BOM-related procedures


Returns True if Unicode-symbol is BOM (Byte Order Mark).


Trying to read UTF BOM-mark from start of the file. Places file pointer right after BOM-symbol or to the start of file if no known BOM-sequence found. Returns BOM-mark type.


Returns True if UTF BOM type is big-endian.


Returns True if UTF BOM type is little-endian.


PawnUTF_FWriteUTF16BOM(File:handle, is_bigendian)

Writes UTF-16 BOM-symbol into the file.


Reads single UTF-16(big endian) symbol from file. Returns Unicode-symbol.


Reads single UTF-16(little endian) symbol from file. Returns Unicode-symbol.


Return encoded Unicode-symbol to UTF-16.

PawnUTF_PutFileCharUTF16_BE(File:handle, unicode_code)

Writes single Unicode-symbol encoded to big-endian UTF-16 into the file.

PawnUTF_PutFileCharUTF16_LE(File:handle, unicode_code)

Writes single Unicode-symbol encoded to big-endian UTF-16 into the file.

PawnUTF_FRead_UTF16(File:handle, unicode_string[], size, is_bigendian)

Reads Unicode-string decoded from UTF-16 file. Returns number of readen symbols.

PawnUTF_FWrite_UTF16(File:handle, string[], is_bigendian)

Writes Unicode string which encoded to UTF-16 into the file. Returns number of writen symbols.


PawnUTF_FWriteUTF32BOM(File:handle, is_bigendian)

Writes UTF-32 BOM-symbol into the file.


Reads single UTF-32(big endian) symbol from file. Returns Unicode-symbol.


Reads single UTF-32(little endian) symbol from file. Returns Unicode-symbol.

PawnUTF_PutFileCharUTF32_BE(File:handle, unicode_code)

Writes single Unicode-symbol encoded to big-endian UTF-32 into the file.

PawnUTF_PutFileCharUTF32_LE(File:handle, unicode_code)

Writes single Unicode-symbol encoded to little-endian UTF-32 into the file.

PawnUTF_FRead_UTF32(File:handle, unicode_string[], size, is_bigendian)

Reads Unicode-string decoded from UTF-32 file. Returns number of readen symbols.

PawnUTF_FWrite_UTF32(File:handle, string[], is_bigendian)

Writes Unicode string which encoded to UTF-32 into the file. Returns number of writen symbols.

Unicode string procedures

Here is represented string-procedures especially for Unicode strings. The reason to make it was that some native/original have complex logic based on aggregate packed strings. Every analogue procedure have same arguments as their original native procedures.

It does not need a special analogue: strdel(), memcpy(), strval(), valstr(), strcmp(), strmid()

It is incompatible with Unicode strings: strpack(), strunpack(), ispacked()


Load collation lower-upper Unicode-symbols.


Returns upper version of Unicode symbol.


Returns lower version of Unicode symbol.


Returns length of Unicode-string.

Unicode_strcat(unicode_string_destination[], unicode_string_source[], maxlength)

Concatenates Unicode source string to destination.

Unicode_strins(unicode_string_destination[], unicode_string_source[], pos, maxlength)

Inserts substring into a string.

Unicode_strfind(unicode_string[], unicode_search_string[], ignorecase, pos)

Find substring inside string. Returns position of substring in string if found, -1 otherwise.


UTF to codepages and vise verse convertion stuff for Pawn.







No releases published


No packages published
