Permalink
Browse files

Initial commit

  • Loading branch information...
0 parents commit 2033d81cec6fdcd8b336c89ade1ccff6081389e0 @SheetJSDev SheetJSDev committed Dec 6, 2013
Showing with 3,397 additions and 0 deletions.
  1. +8 −0 .travis.yml
  2. +13 −0 LICENSE
  3. +201 −0 README.md
  4. +2 −0 bits/10000.js
  5. +2 −0 bits/10006.js
  6. +2 −0 bits/10007.js
  7. +2 −0 bits/10029.js
  8. +2 −0 bits/10079.js
  9. +2 −0 bits/10081.js
  10. +2 −0 bits/1026.js
  11. +2 −0 bits/1250.js
  12. +2 −0 bits/1251.js
  13. +2 −0 bits/1252.js
  14. +2 −0 bits/1253.js
  15. +2 −0 bits/1254.js
  16. +2 −0 bits/1255.js
  17. +2 −0 bits/1256.js
  18. +2 −0 bits/1257.js
  19. +2 −0 bits/1258.js
  20. +2 −0 bits/28591.js
  21. +2 −0 bits/28592.js
  22. +2 −0 bits/28593.js
  23. +2 −0 bits/28594.js
  24. +2 −0 bits/28595.js
  25. +2 −0 bits/28596.js
  26. +2 −0 bits/28597.js
  27. +2 −0 bits/28598.js
  28. +2 −0 bits/28599.js
  29. +2 −0 bits/28600.js
  30. +2 −0 bits/28601.js
  31. +2 −0 bits/28603.js
  32. +2 −0 bits/28604.js
  33. +2 −0 bits/28605.js
  34. +2 −0 bits/28606.js
  35. +2 −0 bits/37.js
  36. +2 −0 bits/437.js
  37. +2 −0 bits/500.js
  38. +2 −0 bits/708.js
  39. +2 −0 bits/720.js
  40. +2 −0 bits/737.js
  41. +2 −0 bits/775.js
  42. +2 −0 bits/850.js
  43. +2 −0 bits/852.js
  44. +2 −0 bits/855.js
  45. +2 −0 bits/857.js
  46. +2 −0 bits/858.js
  47. +2 −0 bits/860.js
  48. +2 −0 bits/861.js
  49. +2 −0 bits/862.js
  50. +2 −0 bits/863.js
  51. +2 −0 bits/864.js
  52. +2 −0 bits/865.js
  53. +2 −0 bits/866.js
  54. +2 −0 bits/869.js
  55. +2 −0 bits/874.js
  56. +2 −0 bits/875.js
  57. +95 −0 bits/932.js
  58. +257 −0 bits/936.js
  59. +253 −0 bits/949.js
  60. +179 −0 bits/950.js
  61. +496 −0 codepage.md
  62. +243 −0 codepages/708.TBL
  63. +248 −0 codepages/720.TBL
  64. +256 −0 codepages/858.TBL
  65. +836 −0 cptable.js
  66. +77 −0 cputils.js
  67. +15 −0 package.json
  68. +56 −0 sbcs.js
  69. +58 −0 test.js
@@ -0,0 +1,8 @@
+language: node_js
+node_js:
+ - "0.10"
+ - "0.8"
+before_install:
+ - "npm install -g mocha voc"
+before_script:
+ - "voc codepage.md"
13 LICENSE
@@ -0,0 +1,13 @@
+Copyright (C) 2013 SheetJS
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
201 README.md
@@ -0,0 +1,201 @@
+# Codepages for JS
+
+[Codepages](https://en.wikipedia.org/wiki/Codepage) are character encodings. In
+many contexts, single-byte character sets are used in lieu of standard multibyte
+Unicode encodings. They use 256 characters with a simple mapping.
+
+[unicode.org](http://www.unicode.org/Public/MAPPINGS/) hosts lists of mappings.
+The build script automatically downloads and parses the mappings in order to
+generate the full script. The `pages.csv` description in `codepage.md` controls
+which codepages are used.
+
+## Setup
+
+In the browser:
+
+ <script src="cptable.js"></script>
+ <script src="cputils.js"></script>
+
+The complete set of codepages is large because of some Double Byte Character Set
+encodings. A much smaller file that just includes SBCS codepages is provided in
+this repo (`sbcs.js`).
+
+If you know which codepages you need, you can include individual scripts for
+each codepage. The individual files are provided in the `bits/` directory.
+For example, to include only the Mac codepages:
+
+ <script src="bits/10000.js"></script>
+ <script src="bits/10006.js"></script>
+ <script src="bits/10007.js"></script>
+ <script src="bits/10029.js"></script>
+ <script src="bits/10079.js"></script>
+ <script src="bits/10081.js"></script>
+
+All of the browser scripts define and append to the `cptable` object. To rename
+the object, edit the `JSVAR` shell variable in `make.sh` and run the script.
+
+The utilities functions are contained in `cputils.js`, which assumes that the
+appropriate codepage scripts were loaded.
+
+In node:
+
+ var cptable = require('codepage');
+
+## Usage
+
+The codepages are indexed by number. To get the unicode character for a given
+codepoint, use the `dec` property:
+
+ var unicode_cp10000_255 = cptable[10000].dec[255]; // ˇ
+
+To get the codepoint for a given character, use the `enc` property:
+
+ var cp10000_711 = cptable[10000].enc[String.fromCharCode(711)]; // 255
+
+There are a few utilities that deal with strings and buffers:
+
+ var 汇总 = cptable.utils.decode(936, [0xbb,0xe3,0xd7,0xdc]);
+ var buf = cptable.utils.encode(936, 汇总);
+
+## Building the script
+
+This script uses [voc](npm.im/voc). The script to build the codepage tables and
+the JS source is `codepage.md`, so building is as simple as `voc codepage.md`.
+
+## Supported Codepages
+
+The standard Windows codepages are supported:
+
+- 1250 Windows Central Europe
+- 1251 Windows Cyrillic
+- 1252 Windows Latin I
+- 1253 Windows Green
+- 1254 Windows Turkish
+- 1255 Windows Hebrew
+- 1256 Windows Arabic
+- 1257 Windows Baltic
+- 1258 Windows Vietnam
+- 874 Windows Thai
+
+The full collection of `ISO-8859` codepages are also supported. The East-Asian
+Double Byte Character Sets are also supported:
+
+- 932 Japanese Shift-JIS
+- 936 Simplified Chinese GBK
+- 949 Korean
+- 950 Traditional Chinese Big5
+
+The complete list of supported codepages can be found in the file `pages.csv`.
+
+## Missing Codepages
+
+The following codepages are not implemented. Normative references may not be
+available in all cases. Furthermore, other software packages are known to hack
+certain codepages (for example, Mozilla treats ASMO-708 as an alias of Arabic
+ISO-8869-6 when in fact there are many differences), so all implementations
+*should* be cleanroom when possible.
+
+- 709 Arabic (ASMO-449+, BCON V4)
+- 710 Arabic - Transparent Arabic
+- 870 IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2
+- 1047 IBM EBCDIC Latin 1/Open System
+- 1140 IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro)
+- 1141 IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)
+- 1142 IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro)
+- 1143 IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro)
+- 1144 IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro)
+- 1145 IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro)
+- 1146 IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro)
+- 1147 IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro)
+- 1148 IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro)
+- 1149 IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro)
+- 1200 Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
+- 1201 Unicode UTF-16, big endian byte order; available only to managed applications
+- 1361 Korean (Johab)
+- 10001 Japanese (Mac)
+- 10002 MAC Traditional Chinese (Big5); Chinese Traditional (Mac)
+- 10003 Korean (Mac)
+- 10004 Arabic (Mac)
+- 10005 Hebrew (Mac)
+- 10008 MAC Simplified Chinese (GB 2312); Chinese Simplified (Mac)
+- 10010 Romanian (Mac)
+- 10017 Ukrainian (Mac)
+- 10021 Thai (Mac)
+- 10082 Croatian (Mac)
+- 12000 Unicode UTF-32, little endian byte order; available only to managed applications
+- 12001 Unicode UTF-32, big endian byte order; available only to managed applications
+- 20000 CNS Taiwan; Chinese Traditional (CNS)
+- 20001 TCA Taiwan
+- 20002 Eten Taiwan; Chinese Traditional (Eten)
+- 20003 IBM5550 Taiwan
+- 20004 TeleText Taiwan
+- 20005 Wang Taiwan
+- 20105 IA5 (IRV International Alphabet No. 5, 7-bit); Western European (IA5)
+- 20106 IA5 German (7-bit)
+- 20107 IA5 Swedish (7-bit)
+- 20108 IA5 Norwegian (7-bit)
+- 20127 US-ASCII (7-bit)
+- 20261 T.61
+- 20269 ISO 6937 Non-Spacing Accent
+- 20273 IBM EBCDIC Germany
+- 20277 IBM EBCDIC Denmark-Norway
+- 20278 IBM EBCDIC Finland-Sweden
+- 20280 IBM EBCDIC Italy
+- 20284 IBM EBCDIC Latin America-Spain
+- 20285 IBM EBCDIC United Kingdom
+- 20290 IBM EBCDIC Japanese Katakana Extended
+- 20297 IBM EBCDIC France
+- 20420 IBM EBCDIC Arabic
+- 20423 IBM EBCDIC Greek
+- 20424 IBM EBCDIC Hebrew
+- 20833 IBM EBCDIC Korean Extended
+- 20838 IBM EBCDIC Thai
+- 20866 Russian (KOI8-R); Cyrillic (KOI8-R)
+- 20871 IBM EBCDIC Icelandic
+- 20880 IBM EBCDIC Cyrillic Russian
+- 20905 IBM EBCDIC Turkish
+- 20924 IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)
+- 20932 Japanese (JIS 0208-1990 and 0212-1990)
+- 20936 Simplified Chinese (GB2312); Chinese Simplified (GB2312-80)
+- 20949 Korean Wansung
+- 21025 IBM EBCDIC Cyrillic Serbian-Bulgarian
+- 21027 (deprecated) <-- is this necessary?
+- 21866 Ukrainian (KOI8-U); Cyrillic (KOI8-U)
+- 29001 Europa 3
+- 38598 ISO 8859-8 Hebrew; Hebrew (ISO-Logical)
+- 50220 ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS)
+- 50221 ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana)
+- 50222 ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI)
+- 50225 ISO 2022 Korean
+- 50227 ISO 2022 Simplified Chinese; Chinese Simplified (ISO 2022)
+- 50229 ISO 2022 Traditional Chinese
+- 50930 EBCDIC Japanese (Katakana) Extended
+- 50931 EBCDIC US-Canada and Japanese
+- 50933 EBCDIC Korean Extended and Korean
+- 50935 EBCDIC Simplified Chinese Extended and Simplified Chinese
+- 50936 EBCDIC Simplified Chinese
+- 50937 EBCDIC US-Canada and Traditional Chinese
+- 50939 EBCDIC Japanese (Latin) Extended and Japanese
+- 51932 EUC Japanese
+- 51936 EUC Simplified Chinese; Chinese Simplified (EUC)
+- 51949 EUC Korean
+- 51950 EUC Traditional Chinese
+- 52936 HZ-GB2312 Simplified Chinese; Chinese Simplified (HZ)
+- 54936 Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)
+- 57002 ISCII Devanagari
+- 57003 ISCII Bengali
+- 57004 ISCII Tamil
+- 57005 ISCII Telugu
+- 57006 ISCII Assamese
+- 57007 ISCII Oriya
+- 57008 ISCII Kannada
+- 57009 ISCII Malayalam
+- 57010 ISCII Gujarati
+- 57011 ISCII Punjabi
+- 65000 Unicode (UTF-7)
+
+## Sources
+
+- [Unicode Consortium Public Mappings](http://www.unicode.org/Public/MAPPINGS/)
+- [Code Page Enumeration](http://msdn.microsoft.com/en-us/library/cc195051.aspx)
+- [Code Page Identifiers](http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx)

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.
Oops, something went wrong.

0 comments on commit 2033d81

Please sign in to comment.