Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Japanese Normalizer fixes #67

Merged
merged 4 commits into from

2 participants

@gmarty

Quick fixes:

  • Fix quotation marks normalization.
  • Escape backslash in regexp (for security, but shouldn't affect current code).
  • Misc. stuff
@chrisumbel chrisumbel merged commit ed4dd45 into from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
4 lib/natural/normalizers/normalizer_ja.js
@@ -138,8 +138,8 @@ var conversionTables = {
'': ']',
'': '{',
'': '}',
- '': '',
- '': '',
+ '': '',
+ '': '',
'': '@',
'': '*',
'': '\\',
View
3  lib/natural/util/utils.js
@@ -25,6 +25,7 @@
* Generate a replacing function given a table of patterns. Inspired by:
* http://code.google.com/p/jslibs/wiki/JavascriptTips#String_converter
* The order of elements is significant. Longer elements should be listed first.
+ * @see Speed test http://jsperf.com/build-a-regexp-table
*
* @param {Object.<string, string>} translationTable The translation table of key value.
* @return {function(string): string} A translating function.
@@ -50,7 +51,7 @@ function replacer(translationTable) {
for (key in translationTable) {
// Escaping regexp special chars.
- key = key.replace(/(\^|\$|\*|\+|\?|\.|\(|\)|\[|\]|\{|\}|\||\\)/g, '\\\$1');
+ key = key.replace(/(\^|\$|\*|\+|\?|\.|\(|\)|\[|\]|\{|\}|\||\\|\/)/g, '\\\$1');
pattern.push(key);
}
View
2  spec/normalizer_ja_spec.js
@@ -55,7 +55,7 @@ describe('normalize_ja', function() {
it('should transform halfwidth punctuation signs to fullwidth', function() {
// Taken from http://unicode.org/cldr/trac/browser/trunk/common/main/ja.xml
expect(normalize_ja('‾ __ -- ‐ — ― 〜 ・ ・ ,, 、、 ;; :: !! ?? .. ‥ … 。。 '\ ‘ ’ "" “ ” (( )) [[ ]] {{ }} 〈 〉 《 》 「「 」」 『 』 【 】 〔 〕 ‖ § ¶ @@ ++ ^^ $$ ** // \\\ && ## %% ‰ † ‡ ′ ″ 〃 ※'))
- .toEqual('‾ __ ─- ‐ — ― 〜 ・ ・ ,, 、、 ;; :: !! ?? .. ‥ … 。。 '\ ‘ ’ "" “ ” (( )) [[ ]] {{ }} 〈 〉 《 》 「「 」」 『 』 【 】 〔 〕 ‖ § ¶ @@ ++ ^^ $$ ** // \\ && ## %% ‰ † ‡ ′ ″ 〃 ※');
+ .toEqual('‾ __ ─- ‐ — ― 〜 ・ ・ ,, 、、 ;; :: !! ?? .. ‥ … 。。 '\ ‘ ’ "" “ ” (( )) [[ ]] {{ }} 〈 〉 《 》 「「 」」 『 』 【 】 〔 〕 ‖ § ¶ @@ ++ ^^ $$ ** // \\ && ## %% ‰ † ‡ ′ ″ 〃 ※');
});
it('should replace repeat characters', function() {
Something went wrong with that request. Please try again.