Permalink
Browse files

Don't attach things to kakarijoshis

  • Loading branch information...
1 parent 19d75df commit b294970b39d3e52fcb82652062e1a3f01d109f64 @Kimtaro committed Jul 17, 2013
Showing with 19 additions and 2 deletions.
  1. +6 −1 lib/providers/mecab_ipadic.rb
  2. +13 −1 tests/mecab_ipadic_parse_test.rb
@@ -163,6 +163,7 @@ def initialize(text, output)
FUHENKAGATA = '不変化型'
JINMEI = '人名'
MEIREI_I = '命令i'
+ KAKARIJOSHI = '係助詞'
# Etc
NA = ''
@@ -177,6 +178,7 @@ def words
words = []
tokens = @tokens.find_all { |t| t[:type] == :parsed }
tokens = tokens.to_enum
+ previous = nil
# This is becoming very big
begin
@@ -272,7 +274,8 @@ def words
when JODOUSHI
pos = Ve::PartOfSpeech::Postposition
- if [TOKUSHU_TA, TOKUSHU_NAI, TOKUSHU_TAI, TOKUSHU_MASU, TOKUSHU_NU].include?(token[:inflection_type])
+ if (previous.nil? || (!previous.nil? && previous[:pos2] != KAKARIJOSHI)) &&
+ [TOKUSHU_TA, TOKUSHU_NAI, TOKUSHU_TAI, TOKUSHU_MASU, TOKUSHU_NU].include?(token[:inflection_type])
attach_to_previous = true
elsif token[:inflection_type] == FUHENKAGATA && token[:lemma] == NN
attach_to_previous = true
@@ -338,6 +341,8 @@ def words
words << word
end
+
+ previous = token
end
rescue StopIteration
end
@@ -739,11 +739,23 @@ def test_words
:pos => [Ve::PartOfSpeech::Verb, Ve::PartOfSpeech::Verb],
:extra => [{:reading=>"オシエテ", :transcription=>"オシエテ", :grammar=>nil}, {:reading=>"クダサイ", :transcription=>"クダサイ", :grammar=>nil}],
:tokens => [0..1, 2..2]},
-'教えてください', <<-EOR.split("\n"))
+ '教えてください', <<-EOR.split("\n"))
教え 動詞,自立,*,*,一段,連用形,教える,オシエ,オシエ,おしえ/教え,
て 助詞,接続助詞,*,*,*,*,て,テ,テ,,
ください 動詞,非自立,*,*,五段・ラ行特殊,命令i,くださる,クダサイ,クダサイ,,
EOS
+EOR
+
+ # はない
+ assert_parses_into_words(Ve::Parse::MecabIpadic, {:words => ["", "ない"],
+ :lemmas => ["", "ない"],
+ :pos => [Ve::PartOfSpeech::Postposition, Ve::PartOfSpeech::Postposition],
+ :extra => [{:reading=>"", :transcription=>"", :grammar=>nil}, {:reading=>"ナイ", :transcription=>"ナイ", :grammar=>nil}],
+ :tokens => [0..0, 1..1]},
+ 'はない', <<-EOR.split("\n"))
+は 助詞,係助詞,*,*,*,*,は,ハ,ワ,,
+ない 助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ,,
+EOS
EOR
# TODO: xした should parse as adjective?

0 comments on commit b294970

Please sign in to comment.