Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule only works when commenting out unrelated rules? #80

Open
unhammer opened this issue Nov 10, 2021 · 12 comments
Open

Rule only works when commenting out unrelated rules? #80

unhammer opened this issue Nov 10, 2021 · 12 comments

Comments

@unhammer
Copy link
Member

unhammer commented Nov 10, 2021

gender = m f nt ut un fn mf GD ;
gender_adj_sg_ind = nt ut ;
number = sg pl sp ND ;
defnes = def ind ;
a_adj = sint ord pp pprs ;
a_cmp = cmp ;
a_det = dem qnt pos emph ;
a_comp = pst comp sup ;

adj:   _.a_adj.a_comp.gender.number.defnes.a_cmp;
n:     _.gender.number.defnes.a_cmp;
det:   _.a_det.gender.number;

N:     _.gender.number.defnes.a_cmp;
A:     _.a_adj.a_comp.gender.number.defnes.a_cmp;
NP:    _.gender.number.defnes;
DP:    _.gender.number.defnes;


N -> %n         { %1 } ;

NP ->      %N { %1 }
    |  adj %N { 1 _ %2 } !!!
    ;

DP ->
      "vennene mine ~> mina vänner"
      %NP det.pos
      { 2[gender=(if (1.number = pl) un else 1.gender), number=1.number]
        _
        1[defnes=ind]
      }
    | "en venn ~> en vänn" det %NP { 1[gender=(if (2.number = pl) un else 2.gender), number=2.number] _ 2 } !!!

      ;

got:

$ echo ' ^venn<n><m><pl><def>/vän<n><ut><pl><def>$ ^min<det><pos><un><pl>/min<det><pos><un><pl>$ ^virtuell<adj><pst><nt><sg><ind>/virtuell<adj><sint><pst><nt><sg><ind>$' |rtx-proc nor-swe.rtx.bin
 ^vän<n><ut><pl><def>$ ^min<det><pos><un><pl>$ ^virtuell<adj><sint><pst><nt><sg><ind>$

expected:

 ^min<det><pos><un><pl>$ ^vän<n><ut><pl><ind>$ ^virtuell<adj><sint><pst><nt><sg><ind>$

HOWEVER: If I comment out either line 23 or line 33 (the ones marked !!!) then it strangely works.

But trace shows that those lines are not used (this is without commenting them out, where I get the bad result):

 echo ' ^venn<n><m><pl><def>/vän<n><ut><pl><def>$ ^min<det><pos><un><pl>/min<det><pos><un><pl>$ ^virtuell<adj><pst><nt><sg><ind>/virtuell<adj><sint><pst><nt><sg><ind>$' |rtx-proc -r nor-swe.rtx.bin

Applying rule 1 (line 20): ^venn<n><m><pl><def>/vän<n><ut><pl><def>$

Applying rule 2 (line 22): ^vän<N><ut><pl><def>{^venn<n><m><pl><def>/vän<n><ut><pl><def>$}$

Applying rule 4 (vennene mine ~> mina vänner - line 27): ^vän<NP><ut><pl><def>{^vän<N><ut><pl><def>{^venn<n><m><pl><def>/vän<n><ut><pl><def>$}$}$ ^min<det><pos><un><pl>/min<det><pos><un><pl>$

Applying output rule 1 (line 22): vän<NP><ut><pl><def> -> ^vän<N><ut><pl><def>{^venn<n><m><pl><def>/vän<n><ut><pl><def>$}$

Applying output rule 0 (line 20): vän<N><ut><pl><def> -> ^venn<n><m><pl><def>/vän<n><ut><pl><def>$

No rule specified: ^vän<n><ut><pl><def>$
^vän<n><ut><pl><def>$
No rule specified: ^min<det><pos><un><pl>/min<det><pos><un><pl>$
^min<det><pos><un><pl>$
No rule specified: ^virtuell<adj><pst><nt><sg><ind>/virtuell<adj><sint><pst><nt><sg><ind>$
^virtuell<adj><sint><pst><nt><sg><ind>$

I'm probably missing something obvious but I can't see it?

@unhammer
Copy link
Member Author

The trace for when line 33 is commented out shows not just applying rule 3 (line 27), but applying output rule 3 (line 27)

@unhammer
Copy link
Member Author

Note also if I just don't include the last word, the rule hits fine.

unhammer added a commit to apertium/apertium-swe-nor that referenced this issue Nov 16, 2021
@mr-martian
Copy link
Collaborator

So the lookahead is trying to figure out whether to keep branches alive in case more rules might apply. You have n det adj, which it thinks could be n DP{ det NP{ adj [n] } }, not realizing that this is actually det.pos, which it looks like you want treated differently.

So the solution is probably for the lookahead to get smarter and for the last rule to change from det to det.[notpos], for a suitable definition of notpos.

The tricky part of this is whether I can fully do that without implementing FST subtraction in lttoolbox (or maybe I should just go ahead and do that...).

@unhammer
Copy link
Member Author

So if I understand correctly it's starting an analysis of n DP{ det NP{ adj [n] } } because there might be an n to the right. But the trace shows it did at one point find the right match, wouldn't it be more robust to backtrack to that?

Also, I can't change the last rule to det.[nonpos] because I do want it to match det.pos (in nob, mine venner and vennene mine are both possible, while in swe we want only the former).

My current workaround is to have a higher-level rewrite rule DP2 → DP Anyword, but it doesn't really make linguistic sense.

@mr-martian
Copy link
Collaborator

IRC:

[10:13:28] <popcorndude> the answer is that this actually is an annoyingly deep issue
[10:14:13] <popcorndude> at least in the reduced case, it reads in the adj
[10:14:50] <popcorndude> and then says DP{NP{N{n}} det} can't do anything with this, but NP{N{n}} det maybe can
[10:14:54] <popcorndude> so discard the first one
[10:14:58] <popcorndude> oh, oops, EOF
[10:17:44] <popcorndude> so I can write hacky rules to fix this in particular cases, but I have no idea how to solve this in general

@unhammer
Copy link
Member Author

Is there a way to give some info in the trace when this applies? It's quite hard to debug when it happens. E.g. I have rules that do

DP{NP{N{n.cmp n}} det}  →*   DP{det NP{N{n.cmp n}}}   ! vennene mine → mina vännar

and they work fine and then I add vcmp into the N rule so I can do

DP{NP{N{vblex.inf.cmp n}} det}  →*   DP{det NP{N{vblex.inf.cmp n}}} ! bakemesteren vår → vår bakmästare

and it works fine and but then I notice the first rule stops working in certain contexts :(

Turns out, if there's any verb in the rest of the sentence (doesn't have to be tagged cmp), the rule doesn't apply any more. Again, the fix is just to ensure the wider context has a parse (a rule like S→DP VP), but I only learnt that by accident, and I had almost forgotten the fix when the problem showed up again.

@mr-martian
Copy link
Collaborator

Information about what parses are getting discarded and why can be gotten from the -e debug option, though it prints out rather a lot of stuff and I don't guarantee it makes all that much sense.

@unhammer
Copy link
Member Author

unhammer commented Aug 24, 2023

We're seeing this issue again in sme-smj, e.g. we have rules for
N→n
NP→NP N | N
PP→N p | p
and on seeing a sequence n n p, it gives a parse for the final two words, but doesn't then apply anything for the first word (I think. I'm not 100% sure about the details here). But the first noun does get a parse if I send it in alone.

Would it be possible to do a final pass after everything is done and just treat all the unmatched lexical units in isolation, so they're at least matched by some single-word rule?

@unhammer
Copy link
Member Author

unhammer commented Aug 25, 2023

With sme-smj.rtx.zip:

$ echo '^Jämtlánda<np><top><sg><gen><@→N>/Jämtlánnda<np><top><sg><gen><@→N>$ ^regiovdna<n><sem_plc><sg><gen><@→P>/regiåvnnå<n><sem_plc><sg><gen><@→P>$ ^dáfus<post><@ADVL>/gáktuj<post><@ADVL>$^.<sent>/.<sent>$' | rtx-proc -e sme-smj.rtx.bin
[…]
Branch 3: 3 nodes, weight = 0
[Chunk]:
^Jämtlánnda<Name><sg><gen><@→N>{
        ^Jämtlánda<np><top><sg><gen><@→N>/Jämtlánnda<np><top><sg><gen><@→N>$
}$
[Blank]:

[Chunk]:
^gáktuj<PP>{
        ^regiåvnnå<N><sg><gen><@→P>{
                ^regiovdna<n><sem_plc><sg><gen><@→P>/regiåvnnå<n><sem_plc><sg><gen><@→P>$
        }$
        ^dáfus<post><@ADVL>/gáktuj<post><@ADVL>$
}$
Branch 4: 3 nodes, weight = 0
[Chunk]:
^Jämtlánda<np><top><sg><gen><@→N>/Jämtlánnda<np><top><sg><gen><@→N>$
[Blank]:

[Chunk]:
^gáktuj<PP>{
        ^regiåvnnå<N><sg><gen><@→P>{
                ^regiovdna<n><sem_plc><sg><gen><@→P>/regiåvnnå<n><sem_plc><sg><gen><@→P>$
        }$
        ^dáfus<post><@ADVL>/gáktuj<post><@ADVL>$
}$

Filtering Branches:
No branch can accept further input.
Branch 3  has no active branch to compare to.
Branch 4  has fewer partial parses or a higher weight than branch 3.
[…]

– isn't this plain wrong? Or am I misunderstanding what "partial parses" means? (In 3, all words have at least one parent, while in branch 4 (which is chosen), the first word has no parent node.)

EDIT: It seems the test is (cur->length < minNode->length || (cur->length == minNode->length && cur->weight >= minNode->weight))
and the values are

cur->length:3
minNode->length:3
cur->weight:0
minNode->weight:0

so they're just equal.

@mr-martian
Copy link
Collaborator

Yeah, I think it's >= since the branches later in the list have usually had more rules applied to them.

unhammer added a commit to apertium/apertium-sme-smj that referenced this issue Aug 25, 2023
@unhammer
Copy link
Member Author

So I noticed that simply changing the file to have weights on each rule made it choose the parse that has more parses, and when doing that across a real rule file for sme-smj, it removes some untranslated words from corpus runs.

Is there a good reason not to have some "initial" weight for every rule, so it can favour parses that cover more words? (Will it then favour deeper trees as well?)

@mr-martian
Copy link
Collaborator

Yes, it will slightly favor deeper trees, but given how reduce-reduce conflicts are handled, those are favored already.

Perhaps we could add another file-level directive to change the default weight to something positive, since that will indeed improve the situation in many cases.

unhammer added a commit that referenced this issue Aug 28, 2023
unhammer added a commit that referenced this issue Sep 5, 2023
unhammer added a commit that referenced this issue Sep 5, 2023
mitigates #80

We splice in the outputQueueReparsed instead of just replacing in case
the output rule changes the number of LU's output.
unhammer added a commit that referenced this issue Sep 5, 2023
unhammer added a commit that referenced this issue Sep 5, 2023
mitigates #80

We splice in the outputQueueReparsed instead of just replacing in case
the output rule changes the number of LU's output.
unhammer added a commit that referenced this issue Sep 5, 2023
unhammer added a commit that referenced this issue Sep 5, 2023
mitigates #80

We splice in the outputQueueReparsed instead of just replacing in case
the output rule changes the number of LU's output.
unhammer added a commit that referenced this issue Sep 5, 2023
mr-martian pushed a commit that referenced this issue Sep 5, 2023
* had to std:: here to make it compile

* Reparse individual non-parsed words after full sentence

mitigates #80

We splice in the outputQueueReparsed instead of just replacing in case
the output rule changes the number of LU's output.

* Tests for reparse #80

* Note to self (don't edit run_tests.py)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants