Maybe a problem during finalization #8

amirouche · 2020-08-16T12:16:59Z

In the search_lss method:

Lines 164 to 170 in 4f31555

    
           suffix = state.longest_strict_suffix 
        
           if suffix.longest_strict_suffix is None: 
        
               self.search_lss(suffix) 
        
           for symbol, next_state in suffix.transitions.items(): 
        
               if (symbol not in state.transitions and 
        
                       suffix != self._zero_state): 
        
                   state.transitions[symbol] = next_state

The line before the last is strange: suffix != self._zero_state that test can be done before we enter the loop.

So there might be some performance to gain during finalization.

Let me know what you think 🙂

The text was updated successfully, but these errors were encountered:

FrederikP · 2020-08-19T15:24:44Z

The test can certainly be done before. I will take closer look when I get a chance.
Thanks for pointing that out.

…nificantly. Fixes #8

FrederikP · 2020-08-19T19:32:45Z

@amirouche Thanks for finding this. I must've overlooked it during some refactoring or so. As you can see in #10 it really speeds up setup considerably.
I'll do some more performance tests and release a new version soon.

amirouche · 2020-08-19T20:17:46Z

By the way, this is not a performance improvement but a remark about the following:

ahocorapy/src/ahocorapy/keywordtree.py

Lines 150 to 166 in 7098bbb

    
           if state.longest_strict_suffix is None: 
        
               parent = state.parent 
        
               traversed = parent.longest_strict_suffix 
        
               while True: 
        
                   if state.symbol in traversed.transitions and\ 
        
                           traversed.transitions[state.symbol] != state: 
        
                       state.longest_strict_suffix =\ 
        
                           traversed.transitions[state.symbol] 
        
                       break 
        
                   elif traversed == self._zero_state: 
        
                       state.longest_strict_suffix = self._zero_state 
        
                       break 
        
                   else: 
        
                       traversed = traversed.longest_strict_suffix 
        
               suffix = state.longest_strict_suffix 
        
               if suffix.longest_strict_suffix is None: 
        
                   self.search_lss(suffix)

The test if suffix.longest_strict_suffix is None: is done at the beginning and at the end of the snippet before the call. That will save a python call. So maybe it is a benefit.

Thanks a lot for sharing this library 👍

amirouche · 2020-08-19T20:23:12Z

Last thing, in my implementation I replaced the list to_process with a set, instead of append I use the equivalent of set.add. What I noticed, is that there a few calls less to search_lss (still my implementation is less fast). I am wondering if it does matter to use a set or list in that place?

FrederikP · 2020-08-19T20:51:53Z

Concerning the first comment: True, I will remove one of those occurrences. It doesn't change performance much, but the code is a little less ugly. Thanks again 👍
If I remember correctly I had a set there as well, but for performance reasons switched to using a list. It's not really the correct datatype from a logical standpoint, but it's just an optimization decision.

FrederikP · 2020-08-19T21:09:40Z

Released performance fix with 1.6.0

Thanks for contributing with such detailed suggestions @amirouche

FrederikP self-assigned this Aug 19, 2020

FrederikP added a commit that referenced this issue Aug 19, 2020

Check if suffix is zero state earlier. Improves setup performance sig…

06570c9

…nificantly. Fixes #8

FrederikP linked a pull request Aug 19, 2020 that will close this issue

Check if suffix is zero state earlier #10

Merged

FrederikP added a commit that referenced this issue Aug 19, 2020

Removed redundant check mentioned in #8

acbe82f

FrederikP closed this as completed in #10 Aug 19, 2020

FrederikP mentioned this issue Aug 21, 2020

kwtree.finalize() cost too long,is there any way to speed up the stage? #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe a problem during finalization #8

Maybe a problem during finalization #8

amirouche commented Aug 16, 2020 •

edited

Loading

FrederikP commented Aug 19, 2020

FrederikP commented Aug 19, 2020

amirouche commented Aug 19, 2020

amirouche commented Aug 19, 2020

FrederikP commented Aug 19, 2020

FrederikP commented Aug 19, 2020

Maybe a problem during finalization #8

Maybe a problem during finalization #8

Comments

amirouche commented Aug 16, 2020 • edited Loading

FrederikP commented Aug 19, 2020

FrederikP commented Aug 19, 2020

amirouche commented Aug 19, 2020

amirouche commented Aug 19, 2020

FrederikP commented Aug 19, 2020

FrederikP commented Aug 19, 2020

amirouche commented Aug 16, 2020 •

edited

Loading