Performance question #12

TeemuAhola · 2018-05-05T11:51:10Z

I have implemented following test application for measuring performance of pyknow.

import time
from pyknow import *

class Engine(KnowledgeEngine):
    
    def __init__(self, fact_count, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.fact_count = fact_count
        self.matched = 0
        self.not_matched = 0

    @DefFacts()
    def _init(self):
        for i in range(self.fact_count):
            yield Fact(i,i%10)
 
    @Rule(Fact(MATCH.a, MATCH.a))
    def rule1(self):
        self.matched += 1
        
    @Rule(Fact(MATCH.a, ~MATCH.a))
    def rule2(self):
        self.not_matched += 1

def run(fact_count):
    start = time.time()
    e = Engine(fact_count)
    e.reset()
    e.run()
    end = time.time()
    print(end - start)

When I run it with following fact counts:

run(10)
run(100)
run(500)
run(1000)
run(10000)

I am getting these results for execution time in seconds:

0.00998544692993164
0.08772945404052734
1.528554916381836
5.943406105041504
742.3850135803223

The problem is that the execution time becomes a bottleneck very quickly in my case when there are around 10000 facts. This was simplified example (and in principle this could be easily implemented without pyknow) but I would think that 10000 facts should not be a big issue when the rules are so simple.

If I remove @Rule(Fact(MATCH.a, ~MATCH.a)) which is fired more often than the other one, then execution time stays quite short (max 11 seconds).

So is this known issue or do I just misunderstanding something?

The text was updated successfully, but these errors were encountered:

nilp0inter · 2018-05-05T23:02:54Z

Hi,

thank you very much for reporting this. You are right, there is a performance issue. But let me first clarify something.

On Complexity

You will never get any benefits from pyknow for this kind of one-shot, trivial programs. You can easily implement that example in pure python like this:

matched = not_matched = 0

for i in range(fact_count):
   # rule1
    if i == i % 10:
        matched += 1
    # rule2
    if i != i % 10:
        not_matched += 1

There are two mayor differences with the code you provided:

First: Performance. Of course you can get much better performance with a raw code like this. There is not a complex machinery (that pyknow has), there are no function calls, etc.
Second (and more important): Big-O Complexity. The raw python algorithm has complexity O(n), while, a Clips implementation of the same algorithm, is something between O(n log n) and O(n^2).

Maybe that was clear to you, but I think it is not for everybody.

A little pyknow history

In the past we put all our efforts on making the implementation correct, postponing optimizations until they were absolutely needed. At the beginning pyknow wasn't even based on the RETE algorithm!

Now that we reached a point of stability on the implementation we can start thinking on making things faster.

About the issue

After doing some profiling on pyknow and an equivalent test on Clips I think we can reach the same level of complexity that Clips (but not the same performance, of course) without making too many changes.

I will start working on the performance issues I've seen after testing and profiling with your sample code.

So, thank you again for opening this issue and please stay tuned to see some performance improvements over the following days/weeks.

TeemuAhola · 2018-05-06T05:37:05Z

Hi,

Thanks for your reply. Yes, I was expecting that there must be some complexity issue which plays part here and true this is quite trivial example and does not really need pyknow to solve the problem.

What I am doing is to evaluate possible solutions for experts systems which our team could use when doing fault analysis. We potentially may have quite lot of facts so that's why I have done some performance measurements. However as said, cleaning up of facts can be done without pyknow before actually feeding the facts to pyknow.

I have been testing some time ago PyKE as well but what I really like about pyknow compared to PyKE is that you can easily integrate it to python code and you can feed facts easily from all sorts of sources and it nicely work with it interactively top of Jupyter. Since not all in my team are fluent with python I think Jupyter based interactive python interpreter and system like pyknow would perhaps lower the bar for them and still allow very flexible way of working with problem analysis.

nilp0inter · 2018-05-06T18:19:01Z

Hi,

I finished implementing the first batch of optimizations.

The time complexity drop dramatically after fixing issues #17 and #18 (light-blue line) which affected a really old part of the project, the conflict resolution strategy system. I changed the way it internally works and now the numbers are much better.

This is the result of executing your code now (# facts / seconds):

10 0.008934974670410156
100 0.03293299674987793
1000 0.21196937561035156
10000 2.2099316120147705
100000 24.78445839881897
1000000 405.15399289131165

Now 10k facts are processed in ~2 seconds. You can check the progress in the feature-optimization branch.

I will release a new version with this changes in the following days and keep working on further optimizations.

TeemuAhola · 2018-05-07T05:22:13Z

Hi,

Tested the branch and it is lightning fast :)

Thank you!

nilp0inter · 2018-05-08T21:41:32Z

Pyknow v1.6.0 is out. Enjoy :)

TeemuAhola · 2018-05-09T18:44:12Z

Thank you!

shahintaj196 · 2019-12-13T12:52:00Z

how i can obtain policy proccessing time?

nilp0inter added the enhancement label May 6, 2018

nilp0inter self-assigned this May 6, 2018

nilp0inter added this to the v1.6.0 milestone May 7, 2018

nilp0inter closed this as completed in 791f0ab May 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance question #12

Performance question #12

TeemuAhola commented May 5, 2018 •

edited

nilp0inter commented May 5, 2018

TeemuAhola commented May 6, 2018

nilp0inter commented May 6, 2018

TeemuAhola commented May 7, 2018

nilp0inter commented May 8, 2018

TeemuAhola commented May 9, 2018

shahintaj196 commented Dec 13, 2019

Performance question #12

Performance question #12

Comments

TeemuAhola commented May 5, 2018 • edited

nilp0inter commented May 5, 2018

On Complexity

A little pyknow history

About the issue

TeemuAhola commented May 6, 2018

nilp0inter commented May 6, 2018

TeemuAhola commented May 7, 2018

nilp0inter commented May 8, 2018

TeemuAhola commented May 9, 2018

shahintaj196 commented Dec 13, 2019

TeemuAhola commented May 5, 2018 •

edited