Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance question #12

Closed
TeemuAhola opened this issue May 5, 2018 · 7 comments
Closed

Performance question #12

TeemuAhola opened this issue May 5, 2018 · 7 comments
Assignees
Milestone

Comments

@TeemuAhola
Copy link

TeemuAhola commented May 5, 2018

I have implemented following test application for measuring performance of pyknow.

import time
from pyknow import *

class Engine(KnowledgeEngine):
    
    def __init__(self, fact_count, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.fact_count = fact_count
        self.matched = 0
        self.not_matched = 0

    @DefFacts()
    def _init(self):
        for i in range(self.fact_count):
            yield Fact(i,i%10)
 
    @Rule(Fact(MATCH.a, MATCH.a))
    def rule1(self):
        self.matched += 1
        
    @Rule(Fact(MATCH.a, ~MATCH.a))
    def rule2(self):
        self.not_matched += 1

def run(fact_count):
    start = time.time()
    e = Engine(fact_count)
    e.reset()
    e.run()
    end = time.time()
    print(end - start)

When I run it with following fact counts:

run(10)
run(100)
run(500)
run(1000)
run(10000)

I am getting these results for execution time in seconds:

0.00998544692993164
0.08772945404052734
1.528554916381836
5.943406105041504
742.3850135803223

The problem is that the execution time becomes a bottleneck very quickly in my case when there are around 10000 facts. This was simplified example (and in principle this could be easily implemented without pyknow) but I would think that 10000 facts should not be a big issue when the rules are so simple.

If I remove @Rule(Fact(MATCH.a, ~MATCH.a)) which is fired more often than the other one, then execution time stays quite short (max 11 seconds).

So is this known issue or do I just misunderstanding something?

@nilp0inter
Copy link
Contributor

Hi,

thank you very much for reporting this. You are right, there is a performance issue. But let me first clarify something.

On Complexity

You will never get any benefits from pyknow for this kind of one-shot, trivial programs. You can easily implement that example in pure python like this:

matched = not_matched = 0

for i in range(fact_count):
   # rule1
    if i == i % 10:
        matched += 1
    # rule2
    if i != i % 10:
        not_matched += 1

There are two mayor differences with the code you provided:

  • First: Performance. Of course you can get much better performance with a raw code like this. There is not a complex machinery (that pyknow has), there are no function calls, etc.
  • Second (and more important): Big-O Complexity. The raw python algorithm has complexity O(n), while, a Clips implementation of the same algorithm, is something between O(n log n) and O(n^2).

Time of execution of a Clips equivalent program

Maybe that was clear to you, but I think it is not for everybody.

A little pyknow history

In the past we put all our efforts on making the implementation correct, postponing optimizations until they were absolutely needed. At the beginning pyknow wasn't even based on the RETE algorithm!

Now that we reached a point of stability on the implementation we can start thinking on making things faster.

About the issue

After doing some profiling on pyknow and an equivalent test on Clips I think we can reach the same level of complexity that Clips (but not the same performance, of course) without making too many changes.

I will start working on the performance issues I've seen after testing and profiling with your sample code.

So, thank you again for opening this issue and please stay tuned to see some performance improvements over the following days/weeks.

@TeemuAhola
Copy link
Author

Hi,

Thanks for your reply. Yes, I was expecting that there must be some complexity issue which plays part here and true this is quite trivial example and does not really need pyknow to solve the problem.

What I am doing is to evaluate possible solutions for experts systems which our team could use when doing fault analysis. We potentially may have quite lot of facts so that's why I have done some performance measurements. However as said, cleaning up of facts can be done without pyknow before actually feeding the facts to pyknow.

I have been testing some time ago PyKE as well but what I really like about pyknow compared to PyKE is that you can easily integrate it to python code and you can feed facts easily from all sorts of sources and it nicely work with it interactively top of Jupyter. Since not all in my team are fluent with python I think Jupyter based interactive python interpreter and system like pyknow would perhaps lower the bar for them and still allow very flexible way of working with problem analysis.

@nilp0inter
Copy link
Contributor

Hi,

I finished implementing the first batch of optimizations.

pyknow_o

The time complexity drop dramatically after fixing issues #17 and #18 (light-blue line) which affected a really old part of the project, the conflict resolution strategy system. I changed the way it internally works and now the numbers are much better.

This is the result of executing your code now (# facts / seconds):

10 0.008934974670410156
100 0.03293299674987793
1000 0.21196937561035156
10000 2.2099316120147705
100000 24.78445839881897
1000000 405.15399289131165

Now 10k facts are processed in ~2 seconds. You can check the progress in the feature-optimization branch.

I will release a new version with this changes in the following days and keep working on further optimizations.

@TeemuAhola
Copy link
Author

Hi,

Tested the branch and it is lightning fast :)

Thank you!

@nilp0inter nilp0inter added this to the v1.6.0 milestone May 7, 2018
@nilp0inter
Copy link
Contributor

Pyknow v1.6.0 is out. Enjoy :)

@TeemuAhola
Copy link
Author

Thank you!

@shahintaj196
Copy link

how i can obtain policy proccessing time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants