GSoC 2019 Report Shubham Kumar Jha: Improving Assumptions

GSoC 2019 Report

This Wiki summarises the work I have done during GSoC 2019 for SymPy.

About Me

My name is Shubham Kumar Jha. I am currently a final year undergraduate student at National Institute of Technology Agartala, India, in the department of Computer Science & Engineering.

Project Outline

SymPy currently has two systems for assumptions handling. The former (Old Assumptions) is time-tested (but quite limited) while the new system (New Assumptions) is powerful but has been very slow and not much used in the codebase.

The aim of this project was to improve the speed of New Assumptions and make it more powerful. Also, this project attempts to synchronize both the systems, so that developments in the new system can be used in the codebase.

A more detailed idea of the project can be found in my proposal.

Work Done

This project has brought great improvements in the performance of New Assumptions. Now querying has become very fast ( ~ 20X ).

Since the problem has been in SymPy's pipeline for a long time, there have been many attempts at it. In the initial phase, I tried to review all such works and benchmark them. The code related to the new system was also profiled. The results of the profiling suggested the following:

[1] The new system was built on top of SymPy core. Hence a great amount of time was spent in creating SymPy objects. Some of the constructors also sorted arguments, which was a huge performance bottleneck.
[2] The evaluation of CNF of expressions ( required for solving ) was itself very time-consuming.
[3] A great amount of time was also spent in rcall() during the evaluation of sathandlers.

During the project, I tried to work on them and improve the overall performance of querying. The following are the PRs made for these:

[Merged] #17144 : Improvement in satask

This PR tries to address [1] and [2]. A separate submodule was created, mostly utilizing the Python built-ins and removing unnecessary creation of SymPy objects, to process the evaluation of queries. The whole subroutine to convert an expression to CNF was rewritten (with Python built-ins ) for performance gain. The improvements in performance can be seen from tests:

Tests This PR Before this PR

test_satask 1.06 s 36.26 s

assumptions\tests 7.61 s 127.21
[Merged] #17379: Reduce the need for costly rcall() on SymPy expression

This PR addresses [3]. Again the in-between construction of SymPy objects was reduced. sathandlers were rewritten to use the constructs from #17144. This adds subtle performance gain to querying.

One significant example is,
```
from sympy import *
p = random_poly(x, 50, -50, 50)
print(ask(Q.positive(p), Q.positive(x)))
```
Before this PR, it took 4.292 s to execute, out of this 2.483 s was spent in rcall() while now the time spent is 1.929 s and 0.539 s respectively.
[Merged] #17440: Add pycosat as an optional dependency

This PR adds a more powerful ( but optional ) SAT solving engine pycosat to SymPy. Though SAT solver doesn't influence performance to that great extent, it can have weightage when expressions are large.

Using a large expression,
```
r = random_poly(x, 100, -100, 100)
ans = ask(Q.positive(r), Q.positive(x))
```
The time taken by SAT solver is like : 0.631(before) and 0.122(after)

Other than this, attempt made to improve the reach of New Assumptions.

[Unmerged] #17392: Adds support for querying Relational expressions

Currently, the new system doesn't support querying Relationals ( expression of the form (a>b)) . This PR attempts to allow simple queries with Relationals to be asked. E.g. the following should work with this:
```
from sympy import *
from sympy.abc import x,y
ask(x>y, Q.positive(x) & Q.negative(y))  # True
```

Some other PRs during this project:

[Unmerged] #16978: Make complex->finite in the codebase

This PR is for Old Assumptions to make the complex assumption imply finite. So, is_complex being True would mean is_finite is also True. This breaks zoo.is_complex throughout the codebase. I have tried to do this without including any extra facts (i.e. extended_complex for entities like zoo).

Currently, it is under discussion.
[Unmerged] #17609: First Order Logic with Equality

This PR attempts to add First Order Logic, which would benefit while extending New Assumptions.
[Merged] #16956: Improved code in Expr.lt and similar functions in Expr

This PR is also for Old Assumptions and improves the querying of Relational expressions.

Future Work

The unmerged/unfinished PRs should be finished and merged.

The New Assumptions is now fast enough to be used in codebase. More and more attempts should be made to integrate it with other parts of codebase and test it there.

The current approach for satask creates too many unnecessary clauses. Attempts should be made to prune facts which are not required for querying ( e.g. Q.positive(x) doesn't depend on facts like Q.diagonal(x) ,Q.invertible(x) , etc ).

More facts should be added to New Assumptions.

I plan to work on them post-GSoC and also encourage others who wish to contribute to this part of codebase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly