Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate human-readable serialization for SymPy expressions #20

Closed
redeboer opened this issue May 19, 2022 · 10 comments · Fixed by #319
Closed

Investigate human-readable serialization for SymPy expressions #20

redeboer opened this issue May 19, 2022 · 10 comments · Fixed by #319
Assignees
Labels
❔ Question Discuss this matter in the team

Comments

@redeboer
Copy link
Member

redeboer commented May 19, 2022

Goals:

  • Dump expressions to some string format that is human-readable (as opposed to pickle)
  • Parse the dumped expressions in such a way that the imported object is equal (same hash) as the object that was dumped)
  • (?) Parse into a different version of SymPy, Python and other dependencies (this does not work with pickle)
@redeboer redeboer self-assigned this May 19, 2022
@redeboer redeboer changed the title Look into human-readable serialization for SymPy expressions Investigate human-readable serialization for SymPy expressions May 19, 2022
@mmikhasenko
Copy link
Contributor

@mmikhasenko
Copy link
Contributor

That would be super cool if it is also possible to serialized the undone-exressions.
Sort-of:

the full expression is

I = A + B +C

where

A = e+d+f
B = ...
C = ...

where

e = ...

The fully substituted expression is this

I = .... a lot ....

@redeboer redeboer added the ❔ Question Discuss this matter in the team label Jul 26, 2022
@mmikhasenko
Copy link
Contributor

mmikhasenko commented Dec 1, 2022

Let's chat about it at the next opportunity,

I am curious to learn some SymPy tree manipulations, serialization/deserialization from you.

It would be cool to have the cached expressions in json (expect relatively nice diffs)

@mmikhasenko
Copy link
Contributor

mmikhasenko commented Nov 14, 2023

String serialization with srepr actually works ok. It is different to print_to_string that I tried before.

The time of the serialization is also ok. With an example I just tried, it started exciting 0.5s at the length of the string ~100_000

image

@redeboer
Copy link
Member Author

Thanks for reporting! Nice that it also contains the symbol definitions, and indeed assumptions are stored as well:

>>> import sympy as sp
>>> x, y = sp.symbols('x y', real=True)
>>> n = sp.symbols('n', integer=True)
>>> expr = x**2 + y*n
>>> sp.srepr(expr)
"Add(Mul(Symbol('n', integer=True), Symbol('y', real=True)), Pow(Symbol('x', real=True), Integer(2)))"

As a side note to #20 (comment), srepr is an instance of some kind of Printer class:

>>> type(sp.srepr).__mro__
(<class 'sympy.printing.printer.srepr_PrintFunction'>, <class 'sympy.printing.printer._PrintFunction'>, <class 'object'>)

So it may be possible to inject that class to other printer mechanisms that support cse (see e.g. arguments to lambdify).

@mmikhasenko
Copy link
Contributor

great note.
I was wondering if there is a way to serialize not-fully-done expressions, e.g. keeping lineshape, momenta folded.
Might be possible it seems

@redeboer
Copy link
Member Author

keeping lineshape, momenta folded.
Might be possible it seems

That would be more complicated with the existing mechanisms that SymPy offers, it seems. Dumping is easy, but for parsing, you need to know where to load the class definition from. For #20 (comment) this works fine, because it is just built up of fundamental mathematical operations that are defined within the sympy module:

>>> src = sp.srepr(expr)
>>> from sympy import *
>>> eval(src)
n*y + x**2

Theoretically, you could dump the entire class definition of these 'folded' expressions, but at that stage you may better just run the entire program 😅

@mmikhasenko
Copy link
Contributor

Nonetheless, the srepr might be solving the issue.

Of course, it is not ideal, JSON would be better, but surprisingly I could not find the tree-like serialization format.

Could you please check if the entire Lc2pKpi model can be saved as a string?
How long is the string?
What is the time to write and read?

@redeboer
Copy link
Member Author

See #319. I also tried stringify_expr() and parse_expr() from #20 (comment), but could not get it to work. Tried this:

from sympy.parsing.sympy_parser import T, stringify_expr

local_dict = {}
global_dict = {}
src = stringify_expr(
    unfolded_intensity_expr,
    local_dict,
    global_dict,
    transformations=T,
)

but got

AttributeError: 'Add' object has no attribute 'strip'

so I'm misunderstanding something here. Feeding the result of srepr to parse_expr resulted in an expression that was a different number of nodes somehow.

@redeboer
Copy link
Member Author

redeboer commented Nov 14, 2023

Write/load time is in the order of seconds btw

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❔ Question Discuss this matter in the team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants