Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] LaTeX table generator #156

Merged
merged 32 commits into from
Aug 10, 2022
Merged

[Feature] LaTeX table generator #156

merged 32 commits into from
Aug 10, 2022

Conversation

MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Jul 2, 2022

This generates a booktabs-style LaTeX table for a subset of equations. Here is an example:

import numpy as np
from pysr import PySRRegressor

X = 2 * np.random.randn(100, 5)
y = 2.5382 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 0.5

model = PySRRegressor(
    niterations=80,
    binary_operators=["+", "*"],
    unary_operators=["cos"],
    model_selection="best",
    loss="loss(x, y) = (x - y)^2",  # Custom loss function (julia syntax)
    maxsize=11,
)

model.fit(X, y)

print(model.latex_table(precision=3, include_score=True))

The output of this is:

\begin{table}[h]
\begin{center}
\begin{tabular}{@{}lccc@{}}
\toprule
Equation & Complexity & Loss & Score \\
\midrule
$3.9$ & 1 & 38.9 & 0 \\
$x_{0}^{2}$ & 3 & 3.16 & 1.26 \\
$x_{0}^{2} - 0.257$ & 5 & 3.09 & 0.0105 \\
$x_{0}^{2} + \cos{\left(x_{3} \right)}$ & 6 & 1.26 & 0.898 \\
$x_{0}^{2} + 2.44 \cos{\left(x_{3} \right)}$ & 8 & 0.245 & 0.818 \\
$x_{0}^{2} + 2.54 \cos{\left(x_{3} \right)} - 0.5$ & 10 & 2.28e-13 & 13.9 \\
\bottomrule
\end{tabular}
\end{center}
\end{table}

which renders as:
image

Leaving include_score set to False will leave out the Score column. Precision can be adjusted to have more or less precise constants.

One can render only a subset of equations by using latex_table([1, 4]) which only includes the 1st and 4th equation in model.equations_.


Edit: it now renders the e-13 as \cdot 10^{-13}

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Jul 2, 2022

@tttc3 or @kazewong do you think you could take a look at this? I am curious about your opinion on the API? Would any other options be useful here?

pysr/export_latex.py Outdated Show resolved Hide resolved
pysr/export_latex.py Outdated Show resolved Hide resolved
@MilesCranmer MilesCranmer changed the title LaTeX table generator: [Feature] LaTeX table generator Jul 2, 2022
@kazewong
Copy link

kazewong commented Jul 2, 2022

@MilesCranmer Do you want me to review this PR? If you assign it to me then I can have a look at this.

@MilesCranmer
Copy link
Owner Author

Thanks! Added

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Jul 3, 2022

The weird inconsistency in the number of significant figures here is a SymPy issue which I raised here: sympy/sympy#23719. We can just leave it as-is until it gets fixed on their end.

Fixed.

@MilesCranmer MilesCranmer linked an issue Jul 4, 2022 that may be closed by this pull request
@tttc3
Copy link
Contributor

tttc3 commented Jul 5, 2022

@tttc3 or @kazewong do you think you could take a look at this? I am curious about your opinion on the API? Would any other options be useful here?

This will be a really nice feature to have!

Could this functionality be included into the PySRRegressor.latex() method? A key word argument such as format_as_table=True could be used to specifiy if to return a table just the equation?

Being able to specifiy the table row order by loss, complexity, or score would also be a nice addition. Similarly, column order would also be useful. Perhaps this could be handled by having latex() take a columns=["Equations","Complexity",...] parameter that determines the column order of the table. By defualt I think it should print all the available columns (including score), then the user can specify a restricted set if they wish.

The resulting method signature would look something like this:

def latex(
    self,
    indices=None,
    format_as_table=False,
    precision=3,
    columns=["Equations", "Complexity", "Loss", "Score"]
)

Where indices could be either an int or a list of ints.

As they aren't used anywhere else, is there an advantage to separating generate_top_of_latex_table and generate_bottom_of_latex_table out of the main latex_table function?

@MilesCranmer
Copy link
Owner Author

Thanks @tttc3, that's a great idea! I will implement it and re-upload.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Jul 9, 2022

Updates:

  1. I tested out the idea of putting it all in latex. However with the change, it seems like there was too much functionality to one function–it seemed cleaner to have a separate function. More importantly, whereas current functions (sympy, latex, jax, torch) accept only a single integer index (assuming nout_=1), the function latex_table accepts a list of integers to put in the table. When nout_>1, they accept a list of integers, and latex_table will accept a list of lists. But letting latex work with both single-output and multi-output seems too tricky to work into the API. What do you think?
  2. latex_table() now works for multi-output equations! They are simply in separate tables right now, though perhaps I could look at combining them in one somehow.
  3. I implemented the change to include columns as a parameter so the user can specify the order and which columns are given.
  4. Regarding sorting - this and other fine-tuning might be better left to the user? What do you think? (The default sorting is descending by loss and also complexity). They could also pass the indices with the sort order they want.

Cheers,
Miles

@tttc3
Copy link
Contributor

tttc3 commented Jul 12, 2022

Re 1: I think this is a fair point, it would probably end up confusing if the methods differed too much. Also, depending on the desire to add other output formats in the future (e.g. txt, CSV, etc...), your current implementation would allow easier generalisation to other table output formats.

Re 4: I agree, sorting rows is not particularly arduous in comparison to rearranging columns.

@MilesCranmer
Copy link
Owner Author

Great points, thanks!

By the way, do you (or @kazewong?) know if there's a way to automatically break equations inside a table? I found this: https://tex.stackexchange.com/a/3785/140440 but it seems to not work inside tables. Since many discovered equations are quite long, it would be nice if latex_table automatically broke equations into multiple lines.

@kazewong
Copy link

kazewong commented Jul 14, 2022

I think with some engineering you should be able to get away with the methods shown in these two pages:
https://www.overleaf.com/learn/latex/Tables (Scroll down to the section Combining rows and columns)
https://texblog.org/2012/12/21/multi-column-and-multi-row-cells-in-latex-tables/
You probably want to do the automatic part in python, say if check the length of the equation in latex form, if it is longer than some fix size then break it with multi row

@MilesCranmer
Copy link
Owner Author

Thanks!

Okay it looks like this can be done with:

\vbox{
\begin{flushleft}
$\displaystyle <equation>$
\end{flushleft}
}

This will automatically break it into pieces inside the table. So I guess this can be added for equations that are too long.

@MilesCranmer
Copy link
Owner Author

Did some updates to the generated output. This is what it looks like for long equations:

\begin{table}[h]
\begin{center}
\begin{tabular}{@{}cccc@{}}
\toprule
Equation & Complexity & Loss & Score \\
\midrule
$y = 5.9560$ & $1$ & $24.388$ & $0.0$ \\
$y = 4.4687 x_{0}$ & $3$ & $4.1362$ & $0.88715$ \\
$y = 2.0248 x_{0}^{2}$ & $5$ & $0.61547$ & $0.95257$ \\
$y = 1.8967 x_{0}^{2} + 0.66315$ & $7$ & $0.41267$ & $0.19987$ \\
\begin{minipage}{0.8\linewidth} \vspace{-1em} \begin{dmath*} y = x_{0} \left(x_{0} \left(0.84477 x_{0} - 1.1845\right) + 3.2887 + \frac{0.19070 - x_{0}}{x_{0}}\right) + 0.59160 \end{dmath*} \end{minipage} & $29$ & $0.0061356$ & $1.7765 \cdot 10^{-12}$ \\
\bottomrule
\end{tabular}
\end{center}
\end{table}

which renders as:
image

What do you think?

@tttc3
Copy link
Contributor

tttc3 commented Jul 20, 2022

Looks good to me. I have heard anecdotal incompatibility issues with breqn but have never used the package myself, perhaps @kazewong is more informed on this?

@MilesCranmer
Copy link
Owner Author

I wonder if I should try to align the equations with equation breaks, like shown here: https://tex.stackexchange.com/a/347011/140440? I am not sure how to make a table with this, though.

@kazewong
Copy link

Looks good to me. I have heard anecdotal incompatibility issues with breqn but have never used the package myself, perhaps @kazewong is more informed on this?

I have not used breqn too tho I don't see where it is used in the current output. In general I think having less dependencies is better, and I think it is possible for this PR

@kazewong
Copy link

I wonder if I should try to align the equations with equation breaks, like shown here: https://tex.stackexchange.com/a/347011/140440? I am not sure how to make a table with this, though.

Maybe try align left instead of center? Assuming the left hand side is always just y, this should work just fine, right? more specifically, change
\begin{tabular}{@{}cccc@{}}
to
\begin{tabular}{@{}lccc@{}}

image

BTW, 'toprule', midrule, bottomrule, and dmath doesn't work out of the box on overleaf. They are probably package depencies problem. Should probably fix them

@MilesCranmer
Copy link
Owner Author

  • toprule, midrule, bottomrule = \usepackage{booktabs} (nice-looking tables)
  • dmath = \usepackage{breqn} (automatic equation breaks)

I think a warning could be printed that one should include them? I'm not sure. The other option for equation breaking is amsmath's \displaymath, but it's another dependency people will have to manually include. It also doesn't look as good.

In your example:
image
this weird alignment is what breqn helps avoid - see how the equation is aligned correctly even though it is broken:
image

@kazewong
Copy link

kazewong commented Jul 26, 2022

I think amsmath is much more common than breqn. Also I don't think you need to use displaymath to solve the problem. In my previous example I was mainly showing that for left aligning while not caring about the dmath problem. If you use align, I think multiline equation can be solved much more elegantly. See the example below

image

\begin{table}[h]
\begin{center}
\begin{tabular}{@{}lccc@{}}
\toprule
Equation & Complexity & Loss & Score \\
\midrule
$y = 5.9560$ & $1$ & $24.388$ & $0.0$ \\
$y = 4.4687 x_{0}$ & $3$ & $4.1362$ & $0.88715$ \\
$y = 2.0248 x_{0}^{2}$ & $5$ & $0.61547$ & $0.95257$ \\
$y = 1.8967 x_{0}^{2} + 0.66315$ & $7$ & $0.41267$ & $0.19987$ \\
$\begin{aligned}
y &= x_{0} \left(x_{0} \left(0.84477 x_{0} - 1.1845\right) + 3.2887 + \frac{0.19070 - x_{0}}{x_{0}}\right) \\&+ 0.59160
\end{aligned}$
& $29$ & $0.0061356$ & $1.7765 \cdot 10^{-12}$ \\
\bottomrule
\end{tabular}
\end{center}
\end{table}
\end{document}

@MilesCranmer
Copy link
Owner Author

But, in that code, you are manually typing \\&. breqn will automatically do that for you. Is there a way to automatically do that with amsmath?

@MilesCranmer MilesCranmer merged commit 0c1c3db into master Aug 10, 2022
@MilesCranmer MilesCranmer deleted the latex-table branch August 10, 2022 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] LaTeX table generator
3 participants