# Finding math that breaks **sympy.parsing.latex.parse_latex**

Based on some [feedback on this PR](https://github.com/sympy/sympy/pull/13706#issuecomment-359944477), 
let's look at some semi-automated ways to test $\LaTeX$ parsing in `sympy`. 

Roughly, we'll:
- Generate some basic [`hypothesis`](http://hypothesis.works/) strategies for expressions from the `sympy` code base
- Generate some more complex strategies with some custom code
- Test ability to even print with `sympy.printing.latex.latex`
- Test with sources of truth
- Find some examples that break current parsing behavior!
- Bonus: some other strategies

In [6]:
import operator as op
from tempfile import mkdtemp
import subprocess
import os
import shutil
import re

import attr

from hypothesis import given, assume, settings, strategies as st
from IPython import display

import sympy as S
from sympy import (
    Symbol, Expr,
)

## Some Basic Strategies
The simple strategies, `sampled_from` and `from_regex` provide a lot
of value when we know a fair amount about the structure.

In [7]:
variable_names = st.from_regex(re.compile(r"\A[a-z]\Z", re.IGNORECASE))

_numeric_unary_ops = st.sampled_from([
    S.sin, S.tan, S.cos, S.acos, S.sec, S.acos, S.atan, S.asec, S.Abs
])

_numeric_binary_ops = st.sampled_from([
    op.add, op.sub, op.mul, op.pow, op.truediv
])

In [8]:
@st.composite
def symbols(draw, name=variable_names):
    return Symbol(draw(name))

In [9]:
@st.composite
def floats(draw):
    precision = draw(st.integers())
    assume(precision > -1)
    assume(precision < 35)
    return S.Float(draw(st.floats()), precision)

## Simple functions

In [10]:
@st.composite
def numeric_unary_expressions(draw):
    return draw(_numeric_unary_ops)(draw(numeric_expressions))

## Common two-argument functions

In [11]:
@st.composite
def numeric_binary_expressions(draw):
    return draw(_numeric_binary_ops)(draw(numeric_expressions), draw(numeric_expressions))

## The `numeric_expressions`
This is a high-level representation of a number-y thing, and is used frequently above.

In [12]:
numeric_expressions = (
    floats() | 
    symbols() | 
    numeric_binary_expressions() |
    numeric_unary_expressions()
)

## Relational expressions

In [13]:
comparators = st.sampled_from([
    op.gt, op.ge, op.lt, op.le, op.eq, op.ne
])

In [21]:
@st.composite
def relational_expressions(draw):
    expr = None
    try:
        expr = draw(comparators)(draw(numeric_expressions), draw(numeric_expressions))
    except:
        pass
    assume(expr is not None)
    
    return expr

## Expressions that don't fail **sympy.printing.latex.latex**

The maturity of `sympy.printing.latex.latex` should be considered close to an (opinionated)
production-grade typesetting approach.

If it can't work with what we've built, we probably don't care to handle it yet.

We also don't care about empty strings, for the time being.

Note that this returns the `latex_str` so we don't have to recalculate it later.

In [37]:
@st.composite
def latex_printable_expressions(draw):
    expr = draw(numeric_expressions | relational_expressions())
    
    latex_str = None
    try:
        latex_str = S.latex(expr)
    except Exception as err:
        pass
    assume(latex_str)
    return (expr, latex_str)

## Expressions that don't fail "real" `latex`
The ultimate source of truth in $\LaTeX$ parsing is a canonical `latex` distribution like `pdflatex` or `xelatex`. 
Because we can easily `display` PDF in Jupyter, we can use this, along with the `MathJax` representation 
for multiple verifications.

In [39]:
class PDF(object):
    def __init__(self, pdf):
        self._pdf = pdf
    def _repr_pdf_(self):
        return self._pdf

We need a very basic $\LaTeX$ document in order to work with the command line tools.

In [40]:
LATEX_DOC = r"""
\documentclass[a4paper]{article}
 
\begin{document}
$$
%s
$$
\end{document}
"""

In [41]:
@st.composite
def typesettable_expressions(draw):
    expr, latex_str = draw(latex_printable_expressions())

    tmpdir = mkdtemp()
    tmp_tex_path = os.path.join(tmpdir, "expr.tex")
    tmp_pdf_path = os.path.join(tmpdir, "expr.pdf")

    success = None
    pdf = None
    try:
        with open(tmp_tex_path, "w+") as fp:
            fp.write(LATEX_DOC % latex_str)
        subprocess.check_call([
            "pdflatex",
            tmp_tex_path
        ], cwd=tmpdir)
        with open(tmp_pdf_path, "rb") as fp:
            pdf = PDF(fp.read())
        success = True
    except Exception as err:
        pass
    finally:
        shutil.rmtree(tmpdir)
    
    assume(success)

    return expr, latex_str, pdf

In [42]:
@given(typesettable_expressions())
@settings(deadline=None, perform_health_check=False)
def test_latex_roundtrip(expr_latex_pdf):
    expr, latex_str, pdf = expr_latex_pdf
    expr_parsed = None
    err = None
    try:
        expr_parsed = parse_latex(latex_str)
    except Exception as err:
        pass
    if expr == expr_parsed:
        return
    raise ValueError([expr, expr_parsed, latex_str, pdf])

In [43]:
try:
    test_latex_roundtrip()
except ValueError as err:
    expr, expr_parsed, latex_str, pdf = err.args[0]
    for k, ex in {"expr": expr, "parsed": expr_parsed}.items():
        display.display(display.Markdown(f"### {k}"))
        print(ex.__class__.__mro__)
        print(ex)
    display.display(display.Markdown("### LaTeX Source\n```latex\n%s\n```" % latex_str))
    display.display(display.Latex("$$ %s $$" % latex_str))
    display.display(pdf)

Falsifying example: test_latex_roundtrip(expr_latex_pdf=(sin(A), '\\sin{\\left (A \\right )}', <__main__.PDF at 0x7f8a576a49b0>))

You can reproduce this example by temporarily adding @reproduce_failure('3.44.4', b'AAMAAQA=') as a decorator on your test case


### expr

(sin, TrigonometricFunction, Function, Application, <class 'sympy.core.expr.Expr'>, <class 'sympy.core.basic.Basic'>, <class 'sympy.core.evalf.EvalfMixin'>, <class 'object'>)
sin(A)


### parsed

(sin, TrigonometricFunction, Function, Application, <class 'sympy.core.expr.Expr'>, <class 'sympy.core.basic.Basic'>, <class 'sympy.core.evalf.EvalfMixin'>, <class 'object'>)
sin(left(A*right))


### LaTeX Source
```latex
\sin{\left (A \right )}
```

<IPython.core.display.Latex object>

<__main__.PDF at 0x7f8a576a49b0>

### Having a look at `expr` & `parsed`

In [None]:
types = list(map(type, [expr, expr_parsed]))
print(types)
assert len(set(types)) == 1, "they are not the same type"

In [None]:
assert expr == expr_parsed, "they're not equal"

In [None]:
expr_parsed.atoms()

In [None]:
expr.atoms()

# Sir Not-Appearing-In-This-Tool

Not using this for anything yet, but this will generate expressions that don't fail `.simplify`.

In [None]:
@st.composite
def simplifiable_expressions(draw):
    expr = draw(numeric_expressions)
    
    success = None
    try:
        expr.simplify()
        success = True
    except Exception as err:
        pass
    finally:
        assume(success)
    return expr