Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: mathtext accents #4887

Merged
merged 5 commits into from Sep 1, 2015
Merged

Conversation

tacaswell
Copy link
Member

This is a follow up to #4588

The change in ae91e9f fixed
the symbols that started with accent names, but broke all of
the other accents.

This fixes both by special casing the 8 named symbols that start with
an accent as a prefix.

ex '\doteq' should be parsed as a single symbol, not as as two symbols
(e, q) with a dot over the e ('\dot{e}q')

attn @zblz

@tacaswell tacaswell added Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. topic: text labels Aug 8, 2015
@tacaswell tacaswell added this to the next point release milestone Aug 8, 2015
@tacaswell
Copy link
Member Author

In adding tests for this I confirmed that out text tests are not strict enough to catch major issues. The second commit on this PR changes what symbols are in the test and it still passed locally.

@zblz
Copy link
Member

zblz commented Aug 8, 2015

@tacaswell: Look at all those nice accents and symbols:
accent_test_mpl_arevsans_tacaswell

Thanks for fixing this. Is there a way to set a per-test tolerance in the mathtext tests? I imagine we don't want to jack up the strictness for all tests, but we really need an accent test.

@zblz
Copy link
Member

zblz commented Aug 8, 2015

I gave it a shot and made a new category in test_mathtext called accent_test with a few accent tests and lowered their tolerance to 16. Using baseline images from this PR, the tests now fail, even though it seems that most of the discrepancy comes from the modified spacing when accents are shifted rather that the accents themselves.

The branch is here: https://github.com/zblz/matplotlib/tree/fix-accent-tests

@tacaswell : do you want me to do a PR on that or will you cherry-pick them here?

@tacaswell
Copy link
Member Author

I think we do want to push the threshold up on all of them.

I started to do some work last night to sort out what fails when the threshold is push down and some of them should fail for example \doteq is used in test 01 which has been happily passing.

@tacaswell
Copy link
Member Author

fun thing to do, apply the following threshold

diff --git a/lib/matplotlib/tests/test_mathtext.py b/lib/matplotlib/tests/test_mathtext.py       
index 745588a..71d9d0a 100644                                                                    
--- a/lib/matplotlib/tests/test_mathtext.py                                                      
+++ b/lib/matplotlib/tests/test_mathtext.py                                                      
@@ -151,7 +151,7 @@ for fonts, chars in font_test_specs:                                         
 def make_set(basename, fontset, tests, extensions=None):                                        
     def make_test(filename, test):                                                              
         @image_comparison(baseline_images=[filename], extensions=extensions,                    
-                          tol=32)                                                               
+                          tol=10)                                                               
         def single_test():                                                                      
             matplotlib.rcParams['mathtext.fontset'] = fontset                                   
             fig = plt.figure(figsize=(5.25, 0.75))    
python tests.py -s --processes=4 --process-timeout=300 matplotlib.tests.test_mathtext 2> math-fail.txt
with open('/home/tcaswell/other_source/matplotlib/math-fail.txt') as f:
    lns = [ln.strip()[51:-5].split('_') for ln in f.readlines() if ln.startswith('FAIL:')]

df = pd.DataFrame({k: v for k, v in zip(['font', 'number'], zip(*lns))})

Gives

In [104]: df.groupby('number').count()
Out[104]: 
        font
number      
00         9
01         9
05         9
06         5
07         4
09         9
15         1
16         2
17         2
24         2
26         2
27         9
30         4
39         2
42         2
44         2
45         3
50         9
51         1
54         9
55         3
56         5
58         1
60         6
61         3
62         2
64         9
72         9

Some of the baseline images (like 01) are clearly wrong.

@tacaswell
Copy link
Member Author

Brain dump of my notes from the tests that have 9 of 9 fail with a tol=0, 63 (?!) of them have at least 1 failure.

*** 00 (fixed)
 - \dots is really a symbol, should change

*** 01 (fixed)
 - baseline image is wrong, built against bad rendering of \doteq

*** 05
 - spacing issues

*** 06
 - spacing issues

*** 07
 - spacing issues

*** 09
 - spacing issues, looks like baseline is correct

*** 26 (fixed)

 - accents in the wrong place

*** 27
 - spacing issues, looks like baseline is wrong

*** 34 (fixed)
 - length of frac line was a bit too long, shortened
 - probably due to issues with right pad on super

*** 39
 - wide accents slightly shifted, not clear to me which one is
   correct

*** 42
 - spacing issues

*** 44

 - frac length issues
 - binomial does not look right
 - superscript looks too close to base letter
 - probably due to issues with right pad on super

*** 50
 - spacing issue, looks like baseline is correct

*** 54
 - spacing issues

*** 56
 not clear either version is correct

    r'${\int }_{1}^{x}\frac{\mathrm{dt}}{t}$',

*** 60

 spacing issues from right pad on superscript

*** 64
 - spacing issues

*** 72 (fixed)
 - clearly wrong

@zblz I can pull the last 3 commits if they are conflicting with what you have been working on which I have not been following as closely as I should have been.

@zblz
Copy link
Member

zblz commented Aug 9, 2015

@tacaswell: They will definitely conflict with PRs #4872 and #4873, for they change spacing of symbols and position of sub/superscripts respectively. With tol=10 the tests will likely fail with those changes (good!). I removed the baseline images from #4873 because I am still finetuning some corner cases.

If you merge this PR I can rebase on these without baseline images and then regenerate them.

@tacaswell
Copy link
Member Author

I pulled off the last three commits, I would rather fix them up all at once then have multiple generations of the test images in the repo.

The changes to 01 do not conflict with your work. There will be a merge conflict on 72, but that should not be too hard to clean up.

@tacaswell
Copy link
Member Author

@matplotlib/developers @mdboom is on vacation, can anyone else review the changes to the parser logic?

@@ -12,7 +12,7 @@
from matplotlib import mathtext

math_tests = [
r'$a+b+\dots+\dot{s}+\ldots$',
r'$a+b+\dot s+\dot{s}+\ldots$',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this case is intended to test that \dots produces three dots and \dot{s} produces an s with a dot over it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test to match the image, I will update image to test all three.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the TeX definitions, \dots should not produce three dots: only \ldots should. \dots would give an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zblz Can you point me to a reference on that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I just checked and it appears I was wrong. Sorry for that. \dots should indeed produce three dots and is a text and math command, whereas \ldots (and other \cdots, etc) are math-only commands. I checked it in the LaTeX comprehensive symbol list table 3 for math and text and table 189 for math only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The \dots that I know is a macro in amsmath, it includes some magic to choose a suitable height for the dots: http://tex.stackexchange.com/questions/649/how-do-magic-dots-work-in-amsmath

@mdboom
Copy link
Member

mdboom commented Aug 20, 2015

The changes to the parser seem like a bit of a hack, but I don't know if that should hold up this PR. I think longer term, the mathtext parser is due for a simplification or overhaul -- it's grown rather crufty with special cases over the years. But perfect is probably the enemy of the good here. At least snowflake is autogenerated here so it should work even as new symbols are added.

@tacaswell
Copy link
Member Author

I picked the name snow flake because it is a hack 😉

I spent a fair amount of time trying to write parsing rules that would not match depending on the next character, but I don't have enough experience with parse/lexxing to be sure if I did not sort it out because I don't know what I am doing or because it can't be done.

In either case, this either needs to be replaced by a better solution or merged for 1.5 as my last attempt to touch the parser really really broke it (and it turns out that comment was right).

@mdboom
Copy link
Member

mdboom commented Aug 28, 2015

I'm happy to merge this once it's rebased and tested again...

This is a follow up to matplotlib#4588

The change in ae91e9f fixed
the symbols that started with accent names, but broke all of
the other accents.

This fixes both by special casing the 8 named symbols that start with
an accent as a prefix.

ex  '\doteq' should be parsed as a single symbol, not as as two symbols
(e, q) with a dot over the e ('\dot{e}q')
*Note this does not update the test image and still passes locally*
zblz and others added 3 commits August 28, 2015 22:50
Updated
 - 01 (snowflake)
 - 05 (spacing)
 - 06 (spacing)
 - 20 (spacing)
 - 21 (spacing)
 - 37 (spacing)
 - 53 (spacing)
 - 54 (spacing)

Added:

 - 77 (snowflake testing)
@tacaswell
Copy link
Member Author

@zblz @mdboom Both are rebased on current master and all of the test updating shoved into one commit.

@tacaswell
Copy link
Member Author

The second round of updated test images is cases where all 9 tests failed for that test at a tolerance of 15

Changed my mind about including tests at a higher threshold that is currently required, I don't have the bandwith to check that the new images are any more correct that the old.

@tacaswell
Copy link
Member Author

ping @mdboom

mdboom added a commit that referenced this pull request Sep 1, 2015
@mdboom mdboom merged commit b7ef642 into matplotlib:master Sep 1, 2015
@tacaswell tacaswell deleted the fix_mathtext_accents branch September 1, 2015 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. topic: text
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants