New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise exception when fast_reader called with incompatible formats, guess all available fast formats #5578

Merged
merged 25 commits into from May 12, 2018

Conversation

Projects
None yet
7 participants
@dhomeier
Contributor

dhomeier commented Dec 9, 2016

Test the updated exception handling in incompatible reader/fast_reader settings (see #5574) to check if they pass CI, and improve guess strategy to work with all supported formats of the fast reader.

  • with fast_reader explicitly requested, options and formats not supported by same now raise a ParameterError rather than silently falling back to a slow reader.

  • for tables with a mismatches in the number of data columns the fast reader now raises an InconsistentTableError just like its Python counterpart, instead of a CParserError. A different return remains in case of fewer columns specified in the header, where the Python parser raises a ValueError.

  • where fast_reader is enabled, guess=True should now try all formats available in fast versions instead of only the first one (FastBasic) before falling back on the slow readers (if fast_reader has been flagged as optional, i.e. fast_reader=True, but not 'force' or a specific fast_reader dict).

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Dec 9, 2016

Well, appears MSVC users had some of that functionality all along...

strtod expects nptr to point to a string of the following form:
[whitespace] [sign] [digits] [.digits] [ {d | D | e | E}[sign]digits]
@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Dec 14, 2016

I've revisited the speed tests in the ascii_read_bench notebook to investigate the performance penalties; as expected for optimum performance with all table sizes you should still turn guessing off.
ascii.read(format='basic', guess=False):
ascii_basic-noguess
ascii.read(guess=True):
ascii-guess

@MSeifert04

This comment has been minimized.

Contributor

MSeifert04 commented Dec 14, 2016

What's the "Fast converter"?

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Dec 14, 2016

In the tests above, it's with the fast_reader=dict(use_fast_converter=True) options (and actually also with the new exponent_style='Fortran' option, but the exponent handling does not seem to have a noticeable performance impact).

@MSeifert04

This comment has been minimized.

Contributor

MSeifert04 commented Dec 14, 2016

okay, then what is "io.ascii Fast-C"? Is this the same but with not-fortran-style exponents?

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Dec 14, 2016

No, with use_fast_converter=False, i.e. using the stdlib strtod() parser instead of the optimised xstrtod(). exponent_style is an additional option available for the latter (see #5552 for details).

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Dec 14, 2016

The notebook needed a couple of updates to get it to work with the current master:
ascii_read_bench

@pllim

This comment has been minimized.

Member

pllim commented Jan 15, 2017

So, this PR has evolved beyond the original intend of just raising errors to fix test failures in parallel mode, is my interpretation correct?

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Jan 15, 2017

The original issues in #5574 were not specifically related to parallel mode, but more generally to requests of fast_reader-specific options when actually a slow reader was called. The latter turned out to happen much more often than expected, so yes, a large part of the PR now is avoiding these fallbacks to the slow reader. This includes that _guess will now in most cases find the correct fast reader.

@pllim

This comment has been minimized.

Member

pllim commented Jan 15, 2017

Not being familiar with the functionality myself, is it correct to assert this as an API change that effects released version? Just need to know for tagging and milestoning. Thanks!

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Jan 15, 2017

Not sure if it qualifies as an API change; ascii.read() will now stick with the fast reader in a number of cases where it would have fallen back to the slow reader previously. The only cases where this should change results are for the reading of Fortran-formatted data just introduced in 1.3. E.g. in the released version only space-separated ('basic') formats would have been correctly detected:

>>> ascii.read("a b\n0.12d0 2.d3", fast_reader=dict(exponent_style='d'))
<Table length=1>
   a       b   
float64 float64
------- -------
   0.12  2000.0
>>> ascii.read("a,b\n0.12d0,2.d3", fast_reader=dict(exponent_style='d'))
<Table length=1>
  a     b  
 str6  str4
------ ----
0.12d0 2.d3

whereas now

>>> ascii.read("a,b\n0.12d0,2.d3", fast_reader=dict(exponent_style='d'))
<Table length=1>
   a       b   
float64 float64
------- -------
   0.12  2000.0
@pllim

This comment has been minimized.

Member

pllim commented Jan 17, 2017

The only cases where this should change results are for the reading of Fortran-formatted data just introduced in 1.3. E.g. in the released version only space-separated ('basic') formats would have been correctly detected

Thanks for the clarifications! This sounds like a bug fix, so I'll tag is as so for now. Subpackage maintainer can re-tag as appropriate.

@pllim pllim added this to the v1.3.1 milestone Jan 17, 2017

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Jan 18, 2017

Yes, I'd certainly label it as a bug fix (even where the results were not changed, in some instances it would silently fall back on the slower reader). Would still be interesting to know it it in fact solves #5693 as well, or if there is more work to do.

@pllim

This comment has been minimized.

Member

pllim commented Jan 18, 2017

Pinging @olebole in regards to testing this for #5693

@olebole

This comment has been minimized.

Member

olebole commented Jan 19, 2017

@plim That is difficult since I have no direct access to the framework that produced the error -- it is run only on our unstable and testing areas, which are the official and proposed packages. Because of our upcoming freeze, I don't want to put that into unstable just for a test.

@bsipocz

This comment has been minimized.

Member

bsipocz commented Feb 15, 2017

Given the previous comment, I'm re-milestoning this to v1.3.2.

@bsipocz bsipocz modified the milestones: v1.3.2, v1.3.1 Feb 15, 2017

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented Mar 9, 2017

@bsipocz, if I understood @olebole right, it is now merged into Debian unstable and passes the tests.
So all we do not know for sure at this point is, if this was actually this PR that fixed #5693, or if the error simply did not show up because it only appears erratically.
Not sure what else is left to do here, except perhaps construct a much larger and more complex table to the test that might force the test failure to show up.

@bsipocz

This comment has been minimized.

Member

bsipocz commented May 10, 2018

@taldcroft - It seems that I've managed to rebase, pushed the result to my fork to be sure that travis passes. (The diff be viewed here: https://github.com/astropy/astropy/compare/master...bsipocz:dhomeier_fortranexp2?expand=1, travis is running here: https://travis-ci.org/bsipocz/astropy/builds/377092449)

If you're happy with it, I can force push it to this branch.

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented May 10, 2018

Thanks @bsipocz; I assume the download failure in the coverage tests is unrelated.
I have two more commits adding a test for the fast_reader dict staying intact (failing on master, passing here) and an update to the docs. They are simple additions to test_read.py and read.rst, so should be trivial to merge either before or after you've pushed your fork - just let me know when to push mine.

@bsipocz

This comment has been minimized.

Member

bsipocz commented May 10, 2018

@dhomeier - Please add those commits to here. I can always cherry pick those commits, but would rather not force push the rebase without @taldcroft's approval.

@taldcroft

This comment has been minimized.

Member

taldcroft commented May 12, 2018

@bsipocz - thanks!! It looks like there are a few diffs that I don't entirely understand between your rebase and the last commit when you did the rebase (before the last two), i.e.

git diff dee6318 dhomeier_fortranexp2

However, these are all just trivial formatting / whitespace diffs so I am 👍 doing the force push, then cherry-picking the remaining commits. (And possibly doing a final clean-up to re-apply the format/whitespace diffs).

@bsipocz bsipocz force-pushed the dhomeier:fortranexp2 branch from e87142f to d009884 May 12, 2018

@bsipocz

This comment has been minimized.

Member

bsipocz commented May 12, 2018

@taldcroft - Indeed, those whitespace stuff were rebase remnants from an earlier commit. I've cleaned them up in the last cleanup commit.
To play it safe a copy of the original wast kept in the dhomeier_fortranexp2_backup_original branch, the only remaining diff is kept to be more consistent with the locally preferred line length.

@taldcroft

This comment has been minimized.

Member

taldcroft commented May 12, 2018

I don't know of any outstanding issues left.

@dhomeier and @bsipocz: good for merge?

@bsipocz

This comment has been minimized.

Member

bsipocz commented May 12, 2018

@taldcroft - yes, I think it's good to go.

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented May 12, 2018

@taldcroft, seems I cannot view the current changeset after the forced push, but from my last commit and your cleanup everything looks good to me, too.

@taldcroft taldcroft merged commit ad818a9 into astropy:master May 12, 2018

8 checks passed

astropy-bot All checks passed
Details
ci/circleci: 32bit Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl153 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl202 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl212 Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 84.786%
Details
@taldcroft

This comment has been minimized.

Member

taldcroft commented May 12, 2018

🎉
Thanks @dhomeier and @bsipocz !

@bsipocz

This comment has been minimized.

Member

bsipocz commented May 13, 2018

🎉

Thanks @dhomeier for the contribution and your patience!

@hamogu

This comment has been minimized.

Member

hamogu commented May 13, 2018

Adding to the list, this really was a concentrated effort by all three of you, @dhomeier, @taldcroft, and @bsipocz. The PR would not have been finished in this form if any of you had not put in the large effort that you did to finish this.

It's a testament to the complexity of our ascii reader classes that it needed that much effort, but thanks to you be can continue to improve the performance and consistency to make them even better.

@dhomeier

This comment has been minimized.

Contributor

dhomeier commented May 14, 2018

Thanks @taldcroft and @bsipocz for helping to get this PR into a mergeable state, this was indeed a task that had grown beyond my means!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment