Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed io.acii read IPAC tables for long column types on Windows #15992

Merged
merged 2 commits into from Feb 8, 2024

Conversation

orionlee
Copy link
Contributor

@orionlee orionlee commented Feb 5, 2024

Description

This pull request is to address reading IPAC table files with long columns on Windows platforms.
It forces long columns to use np.int64 as the default data type (rather than 32-bit integers on Windows).

Fixes #15989 for v6.0.x only. The fix for this issue in main (v6.1 and later) is more comprehensive at #16005 but it cannot be backported.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

Copy link

github-actions bot commented Feb 5, 2024

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • Is a milestone set? Milestone must be set but we cannot check for it on Actions; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@orionlee orionlee force-pushed the io_ipac_long_columns_windows_compat branch from a10438c to 594b4b7 Compare February 5, 2024 00:19
@orionlee orionlee changed the title Fixed io.acii ipac read for long column types on Windows Fixed io.acii read IPAC tables for long column types on Windows Feb 5, 2024
@orionlee orionlee changed the title Fixed io.acii read IPAC tables for long column types on Windows Fixed io.acii read IPAC tables for long column types on Windows Feb 5, 2024
@orionlee orionlee force-pushed the io_ipac_long_columns_windows_compat branch from 594b4b7 to dcaa223 Compare February 5, 2024 00:45
@@ -214,6 +216,10 @@ def get_cols(self, lines):
null = header_vals[3][i].strip()
fillval = "" if issubclass(col.type, core.StrType) else "0"
self.data.fill_values.append((null, fillval, col.name))
if "long".startswith(col.raw_type.lower()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused by this line. Is it not doing more than it should ? Is, e.g. "Lo" a correct way to specify long ints ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure long can be abbreviated as Lo per IPAC spec, but current astropy implementation does allow all kind of abbreviation.

The above line mimics the existing ones in get_col_type() in

if col_type_key.startswith(col.raw_type.lower()):
:

The data type abbreviation has been explicitly tested in:

def test_ipac_abbrev():
lines = [
"| c1 | c2 | c3 | c4 | c5| c6 | c7 | c8 | c9|c10|c11|c12|",
"| r | rE | rea | real | D | do | dou | f | i | l | da| c |",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thank you very much for clarifying this !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolving just so that others can see this too!

@@ -214,6 +216,10 @@ def get_cols(self, lines):
null = header_vals[3][i].strip()
fillval = "" if issubclass(col.type, core.StrType) else "0"
self.data.fill_values.append((null, fillval, col.name))
if "long".startswith(col.raw_type.lower()):
# ensure long columns are 64-bit int, to address (Windows-specific):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment specifically targets Windows but I don't see a sys.platform check in the if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue surfaces on Windows. Conceivably, the same problem could arise in some other 32-bit platform, or possibly some non-standard build of numpy [*].

I think mapping long to np.int64 should be safe and correct in all cases.

Thoughts?


[*] When typing is not specified, ascii guesses and assigns type for a column in _convert_vals() .

For integer-like columns, it uses the converter from convert_numpy(int), which boils down to the behavior of np.array([], int). For Windows (even on 64-bit), numpy returns a np.int32 array, while for (most if not all) other platforms, it returns a np.int64 array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your actual patch is not really platform specific, then perhaps update the comment and change log to match? Currently, from the change log, it seems like your fix is only for Windows though in reality it is not confined to Windows. Hope that makes sense. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For changelog (and similarly in the comment), how about:

Fixed reading IPAC tables for ``long`` column type on some platforms, e.g., Windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine by me. Thanks for your understanding!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@orionlee
Copy link
Contributor Author

orionlee commented Feb 6, 2024

@pllim I might have a fix for the underlying #5744 . If it's fixed, this PR is no longer necessary.

@pllim pllim marked this pull request as draft February 6, 2024 18:19
@pllim
Copy link
Member

pllim commented Feb 6, 2024

Wow, thanks! Since this is already approved but there is a chance of it being superseded, I am turning this into draft to prevent pre-mature merging.

@mhvk
Copy link
Contributor

mhvk commented Feb 6, 2024

Just in case it matters, numpy 2.0 will make int64 the default even on 32-bit systems...

@orionlee orionlee force-pushed the io_ipac_long_columns_windows_compat branch from 11a6ab6 to 2ed04cc Compare February 8, 2024 17:31
@orionlee
Copy link
Contributor Author

orionlee commented Feb 8, 2024

I made the branch based on v6.0.x per @Pilim, but it looks like I can't change the PR's target to astropy:v6.0.x, and need a new PR instead.

Update: Nevermind, I figured out how to change the base to v6.0.x .

@orionlee orionlee changed the base branch from main to v6.0.x February 8, 2024 17:35
@pllim pllim added skip-basebranch-check Skip base branch check for direct backports and removed backport-v6.0.x on-merge: backport to v6.0.x labels Feb 8, 2024
@pllim
Copy link
Member

pllim commented Feb 8, 2024

Going to close/reopen to re-trigger checks. Thanks!

@pllim pllim closed this Feb 8, 2024
@pllim pllim reopened this Feb 8, 2024
@pllim pllim marked this pull request as ready for review February 8, 2024 17:45
@pllim

This comment was marked as resolved.

@pllim pllim merged commit 996d700 into astropy:v6.0.x Feb 8, 2024
27 of 31 checks passed
pllim pushed a commit that referenced this pull request Feb 12, 2024
…16005)

* ensure 64 bit int columns remain 64 bit int type on Windows (Python path)

* ensure 64 bit in columns read correctly, C code path

* Fixed tests to reflect changes from reading 64bit integers as ints

* C path int parsing: use int64_t instead of long long

* update the test to reflect the policy of defaulting to 64bit ints in all cases.

* update changelog to stress  the 64-bit by default behavior.

* forward port ipac test from #15992

* small optimization: inline int64 string parsing C function

* Tweak changelog wording and add what's new entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants