Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coordinates: ICRSCoordinates should accept coordinate strings with no space #1115

Closed
adrn opened this issue May 19, 2013 · 29 comments
Closed

Comments

@adrn
Copy link
Member

adrn commented May 19, 2013

The ICRSCoordinates initializer should probably accept:

ICRSCoordinates("J061800.25+203142.5")
ICRSCoordinates("061800.25+203142.5", unit=(u.hour,u.degree))

Right now these fail for a number of reasons:

  • the preceding J
  • no space in the coordinate string
  • the angle parser can't handle angles specified without a delimiter, e.g.: 061800.25

I started working on fixing this, but got lost in the "new" pyparsing code...

@mdboom
Copy link
Contributor

mdboom commented May 20, 2013

I think the best approach will be to create a new parser that imports productions from the existing angle parser. (I don't know if this is even possible with PLY). I'll have a look at it when I have a chance (getting out from under the matplotlib release candidate cycle).

@ghost ghost assigned mdboom May 20, 2013
@eteq
Copy link
Member

eteq commented May 20, 2013

There was a long discussion about this at the time of the last coordinates revision - the agreement was to not put this in the ICRSCoordinates initializer, but rather in a "parser" function (which has yet to be made). I think that will probably change to the "high-level" class we have lined up for the 0.3 revision.

So I agree with @mdboom that we will want to use a proper parser, but I think it probably should not go in ICRSCoordinates.

@adrn
Copy link
Member Author

adrn commented May 21, 2013

Ah, I don't remember that discussion. What was the reasoning? Seems to me that, e.g. J161508.34+612351.51 is a pretty standard format?

@eteq
Copy link
Member

eteq commented May 21, 2013

I'm actually rather neutral on the topic, but my understanding of the decision is that the class constructors were to have relatively straightforward syntax, without too much "magical" parsing - basically anything that can be just split off and delegated to the Angle constructor. Any more complicated parsing (admittedly, a somewhat arbitrary distinction) would be left to the "parsing" function.

The idea, I think, was to avoid "parsing creep" - that is, there are certainly other slightly more obscure ways of writing down coordinates, and we don't want to just slowly add more and more formats until there's no distinction between the constructor and parser.

Another thing that occurs to me now, though: What do we do if a user does FK4('J161508.34+612351.51')? The "J" means J2000 equinox , but FK4 is normally un the B1950 equinox. So does that mean "transform form J200 to B1950" or "use these numbers as the coordinates, and the J just indicates the format"? Whereas a parser function of the "high-level" class we're talking about now would be able to assume an appropriate system without those ambiguities.

@mdboom
Copy link
Contributor

mdboom commented May 21, 2013

My view on this as a non-astronomer is this: if there are file formats that include coordinates in this way (without the space and the J prefix etc), then we should try to support that. Otherwise, supporting Python/Numpy numbers is preferable to supporting all kinds of magical strings.

Once we make a determination on that, I'm happy to look into building a coordinate parser if we decide to go that way.

@eteq
Copy link
Member

eteq commented Jun 11, 2013

My similar but slightly different take on @mdboom's version is this: if there are papers that give coordinates in this way, then we should support it (in addition to file formats, as @mdboom says).

By that metric this should be in there, because you do sometimes see "J061800.25+203142.5" or similar in a paper without the coordinates given in some format (usually as a passing reference to some particular object, but still).

But I would still say this should all go in the "higher-level" class we have on the drawing board for v0.3 (which would sumsume the parsing function we discussed earlier), rather than the at the level of Angle or the current coordinate classes.

@taldcroft
Copy link
Member

Don't forget that format ambiguities are common in well-known catalogs, e.g. IGR J01234-2338 (is it 0h 12m 34s or 01h 23.4m? answer is the latter), or PKS 0123+082 (that's 8.2 degrees, not 82 degrees). I think the original case from @mdboom is unambiguous, but if you actually go through all papers and source catalogs in astronomy you will find that astronomers have been creative with source naming when it comes to including the coordinates in the name. This mostly applies to older formats with fewer digits as shown above.

@eteq
Copy link
Member

eteq commented Jun 18, 2013

That's a good point, and is part of why I'm thinking this should go in the higher-level class/parser function. Then we have one place where you add code to do things like checking if "PKS" is at the start of the string, and if so, interpret the coordinates appropriately. Then the (likely rather confusing) parsing code can be kept separate of the actual coordinate-handling code.

@astrofrog astrofrog modified the milestones: v1.0.0, v0.4.0 May 8, 2014
@eteq
Copy link
Member

eteq commented May 14, 2014

@adrn - this issue isn't relevant as-is, because ICRSCoordinates no longer exists. For 1.0, this sort of capability will probably be subsumed into SkyCoord either as class methods or with more intelligent parsers.

But the issue still technically exists in that it is on SkyCoord now. So I'll live it to you whether you want to re-name this issue and leave it open, or just close it - either way is fine with me.

@mhvk
Copy link
Contributor

mhvk commented Jul 11, 2014

@adrn, @eteq - came across this while looking for scipy spring ideas. My 2¢ on this is that we should support decoding all IAU-allowed names. That includes all examples given above. It may be best to first write the parser and then see how to use it in SkyCoord.

@taldcroft
Copy link
Member

I think we need a more general and pluggable infrastructure to handle parsing string coordinates. We already have the case of SunPy wanting to parse strings like N02W22. I'm not convinced that such parsing (e.g. of all IAU-allowed names) should be an automatic default within SkyCoord or a separate function, but we can discuss.

@eteq
Copy link
Member

eteq commented Jul 11, 2014

I agree with @taldcroft here - this was one of the major motivations for SkyCoord: to allow it to implement this sort of parsing, while the lower-level coordinate tasks are done by the frame classes.

That said, @mhvk may be right that the easiest way to proceed is to start with the parser and then add it to SkyCoord after it's working. It might make sense to look at how Angle implements its parser (using ply from astropy.extern.ply)

@astrofrog
Copy link
Member

Closed by #2920

@astrofrog astrofrog reopened this Jan 20, 2015
@astrofrog
Copy link
Member

Sorry, this issue is not completely addressed by #2920 (there was a comment there that said it closed this issue). The strings here are still not parsed, and I'm not sure if we decided we should or not. There was a comment above by @taldcroft that we should support IAU names at least. Removing the 1.0 milestone.

@astrofrog astrofrog removed this from the v1.0.0 milestone Jan 20, 2015
@bsipocz
Copy link
Member

bsipocz commented Jan 20, 2015

I've looked at these formats while doing #2920. The first one is easy, one must really need to use the unit as both u.deg or u.hour can be valid unit of RA for this format.

Regarding the second, I wasn't sure whether to include them with hard wired fix lenght as HHMMSS.ss+DDMMSS.s and (D)DDMMSS.ss+DDMMSS.s.

The current example is OK, but it can be quite ambiguous when one says "000010.25+000055.3" as it can be parsed as a decimal u.hour and u.deg. However no one would write that many zeros in front of a decimal value...

btw is there an official IAU format/name list somewhere? I only found examples of supported and avoidable formats, but nothing comprehensive.

@adrn
Copy link
Member Author

adrn commented Jul 3, 2016

@bsipocz @astrofrog This IAU spec document states that RA is always specified in HMS, so now I don't think there is a unit ambiguity. In this document, it doesn't look like there is much flexibility and this could just be implemented as a regex... The relevant text from the linked document:

  • Equatorial Coordinates shall always be preceded by J if they are for the standard equinox of J2000.0 (i.e., IRCS position or FK5-based, Julian equinox 2000.0 system). They should be preceded by B if they are for the old standard equinox of B1950.0 (i.e., Bessel-Newcomb FK4-based, Besselian equinox 1950.0 system). Galactic coordinates shall be preceded by a G
  • Coordinates shall be specified as LLL.ll+BB.bb or LLL.ll-BB.bb for galactic coordinates, and as HHMMSS.ss+DDMMSS.s or HHMMSS.ss-DDMMSS.s for equatorial coordinates (without spaces)
  • Coordinates shall be truncated (not rounded), thus defining a unique (small) field on the sky in which the source is located. The truncation should also operate when the right-most digit represents a decimal part. The right-most digit of the field HHMMm should be computed as m=int(SS/6)

I think this regex captures everything outlined in the document:

(\s|^)([JB]{0,1}(\d{6}|\d{4}|\d{2})(\.\d+){0,1}[\+\-](\d{6}|\d{4}|\d{2})(\.\d+){0,1})|(G\d{3}\.*\d*[\+\-]\d{2}\.*\d*)

e.g., try it on these examples:

J061800.25+203142.5
SDSS J061800.25+203142.5
RX J1426.8+6950
PSR J1302-6350
PN G001.2-00.3
QSO B004848-4242.8
QSO B004848-4242
SDSS 061800.25+203142.5
B004848-4242.8
061800.25+203142.5

and these should fail:

QSO 00484-4242 
GRO J317-85

@mhvk
Copy link
Contributor

mhvk commented Jul 4, 2016

Yes, we should still have a parser! But be sure to also capture odd numbers of digits to get coordinates like QSO^0048-427

If the number of digits is odd and fewer than six, the right-most digit represents a decimal part of hours, degrees or minutes (as, e.g., in the PKS-style HHMM+DDd or in IRAS source designation HHMMm+DDMM) and not tens of minutes or seconds (e.g.. the formats HHMMS or +DDM should be avoided).

@adrn
Copy link
Member Author

adrn commented Jul 5, 2016

I'll have to look in to implementing this in a parser -- I have no experience with that and don't understand @mdboom 's original comment!

I think the best approach will be to create a new parser that imports productions from the existing angle parser. (I don't know if this is even possible with PLY).

@bsipocz
Copy link
Member

bsipocz commented Jul 5, 2016

@adrn - If you prefer I can look into this and do PR in the weekend, as I recall in #2920 my major problem was that there wasn't a complete list what needed to be accepted and what shouldn't.

@adrn
Copy link
Member Author

adrn commented Jul 6, 2016

@bsipocz aha! I think we just need to modify the code from that PR (#2920) slightly to handle all cases that are described in the document above. If you feel up to it, go for it! Otherwise I can take a crack at it in a few days.

@bsipocz
Copy link
Member

bsipocz commented Jul 6, 2016

@adrn - Not before the weekend, but then I probably can do it.

@adrn
Copy link
Member Author

adrn commented Aug 10, 2016

Pinging on this! @bsipocz

I can give it a try if you are swamped.

@taldcroft
Copy link
Member

I'm good with the general idea of this now. I just have to laugh/cry when I see the contortions that ensue from this insistence on sexagesimal. Is it the intent here to fully implement this IAU spec?

Coordinates using an even number of digits (in either RA or Dec), fewer than seven, are expressed in the sexagesimal system. The sequences HHMM.mm or DD.dd where mm and dd are decimal parts of a minute or degree, respectively, should be avoided. If the number of digits is odd and fewer than six, the right-most digit represents a decimal part of hours, degrees or minutes (as, e.g., in the PKS-style HHMM+DDd or in IRAS source designation HHMMm+DDMM) and not tens of minutes or seconds (e.g.. the formats HHMMS or +DDM should be avoided). If the number of digits is more than six, the digits in excess of six are decimal parts of seconds of time for RA or of angle for Dec ; explicit use of the decimal points is encouraged (e.g., HHMMSS.ss or DDMMSS.s).

@mhvk
Copy link
Contributor

mhvk commented Aug 18, 2016

The contortions mostly come about because both radio and X-ray astronomers had so many coordinates of poor quality... Anyway, I do think if we implement the specification, we might as well implement it fully. (Though we cannot avoid that, surely, people regularly will continue to assume coordinates are rounded rather than truncated...)

@adrn
Copy link
Member Author

adrn commented Jan 31, 2017

Reviving this thread! @bsipocz are you still interested in trying this?

@bsipocz
Copy link
Member

bsipocz commented Jan 31, 2017

@adrn - Indeed, this slipped through multiple times. I try to get back to it this week.

(and I wish github would finally introduce a personal todo to help to avoid cases like this...)

@astrofrog astrofrog assigned bsipocz and unassigned mdboom Jan 31, 2017
@bsipocz bsipocz added this to the v2.0.0 milestone May 7, 2017
@bsipocz bsipocz removed this from the v2.0.0 milestone Jun 26, 2017
@adrn
Copy link
Member Author

adrn commented Sep 18, 2017

👋 O hai @bsipocz

@bsipocz
Copy link
Member

bsipocz commented Sep 18, 2017

ouch

@adrn
Copy link
Member Author

adrn commented Dec 3, 2018

This now works through SkyCoord.from_name, e.g.,
coord.SkyCoord.from_name("J194428-420209", parse=True)

@adrn adrn closed this as completed Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants