Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

METAR parsing #70

Closed
dopplershift opened this issue Jul 2, 2015 · 19 comments
Closed

METAR parsing #70

dopplershift opened this issue Jul 2, 2015 · 19 comments
Assignees
Labels
Area: IO Pertains to reading data Type: Feature New functionality
Milestone

Comments

@dopplershift
Copy link
Member

Add METAR parsing to facilitate replacing the old netcdf-perl decoders for NOAAPORT METAR messages.

@dopplershift dopplershift added the Type: Feature New functionality label Jul 2, 2015
@dopplershift dopplershift self-assigned this Jul 2, 2015
@akrherz
Copy link
Contributor

akrherz commented Jul 24, 2015

My current source of this functionality is https://github.com/phobson/python-metar

my only tweak of the library is to attempt to be more aggressive with parse failures it encounters, so to get more data. I take the fragment it complains about and cull it from the original METAR and then attempt to try again.

try:
    mtr = Metar(clean_metar, allexceptions=True)
except MetarParserError as inst:
    io = StringIO.StringIO()
    traceback.print_exc(file=io)
    errormsg = str(inst)
    if errormsg.find("Unparsed groups: ") == 0:
        tokens = errormsg.split(": ")
        newmetar = clean_metar.replace(tokens[1].replace("'", ''), "")
        if newmetar != clean_metar:
            reactor.callLater(0, process_site, orig_metar, newmetar)

@dopplershift
Copy link
Member Author

Need to make sure we handle this case from THREDDS bug report (TDS-685):

"""
I confirmed this error recently with an observation from KPSX during Tropical Storm Bill. The METAR correctly shows the peak wind speed is 50 knots, but the IDD encoded field header indicates that it is 50 meters per second, which is incorrect.

KPSX 162341Z AUTO 16031G43KT 2SM +RA BR SCT012 BKN018 OVC033 26/25 A2967 RMK AO2 PK WND 15050/2320 VIS 1V4 LTG DSNT NW RAB20 P0006 T02610250 $
"""

@akrherz
Copy link
Contributor

akrherz commented Sep 17, 2015

@dopplershift Is the thought to write your own parser or ?

>>> from metar import Metar
>>> m = metar('KPSX 162341Z AUTO 16031G43KT 2SM +RA BR SCT012 BKN018 OVC033 26/25 A2967 RMK AO2 PK WND 15050/2320 VIS 1V4 LTG DSNT NW RAB20 P0006 T02610250 $')
>>> m.peak_wind()
'SSE at 50 knots at 23:20'

@dopplershift
Copy link
Member Author

Maybe? I'll evaluate python-metar first, but usual tradeoffs of adding a dependency apply. Certainly, I think MetPy should provide functionality of reading a METAR out of the box--the question becomes how much re-invention takes place.

@dopplershift dopplershift added the Area: IO Pertains to reading data label Jan 26, 2016
@dopplershift dopplershift modified the milestone: SEA Conference Jan 26, 2016
@dopplershift dopplershift added the Status: Need Info More information is required to act on the issue label Jan 30, 2016
@dopplershift dopplershift removed this from the SEA Conference milestone Apr 4, 2016
@dopplershift dopplershift added this to the Fall 2016 Release (0.4) milestone Sep 14, 2016
@dopplershift dopplershift removed the Status: Need Info More information is required to act on the issue label Sep 14, 2016
@dopplershift dopplershift modified the milestones: Fall 2016 Release (0.4), Winter 2017 Release (0.5) Oct 14, 2016
@dopplershift
Copy link
Member Author

Should probably look at a smarter text parsing solution than regular expressions. Options:

  • state machines, maybe with pynini
  • Actual parsing engine, like ply

@jrleeman jrleeman removed this from the Spring 2017 milestone Mar 10, 2017
@dopplershift dopplershift modified the milestone: Summer 2017 Mar 10, 2017
@dopplershift dopplershift modified the milestones: Summer 2017, Fall 2017 Jul 19, 2017
@jrleeman jrleeman modified the milestones: Fall 2017, Winter 2017 Oct 26, 2017
@jrleeman jrleeman removed this from the 0.7 milestone Nov 15, 2017
@dopplershift dopplershift added this to the 0.9 milestone May 18, 2018
@akrherz
Copy link
Contributor

akrherz commented Jul 20, 2018

@dopplershift Is this item still intended to be done for release 0.9? I would like to help out, but don't think I am quite clever enough to implement what you have in mind. If you could put some scaffolding in place, perhaps I could take up the task from there and see if I can grok what you are doing :)

@dopplershift
Copy link
Member Author

So I'm not sure if it's going to make 0.9, but it is priority 1a once I get to work on my own code. What I'm doing is looking at using a proper parser, Canopy, which can generate the Python code from a language spec. Here's what I have for METAR so far:

grammar METAR
ob       <- metar? sep siteid sep datetime sep wind sep vis sep skycover sep temp sep remarks end
metar    <- "METAR"
sep      <- " "
siteid   <- [0-9A-Z] [0-9A-Z] [0-9A-Z] [0-9A-Z]
datetime <- [\d]+ "Z"
wind     <- [\d]+ "KT"
vis      <- [\d] [\d] [\d] [\d]
skycover <- cover (sep cover)*
cover    <- ("SCT" / "BKN" / "OVC") [\d]+
temp     <- [\d] [\d] "/" [\d] [\d]
remarks  <- [^=\n]*
end      <- "="

I'd be happy to have this expanded to parse the rest of the METAR syntax--though my plans were NOT to be exhaustive about the remarks section, but only pull out the important information.

From there it's "just" a matter of using the parser to parse files and put the results into a Pandas data frame. (And for me then put into a CF-compliant netCDF file.)

@dopplershift dopplershift modified the milestones: 0.9, 0.10 Jul 26, 2018
@David-Gil
Copy link

David-Gil commented Jul 31, 2018

Hi! Is this canopy parser somewhere I can track/contribute to its progress?

At a first glance I can see some issues with it. For example, the cover can also be "FEW", among other options I can't recall right now.

Besides, metar can also be "SPECI" or "METAR COR".

Edit: from the code of the WIP PR I've found I can see that you're completely aware of the issues I've mentioned 😅. Anyway, I am extensively working with metars right now, and trying to find a good way to parse them, so if I can be of any help, just ask.

Looking forward to your response! 😊

@dopplershift
Copy link
Member Author

I'll see about putting up a PR/branch with the progress I have so far. At that point, contributions can be done via additional PRs to that branch.

And oh yeah, what I posted above is little more than a proof-of-concept that parsed the single METAR I was working with on the plane. 😁

@David-Gil
Copy link

I'm sorry! Next time I'll check who I'm talking to and where they work 😅🙈🙈🙈

Please let me know when the branch is available and I'll check it out. I have to make monthly statistics of the metars of two airports. I have plenty of metars to parse.

I work as a programmer at the Spanish weather service, AEMET.

Cheers!

@dopplershift
Copy link
Member Author

No worries! Thanks for the interest!

@jrleeman jrleeman removed this from the 0.10 milestone Dec 12, 2018
@eliteuser26
Copy link
Contributor

eliteuser26 commented Jan 11, 2019

You can do the same type of decoding with regular expressions by using the same coding used for weather stations on matplotlib maps in Metpy. I don't know if it makes sense. I think this is what you were mentioning above.

@eliteuser26
Copy link
Contributor

eliteuser26 commented Apr 4, 2019

@dopplershift The decoding that I use for Metar observations was done in PHP code. However the code can be adapted to use Python instead which would be easier to code than PHP. The result of decoding Metar observations can be seen in PHP at this web site:

http://cdicaire.synology.me/weather/wxobs.php

If you have trouble seeing this web page I can send a snapshot of the web page. After thinking about it long and hard, it took some time to come up with the code. In Python, I would develop a Metar class much in the same fashion as Metpy. Then I would separate the Metar code in two parts: the current weather observations and the remark section. The way to decode the weather observations is to look at it as weather groups just like the WMO code by splitting the groups by using the string split function. Weather groups would comprise, for example, of station ID, date and time, wind direction and speed, visibility and precipitation/obstruction and temperature/dewpoint. Sea level pressure would be extracted from the remark section. Then each group would be transferred to its corresponding function in the Python class to decode the information and named it in the appropriate class variables. Then I can attach the pint units to be used in the Metpy code and station plot. I haven't created the Python code just yet but I will use the one created in PHP as a template. This seems to work quite well to my surprise. I always wanted to work on decoding Metar observations in an easy way. The difficult part would be to decode the precipitation as you know is an obvious puzzle. I can develop a test code to see if my theory would work.

I would need to use regular expressions or startswith/endswith python functions to decode the weather elements.

@dopplershift
Copy link
Member Author

So I'm really interested in using an actual parsing framework to handle the METAR. However, this is what I have thus far:

grammar METAR
ob       <- metar siteid datetime auto wind vis curwx skycover temp altim remarks end
metar    <- (("METAR" / "SPECI"))?
sep      <- " "+
siteid   <- sep? [0-9A-Z] [0-9A-Z] [0-9A-Z] [0-9A-Z]
datetime <- sep [\d]+ "Z"
auto     <- (sep "AUTO")?
wind     <- sep ([\d]+ / ("VRB" [\d] [\d])) ("G" [\d]+)? ("KT" / "MPS") varwind?
varwind  <- sep [\d] [\d] [\d] "V" [\d] [\d] [\d]
vis      <- sep (([\d] [\d] [\d] [\d]) / ([\d] ([\d] / ((" " [\d])? "/" [\d]))? "SM") / "CAVOK")
curwx    <- (sep wx)*
wx       <- ([-+] / "VC")? ("FZ" / "TS" / "SH")? ("FG" / "DZ" / "RA" / "BR" / "TS" / "SN" / "HZ")
skycover <- (sep cover)*
cover    <- (("FEW" / "SCT" / "BKN" / "OVC") [\d]+ ("TCU" / "CB")?) / "CLR"
temp     <- sep [M]? [\d] [\d] "/" [M]? [\d] [\d]
altim    <- sep [QA] [\d] [\d] [\d] [\d]
remarks  <- (sep "RMK" .*)?
end      <- sep? "="

I'm not sure if that's actually any better than what you're proposing. So I guess it's a matter of how messy the code is and how slow it is (or not). In the end we need code we can maintain that can convert a set of METAR observations into a Pandas DataFrame (or maybe an Xarray Dataset).

@eliteuser26
Copy link
Contributor

eliteuser26 commented Apr 9, 2019

I was thinking along the same way that I would decode each group in the weather observations in his respective function in a class. The idea was to create separate parameters that can be passed back into Metpy and add pint units accordingly. This is what I had in mind for each parameter:

Class metar
metar.stationid
metar.date
metar.time
metar.temperature
metar.dewpoint
metar.visibility
metar.winddirection
metar.windspeed
metar.pressure
metar.remark
metar.wxobs

As for the names, it doesn't matter what I use. I will just adapt the names to whatever is being proposed. I know the code has been done in PHP but I was able to decode the information in 3.5 seconds for 8 stations (0.4 seconds per station). I always work on the performance part of the Python code to run it as fast as possible. This is how I create my own code. If the code can be substituted and run faster then I will use that code. I have found different way to parse the Metar information in Python. I am also thinking of possibly decode more than one Metar at a time by doing a count. I don't have any example to show yet but I thought about it how to decode the weather observation and possibly converted to a Pandas dataframe format. I will work on it as I will use in my own program to test it. I can also add the additional weather element that you want in your list.

@eliteuser26
Copy link
Contributor

eliteuser26 commented Apr 9, 2019

@dopplershift I have another question to ask. Do you want to decode the Metar weather observations as undecoded (raw Metar codes), decoded format or both? This will make a difference in how to decode the Metar in the Python code. In my previous note, the Metar weather observations would be passed to the Metar class which would be decoded in the different functions. First the different weather group in the Metar would be separated with the split function in Python by using the space delimiter and parse each group to the proper element. This is very similar to the parsing you have suggested but done in a different way. Then each element in the Metar observation could be saved directly into a Pandas dataframe.

As for the aviation weather code, I found an interesting table for the present weather symbols which shows the weather phenomena matrix table at this address:

http://www.moratech.com/aviation/metar-class/metar-pg9-ww.html#DECODE

This will answer questions about whether it is a good or bad Metar observations. From my own knowledge I haven't seen more than 3 precipitation types in a Metar.

@dopplershift
Copy link
Member Author

I would just go with 3 columns for the current weather, once for each of the maximum number of allowed types. I wonder if it's worth having both the decoded (into WMO codes for easy symbol plotting) and the raw (for easy introspection).

@eliteuser26
Copy link
Contributor

eliteuser26 commented Apr 10, 2019

@dopplershift I think that you have answered my question. If it is only for easy symbol plotting then there is no need to keep the raw data. The reason I kept the raw data was for verification purposes. I was going to create python dictionaries to decode the weather element into the appropriate WMO codes as you do in some of the Metpy classes. The first thing I would need to do is to decode the Metar weather observations into the Pandas dataframe or an Xarray dataset. Then use this data to produce a Matplotlib plot in Metpy. I think this is the idea that you are suggesting. If I didn't get it right, I can adjust the Python code accordingly.

@dopplershift dopplershift added this to the 0.12 milestone Dec 24, 2019
@dopplershift
Copy link
Member Author

Done with #1081.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: IO Pertains to reading data Type: Feature New functionality
Projects
None yet
Development

No branches or pull requests

5 participants