Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

Add 9am & 3pm state bulletins #21

Closed
mpadge opened this issue Jun 22, 2017 · 11 comments
Closed

Add 9am & 3pm state bulletins #21

mpadge opened this issue Jun 22, 2017 · 11 comments

Comments

@mpadge
Copy link
Collaborator

mpadge commented Jun 22, 2017

Adam, Hugh, and Keith, I'm excited about reviewing this package, and will take this opportunity to confess that i worked as a BoM forecaster for a couple of years so consider myself well-suited for the job. The first thing I wondered on checking the package out was why it didn't extract the daily (09:00, 15:00) weather bulletins? These data represent the key archived observations for the BoM on which more things depend than any other set of data. They'd be easy to extract, so ... could you add these please?

Example here

Things to consider:

  1. This appears to me a fairly striking omisson that you could readily address prior to me submitting my review, but if you prefer to keep the review process free of complications, this issue could simply remain until post-review. (There may also be more to come.)
  2. I've often wondered, and don't really know, what the status would be of somebody archiving themselves these daily obs? The BoM ultimately charges for dissemination of the full historical data set, so there may be some kind of infringement issue that might need to be considered?
@adamhsparks
Copy link
Collaborator

@mpadge,
Thanks for your enthusiasm. Originally this started as a script for another project @Keith-Pembleton and I are working on, https://github.com/ToowoombaTrio/WINS, which only needed the forecast as we were drawing the observed data from SILO. I added the ag bulletin since it was also in an XML format.

@HughParsonage kindly added the current weather after finding the package.

I'll confess, I'd not really considered adding the 9am & 3pm state bulletins, but we can certainly look into it. Are they only a webpage as you've linked to? No json or xml interface?

About the infringement issues, I'm not keen to archive any data with the package, only the metadata about the stations themselves, and provide facilities to retrieve and parse the weather data. But we can check with BoM to be safe. Right now I figured we were since we're retrieving what they are putting out as json and xml...

@HughParsonage
Copy link
Collaborator

Extracting the 9am and 3pm state bulletins directly would be difficult and I think should be out-of-scope for this package. The data is already captured in get_current_weather, and I foresee that maintaining the function to extract the unstructured page in perpetuity would be a task. Although obviously important and central, that page is intended for internet browsers -- i.e. human, not machine, readers. We could just read the data and print it to the console line-by-line, but that seems not much better than just printing the URL.

On the archiving question, your point is interesting and the BoM seems silent on its copyright page. My guess would be it would be very similar to Google's API Terms of Service: you can access the API we provide, you can even charge others to access it, but you can't store the data (or, obviously, compete with the Bureau by selling the data).

@mpadge
Copy link
Collaborator Author

mpadge commented Jun 22, 2017

@HughParsonage The last thing I want to do is start the review process off on a bad note, but ... the bulletins represent the single major twice-daily data output of the entire operation that is the BoM, and a package called bomrang should surely incorporate this? Current weather is not the same, because different stations have differen reporting schedules, yet the only two things they must all do are the 9 & 15 obs. These are also the most detailed obs of the day, and the bulletins are the only way that the BoM disseminates a national weather snapshot.

As for extracting it: it's a simple html table, very easy to scrape, and it looks like it would be very cleanly structured. The format hasn't really ever changed since ... the invention of the internet. Loads of packages do data scraping of far messier sites. My own bikedata package has to pull in some pretty messy data, and includes checks to ensure appropriate fails when and if html or other data structures change. It really would be quite simple, and ought to be able to be done with really very little code.

An alternative: I'd be perfectly happy to help post-review, if you'd prefer just to leave this issue until then. If you were to simply give confirmation that you/we would do it down the track, then i'd be happy to simplly state that it's a non-issue coz you're working on it.

@adamhsparks
Copy link
Collaborator

@mpadge, I think your proposition that you help after the review would be the most ideal. I understand @HughParsonage's objection to scraping a webpage, I have the same reservations as well, but based on your experience I'm willing to give it a go. My first thought was that xml2 could likely parse the page since it is XHTML and it's what I used for the précis and ag bulletin.

I also know that BoM is working on updating the APIs to access the data, who knows when that will come to fruition though. But I've requested to be an early tester before they are released to the general public. So I guess there is no guarantee that the current observations or the forecasts and bulletins will remain in the same format either. This isn't any different.

@adamhsparks
Copy link
Collaborator

I had a chat with @Keith-Pembleton today about this. As I understand it, BoM data should be available through SILO already under Creative Commons, but isn't yet. The plan is in place, but it's just not yet available. That's going to be a well structured data set that we can use for this in the future when that's actually in place.

@adamhsparks
Copy link
Collaborator

adamhsparks commented Aug 11, 2017

@mpadge, I think we're close to getting all of the points addressed that were raised by you and @geanders during the review process.

Do you want to get started on adding the 9 & 3 bulletins so that we can include this functionality in the JOSS paper submission?

@mpadge
Copy link
Collaborator Author

mpadge commented Aug 11, 2017

Oh, I didn't realise you wanted that functionality in prior to the JOSS submission. Yeah, I guess I could do it; will just have to find time. Maybe the week after next (21-25 Aug)?

@adamhsparks
Copy link
Collaborator

adamhsparks commented Aug 11, 2017

We don't have to, I was just offering thinking it would be good to have you on the author list as well (if you'd like).

@mpadge
Copy link
Collaborator Author

mpadge commented Aug 11, 2017

thanks adam! yeah, that'd be great, but only if 1-2 weeks isn't too long to wait

@adamhsparks
Copy link
Collaborator

Yeah, no worries, I think it's worth it. :)

@adamhsparks
Copy link
Collaborator

Closing now as this has been added by @mpadge.

Thank you Mark!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants