Add 9am & 3pm state bulletins #21
Comments
@mpadge, @HughParsonage kindly added the current weather after finding the package. I'll confess, I'd not really considered adding the 9am & 3pm state bulletins, but we can certainly look into it. Are they only a webpage as you've linked to? No json or xml interface? About the infringement issues, I'm not keen to archive any data with the package, only the metadata about the stations themselves, and provide facilities to retrieve and parse the weather data. But we can check with BoM to be safe. Right now I figured we were since we're retrieving what they are putting out as json and xml... |
Extracting the 9am and 3pm state bulletins directly would be difficult and I think should be out-of-scope for this package. The data is already captured in On the archiving question, your point is interesting and the BoM seems silent on its copyright page. My guess would be it would be very similar to Google's API Terms of Service: you can access the API we provide, you can even charge others to access it, but you can't store the data (or, obviously, compete with the Bureau by selling the data). |
@HughParsonage The last thing I want to do is start the review process off on a bad note, but ... the bulletins represent the single major twice-daily data output of the entire operation that is the BoM, and a package called As for extracting it: it's a simple html table, very easy to scrape, and it looks like it would be very cleanly structured. The format hasn't really ever changed since ... the invention of the internet. Loads of packages do data scraping of far messier sites. My own An alternative: I'd be perfectly happy to help post-review, if you'd prefer just to leave this issue until then. If you were to simply give confirmation that you/we would do it down the track, then i'd be happy to simplly state that it's a non-issue coz you're working on it. |
@mpadge, I think your proposition that you help after the review would be the most ideal. I understand @HughParsonage's objection to scraping a webpage, I have the same reservations as well, but based on your experience I'm willing to give it a go. My first thought was that xml2 could likely parse the page since it is XHTML and it's what I used for the précis and ag bulletin. I also know that BoM is working on updating the APIs to access the data, who knows when that will come to fruition though. But I've requested to be an early tester before they are released to the general public. So I guess there is no guarantee that the current observations or the forecasts and bulletins will remain in the same format either. This isn't any different. |
I had a chat with @Keith-Pembleton today about this. As I understand it, BoM data should be available through SILO already under Creative Commons, but isn't yet. The plan is in place, but it's just not yet available. That's going to be a well structured data set that we can use for this in the future when that's actually in place. |
Oh, I didn't realise you wanted that functionality in prior to the JOSS submission. Yeah, I guess I could do it; will just have to find time. Maybe the week after next (21-25 Aug)? |
We don't have to, I was just offering thinking it would be good to have you on the author list as well (if you'd like). |
thanks adam! yeah, that'd be great, but only if 1-2 weeks isn't too long to wait |
Yeah, no worries, I think it's worth it. :) |
Closing now as this has been added by @mpadge. Thank you Mark! |
Adam, Hugh, and Keith, I'm excited about reviewing this package, and will take this opportunity to confess that i worked as a BoM forecaster for a couple of years so consider myself well-suited for the job. The first thing I wondered on checking the package out was why it didn't extract the daily (09:00, 15:00) weather bulletins? These data represent the key archived observations for the BoM on which more things depend than any other set of data. They'd be easy to extract, so ... could you add these please?
Example here
Things to consider:
The text was updated successfully, but these errors were encountered: