-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft: injected buffer implementation #21
Conversation
Hi Dave. I agree with the need. (It appears as a TODO in the code.) I can't really test right now, as my setup is not running. If I can't get it to run soon as I'd like it to do, I'll just run a simple temperature only setup for the sake of the tests. My objective was to do keep the data in memory and optionnaly synchronize it from time to time into a file, why not a DB ? I expected to do that in the only buffer implementation, using a variable to make it optionnal. A new class is fine, especially if we intend to have several possibilities (nothing, file, DB). You approach seems exhaustive. I mean the abstract class is populated with a lot of functions. We don't really need to provide so much right now since anyone subclassing it will be in position to modify the code anyway (not like distributing a compiled lib for external usage) but I like to do things this way myself as well. I don't integrate it as is, as I have a few comments. I'd like to provide you line by line feedback. Sort of a peer review. I'll try to come back to you asap. Thank you for your interest and participation. Please ring bell if I'm too long. |
Thanks. My end goal is have a DB implementation with soft deletes / configurable purging. I find that emoncms at the moment is too brittle, we change this engine to that engine and the various timestore implementations are not portable (unless you have a perfect running copy of emoncms and all relevant dependencies). Since I only have 5000 data points per day I don't see any reason not to buffer several weeks of RF data on a rolling basis, then I'd have that long to handle any issues and potentially just replay all of the inputs by removing the soft delete flag. It may seem a little redundant but I'm happy to think of emoncms as a post-processor being fed by a data source holding the raw inputs (the data isn't exactly going to challenge my HDD, and people running on SD cards can still use in-memory buffering). |
OK, this [edit: the in memory buffer!] is now working nicely on my live Pi. It would be nice if the buffer implementation could be moved out to the configuration file, so I'm taking a look at Python factory options now. |
This is a really cool idea the current in memory buffering works really well and extended persistent data buffering would be a great addition, |
Thanks. That's it working with the buffer implementation specified in the config file. It defaults to the InMemoryBuffer for backwards compatibility with existing config files (although installing this requires a restart, it can be done without reconfiguration) |
@pb66 the in memory version just provides temporal decoupling of the RX transmissions and onward processing. (I say "just" as though this is not crucial functionality!) With a persistent buffer it would be possible to upgrade the OEM gateway without losing any buffered transmissions, and expand this to other things like "replay all of last week's submissions" if you had some kind of issue with emoncms that wasn't picked up on for a while. Personally, I like the MySQL implementation because I'd like to have a node traffic-light board telling me exactly when transmissions were last received. This isn't possible with emoncms (or at least, doesn't seem to be accurate, perhaps because I'm having redis issues). There are other interesting queries I could do too, like finding nodes with bad RF signal by looking at the proportion of missed/irregular transmissions. |
Hmm, I've got mysql working now but I'm not sure whether to push it or not, as I don't want to make the job of reviewing this any harder than necessary. I'll settle for pasting in some query / log output: Detecting whether any nodes have reception issues (node 7 + 25) or are configured incorrectly (node 24 is sleeping for half as long):
Knowing exactly when each node last called in:
Catching up on processing after a reboot:
|
OK I reviewed your code and I like it. oemgateway.conf.distYour forgot an API key, here. I don't mind the active False -> True change. Users should review it anyway before using it. Or maybe we should set both to false. Anyway it probably needs better comments, a complete explanation. We'll deal with this later. oemgatewaybuffer.pyGreat. self.buffer = getattr(ogbi, bufferImplType)() means the use must write the real class name in the config file. I'd like to allow him to write just 'memory', 'file', 'mysql', or anything. Maybe this could be done with a function in oemgatewaybufferimpl.py module. We could also do that for the other listeners and buffers, but I don't feel such a need. Anyway, no rush, let's keep this as it is. oemgatewaybufferimpl.pyFine. As I said, you went pretty far into the decomposition of the actions, to make it as generic as possible. You probably didn't have to go that far, but anyway, I like it this way as well. Perhaps size and even isFull could be decorated @ property. discardOldestItem is in fact discardOldestItems (plural) in case several items were added in the meantime. You probably didn't test that function, as size won't work, you need self.size(), or self.size if decorated @ property. discardOldestItemIfFull -> discardOldestItemsIfFull (plural as well). Anyway, did we actually need to split those in two functions ? I don't mind. Just wondering. storeItem: if we discard before appending, we might as well discard one more item, otherwise the buffer will overflow of one item. In fact, it will also overflow if we do it the other way around, but just for an instant. Maybe discardOldestItems could discard one more item ? (We don't mind here, but for the sake of it.) removeLastRetrievedItem sounds a bit weird. Assuming the buffer is always used FIFO (strongly recommended if feeding emoncms, since some processing operations rely on sample order), we could rename functions from retrieveItem and removeLastRetrievedItem to retrieveOldestItem and removeOldestItems. Do you see a use case where we wouldn't want to retrieve the oldest item ? No newline at end of file. README.mdNeeds updating. General commentsI never liked OemGatewayBuffer as a name, since it was much more than a buffer, and actually not so much of a buffer. Since now we're moving the buffering out of it, it's even worse. Unfortunately, I couldn't come up with a better name. You probably feel the same. What do you say if we rename OemGatewayBuffer with a better name, and use this one for the actual OemGatewayAbstractBuffer and OemGatewayInMemoryBuffer ? (Or we keep AbstractBuffer and InMemoryBuffer if you prefer.) Do you have an idea for a better name ? Of course this would break the config files. But if we're to do this at some point, the sooner the better. I'm thinking renaming plus creating a new class with pretty much the same name in a file with the same name could be confusing, for instance in old forum threads. Just asking for your opinion. This makes me wonder about a better release process. Do you think I should tag current as v1 (or v0.1) and create a dev branch for what we're doing here, until we release a v2 (or 0.2) ? |
Oh, and maybe add some logging in there def discardOldestItemIfFull(self): (Also needs logger declaration at init.) |
OK, in the commit above I have:
|
Readme updated. WRT "discardLastRetrievedItem", my thinking was that it does not foreshadow implementation detail. It could be FIFO or FILO but the important thing from the perspective of the system is that the successfully submitted item is not kept in the buffer and submitted again. |
Here is my suggestion for rebranding the architecture: the old buffers become 'dispatchers', which use the new buffers. I've tested this and it works for me (including the full buffer emptying) but I won't commit any more in case you don't like this change, as we can then easily revert it. It does invalidate all deployed config files so might be worth branding as 'v2' to make it clear. Since this presumably only gets deployed on Raspberry Pis, I'd also like to debian package this repo and revise the emoncms install instructions to use this over the original module? |
Again I reflect to myself that GitHub is all the win compared with passing around code comments and git diff links by email or static document 👍 |
Excellent work! emonGateway - closest to the existing name Just a thought :-) |
Names are very important. But in theory it doesn't just do emoncms, right? It's a marshaller for (serial) data packets but it could easily be extended to read them from e.g. a DB table (or web request), and dispatch them over RFM12b or other HTTP endpoints than just emoncms. I have to admit, I do find 'oem' slightly confusing (original equipment manufacturer?) but I'm not sure what exactly would be a better name.... |
Since most of the oem stuff is prefixed "emon" my perception of "emonGateway" was gateway to the "emon" branded OEM project rather than to emonCMS in particular, I think the "emon" part is quite essential to qualify it as a core part of the project rather than an optional compatible alternative. Will this version still work on a RO environment? |
Hi. Thanks a lot for the rework ! Dispatcher is a great name. oemgateway.conf.distMaybe add bufferSize = 1000 to both buffers. oemgatewaybuffer.pyMay I suggest making bufferMethodMap an attribute to class OemGatewayBuffer ? I think we could get access to it through the class, before instanciating a buffer. This way, adding a new buffer is just about modifying oemgatewaybuffer.py, the abstraction is complete. Maybe I could do the same kind of map for the listeners and dispatchers (gotta get used to say dispatcher, now...), but I feel the class names can't be simplified, so I don't see what I'd use for names instead. Great job, overall. Logging, doc update, everything. I understand about discardLastRetrievedItem. I'm just thinking, who would do other than FIFO ? I used to... and it was a bug. But on the principle, you're right. I agree. Yes, git is nice, and having something such as GitHub, Gitorious, or equivalent really simplifies the workflow. I'll create a dev branch for this new stuff (along with @TrystanLea's #20) and probably tag current HEAD as v1. We'll have time to think of another name. I think my idea was that OEM is OpenEnergyMonitor and emon is emonCMS but this is stupid: emonTX, emonTH, etc. I should have thought twice... I agree with pb66
In theory, it doesn't do just emonCMS. In practice, it has no processing, and many other web platforms expect processed data. I already thought we could add some processing to the gateway, since the RPi has processing power for that. But this is not the design choice of OEM. |
@pb66, legacy config files result in a 1000x in-memory buffer for full back compatibility with existing installs (RO or otherwise). Obviously the MySQL option expands the requirements quite a bit - file persistance would still require an HDD. I haven't tried a flat file adapter since I was wondering if we could just use a flat-file DB engine with the same queries. |
Let's have new name discussion here: #23 |
…ateway into invertBufferDependency Conflicts: oemgatewaydispatchbuffer.py
This might be a little bit less tidy, but I want to make an initial commit while it's all working 🚀 To use this, you'll need to update your config file as appropriate to reach the following DB table:
You can now run queries like
|
Hi Dave. No problem inserting new stuff now that I created the dev branch. I probably won't test the mySQL DB part. I mean unless it was needed, I won't bother creating the DB. Besides, I run on SD Card (32 Go) only, so I tend to limit write accesses. I'll try to understand soft/hard delete another day. More generally, I don't have any high level comment. I didn't try to understand that yet. I'd like to merge cause I dont' want you stuck, and it won't hurt anyone anyway. Here are a few comments on your latest commits: I don't really understand this line, but this is debug anyway. I suppose you had a reason. self._log.debug("MySQL buffer (%s): is full? %d > %d = %s" % (self._bufferName, self.size(), self._maximumEntriesInBuffer, self.size() >= self._maximumEntriesInBuffer)) (Edit: I understand what it does, I don't understand why you'd do that.) BTW, I try to keep line length under 80 chars. This may seem anachronical. Everybody uses wide screens. I code in an xterm through SSH, and even this simple setup can display longer lines. "Fortunately", GitHub is here to recall us the pain of lines too long to be displayed. But don't bother reworking this specifically, we can do it anytime. In retrieveItem (and in general), try to keep the exception catching restricted to the exception source. I mean except MySQLdb... should come just after conn.close(). Then use else: statement for the general case where it went fine. I suppose return [-1,""] is some kind of error reporting mechanism. Why not. I guess the idea is to return an error code and a message. In fact, the python way of doing this may be to create our own exception class, and catch that in the caller. We could think of doing so. Rework what you can and rebase on dev. I can always do a little rework afterwards if I'm unhappy with the coding style, the README, or anything. It's dev branch, so no rush. You did quite a lot of rework, already, thank you for that. (Or is there a easy way for me to pick your commits in my dev branch while your pull request is on master ? I mean easier than you just changing the target.) |
Hi, If you clone your repo and add mine as an extra remote you should be able to merge this into the dev branch no problem (I'm not sure how you use git, I use SSH mode but you can easily figure out the alternative paths if you don't):
I will have a look at your other comments. It was tricky to get something working in Python at all, so there are a few rough edges around the error handling etc. as you note. |
@dave - I tried this install and it's all cloned ok, created/edited .conf & I've installed original dependencies plus python-mySQLdb, but it stalls at pi@b2d-hdd2 ~/oem_gateway $ ./oemgateway What should dbHost be set to ? tried ip & hostname with no joy |
Dave, I think you did quite well with Python. I'm using SSH as well. For pull request management, I expected to use the Githug interface, but I can also add your repo to my local clone and do as you say, it's logical. Don't bother too much with the rework. We can do that in several steps. Or I can send a pull request to your dev branch... pb66, the problem seems to be that when creating the second dispatcher, the wrong class is called, and the init method doesn't know what to do with the dbHost argument. Did you actually checkout Dave's last commit (what is the first commit appearing in git log ?) ? Could it be the dispatcher or buffer class name that is not entered correctly in your config file ? |
It should only look for that if you are creating a MySQL buffer, have you On Sun, Apr 6, 2014 at 11:24 AM, pb66 notifications@github.com wrote:
|
Regarding the error handling, let's keep the custom exception design for later. I was just trying to understand the [-1,""] return value. |
Oh yeah, sorry you're right, I had changed database to memory prior to installing python-mySQLdb and forgotten to change back. I have tired a couple of packets and it seems to work now. I did notice that the dispatchers are still created despite "active =" set to false and therefore if/when creation fails the error stops further progress, should an unused/unconfigured dispatcher be able to cause a fail ? Later today I will configure a setup that I can "switch off" the emoncms server for an hour or 2 to test the catch up. |
Yes, inactive dispatchers are created. They just dont do anything. If a user wants them not to be created, he can remove them from the config file (or comment them). At the time, I wondered if, when setting active to false, the data waiting to be sent should be trashed or kept for later. I don't remember which option I chose. It is not that important. So, to answer your question, this is not a bug, rather a design choice. It is all about what this option is for. It was meant to allow me to neutralize a dispatcher with just an easy change in the config file. It is useful for the debug. In real life, I don't see any use case. |
That's cool if it's meant to be that way. Do you think it may be advantageous for it to report the problem dispatcher, but then continue with the others for a more robust installation ? Although I caused the error during install on this occasion, would a real world fault stop any data being collated. eg power-failure causes a corrupt db or similar and then on reboot the program halts at this point and other servers do not get their data nor does anything get "buffered". It would be handy to know (although not important right now) if the data is still retained when set to false as if it does this effectively is a "pause" setting which may be a useful feature, if taking a server down for planned maint/upgrade etc it can be paused and reinstated under control after the server is back on-line rather than just relying on a successful or failed delivery. |
@pb66 - because the config is reloaded dynamically, it should really continue to operate as long as there is at least one valid buffer. I don't think that's the case though, I think it will kill the daemon (and empty the in-memory cache in the process) if there are any errors. I haven't quite got my head around the way Python does error handling. It was harder than I thought to implement the MySQL buffer because there's no compile-time type checking etc. so you actually have to run through each possibility to rule out sudden death of the daemon. That's something we could add as a separate branch / pull to the dev branch though. I think for the time being we've given @jerome-github plenty of work to do! (#20)... |
Yes, don't worry about the error handling right now. We'll think of it in a further step. |
I merged into dev, then did a bit of rework, mostly coding style related. Not sure my style is better, but I'd like to keep coherence. And I like 4 space tabs (which is also the choice on emoncms). I hope I didn't break anything... I didn't test the mySQL buffer. The mySQL buffer is included, but will probably be removed for the first release, then put back. I haven't been thinking about it, but the error handling would probably be improved by using exceptions. I'm closing the pull request, but you can still comment here on the subject. Or open a new issue. Again, it is not fully tested (at least by myself) but i didn't want to stall you. It's dev branch, anyway. |
@Dave-McCraw: I'm trying to reintroduce bulk send mode. The objective is to send several data sets at a time:
This requires emoncms >= 8.0 as it depends on a recent change made by Trystan. I just modified the code (08c500a) to use this syntax, but I still send one data set (one sample at a time), which is not really interesting... I'd need to reproduce what I had before already: a6c75ef, 2cddea5. To do so we'll probably want to modify the buffer API. |
(Sorry, pawned by Ctrl+Enter again.) We'd need to be able to retrieve several elements at a time, and delete them after sending. Again, I feel awkward about this DiscardLastRetrieved. Perhaps the buffer class could provide the following
and inside the buffer, there'd be an attribute memorizing which data was about to be sent / or which data to keep if correctly sent. Kinda like what I did here: d184d15. I can think of a way to do that with the InMemory buffer, perhaps even with the SQL one, but I'd rather ask for your opinion (and anyone else's). |
Will the buffer always be asked to supply N items and then either all N succeed or all N fail? If so, I don't think it's too hard a problem, but it would be a hassle to individually update the buffer (if, say, you ask for 100 items then send in two batches of 50, one of them fails). Another thought: what happens if one data item is corrupt and rejected by emoncms? I'm guessing it will endlessly attempt to resend that item if so, rather than giving up after a certain number of events (not specific to bulk submissions). Feel free to rename the prototype methods and I can alter the MySQL buffer to match, since that's the one I am testing with my home installation. |
Hi,
Not really a pull request, I just wanted to see if you had any thoughts on this. First time I've ever tried to do anything with Python.
I'd like to switch away from the raspberrypi module to using the OEM gateway. However since my main issue with the raspberrypi solution is instability / data loss, I'd like to back the OEM gateway with a persistent buffer (either file based or MySQL) rather than in-memory.
As a first draft, I've just tried to refactor the existing behaviour exactly into a new class, and then inject that dependency at runtime. This should mean that it's relatively easy to drop new concrete buffer implementations into place.
cheers,