-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add command for checking if job is still in queue #81
Comments
Hmm... I think that in order to support this (and to do so efficiently) it would require metadata for every job twice. The problem is that the jid is part of the key mapping for all non-strict-time-based storage, i.e. jobs in the "working" or "failed" state, etc (anything that uses The only thing that I can think of that would allow you to do this would be creating a secondary column family in rocks to store job "states" and then do some sort of synchronized update between that column, the queue, and all the other areas we store job info. That being said, I'm not quite sure what the point of storing state like this is in a work queue. I would imagine you would have much better control doing some sort of completion callback in your client itself--which is a pretty common pattern that I've used working with both sidekiq and systems like kafka. Otherwise, if you're looking for a mechanism to debug stuff or monitor the system, couldn't you just use the web ui? |
I'm not sure what you mean by "storing state"? I'm not proposing that the system should store any more state than it already does. Rather, there should be a way to find out whether a given job is in a queue (i.e., has not yet been completed by a worker), or not. Callbacks are a pain, especially for simpler applications, because they require another messaging fabric to maintain information that the work server already has. |
What I mean is that, as it is, to check to see if something with a In other words--in order to be efficient about the process and not slow down the rest of the system, you'd likely need to have a secondary data store that allows you to quickly retrieve jobs that are currently enqueued by I guess I still don't quite get your callback point. Why not just enqueue another job as a callback to process asynchronously? It seems a lot easier than scheduling another thread in your application that scans a stored set of |
Callbacks would likely be implemented as with Sidekiq Pro's Batches: execute this other job when a group of jobs are done. Monitoring a single job can be done as a batch of 1 but may seem odd. |
@andrewstucki Consider the case where I have a UI for uploading and transcoding videos. The user uploads a video, and the UI should show when the video finished transcoding. The transcoding job is sent to Faktory. A trivial way to implement this is to have the client poll for when the job is no longer in the queues. Doing this with callbacks is significantly more complicated. You could do it by introducing another storage system where I record that the transcoding is finished, and then have the client poll that instead, but that introduces some real complexity into both the worker and the client that simply isn't necessary. Your observation about |
@jonhoo Definitely a good example for me to see your point. Well taken. That said, I do think that the only way of doing this without running into performance issues/complications is to store information that tracks the current state (the things in The thing that would make this difficult is:
Along with this comes the approximate doubling of your storage space because of the need for duplicate entries for every job in the system. I say this not to rule out the possibility of implementing this, but just to outline the inherent complexities that I see with doing this in a scalable way. I'm sure there are alternatives that I haven't thought about, but just figured I'd give my $0.02. Ultimately I'm not the maintainer of this project though, so it's really not up to me 😃 . |
So, if I understand correctly, the reason this is tricky stems from the fact that we can't do cross-datastructure transactions (like you can in a traditional RDBMS). Well, then we'll need a protocol. Here's a proposal for a protocol that I think would work. I don't know much about the exact datastructures provided by RocksDB, but I believe it should be general enough to fit. This relies on the fact that Faktory does not (and can not) provide exactly-once execution of jobs, and I believe the overhead should be minimal: Keep two data-structures: a heap of On
On
On
On
On recovery after failure:
Let's see why this works:
An aside about performance: The number from #80 suggest that Faktory can currently do ~10k ops/second. There is just no way we're limited by the client in the |
Quickly to your first point, and then moving on to your second point: Point 1It's not about cross datastructure transactions, you have transactions that can write across multiple column families in Rocks. It's about keeping operations to the underlying Rocks storage fast--the more operations you introduce in a single high-level command, the slower your throughput is going to be, i.e. a Additionally, as I pointed out storage space would double. You could alleviate a lot of this by doing all of these state look-ups in memory, but I'm not sure how that would work with restarts. Also, I'm still unsure as to what purging old state would look like. Point 2So, I'm not quite sure how you're getting your estimates, but they seem fairly unrealistic IMO. Not to mention network overhead, there's also
I wrote up a couple of benchmarks to show some closer-to-raw performance without network overhead here. This is what I get.
Also from RocksDB's own benchmarking. Seeing a load of 1 billion keys bulk sequential keys with the WAL enabled happen in 36 minutes means that the operation is closer to ~460k writes a second. Looking at the Faktory benchmark, the write performance is about half that. I'm not sure if that's due to fsync operations, bloom filter configuration, the space of each key/value, or even just the difference in specs between my laptop and the official Rocks benchmark machine. It's probably worth looking into though. The benchmark speed for Next, looking at the Given that go's internal json encoding library takes about 3µs just to serialize/deserialize simple structures, I'm not surprised by the fact that it that much longer to do a What all of this tells me is that even serialization aside, raw Of course there are ways to speed fetches up including:
But right now from what I can see network roundtrips are taking ~2/3 of the operation time. I say this all to point out that expectations for Faktory to do millions of operations a second is not realistic. If you look at Redis benchmarks, even doing parallel network requests and doing pure in-memory storage gives around 72k operations a second for a simple |
Sidekiq.cr, an impl of Sidekiq in Crystal (which is a very fast language), can process about 15k jobs/sec with Redis on my laptop. The Ruby version can do about 5k/sec. While I'd love to beat that 15k number, I think it's unrealistic with Faktory's age, right now I'm focused on features and usability, not performance. If you want to process 100,000s of things per second, you should be looking at a Kafka cluster. Background jobs are naturally more "coarse" in my mind than a stream of data: think business transactions rather than processing individual logfile lines. |
Point 1
I think this is where I'm not understanding you correctly. Doing two operations in a single transaction should not be twice as slow as doing a single operation in one transaction, because the WAL/durability overhead should be per transaction, not per operation. If Rocks writes and flushes each operation individually to disk, that means it's doing something silly.
I'm not sure I understand this point? Any sensible database will either keep an in-memory copy of its state, or have datastructures on disk that can be manipulated without deserializing it entirely. If the database had to deserialize the entire dataset off disk for every operation, that would make it cripplingly slow!
Where do you get "double" from? The proposed scheme only stores the
The database should already perform operations on an in-memory structure first, and then flush changes as a batch to disk when a transaction commits. Doing more operations should, in theory, not lead to incurring significantly more disk activity.
Not sure what you mean? The proposed scheme has no old state beyond dead jobs (just like in the current scheme). Point 2I think this discussion should likely be moved to #80, or to a new issue altogether, so I'll just respond briefly here. I completely agree that 1M/s should not be the goal for Faktory. My point was only that 10k seems low, and that I don't think writing to an additional data structure is going to have a significant impact. As you yourself observe, the overheads lie elsewhere.
You showed in the benchmarks that serialization is ~3µs, and that RocksDB can do 460k writes/s. Neither of these suggest that we should be limited to 10k over a network.
60µs network roundtrip time on localhost? That seems like a lot... @mperham I completely agree that performance should not be the focus of Faktory at the moment, nor in the foreseeable future. I was only making a performance argument to say that the cost of adding a secondary datastructure to support |
Also, taking a look at how RabbitMQ compares with full confirmation and persistence (which is what Faktory does right now due to using the WAL and RESP-based responses), Faktory already has it beat. In general I think it's kind of unfair to compare in-memory stores with systems that actually persist to disk for every operation. Assuming that there was an option to make the store in-memory only and bump up the parallelism of the workers I imagine that Faktory's throughput would go through the roof. |
+1 I understand the UI gives a great way to check up on the status of the tasks, but there may be cases where we may want to let a client know what the status of the current job is. Without adding much complexity isn't it possible to just return if a certain |
Any progress on this issue? If Faktory won't support this natively, it would be nice to have a documentation page that describes how a developer could implement it in at the application level. |
The MUTATE command is being added right now which somewhat handles this need. I think the wiki page is “Data Admin API”?
… On Mar 10, 2019, at 11:37, David Baumgold ***@***.***> wrote:
Any progress on this issue? If Faktory won't support this natively, it would be nice to have a documentation page that describes how a developer could implement it in at the application level.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Job Tracking is now in Faktory Enterprise and allows a client to poll for the state of a JID. |
Polling is a relatively common way of checking for job completion, but Faktory does not currently provide a way for applications to do that. To support this, it'd be useful to have an API call that checks whether a given job is still enqueued (i.e., has not been processed yet). Something like:
Which returns one of
queue
should default todefault
if not given. We need the queue to be explicitly named to avoid searing all the queues and sets.For further motivation and discussion, see the Gitter thread starting from https://gitter.im/contribsys/faktory?at=5a03413f7081b66876c7a6ae.
The text was updated successfully, but these errors were encountered: