-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug?] 6.0.0: Reading concurrently from multiple connections #84
Comments
I think this might be a symptom of how the select macro is used here. If I'm reading this correctly you're selecting over the list of futures, waiting for one of them to complete, and then dropping all the others when the loop continues. So in each loop iteration only one of the futures will ever complete. I suspect there's also a missing I'll have some time this week to explore other ways of structuring this, but I suspect it can be fixed with some changes in the app code. There's no shared state between clients, even when they're cloned via |
@aembke thanks for your comment, I'm still digging into this, and it looks like that the problem might be in the "dropping all the others" part. In the real app, the list of streams is dynamically created, as clients add and remove the stream names to the list (through sending commands over the
It appears that something is missing around the cancellation of promises and canceling the blocking XREADs |
After writing the above comment I SUDDENLY realized that I have blocking policy in the Now I sometimes see
and sometimes
on two consecutive runs.
|
Another spammy comment, but might be helpful. I switched back to RESP3 and now I don't see any attempts to unblock in the trace logs. But here's the interesting part: a log from fred interspersed with my own logging from the app. I see myself sending 3 XREADs to the connection and fred processing only 2, and only 1 flushed down the pipe. Which is basically what I see in the MONITOR stream. Why?
|
I wont have time to really dive into this until later in the week, but from a quick glance it looks like what you're trying to do may be inherently prone to race conditions, which could explain why you're seeing inconsistent results. That's not necessarily a bad thing, but just worth considering. The redis protocol is not multiplexed, so when you send a blocking XREAD command it will block the entire client until that call finishes. That may explain why you're seeing fewer XREAD logs than you expect, but it's hard to parse given how you're distributing commands among the clients. If I were doing this I'd put each client in it's own tokio task with a Take all that with a grain of salt though, I haven't had a chance to really dive into this yet. |
@aembke thanks, are you saying that even having exactly one XREAD per connection within the same library instance is prone to race conditions? Because that's what I'm trying to achieve here. Basically, what I'm trying to do is this: lets say streams A and B are destined to use one connection and thus I issue |
Here's the minimum repro, much simpler than the original (should I replace it?): use log::*;
use fred::prelude::*;
use tokio::sync::mpsc;
#[tokio::main]
async fn main() -> Result<(), RedisError> {
pretty_env_logger::init();
let mut config = RedisConfig::default();
config.blocking = Blocking::Interrupt;
let client = RedisClient::new(config.clone(), None, None);
let mut stream_names = vec![];
let mut ids = vec![];
let (tx, mut rx) = mpsc::unbounded_channel();
client.connect();
client.wait_for_connect().await?;
let jh = tokio::spawn(async move {
loop {
tokio::select! {
Some(new_name) = rx.recv() => {
debug!("Adding new stream: {}", new_name);
stream_names.push(new_name);
ids.push("$");
}
Ok(r) = client.xread_map::<String, String, String, String, Vec<&str>, Vec<&str>>(None, Some(0), stream_names.clone(), ids.clone()) => {
for s in r.iter() {
println!("Result: {:?}", s);
}
}
}
}
});
let _ = tx.send("test0");
let _ = tx.send("test1");
let _ = tx.send("test2");
let _ = jh.await;
Ok(())
} So my problems were twofold: If you introduce a delay after spawn and before first As you can see, there's nothing illegal happening: one connection, just multiple commands (but only one at a time) interrupting each other. My impression is that if the interrupt happens while XREAD is not fully settled then some internal mechanism in the library locks up. If you send data to the channel it managed to listen to (usually "test0") then this data go missing for 1 or 2 attempts but then the mechanism wakes up, syncs up with itself and then works. |
I rewrote the module from my app into the threaded model, one client - one task (expected it to be much more complex and it ended up being the same line count, but arguably harder to read and comprehend) and it has the same problem. I need to cancel the outstanding XREAD to update the list of streams and if I do it too fast it locks up. |
This helped a lot, thanks. Apologies, I misunderstood this originally. Try this: use fred::prelude::*;
use futures::future::pending;
use std::{collections::HashMap, time::Duration};
use tokio::{sync::mpsc, time::sleep};
#[tokio::main]
async fn main() -> Result<(), RedisError> {
pretty_env_logger::init();
let mut config = RedisConfig::default();
config.blocking = Blocking::Interrupt;
let client = RedisClient::new(config.clone(), None, None);
let mut stream_names: Vec<String> = vec![];
let mut ids = vec![];
let (tx, mut rx) = mpsc::unbounded_channel();
client.connect();
client.wait_for_connect().await?;
let jh = tokio::spawn(async move {
loop {
tokio::select! {
Some(new_name) = rx.recv() => {
println!("Adding new stream: {}", new_name);
stream_names.push(new_name);
ids.push("$");
}
Ok(r) = async {
if !stream_names.is_empty() {
client.xread_map::<String, String, String, String, Vec<String>, Vec<&str>>(None, Some(0), stream_names.clone(), ids.clone()).await
}else{
println!("Skip XREAD.");
// return a future that never resolves. the select! macro will cancel the future when `rx` gets a message
let _: () = pending().await;
Ok(HashMap::new())
}
} => {
for s in r.iter() {
println!("Result: {:?}", s);
}
}
}
}
});
let _ = tx.send("test0".into());
let _ = tx.send("test1".into());
let _ = tx.send("test2".into());
for i in 0 .. 50 {
let _ = tx.send(format!("test{}", i + 3));
sleep(Duration::from_secs(5)).await;
}
let _ = jh.await;
Ok(())
} It looks like there were two race conditions here - one in the client in and one in the app code. This PR #87 fixes the one in the client, and the code above adds a check for the other one. XREAD returns a syntax error if called like that without any arguments, which can result in the app failing to pick up any new records until something writes to |
Right, sorry, I missed that part, the actual code does have a guard against empty XREAD, it should be written like this: use log::*;
use fred::prelude::*;
use tokio::sync::mpsc;
#[tokio::main]
async fn main() -> Result<(), RedisError> {
pretty_env_logger::init();
let mut config = RedisConfig::default();
config.blocking = Blocking::Interrupt;
let client = RedisClient::new(config.clone(), None, None);
let mut stream_names = vec![];
let mut ids = vec![];
let (tx, mut rx) = mpsc::unbounded_channel();
client.connect();
client.wait_for_connect().await?;
let jh = tokio::spawn(async move {
loop {
tokio::select! {
Some(new_name) = rx.recv() => {
debug!("Adding new stream: {}", new_name);
stream_names.push(new_name);
ids.push("$");
}
Ok(r) = client.xread_map::<String, String, String, String, Vec<&str>, Vec<&str>>
(None, Some(0), stream_names.clone(), ids.clone()),
if stream_names.len() > 0 => {
for s in r.iter() {
println!("Result: {:?}", s);
}
}
}
}
});
let _ = tx.send("test0");
let _ = tx.send("test1");
let _ = tx.send("test2");
let _ = jh.await;
Ok(())
} |
Some late night testing with the app test case shows that #87 did the trick, thanks! I'll test more tomorrow, but the hopes are high, thank you very much! |
Can't reproduce anymore. |
It's either a bug in my DNA (© old programming joke) or in fred.
Redis version - 7.0.5
Platform - Mac
Using Docker and/or Kubernetes - no
Deployment type - centralized
Describe the bug
A brief explanation to why I'm doing things the way the repro is written: I need to read a lot of STREAMs from a Redis Cluster. XREAD is able to read many streams at once, just not in the cluster environment. In cluster environment it can only read from streams that hash into the same slot (not the same node!). So to minimize the number of XREADs I group the channel names by the result of
redis_keyslot()
, for every slot that has streams I open a connection to the cluster node (not a clustered connection, but a result of theclone_new()
from one ofsplit_cluster()
) and run XREAD on this connection. And that doesn't work the way I intended. After I run a test case with just a few streams only one stream works as supposed. The others wake up the connection but do not trigger XREADs. If you keep trying to XADD to these "ignored" streams, they suddenly start working (from 1-2 attempts, sometimes more). The original stream that was working stops working. But the data sent to the "ignored" channels to wake them up is lost which probably means that these other XREADs are not even started because otherwise they'd have captured the data due to their "$" wildcard ID. Phew.To Reproduce
This repro simulates this behavior without the cluster and that's why it looks the way it is. The easiest way to produce the repro is to put it into the
examples
folder andRUST_LOG=trace cargo run --example bug84
. Then from another terminal runredis-cli
and issue a few commands:I had this problem with 5.2.0 and moved to 6.0.0 hoping it's going to be better.
Quite possibly it's my insufficient understanding on how the async Rust works...
Repro itself:
The text was updated successfully, but these errors were encountered: