"failed to fill whole buffer" with multiple nodes #245

psachs · 2019-03-01T10:47:44Z

When configuring multiple scylla nodes, insert failes with an error "failed to fill whole buffer":

thread '<unnamed>' panicked at 'Failed to insert data: Io(Custom { kind: UnexpectedEof, error: StringError("failed to fill whole buffer") })', libcore/result.rs:1009:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:211
   3: std::panicking::default_hook
             at libstd/panicking.rs:227
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:476
   5: std::panicking::continue_panic_fmt
             at libstd/panicking.rs:390
   6: rust_begin_unwind
             at libstd/panicking.rs:325
   7: core::panicking::panic_fmt
             at libcore/panicking.rs:77
   8: core::result::unwrap_failed
             at libcore/macros.rs:26
   9: <core::result::Result<T, E>>::expect
             at libcore/result.rs:835
  10: <unknown>
             at src/main.rs:62

The issue can be reproduced with the following setup:

Setup scylla cluster with 3 nodes

docker run --name scylla-0 -p 9042:9042 -d scylladb/scylla
# wait until scylla-0 is up: docker exec -it scylla-0 nodetool status
docker run --name scylla-1 -p 9043:9042 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' scylla-0)"
docker run --name scylla-2 -p 9044:9042 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' scylla-0)"

Create column family within cqlsh: docker exec -it scylla-0 cqlsh

CREATE KEYSPACE test
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 2};

use test;

CREATE TABLE data (
    id int,
    epoch_utc timestamp,
    value double,
    PRIMARY KEY (id, epoch_utc))
WITH CLUSTERING ORDER BY (epoch_utc DESC)
AND COMPACTION = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'DAYS',
    'compaction_window_size': '1'
};

Cargo.toml

[package]
name = "rust_test_scylla"
version = "0.1.0"
authors = ["Pascal Sachs <pascal.sachs@sensirion.com>"]
edition = "2018"

[dependencies]
cdrs = "^2.0.0-beta.6"
cdrs_helpers_derive = "0.1.0"
rand = "^0.6.5"

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

main.rs

#[macro_use]
extern crate cdrs;
#[macro_use]
extern crate cdrs_helpers_derive;

use std::sync::Arc;
use std::thread;
use std::time::SystemTime;

use rand::prelude::*;

use cdrs::authenticators::NoneAuthenticator;
use cdrs::cluster::session::{new_lz4 as new_session, Session};
use cdrs::cluster::{ClusterTcpConfig, NodeTcpConfigBuilder, TcpConnectionPool};
use cdrs::frame::IntoBytes;
use cdrs::load_balancing::RoundRobinSync;
use cdrs::query::*;
use cdrs::types::prelude::*;

type ScyllaSession = Session<RoundRobinSync<TcpConnectionPool<NoneAuthenticator>>>;

const SCYLLA_NODES: &'static [&'static str] =
    &["localhost:9042", "localhost:9043", "localhost:9044"];

const INSERT_DATA: &'static str = "\
                                   INSERT INTO test.data \
                                   (id, epoch_utc, value) \
                                   VALUES (?, ?, ?)";

fn main() {
    let nodes = SCYLLA_NODES
        .into_iter()
        .map(|addr| NodeTcpConfigBuilder::new(addr, NoneAuthenticator {}).build())
        .collect();
    let cluster_config = ClusterTcpConfig(nodes);
    let session: Arc<ScyllaSession> = Arc::new(
        new_session(&cluster_config, RoundRobinSync::new())
            .expect("Could not connect to scylla cluster"),
    );

    for i in 0..20 {
        let thread_session = session.clone();
        thread::spawn(move || {
            let mut rng = rand::thread_rng();
            let query_insert_data = thread_session
                .prepare(INSERT_DATA)
                .expect("Failed to prepare insert data query");

            let id = i;
            let epoch_utc = (1_000
                * SystemTime::now()
                    .duration_since(SystemTime::UNIX_EPOCH)
                    .unwrap()
                    .as_secs()) as i64;
            let value = rng.gen();
            let values = DataStruct {
                id,
                epoch_utc,
                value,
            }
            .into_query_values();
            thread_session
                .exec_with_values(&query_insert_data, values)
                .expect("Failed to insert data");
        })
        .join()
        .expect("thread error");
    }
}

#[derive(Clone, Debug, IntoCDRSValue, PartialEq)]
pub struct DataStruct {
    id: i32,
    epoch_utc: i64,
    value: f64,
}

impl DataStruct {
    fn into_query_values(self) -> QueryValues {
        query_values!(self.id, self.epoch_utc, self.value)
    }
}

The text was updated successfully, but these errors were encountered:

AlexPikalov · 2019-03-01T10:53:06Z

Hello @psachs,

Thank you for creating this issue and providing the reproduction steps. I'll have a look on that today. Will keep you updated.

AlexPikalov · 2019-03-01T11:51:29Z

@psachs
In spare time, could you please try to use v3 feature of CDRS? By default CDRS uses version 4 of CQL binary protocol, but Scylla in some previous versions (I didn't check the most recent one) used 3-rd version.

[dependencies]
cdrs = { version = "^2.0.0-beta.6", default-features = false, features = ["v3"] }

psachs · 2019-03-01T11:56:22Z

@AlexPikalov Thank you for the fast response, I updated the dependency as you requested. It did not help with the issue.
The current scylla seems to use v4

[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]

AlexPikalov · 2019-03-01T11:58:23Z

@psachs Thanks for the feedback. Then, as I said, hope I'll be able to come up with something later today.

psachs · 2019-03-01T12:00:34Z

@AlexPikalov Thank you very much for your support. Let me know if I can help you with anything

AlexPikalov · 2019-03-01T23:50:40Z

So far I was able to figure out what Io(Custom { kind: UnexpectedEof, error: StringError("failed to fill whole buffer") }) really means. Error frame parser had a bug so it wasn't able to properly parse an another error which is Unprepared(UnpreparedError { id: CBytesShort { bytes: Some([241, 127, 142, 119, 197, 190, 59, 121, 10, 103, 147, 178, 235, 162, 184, 230]) } }) }) (No prepared statement with ID f17f8e77c5be3b790a6793b2eba2b8e6 found.) which means that a client tried to execute an unprepared query. In order to figure out what is wrong with the app you provided I'd need to continue the analysis of this problem.

As for the error frame parser bug, I've prepared a fix for that #246

AlexPikalov · 2019-03-02T11:54:49Z

@psachs

The problem with unprepared statement is caused by by the fact that each new request obtains a cluster node where a request will be sent. In your code each thread makes two requests to a server - thread_session.prepare(INSERT_DATA) and thread_session.exec_with_values(&query_insert_data, values). Most probably there was made a false assumption that both of these calls will be made to the same node. However it's not true. In case of round robin load balancer, if prepare is sent to the node N then the next request which is exec_with_values will be sent to the node N + 1. But node N + 1 doesn't have yet a prepared query with provided ID.

In order to make your flow working CDRS load balancer should be able to find a node that has a given query_insert_data prepared query and send an exec_with_values request exactly to that node. But currently CDRS doesn't have such feature.

An option that would make your code working without waiting for an implementation of new feature is to make query_with_values instead of combination of thread_session.prepare(INSERT_DATA) and thread_session.exec_with_values(&query_insert_data, values). The main cons of this approach is that we're sending query string with each request.

psachs · 2019-03-02T17:53:18Z

@AlexPikalov Thank you very much for the clarification. Yes I can get rid of the prepared statements for now.
And of course I really appreciate your fast response and fast fixing of the issue. I can test again on Monday and let you know if everything worked out.

psachs · 2019-03-04T09:59:08Z

@AlexPikalov
I just tested the master branch for around 2 hours with some realistic load and everything works like a charm without the prepared statements.

I therefore consider this bug as fixed and I am looking forward for the beta7 release ;-)

AlexPikalov · 2019-03-04T20:48:01Z

@psachs

Thanks again for reporting this issue! Beta 7 with fix of unprepared error was published.

As for modification of load balancer I'll create an issue and will try to come up with some approach to this problem.

psachs closed this as completed Mar 4, 2019

AlexPikalov mentioned this issue Mar 20, 2019

USE statement #249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"failed to fill whole buffer" with multiple nodes #245

"failed to fill whole buffer" with multiple nodes #245

psachs commented Mar 1, 2019

AlexPikalov commented Mar 1, 2019 •

edited

Loading

AlexPikalov commented Mar 1, 2019 •

edited

Loading

psachs commented Mar 1, 2019

AlexPikalov commented Mar 1, 2019

psachs commented Mar 1, 2019

AlexPikalov commented Mar 1, 2019 •

edited

Loading

AlexPikalov commented Mar 2, 2019 •

edited

Loading

psachs commented Mar 2, 2019

psachs commented Mar 4, 2019

AlexPikalov commented Mar 4, 2019

"failed to fill whole buffer" with multiple nodes #245

"failed to fill whole buffer" with multiple nodes #245

Comments

psachs commented Mar 1, 2019

AlexPikalov commented Mar 1, 2019 • edited Loading

AlexPikalov commented Mar 1, 2019 • edited Loading

psachs commented Mar 1, 2019

AlexPikalov commented Mar 1, 2019

psachs commented Mar 1, 2019

AlexPikalov commented Mar 1, 2019 • edited Loading

AlexPikalov commented Mar 2, 2019 • edited Loading

psachs commented Mar 2, 2019

psachs commented Mar 4, 2019

AlexPikalov commented Mar 4, 2019

AlexPikalov commented Mar 1, 2019 •

edited

Loading

AlexPikalov commented Mar 1, 2019 •

edited

Loading

AlexPikalov commented Mar 1, 2019 •

edited

Loading

AlexPikalov commented Mar 2, 2019 •

edited

Loading