Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad CString conversion corrupt data #86

Open
gwik opened this issue Sep 24, 2020 · 13 comments
Open

bad CString conversion corrupt data #86

gwik opened this issue Sep 24, 2020 · 13 comments

Comments

@gwik
Copy link
Contributor

gwik commented Sep 24, 2020

Hi,

It looks like inserting data using bind_by_name was broken by the series of patch labeled as CString conversion optimizations.
When inserting data, it looks the length of the string is incorrect and it writes subsequent memory to cassandra:

Value should be "Paul" but the following is inserted:

Paulxxxaasdf0asdfas8d9fyyyb123921381923921939assertion failed: `(left == right)`\n  left: ``,\n right: ``examples/repro.rsassertion failed: found_one127.0.0.1/home/antoninamand/.rustup/toolchains/nightly-2020-09-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs | aasdf0asdfas8d9fyyyb123921381923921939assertion failed: `(left == right)`\n  left: ``,\n right: ``examples/repro.rsassertion failed: found_one127.0.0.1/home/antoninamand/.rustup/toolchains/nightly-2020-09-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs | b123921381923921939assertion failed: `(left == right)`\n  left: ``,\n right: ``examples/repro.rsassertion failed: found_one127.0.0.1/home/antoninamand/.rustup/toolchains/nightly-2020-09-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs

I bisect'ed with git with the following example (add to
examples/repro.rs) : https://gist.github.com/gwik/2d793da01a88cd6a5280f4a152f61a44

git bisect run cargo run --example repro

The first failing commit is cd5f02c. But it's already in version 0.15.1 and I the first occurence of this bug I found with 0.15.2-pre were fixed with 0.15.1 so I suppose that similar commits 8ca2f15 and f8793b6 have the same issue (introduced after 0.15.1).

@gwik
Copy link
Contributor Author

gwik commented Sep 25, 2020

I seems the bug is in the cpp driver itself:

https://github.com/datastax/cpp-driver/blob/dbea8c599c2726b77b96d0399c86ea3e836223ed/src/statement.cpp#L226-L230

They ignore the value_length parameter and try to deduct the length looking for a NULL byte str terminator.

Separately, the usage of the pointer for the name is not correct. The name reference should outlive the lifetime of the statement as the cpp driver stores the pointer.
It's safe in most usages since people will use static strings, but things can go wrong when using dynamic column names in their code.

@kw217
Copy link
Collaborator

kw217 commented Oct 6, 2020

Oh no! Have you reported the CPP driver bug at https://datastax-oss.atlassian.net/jira/software/c/projects/CPP/issues/?filter=allissues&= ? I've found them moderately responsive, especially if you offer a patch.

Thanks for the workaround.

Please can you raise a separate issue for the usage problem?

@gwik
Copy link
Contributor Author

gwik commented Oct 6, 2020

Oh no! Have you reported the CPP driver bug at https://datastax-oss.atlassian.net/jira/software/c/projects/CPP/issues/?filter=allissues&= ? I've found them moderately responsive, especially if you offer a patch.

I made a PR and they merged it, but they haven't released it.

Please can you raise a separate issue for the usage problem?

I will.

@kw217
Copy link
Collaborator

kw217 commented Nov 4, 2020

Great! Leaving this issue open to track the Cassandra driver fix. Can you link the Cassandra issue?

@kw217 kw217 added the blocked label Nov 4, 2020
@jhgg
Copy link
Collaborator

jhgg commented Nov 5, 2020

Issue is here: datastax/cpp-driver#484

@jensenn
Copy link
Contributor

jensenn commented Dec 9, 2020

Good catch on the bug. Should the code be rolled back to using a null terminated string until the cpp driver is fixed? Being correct is better than fast and broken.

Separately, the usage of the pointer for the name is not correct. The name reference should outlive the lifetime of the statement as the cpp driver stores the pointer.
It's safe in most usages since people will use static strings, but things can go wrong when using dynamic column names in their code.

I think this is ok. The name is used to calculate an index and the index is what is stored. So the name only needs to live for the statement->set() call inside of cass_statement_bind_string_by_name_n(), which it does.

@gwik
Copy link
Contributor Author

gwik commented Dec 10, 2020

The name is used to calculate an index and the index is what is stored

You're absolutely right. Good, we're safe then.

@kw217
Copy link
Collaborator

kw217 commented Jan 22, 2021

I see there has still been no point release of the CPP driver :-(.

Our master code has the workaround

let value_cstr = std::ffi::CString::new(value)?;

so cassandra-rs is safe - strings with NULs in will be truncated (unavoidably), but non-NUL-terminated strings are handled correctly.

This issue still remains open, to track DataStax releasing the fixed driver, and us (possibly) reverting the workaround.

@gwik
Copy link
Contributor Author

gwik commented Jan 22, 2021

I looked for a version in the cpp driver library but couldn't find one. Since we don't know the library version we are executing on I guess we're stuck with the workaround for a very long time. Or maybe someone has a clever idea ?

@kw217
Copy link
Collaborator

kw217 commented Jan 25, 2021

It's undocumented, but the C++ driver does expose a function const char* datastax::internal::driver_version() - see https://github.com/datastax/cpp-driver/blob/master/src/driver_info.hpp#L23 . (Prior to 2.13.0 this was just cass::driver_version, but then it was moved into datastax::internal.) I haven't looked into what support the Rust FFI has for optionally binding a symbol, but this might give us a mechanism.

@kw217
Copy link
Collaborator

kw217 commented May 25, 2021

This is fixed in 2.16.0 of the DataStax C/C++ driver - see datastax/cpp-driver@2.15.3...2.16.0 . @gwik do you want to investigate version detection?

@gwik
Copy link
Contributor Author

gwik commented May 25, 2021

If I believe this: https://doc.rust-lang.org/nomicon/ffi.html it doesn't look good.

Rust is currently unable to call directly into a C++ library

Am I missing something ?

@kw217
Copy link
Collaborator

kw217 commented May 25, 2021

That's a bit of an overstatement. There's nothing magic about a C++ function. bindgen handles C++ to some degree: https://rust-lang.github.io/rust-bindgen/cpp.html and it should certainly cope with this - the only C++ thing about it is the name-mangling. The bigger problem is avoiding a link error if the symbol is absent; I'm not sure how to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants