Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of C strings in API [was: this project vs couchbase-lite-core C API?] #22

Closed
Dushistov opened this issue Sep 19, 2019 · 17 comments
Closed

Comments

@Dushistov
Copy link

Dushistov commented Sep 19, 2019

I am really confused by this project.
There is https://github.com/couchbase/couchbase-lite-core with C API.
What for this project?

Recently I make my Rust binding for couchbase-lite-core public
https://github.com/Dushistov/couchbase-lite-rust/ it is "very alpha" though I use it in my project.

Should I switch to couchbase-lite-C?

From the first look couchbase-lite-C API is worse in compare with couchbase-lite-core for Rust.
Rust native work with slices, so to pass null terminated string you have to allocate memory.
While couchbase-lite-core C API works with slices that maps one to one for all fundamental Rust types.

cc @snej

@Dushistov Dushistov changed the title this project vs couchbase-lite-core? this project vs couchbase-lite-core C API? Sep 19, 2019
@jeromebenois
Copy link

Interesting issue! I works on Rust bindings for this project https://github.com/jeromebenois/couchbase-lite-c-rs

It seems the C API in couchbase-lite-core in an internal API and maybe change. It explains here: https://blog.couchbase.com/couchbase-lite-in-c/

I think that we must talk together. It’s a shame to have 2 rust bindings.

@Dushistov
Copy link
Author

@jeromebenois

It seems the C API in couchbase-lite-core in an internal API and maybe change

But it has little sense for me. You need any way build couchbase-lite-core for Rust binding, so you stick to specific version of couchbase-lite-core any way. So any API changes doesn't matter until you going upgrade to new couchbase-lite-core and during upgrade you need any way test many things so adding new arguments to call and so on doesn't matter.
May be for some case where you used pre-build libLiteCore.so, but I don't see such cases.
While usage of "c null terminated strings" is huge thing, that gives a ton of unnecessary memory allocations.

think that we must talk together. It’s a shame to have 2 rust bindings.

Definitely. I make my binding open-source because of I need collaborators.
It would be nice to use rust websockets instead of c4_civet_web_socket.
Extend query struct to build query via "builder patter" instead of raw json.
Write fleece based serde serialization, de-serialization to skip extra step json<->fleece
for recv/send data to database.

@snej
Copy link
Contributor

snej commented Sep 23, 2019

The short answer is that Couchbase has customers that want to use Couchbase Lite directly from C/C++, or in environments where neither Java/.NET/Cocoa are an option. So we're building a version for them to use, with an API that's pretty close to our other platform APIs.

Couchbase Lite Core is an internal library with an internal API. We reserve the right to change it around drastically at any moment. It's possible it will go away entirely and be merged into CBL-C. The API is pretty complicated to figure out in some areas. And there are annoying problems building it, that we don't have an urgent reason to fix, but that some new developers run into.

We used to be more open about other devs using LiteCore, because we had no better alternative to offer. Now we do.

@snej
Copy link
Contributor

snej commented Sep 23, 2019

Rust native work with slices, so to pass null terminated string you have to allocate memory.

Performance-sensitive APIs in CBL-C, like at the document property level, work with slices. The lowest level thing I can think of that uses C strings is getting documents, but the overhead of allocating a string isn't going to be much compared to reading the data out of SQLite.

@Dushistov
Copy link
Author

@snej

Performance-sensitive APIs in CBL-C, like at the document property level, work with slices

But why mix C null terminated strings and slices?

It is understandable if all your API will works with null terminated strings,
then it may be simplify things for some language bindings that support C strings by default.
But if you have API that can be called only with slices, then why not use it everyway,
this API will be simple in compare with mix slice/C strings?

@snej
Copy link
Contributor

snej commented Sep 24, 2019

Good question. I started out wanting to make the API more idiomatic in C. But when it got down to the document property level, it's really just exposing the Fleece API. Wrapping that whole API in something that exposed C strings instead would be both awkward and introduce a lot of performance overhead due to copying.

So at this point you may be right that it's best to go for consistency and make the higher level APIs slice-based too. I'm not in charge of CBL API decisions; @pasin, what do you think?

@pasin
Copy link
Contributor

pasin commented Sep 24, 2019

I think we should do what makes sense to the platform. Can we have some examples about the proposed change?

@snej
Copy link
Contributor

snej commented Sep 24, 2019

The proposal is to change strings in the public API from char* to FLSlice. There are quite a few of these, for example:

CBLDatabase* CBLDatabase_Open(const char *name _cbl_nonnull,
                              const CBLDatabaseConfiguration* config,
                              CBLError* error) CBLAPI;

Here const char *name would be changed to FLSlice name.

The argument for char* is that it's, obviously, idiomatic for C client code.

However, as @Dushistov points out, a second category of clients is bindings to other languages (Rust, Python, JS, Go, etc.) In many cases these languages store strings as UTF-8 but don't null-terminate them, which means that passing a string into one of these CBL-C APIs requires first copying the string into a temp buffer and appending a zero byte. (And handling a returned C string involves calling strlen to find its length.) For these language bindings, FLSlice is a much more optimal representation.

Another argument against char* is that it already, in some cases, involves string copying inside CBL-C to convert a C4Slice return from LiteCore into a C string. But there's not too much of that IIRC.

I never did push char* all the way down into the CBL-C API. Inside the document body the API changes to Fleece, which is all slice-based. As I said above, I didn't think it would be worth it to create a shadow copy of the entire Fleece API that exposed C strings (with all the copying that entails.) So that's a further argument against char*, that it's not consistent through the whole API.

The main effect on C clients is learning what FLSlice is, and using FLStr() or FLSTR() to wrap C string parameters. Not a lot of overhead IMHO. (For C++ clients I can whip up an implicit std::string to FLSlice conversion operator, so they should be able to pass C++ strings directly.)

@snej snej changed the title this project vs couchbase-lite-core C API? Use of C strings in API [was: this project vs couchbase-lite-core C API?] Sep 24, 2019
@borrrden
Copy link
Member

It doesn't make that much difference to C# since both of them will need some marshalling work (C# strings are internally stored as UTF-16) but I agree we should be consistent which looks like it means using slices everywhere.

@snej
Copy link
Contributor

snej commented May 16, 2020

I recently added alternative versions of the API functions that take const char*. These take FLSlice instead, and have _s appended to their names.

@snej snej closed this as completed May 16, 2020
@snej
Copy link
Contributor

snej commented May 20, 2020

@Dushistov FYI, I've added a preliminary 'official' Rust binding in the bindings/rust directory. (I know you've got your own, but we need one whose API is compatible with our cross-platform API spec.)

@Dushistov
Copy link
Author

Dushistov commented May 20, 2020

@snej

but we need one whose API is compatible with our cross-platform API spec

  1. You mean similar to binding to other languages?
    What about language peculiarities, do you change API per language to make it more elegant for this language or prefer worse but uniform API?
    Like one language has RAII (Resource Acquisition Is Initialization),
    others has no? One language can make sure that you open only one transaction against db simultaneity during compilation time others can not? And so on and so on.

  2. Code looks like generated by script, is this general idea, generate binding via something similar to swig and exclude human from the chain or this just for starting?

  3. The way you building the whole thing is quite strange for Rust, for Python it is common idea: build C/C++ library plus build glue for python, but for Rust it should be reverse, the whole thing should be build from Cargo.toml + build.rs, it is from hard to impossible to point Rust build system: "here prebuild Rust library", you can do it for C/C++ library, but not for Rust library/crate.

@Dushistov Dushistov mentioned this issue May 25, 2020
@snej
Copy link
Contributor

snej commented May 28, 2020

Code looks like generated by script

Only bindings.rs is auto-generated (at build time) from the C headers. That ensures we have accurate C bindings. Those are private; the public classes are wrappers around them.

but for Rust it should be reverse, the whole thing should be build from Cargo.toml + build.rs, it is from hard to impossible to point Rust build system

I don't understand that paragraph. Are you saying that the Rust build script should invoke CMake to build the native library? That sounds like a good idea, but it wasn't necessary for bringup so I haven't done it yet.

@Dushistov
Copy link
Author

Dushistov commented May 29, 2020

@snej

Are you saying that the Rust build script should invoke CMake to build the native library?

Rust ABI is intentionally not stable, you can not give pre-build crate to user for linking, user have to build your crate from source code. You obviously can document that user have to compile C++ code and only after that it can build Rust code, but much more convenient if C++ will build as part of Rust build.

@snej
Copy link
Contributor

snej commented May 29, 2020

You obviously can document that user have to compile C++ code and only after that it can build Rust code, but much more convenient if C++ will build as part of Rust build.

As I said, it's a good idea :) it just wasn't on the critical path to getting something running.

We don't want to require it, though. Couchbase distributes Couchbase Lite to customers in binary form (so that our support team knows exactly what bits the customer has), and the commercial Enterprise Edition contains some closed-source bits so it's only available in binary form. So we need to preserve the ability to build the Rust library from existing CBL headers and dylib.

@snej
Copy link
Contributor

snej commented May 29, 2020

Oh, also

Rust ABI is intentionally not stable

I don't think it's intentional anymore. There is talk of stabilizing the ABI (I think there have been posts about it on the Rust blog.) Being able to distribute libraries in binary form is useful — before Swift finalized its ABI we kept having to provide new builds of CBL-Swift every time a new version of Xcode came out.

@Dushistov
Copy link
Author

I don't think it's intentional anymore. There is talk of stabilizing the ABI

There are several topics in https://internals.rust-lang.org/ , but there are no RFC as I know.
RFC may take year or more, and after that RFC should be implemented and then stabialized.
Years will pass before this happens.

And you should note that there are no option in cargo built system to use external prebuild Rust library, no one expect yet that user need such feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants