Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/599 provide and use c service in rust #627

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

pnoltes
Copy link
Contributor

@pnoltes pnoltes commented Aug 29, 2023

Intro

This PR completes the Rust feasibility issue (#599) by providing a proof of concept (PoC) implementation for Apache Celix with Rust.

I invested more time in this than I had initially planned. However, as a result, this PoC IMO demonstrates nicely the feasibility of integrating Rust support into Apache Celix.

It's good to highlight that this implementation serves as a proof of concept and is not ready for production. The code lacks consistent error checks, documentation, unit tests, and is not feature-complete.

Changes

In this PR, the following are introduced:

  • A celix crate, which builds upon the celix_bindings crate, offering a Rust-native API for:
    • Erno enum module
    • BundleActivator module
    • LogHelper module
    • BundleContext module
  • The BundleContext module provides the following features:
    • Logging functions
    • Service registration builder
    • Service use builder
    • Service tracker builder
  • LogHelper uses a service tracker to fetch an on-demand C log_service.
  • An updated Rust "hello world" bundle that utilizes the BundleActivator
  • A Rust Shell API crate that:
    • Introduces a RustShellCommand type.
  • A Rust Shell command bundle that:
    • Utilizes the BundleActivator
    • Supplies both a C shell command and a Rust shell command (Note: The Rust shell command serves merely as a demonstration and is not actively utilized)
    • Uses the BundleContext::use_service to access both the Rust and C shell commands

The rust_shell_cnt Apache Celix container/executable can be used to try-out Rust integration. With the query -v command it should give a nice overview of provided and used service in Rust.

Interesting Observations

Rust's templating system, especially its type inference capabilities, is quite powerful. However, working with templates and traits can be - based on my experience - complex. The is mainly because traits aren't complete types. As a result, template methods cannot seamlessly move a provided trait type from a Box (pointer) to a different container type, like Arc (atomic reference count (shared_ptr)). My assumption is that this constraint is because of potential slicing and memory allocation/deallocation challenges. But, I am still learning the ins and outs of Rust, so maybe more is possible. Consequently, for this PoC, I decided to make type containers (e.g., Box, Arc, Rc) part of the service type. As result, from a Rust perspective, one does not register a 'RustShellCommand' service type but an Arc<RustShellCommand> service type.

While integrating Rust support into Apache Celix is feasible, I'm uncertain whether it's appropriate to include it directly within the Apache Celix main git repository, especially when leveraging CMake Corrosion. Though this method functions, in my experience is does not really integrate well with an IDE.

Lastly, Rust, would arguably be a better fit, if Apache Celix could support both Static Bundles and Shared Object Bundles, in addition to Zip Bundles (#94).

@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2023

Codecov Report

Merging #627 (23bcf27) into master (2811897) will increase coverage by 0.02%.
Report is 14 commits behind head on master.
The diff coverage is n/a.

❗ Current head 23bcf27 differs from pull request most recent head 4e91216. Consider uploading reports for the commit 4e91216 to get more accurate results

@@            Coverage Diff             @@
##           master     #627      +/-   ##
==========================================
+ Coverage   81.61%   81.63%   +0.02%     
==========================================
  Files         260      260              
  Lines       34680    34679       -1     
==========================================
+ Hits        28303    28311       +8     
+ Misses       6377     6368       -9     
Files Changed Coverage Δ
...s/pubsub/pubsub_admin_tcp/src/pubsub_tcp_handler.c 78.42% <ø> (+0.10%) ⬆️

... and 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@PengZheng PengZheng self-requested a review August 31, 2023 05:46
@PengZheng
Copy link
Contributor

PengZheng commented Sep 10, 2023

Just finished the book, hopefully that was not too late for our Rust party.
I'll finish reading through the code base this weekend.

Please wait until Friday.

I'll finish the review before Wednesday.

[lib]
name = "celix"
path = "src/lib.rs"
crate-type = ["rlib"]
Copy link
Contributor

@PengZheng PengZheng Sep 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's OK to have static libcelix.

Considering its size, 200KB for x64 release build, it might be preferable to built as shared object.
But unfortunately I've not found any way to make BUILD_RPATH work for Corrosion, without which it is quite inconvenient to run the resulting binary in the build tree.

Even if we can add BUILD_RPATH via build.rs, I don't know how to make RPATH rewrite work.


#[doc = "A trait to implement a Celix Shell Command"]
pub trait RustShellCommand {
fn execute_command(&self, command_line: &str) -> Result<(), Error>;
Copy link
Contributor

@PengZheng PengZheng Sep 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&self or &mut self? What if a command execution changes the internal states?
So we have to use a dummy RustShellCommandImpl to hold a mutable reference to the object we really want to change its states with such a command execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs IMO some more experimenting and/or discussion.

But a &self argument - so not mutable - in a (service) trait can be used to change a service state, but it requires the the implementing object uses a Mutex to change the state.

An other option could be to register a services as Arc<Mutex> or maybe require that a service template argument is Sync.

But I think this is something to get more clear later.

let include_path_file = build_dir.join("include_paths.txt");
file = File::open(&include_path_file)?;
} else {
println!("include_paths.txt not found in CORROSION_BUILD_DIR. Failing back to CELIX_RUST_INCLUDE_PATHS_FILE env value");
Copy link
Contributor

@PengZheng PengZheng Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the message expresses the real intention, then the currently implementation, which fails back to CELIX_RUST_INCLUDE_PATHS_FILE if CORROSION_BUILD_DIR is not set, is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally, but does not really work. So I will remove the else branch.

#Note for now this includes framework, utils and shell_api maybe this should be separated in the future.
file(GENERATE OUTPUT
"${CMAKE_CURRENT_BINARY_DIR}/include_paths.txt" CONTENT
"$<TARGET_PROPERTY:framework,INTERFACE_INCLUDE_DIRECTORIES>;$<TARGET_PROPERTY:utils,INTERFACE_INCLUDE_DIRECTORIES>;$<TARGET_PROPERTY:shell_api,INTERFACE_INCLUDE_DIRECTORIES>;$<TARGET_PROPERTY:Celix::log_service_api,INTERFACE_INCLUDE_DIRECTORIES>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INCLUDE_DIRECTORIES alone is not enough, we shall also have COMPILE_DEFINITIONS and COMPILE_OPTIONS.

@@ -44,19 +58,15 @@ fn main() {
println!("cargo:info=Start build.rs for celix_bindings");
let include_paths = print_include_paths().unwrap();

let mut builder = bindgen::Builder::default()
.header("src/celix_bindings.h");
let mut builder = bindgen::Builder::default().header("src/celix_bindings.h");

// Add framework and utils include paths
for path in &include_paths {
builder = builder.clang_arg(format!("-I{}", path));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier, -I is not enough, we may need -D and other compile options (like -pthread).
It is hard to tell whether it matters for now without checking each target's INCLUDE_DIRECTORIES, COMPILE_DEFINITIONS and COMPILE_OPTIONS.

@@ -44,19 +58,15 @@ fn main() {
println!("cargo:info=Start build.rs for celix_bindings");
let include_paths = print_include_paths().unwrap();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misisng cargo:rustc-link-lib= leads to no DT_NEEDED for framework and utils shared objects.

@@ -44,19 +58,15 @@ fn main() {
println!("cargo:info=Start build.rs for celix_bindings");
let include_paths = print_include_paths().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing parse_callbacks(Box::new(bindgen::CargoCallbacks)) so that changed Celix C headers will not lead to rebuild of this crate.

@@ -30,3 +30,6 @@
#include "celix_framework.h"
#include "celix_framework_factory.h"
#include "celix_framework_utils.h"

#include "celix_shell_command.h"
#include "celix_log_service.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all components (including framework and utils) are optional, these headers may be missing.

For Conan, intra-package and inter-package dependency deduction work like charm. But we need to think about how to make Cargo and Conan work together.


impl Debug for Error {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
match self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is celix_strerror helpful for more useful Debug information?


pub struct LogHelper {
name: String,
tracker: Mutex<Option<ServiceTracker<celix_log_service_t>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With LogHelper as the sole owner, the use of Mutex is weird, whose purpose is to suppress compiler's complain of helper.tracker.lock().unwrap().replace(tracker)?

});
let filter = format!("(name={})", name);
let weak_helper = Arc::downgrade(&helper);
let tracker = ctx.track_services::<celix_log_service_t>()
Copy link
Contributor

@PengZheng PengZheng Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the weirdness of the tracker mutex comes from this: providing a half-constructed object (log helper) to another object (tracker).
If we permit a new log helper holding a None tracker and opening the tracker after new, then LogHelper::new returns Self rather than Arc<Self>.

Copy link
Contributor

@PengZheng PengZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for contributing this eye-opening PR.

Reviewing this PR is not an easy task: familiarizing oneself with Rust's ownership model is not easy, and fitting it with the Celix's service layer which manages dynamic service lifetime is challenging. I was struggling with aligning these two properly. Maybe I need more time to absorb Rust. Never mind. I'll keep looking at the Rust part afterwards.

Let's get this cool addition merged in its current form and get 2.4.0 released ASAP.

handle: *mut ::std::ffi::c_void,
_ctx: *mut $crate::details::CBundleContext,
) -> $crate::details::CStatus {
let reclaimed_activator = Box::from_raw(handle as *mut $activator);
Copy link
Contributor

@PengZheng PengZheng Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not feel right. In the C version, activator is dropped after celix_bundleContext_waitForEvents(ctx).

IMHO, the best fix is to change the corresponding logic in celix_framework_stopBundleEntryInternal:

From

        status = CELIX_DO_IF(status, bundle_getContext(bndEntry->bnd, &context));
        if (status == CELIX_SUCCESS) {
            if (activator->stop != NULL) {
                status = CELIX_DO_IF(status, activator->stop(activator->userData, context));
                if (status == CELIX_SUCCESS) {
                    celix_dependency_manager_t *mng = celix_bundleContext_getDependencyManager(context);
                    celix_dependencyManager_removeAllComponents(mng);
                }
            }
        }
        if (status == CELIX_SUCCESS) {
            if (activator->destroy != NULL) {
                status = CELIX_DO_IF(status, activator->destroy(activator->userData, context));
            }
        }

        if (bndEntry->bndId >= CELIX_FRAMEWORK_BUNDLE_ID) {
            //framework and "normal" bundle
            celix_framework_waitUntilNoEventsForBnd(framework, bndEntry->bndId);
            celix_bundleContext_cleanup(bndEntry->bnd->context);
        }

To

        status = CELIX_DO_IF(status, bundle_getContext(bndEntry->bnd, &context));
        if (status == CELIX_SUCCESS) {
            if (activator->stop != NULL) {
                status = CELIX_DO_IF(status, activator->stop(activator->userData, context));
                if (status == CELIX_SUCCESS) {
                    celix_dependency_manager_t *mng = celix_bundleContext_getDependencyManager(context);
                    celix_dependencyManager_removeAllComponents(mng);
                }
            }
        }

        if (bndEntry->bndId >= CELIX_FRAMEWORK_BUNDLE_ID) {
            //framework and "normal" bundle
            celix_framework_waitUntilNoEventsForBnd(framework, bndEntry->bndId);
            celix_bundleContext_cleanup(bndEntry->bnd->context);
        }
        if (status == CELIX_SUCCESS) {
            if (activator->destroy != NULL) {
                status = CELIX_DO_IF(status, activator->destroy(activator->userData, context));
            }
        }

So that we can remove celix_bundleContext_waitForEvents(ctx) from the generated celix_bundleActivator_destroy. Also we re-promote usage of create/start/stop/destroy rather than CELIX_GEN_BUNDLE_ACTIVATOR.

}

pub struct ServiceTracker<T> {
ctx: Arc<BundleContext>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not feel right, since BundleContext has longer lifetime than any tracker associated with it.
This weirdness leads to unnatural BundleContext::get_self.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifetime Order: BundleContext > Activator > Loggerhelper > ServiceTracker ...
If we can enforce such order using Rust type system, then lots of weirdness will disappear.

);
}
Err(e) => {
println!("Error creating CString: {}", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use eprintln! rather than println!?

handle: &mut self.shell_command_provider as *mut CShellCommandImpl as *mut c_void,
executeCommand: Some(CShellCommandImpl::call_execute_command),
})
.with_service_name("celix_shell_command")
Copy link
Contributor

@PengZheng PengZheng Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the current implementation, I can tell if with_service_name comes before with_service, an dynamic memory allocation will be avoided.

@pnoltes
Copy link
Contributor Author

pnoltes commented Sep 18, 2023

LGTM

Thanks for contributing this eye-opening PR.

Reviewing this PR is not an easy task: familiarizing oneself with Rust's ownership model is not easy, and fitting it with the Celix's service layer which manages dynamic service lifetime is challenging. I was struggling with aligning these two properly. Maybe I need more time to absorb Rust. Never mind. I'll keep looking at the Rust part afterwards.

Let's get this cool addition merged in its current form and get 2.4.0 released ASAP.

Thanks. I agree with a 2.4.0 ASAP. I think it is maybe better to merge this PR after 2.4.0. Then some additional work can be done to prevent/refactor some strange constructions (e.g. the log_helper <-> service tracker).

@PengZheng
Copy link
Contributor

I think it is maybe better to merge this PR after 2.4.0. Then some additional work can be done to prevent/refactor some strange constructions (e.g. the log_helper <-> service tracker).

I agree. Improving the C implementation and the Rust design at the same time may produce the best results.
For a stable 2.4.0, we'd better leave the current C implementation untouched.

@kulst
Copy link

kulst commented Apr 3, 2024

Hi, awesome to see Celix is making progress towards Rust 👍

Are you open to contributions on this topic? Although I am not completely familiar with the internals of Celix yet, I would love to wrap my head around your concepts and see if we can create a nice and secure Rust API for Celix.

@pnoltes
Copy link
Contributor Author

pnoltes commented Apr 4, 2024

Hi, awesome to see Celix is making progress towards Rust 👍

Are you open to contributions on this topic? Although I am not completely familiar with the internals of Celix yet, I would love to wrap my head around your concepts and see if we can create a nice and secure Rust API for Celix.

Hi,
Firstly thanks for the interest! Additional help is always welcome 😄 .

Although this pull request has remained dormant for some time, I still believe that introducing Rust bindings, alongside the existing C++ bindings, would greatly benefit Apache Celix.

Please note that Rust support, as well as this pull request, is still experimental. Rust is a new for me (and, I believe, for the other committers as well), and we are all in the process of understanding how Rust concepts can be integrated into Apache Celix, or more broadly, how they fit within a dynamic, in-process, service framework.

Another point to consider is that, for now, we have decided to develop Rust support on top of the C framework, thereby leveraging the existing and stable C framework. Theoretically, this approach could be reversed, but in my opinion, that is not our current objective.

A good way to start would be to try out Apache Celix and, specifically, this branch. If you have any questions, feel free to ask them here or on the Apache Celix dev mailing list (see https://celix.apache.org/support/mailing-list.html). And, of course, it is possible to create a pull request on top of this pull request to introduce some changes.

Apache Celix is part of the Apache Software Foundation (ASF), and in practice, this means that we can accept some small (code) donations, but for more or larger donations, an ASF Contributor Agreement is needed.

@kulst
Copy link

kulst commented Apr 4, 2024

Hi, thank you for the detailed explanations!

I aswell think that Rust would really benefit Celix.
The exclusive use of safe Rust guarantees that memory and type conversion errors as well as undefined behavior are excluded - through compile time and runtime checks.
This should also be the goal for the Rust API of Celix. If any of these errors occur to the user of the API when using safe Rust only, it should be considered a bug in the Rust API or Celix. However, whether it is possible to offer a Celix API that consists exclusively of safe Rust is still unclear to me.

I think the existing C API is a great starting point for now, as Rust interopperates usually well with C. As you already showed :)

For now I would start with making myself more familiar with Celix. I was already able to develop a basic understanding of most of the concepts of Celix. But I definitely need to further enhance this.
I will start by installing Celix and trying to get your Rust examples to work, lets see if I can also make it work with the latest version 2.4.

The next step would be to get a better understanding of the invariants that need to be upheld when using the C API. I saw, that a lot of these are already documented, but I assume there are more and maybe also implicit ones.
When this is done, the Rust API can be designed. Of course I want to leverage the Rust type system to make sure that the invariants of the C API are upheld.
Your POC is also a great starting point.

For now I think this is already a lot to do. Let's see where this leads me. :) In any case, an ASF Contributor Agreement will not be an obstacle.

Topics I stumbled upon but which I have put in my backlog for now:

Allocation: by default allocating and deallocating on different sides of the ffi might lead to undefined behaviour. There are several possible solutions to this

  1. implementing a (global) allocator in Rust that delegates to the Celix allocator
  2. making sure that both happens on the same side of the FFI by the Rust type system

Type conversion: There are various types in the C API that are similar to Rust types but can not be used as smoothly without further adjustments on the Rust side. For example the array_list type is similar to a vector (with some additions).

  1. So one solution to use such a type smoothly in Rust, you basically would have to reimplement the interface of the std::vec type.
  2. Another solution is to just copy the elements to the corresponding type before/after passing the ffi boundary. This might be inefficient however.
  3. A third solution might only be possible when allocation and deallocation happen on the same side of the ffi boundary. I think it could then be possible to create a std::vec from the internals of an array_list without another allocation just by using the pointer, the size and the capacity an array_list already provides. Together with the additions of an array_list this could be packaged in a simple wrapper type.

@pnoltes
Copy link
Contributor Author

pnoltes commented Apr 7, 2024

Sound like a plan 👍 .

For now I would start with making myself more familiar with Celix. I was already able to develop a basic understanding of most of the concepts of Celix. But I definitely need to further enhance this. I will start by installing Celix and trying to get your Rust examples to work, lets see if I can also make it work with the latest version 2.4.

Trying Apache Celix using version 2.4 is fine, but generally speaking the master branch is also quite stable for exploring Apache Celix.
1 note: We are currently working towards a Apache Celix 3.0.0, so a major update. I expect this
has no real impact on learning Apache Celix, because a major part of the backwards incompatible updates focus on dropping older APIs, which are not or not well documented anyways. For a complete overview of the backwards incompatible changes see the top level CHANGES.md file.

The next step would be to get a better understanding of the invariants that need to be upheld when using the C API. I saw, that a lot of these are already documented, but I assume there are more and maybe also implicit ones. When this is done, the Rust API can be designed. Of course I want to leverage the Rust type system to make sure that the invariants of the C API are upheld. Your POC is also a great starting point.

For now I think this is already a lot to do. Let's see where this leads me. :) In any case, an ASF Contributor Agreement will not be an obstacle.

Great to hear, when this is needed I will give an additional trigger.

Topics I stumbled upon but which I have put in my backlog for now:

Allocation: by default allocating and deallocating on different sides of the ffi might lead to undefined behaviour. There are several possible solutions to this

1. implementing a (global) allocator in Rust that delegates to the Celix allocator

2. making sure that both happens on the same side of the FFI by the Rust type system

Type conversion: There are various types in the C API that are similar to Rust types but can not be used as smoothly without further adjustments on the Rust side. For example the array_list type is similar to a vector (with some additions).

1. So one solution to use such a type smoothly in Rust, you basically would have to reimplement the interface of the `std::vec` type.

2. Another solution is to just copy the elements to the corresponding type before/after passing the ffi boundary. This might be inefficient however.

3. A third solution might only be possible when allocation and deallocation happen on the same side of the ffi boundary. I think it could then be possible to create a `std::vec` from the internals of an `array_list` without another allocation just by using the pointer, the size and the capacity an `array_list` already provides. Together with the additions of an `array_list` this could be packaged in a simple wrapper type.

Good to hear, you are already seeing some topics and I think the topics are a good starting point. I was not aware that a separate way of mem allocation was needed/desired. But as stated, I am not a Rust expert, so help is welcome :)

I will also try to eventual get this PR merged. Not as a stable or even completely usable addition, but then we at least have a Rust starting point in the master branch. But I am currently focusing on something else, so I am not sure yet when I have to pick up this PR.

@kulst
Copy link

kulst commented Apr 25, 2024

I wanted to give a little update on my progress :)

I was able to get your POC running with the latest master branch. Only some small changes were necessary. However there were some error messages I could track down to filter compilation. In celix_version_parse "Invalid version component" gets logged. I do not know if this is intentional.

I also made some progress towards the API and wanted to share my thoughts.

I heavily thought about the I think most important invariant. celix_bundle_context_t (and all other framework objects) must not be used when the Bundle is stopped. From my point of view this means that the activator must invalidate all Framework objects that are floating around.

POC approach using Weak and Arc
In the POC I think this was tried to achieve by using std::sync::Weak pointers. However I am not sure if this is entirely sound. For example when building the Servicetracker we upgrade the Weak to an Arc. What happens if this is done in another thread and immediately after upgrading the Weak and before actually tracking the service the bundle gets stopped and destroyed. I think we would have a use after free situation here.

Approach using Arc<Mutex<Option>>
So I thought about a solution having two types both storing Arc<Mutex<Option<*mut celix_bundle_context_t>>>. One type is private to the library and invalidates each celix_bundle_context_t simply by taking it out of the Option when the bundle gets stopped. The other type is public with a private implementation and uses the stored celix_bundle_context_t as long as it is available. With this solution the whole API of the PublicCelixBundleContext would always return a Result<T,E>. Also for each access there is additional locking necessary because of the Mutex.

Then I thought about using lifetimes. The problem with lifetimes is that they need a scope where the Owner lifes. But we have two (four) functions that are called after each other from the framework. The compiler does not know that these are called one after another. Also we convert the Activator to a raw pointer. There do no lifetimes exist for raw pointers. So what can we do about it?

First lifetime approach: Static services
The framework provides the celix_bundle_context_t in each of the four functions. We could now create a BundleContext in the start function from it and store it in the activator. The library user could now get a &BundleContext. During the start function the user could register services, track services and so on with it. This can be also stored in the activator. However he can not create a new thread with it that outlives the start function. This is because the &BundleContext has a non-static lifetime.

In the stop function the user simply takes its user data from the activator and undos what he did in the start function.
Overall this can only be used for static handling of services.

Second lifetime approach: Spawn a thread
My second approach is to just spawn a thread in the start function and join it in the stop function. By this we get a scope that lifes from start to stop. We send the created BundleContext into the thread but also here we only provide the user a &BundleContext. Also here the user can not spawn thread that outlive the scope. But as the scope is from start to stop it is possible to create threads that also live as long.
In the stop function we signal the thread that it must stop and join it. The user is responsible to make sure the thread stops.
With this approach we do not need to store user data in the activator. We simply put a JoinHandle there to join it and the Sender to signal the thread it must stop.
Overall this method is suited for static services but dynamic services as well.

In the following repository I show how I think this could look like:
https://github.com/kulst/celix_rust_test

I hope I was able to make everything clear to you.
I would be glad to get some feedback, to know if I am on the right way.

@pnoltes
Copy link
Contributor Author

pnoltes commented Apr 28, 2024

Thanks for the update.

POC approach using Weak and Arc In the POC I think this was tried to achieve by using std::sync::Weak pointers. However I am not sure if this is entirely sound. For example when building the Servicetracker we upgrade the Weak to an Arc. What happens if this is done in another thread and immediately after upgrading the Weak and before actually tracking the service the bundle gets stopped and destroyed. I think we would have a use after free situation here.

You are correct, the usage of Weak mimics what is done in C++. Note that in C++ de bundle deactivates waits until all upgraded Weak (std::weak_ptr) to Arc (std::shared_ptr) are out of scope and prints an error when this takes too long.

But I agree, ideally this is improved using Rust lifecycles.
And great to see, and impressive, that you already made some progress with Rust lifecycles and Apache Celix.

The coming week I will not have the time have a look at the code and give feedback, but when I have to time I will pick this up.

@kulst
Copy link

kulst commented May 14, 2024

Two weeks have passed and I wanted to share again my progress :)

BundleContext lifetime
Another way I came up with is to hand out a BundleContext to the bundle and having an Arc<(Condvar, Mutex)> flag in the BundleContext (not accessible by the BundleContexts API). When the BundleContext is dropped, the flag is set and the Condvar is notified.
A copy of that Arc is stored in the internal part of the activator. When the stop/delete function of the bundle is invoked the flag is checked periodically if it is set. By using the Condvar we can block between these checks (and also log an error similar to the cpp API).

However I postponed this topic a little bit as I think there are a few feasible solutions. We can later review these solutions and settle on one of them.

Services
The next topic I wanted to approach is services. When using a service a bundle calls the function of another bundle. Both bundles need to agree upon the the API of the service and also on the ABI of the service.
In my eyes the Celix Rust API should be safe to use for the user. This means it should not be possible to cause undefined behaviour without using unsafe code in Rust. But we can not enforce during compilation that the API and ABI of the service we want to use is the same as we think it is.

So I think we have two options here:

  1. Using a service is inherently unsafe and invariants must be upheld by the user. For C and C++ services this will probably be the only solution, so using these services from Rust will allways be unsafe. However for Rust services I think there is another way!
  2. Checking ABI and API compatibility at runtime. If we can achieve this, using a service is safe, incompatible ABI/API would not cause undefined behaviour, returning an Error would be enough.

How can we achieve ABI and API checks at runtime?
There is one Rust crate that already tries to deal with this issue. Only a very small part of the Rust API is already stable. Most part of the API is allowed to change between compiler versions but also between different compiler runs.
abi_stable provides ABI stability by transforming Rust types with unstable ABI into types with a stable ABI (#[repr(C)]) types. It utilizes the Rust procedural macro system for this which can transform abstract syntax trees into other abstract syntax trees.
To get API stability the crate provides type information of these types, which can be checked at runtime.
To make sure that the API and ABI of the abi_stable crate is compatible a static variable is provided which can also be checked against at runtime.

I think we can utilize this crate to provide safe services.

How could this look like?
A service would be an object that implements a trait (the service interface). This is similar to C++ services. The trait is defined in an Service API crate.
The bundle that provides the service statically links against this API crate. When registering this service a pointer to the bundle-static type information and abi_stable library information is stored in the framework (for that it is necessary to implement one or two functions that are only usable internally by the Celix Rust API).

When another bundle now wants to use the service it must statically link against the API crate as well. When using the service it is checked that the type information and abi_stable library information are compatible.

Additional opportunities
It would be great if we could use Rust services from C and C++. For this the Service API crate must export some C/C++ header files. Luckily there is a crate that already does this: cglue. On top of that it is compatible and works together with abi_stable.
This makes it possible to easily create services that are usable from C, C++ and Rust and provide additional guaranties.

Final hurdles
I would like to rewrite the POC in the next step to make sure all of this works. However there is one problem I could not solve yet:
What if a service of a bundle provides a 'static reference in its interface? The reference could point to 'static data of the bundle. After unloading the providing bundle this reference would be invalid.
There should be a way to constrain the lifetimes to the lifetime of the service use callback.
However I do not know how yet.
Maybe there is a way by checking the service API with an additional proc macro.

Summary
Again a lot of text. I hope you are fine with documenting my thoughts and progress here. There are a lot of topics that need to be considered. At the moment Rust is not perfectly well suited for dynamic linking. Its mostly unstable ABI makes it necessary to heavily use the macro system to provide a safe interface. If you want to have all of the Rust safety guaranties as well, a lot of possibilites need to be considered.
But I still think it is possible and I will continue working on it.

@pnoltes
Copy link
Contributor Author

pnoltes commented May 14, 2024

Two weeks have passed and I wanted to share again my progress :)

Very good, keep up the pace :)

I did have a look at the implementation, but was struggling to find some spare time.

@pnoltes
Copy link
Contributor Author

pnoltes commented May 14, 2024

Second lifetime approach: Spawn a thread My second approach is to just spawn a thread in the start function and join it in the stop function. By this we get a scope that lifes from start to stop. We send the created BundleContext into the thread but also here we only provide the user a &BundleContext. Also here the user can not spawn thread that outlive the scope. But as the scope is from start to stop it is possible to create threads that also live as long. In the stop function we signal the thread that it must stop and join it. The user is responsible to make sure the thread stops. With this approach we do not need to store user data in the activator. We simply put a JoinHandle there to join it and the Sender to signal the thread it must stop. Overall this method is suited for static services but dynamic services as well.

In the following repository I show how I think this could look like: https://github.com/kulst/celix_rust_test

In my opinion I would be great if the BundleContext and everything created from the bundle context can be controlled by lifecycles. This could ensure that we can have static checks for correctly handling service registrations and service usage within the lifecycle of a bundle.

For now I also think it is fine to do this with spawning threads, but ideally a thread is not needed. I have experience with complex applications using about 100 bundles and this would also mean a 100 thread (including a 100 extra stack stack) purely for lifecycle management.

If I understand it correctly a thread is used to ensure a object lifecycle from BundleActivtor::start until BundleActivator::stop. Is it not possible to somehow create a rust object that outlives the celix C framework, maybe even using a custom rust celix launcher and use that rust object to store "bundle lifecycle control" storage objects?
But I am guessing here.. I am still learning rust.

@pnoltes
Copy link
Contributor Author

pnoltes commented May 14, 2024

Two weeks have passed and I wanted to share again my progress :)

BundleContext lifetime Another way I came up with is to hand out a BundleContext to the bundle and having an Arc<(Condvar, Mutex)> flag in the BundleContext (not accessible by the BundleContexts API). When the BundleContext is dropped, the flag is set and the Condvar is notified. A copy of that Arc is stored in the internal part of the activator. When the stop/delete function of the bundle is invoked the flag is checked periodically if it is set. By using the Condvar we can block between these checks (and also log an error similar to the cpp API).

This is indeed close to what C++ is doing and that case a weak_ptr is used during bundle stop a runtime check and wait is made to ensure there are no users of the BundleContext anymore.

@kulst
Copy link

kulst commented May 14, 2024

This is indeed close to what C++ is doing and that case a weak_ptr is used during bundle stop a runtime check and wait is made to ensure there are no users of the BundleContext anymore.

Exactly, the C++ solution is where I took inspiration from. But by doing it with an Arc<(Condvar, Mutex<bool>)> inside of the BundleContext we can utilize lifetimes and could also create the Rust API with more flexibility for the user.
For example instead of explicitly storing an Arc<BundleContext> or an &BundleContext in each data structure that depends on the BundleContext we could also store an T : AsRef<BundleContext> in the data structures and let the user decide.

@pnoltes
Copy link
Contributor Author

pnoltes commented May 14, 2024

Services The next topic I wanted to approach is services. When using a service a bundle calls the function of another bundle. Both bundles need to agree upon the the API of the service and also on the ABI of the service. In my eyes the Celix Rust API should be safe to use for the user. This means it should not be possible to cause undefined behaviour without using unsafe code in Rust. But we can not enforce during compilation that the API and ABI of the service we want to use is the same as we think it is.

So I think we have two options here:

1. Using a service is inherently unsafe and invariants must be upheld by the user. For C and C++ services this will probably be the only solution, so using these services from Rust will allways be unsafe. However for Rust services I think there is another way!

First of, my complements about taking API and ABI compatibility into account. For C and C++ we currently only support source compatibility, so everything should be build with the same compiler using the same sources (include service interface headers).

But the idea to do some API compatibility checking is not new. Apache Celix also has a libffi library (recently updated for stability and documentation).

An idea back when libffi was first created, was to use the libclang to parse C service interface headers to create descriptors (small and easy parseable files describing C structures and interfaces). Note: using libclang as lib and not necessary also using the clang compiler. And then adding these generated dfi descriptors in bundle zips to that they can be referred to during service registration and service usage. The Celix framework could then use these descriptors and libffi to check compatibility.

I still think this would be a very welcome addition to Apache Celix, but I would still require significant effort to develop, test and make user friendly.

2. Checking ABI and API compatibility at runtime. If we can achieve this, using a service is safe, incompatible ABI/API would not cause undefined behaviour, returning an Error would be enough.

How can we achieve ABI and API checks at runtime? There is one Rust crate that already tries to deal with this issue. Only a very small part of the Rust API is already stable. Most part of the API is allowed to change between compiler versions but also between different compiler runs. abi_stable provides ABI stability by transforming Rust types with unstable ABI into types with a stable ABI (#[repr(C)]) types. It utilizes the Rust procedural macro system for this which can transform abstract syntax trees into other abstract syntax trees. To get API stability the crate provides type information of these types, which can be checked at runtime. To make sure that the API and ABI of the abi_stable crate is compatible a static variable is provided which can also be checked against at runtime.

I think we can utilize this crate to provide safe services.

I have a short look at abi_stable and mainly looked for the license. Maybe I missing something, but I could not find under which license the code is provided.

But this does indeed sound like a good addition for runtime compatibility checking.

How could this look like? A service would be an object that implements a trait (the service interface). This is similar to C++ services. The trait is defined in an Service API crate. The bundle that provides the service statically links against this API crate. When registering this service a pointer to the bundle-static type information and abi_stable library information is stored in the framework (for that it is necessary to implement one or two functions that are only usable internally by the Celix Rust API).

When another bundle now wants to use the service it must statically link against the API crate as well. When using the service it is checked that the type information and abi_stable library information are compatible.

Additional opportunities It would be great if we could use Rust services from C and C++. For this the Service API crate must export some C/C++ header files. Luckily there is a crate that already does this: cglue. On top of that it is compatible and works together with abi_stable. This makes it possible to easily create services that are usable from C, C++ and Rust and provide additional guaranties.

Nice, I will have a deeper look into this. Currently when combining C and C++, it possible to use C service in C++ but not the other way around. If rust service can be more flexible that would be great.

@pnoltes
Copy link
Contributor Author

pnoltes commented May 14, 2024

Final hurdles I would like to rewrite the POC in the next step to make sure all of this works. However there is one problem I could not solve yet: What if a service of a bundle provides a 'static reference in its interface? The reference could point to 'static data of the bundle. After unloading the providing bundle this reference would be invalid. There should be a way to constrain the lifetimes to the lifetime of the service use callback. However I do not know how yet. Maybe there is a way by checking the service API with an additional proc macro.

Indeed. The integral dynamic nature of dynamic services in Apache Celix is that they can come and go and users should be protected against misuse as much as possible. For C / C++ this is arranged by providing a "use service" API based on callbacks so that the framework can ensure services are kept "alive" during callbacks. But it can also be done by service trackers combined with set/add/remove callbacks and in that case the user is responsible for correctly using the callbacks and protection service usage.

I agree that ideally in Rust this should be done in a safer way, but lack to rust experience to know how or even if this is feasible.

Summary Again a lot of text. I hope you are fine with documenting my thoughts and progress here. There are a lot of topics that need to be considered. At the moment Rust is not perfectly well suited for dynamic linking. Its mostly unstable ABI makes it necessary to heavily use the macro system to provide a safe interface. If you want to have all of the Rust safety guaranties as well, a lot of possibilites need to be considered. But I still think it is possible and I will continue working on it.

No worries, I find very interesting to read up your thoughts, ideas and POCs. IMO communication through PR comments are ok :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants