Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expansion of serialization feature #97

Closed
brianbruggeman opened this issue Feb 1, 2021 · 20 comments
Closed

Expansion of serialization feature #97

brianbruggeman opened this issue Feb 1, 2021 · 20 comments

Comments

@brianbruggeman
Copy link

I have structured data and I'd like a convenient mechansim for keeping that structure while pushing and pulling to and from aerospike. It looks like there may have been some initial work done here, but I'd like to push that bar further.

I'm proposing a trait, AerospikeRecord, that can be applied to an arbitrary Rust structure provided it meets certain criteria. The API would look something like this:

let client = build_aerospike_client(...);  // not included in API...
let record = SomeRecord {...};  // not included in API...

// Check if the record is in aerospike
if exists_record(&record, &client)? != true {

  // Create the record in aerospike
  set_record(&value, &client)?;

  // Retrieve the record from aerospike
  let aerospike_copy: SomeRecord = get_record(&record, &client)?;

  // Update the record in aerospike
  let updated_record = SomeRecord { ... };
  set_record(&updated_record, &client)?;
  
   // Delete the record from aerospike
   remove_record(&updated_record, &client)?;
 
}

The trait would have a variety of different methods to support the API above

pub trait AerospikeRecord: ... {
    /// Builds a stringified version of the record's key
    fn aerospike_record_key(&self) -> String;
   
    /// Builds aerospike bin name for this record
    ///   default: "payload"
    fn aerospike_bin_name(&self) -> String;

    /// Returns the timeout in seconds
    ///   A timeout of 0 indicates that the value will never be culled
    ///   Default:  whatever AEROSPIKE_TIMEOUT has (or 0)
    fn aerospike_timeout(&self) -> u32;

    /// Returns the namespace
    ///   Default:  whatever AEROSPIKE_NAMESPACE has (or "undefined")
    fn aerospike_namespace() -> String;

    /// Returns the set name
    ///   Default:  whatever AEROSPIKE_SET_NAME has (or "undefined")
    fn aerospike_set_name() -> String;

    /// Builds record data for aerospike entry
    ///   Default:  uses data from above
    fn aerospike_as_bins(&self) -> Vec<Bin>;

    /// Builds an aerospike key from record
    ///   Default:  uses data from above
    fn aerospike_key(&self) -> Key;

    /// Builds an aerospike key using parameters
    fn aerospike_build_key(&self, namespace: &str, set_name: &str) -> Key;

    /// Build dyn AerospikeRecord from aerospike::Record
    fn aerospike_from_record(&self, aerospike_record: Record) -> Self;
}

For a default behavior, the only thing that would need to be implemented for a given structure would be the aerospike_record_key. I think this makes sense because the key could be any kind of value within a given record. It might even need to be a composite key made of several values or a calculated set of values. But this allows the maintainer to decide how to build that key explicitly.

struct Foo {
    bar: String
    }
    
impl AerospikeRecord for Foo {
    fn aerospike_record_key(&self) -> String {
        self.bar.clone()
    }
    

Any of the other traits can easily be overwritten:

impl AerospikeRecord for Foo {
    ...
    
    fn aerospike_namespace() -> String {
        let default = String::from("real-time");
        let fall_through = std::env::var("AEROSPIKE_NAMESPACE").unwrap_or(default);
        std::env::var("FOO_AEROSPIKE_NAMESPACE").unwrap_or(fall_through)
    }
    
}

One caveat is that I'd need to make changes to the Bin lifetime. Right now, the way that it's implemented, it's impossible to do this:

    impl Into<Vec<Bin>> for TestRecord {
        fn into(self) -> Vec<Bin> {
            let json_string = serde_json::to_string(&self).unwrap_or_else(|_| String::default());
            vec![as_bin!("key", self.aerospike_record_key()), as_bin!("payload", json_string)]
        }
    }

Bin<'a> requires that the control of the lifetime be outside of its use. As a result, you'd need to carry that lifetime all the way up to the top of the call stack or set it to Bin<'static> and remain for the lifetime of the program. It's hard to know why the lifetimes are there in the first place as they only seem to make the API more complicated than is necessary. As a more general statement, I suspect some of the other lifetime requirements could also be removed but I don't have a use case for them right now. And, I wonder if this may be why the API doesn't get as much usage.

@jhecking
Copy link
Contributor

jhecking commented Feb 2, 2021

Is there any reason why this new higher-level API would have to be part of the aerospike crate itself, and cannot be shipped as a separate crate?

@brianbruggeman
Copy link
Author

brianbruggeman commented Feb 2, 2021

There's no reason why it couldn't be separate, except I needed to make modifications to Bin which cannot be made outside of the current crate. I had initially intended to make it separate, but couldn't due to Bin's lifetime requirements (as noted above).

@jonas32
Copy link
Contributor

jonas32 commented Feb 2, 2021

The only problem i see is that we tried to keep the rust API as close as possible to the other clients (Go/Java/C).
That would be an addition that is not included in any other client.
I like the Idea of being able to use it like that.
The point you made about the over complicated Lifetimes also annoyed me while doing work on the client.
Maybe we should instead look at a way to fix this (without breaking the API).

And, I wonder if this may be why the API doesn't get as much usage.

I guess the main reason for that is because the client was not really actively maintained for a long time. That might also be the reason for the over complicated usage on some parts. It was build with a pretty old rust version and i guess nobody had the time to keep it up to the changes of the rust language.

@brianbruggeman
Copy link
Author

I appreciate the idea of keeping all of the clients in-line. But that said, this is a Rust crate and it should also work Rust's idioms. One of the major things that people use Rust for is (de)serialization performance especially when moving data around. This change makes it much easier for the library user not to have to rebuild that code. I don't think that every language would have something like this because it makes much more sense for Rust itself.

However, I actually kept the changes for the Bin separate (and locally it's in a separate branch), so it's very possible we could just use that instead of the above. I'd be fine with that. I didn't think that it would make sense to present the changes to Bin in isolation without showing why.

I do think that the changes to lifetimes are very much a Rust implementation problem and not something that would be found across all of the implementations of the library. Also, and I lost the reference to this, but the standard library within Rust had a similar problem with lifetimes in at least a couple of APIs and they were dropped because they just don't work when you start to really use them as a library.

@khaf
Copy link
Collaborator

khaf commented Feb 2, 2021

From my perspective, if you can remove the lifetimes without producing a lot of memory allocation/copying (explicit or implicit) in the process, go for it. The logic behind the current design is that years ago when I was prototyping it, I wanted to avoid the String allocations (wanted the &str), and as it turns out, Rust does not make it easy at all. If it breaks the API, so be it. We'll release it as v2. We want to move towards the async API in the Rust client, and IMO this work should happen before that.
I'm against implementing a higher level API in this crate. It is important to keep the client model and idioms compatible with the other clients, due to the complexity of the code and the maintenance cost, especially since the server constantly changes and we need to adapt the clients.
That said, I believe it is reasonable to implement a trait for Serialization of complex arbitrary structs to/from wire protocol to avoid the transport allocations. If you have a good idiomatic Rust design for that, it has a high chance of getting into the library.

@jonas32
Copy link
Contributor

jonas32 commented Feb 2, 2021

We want to move towards the async API in the Rust client, and IMO this work should happen before that.

That would be great. But i dont really see that in this client. The client is sync from its base. So switching to async would again be a big change leaving many traces of the old parts. The client has bigger problems right now as far as i see ( #8 #9 #10 ).
Before introducing new features, it might make more sense to first of all get this up to date with the other clients.
I already thought a lot about doing that changes. Mainly because i need some of them. As you said, you designed it years ago. Rust changed a lot since then. Probably the cleanest and easiest way to reach that named goals would be re-implementing from low level on.
When i started doing big changes, i already played around a little with the low level stuff like network and peers implementation.
The only reason i left that idea was because reversing all the required information from the Go/Java/C Client was a big time factor. With that information available, i guess we could try to rebuild it to the latest Aerospike and Rust standards after it has been abandoned for that long.

@khaf
Copy link
Collaborator

khaf commented Feb 2, 2021

Nothing I said negates what you say. I also didn't say we should convert the client to async, just add async API in addition to the sync API. I don't see how the issues you noted above are design problems. They are just missing features and require work, not a redesign.

@jonas32
Copy link
Contributor

jonas32 commented Feb 2, 2021

Ok then i misunderstood that. I thought you want to switch to async completely.
In that case this makes sense. But a complete switch to async on this codebase probably wont be a good idea.
But even in this case, i think this 3 mentioned issues should be a priority over the async API. I didnt want to point them out as design issues. There is no relation to any client design here. The design of keeping the client low memory is totally fine.
The Idea of a re-implementation was just to get rid of the many outdated or non standard codebase parts that are still left but hard to replace.

@brianbruggeman
Copy link
Author

This is all sort of devolving from my original thought, but ...

Do we have a roadmap of features/reworks for this client? Even having a checkbox of a list of things that need to be done in some sort of order would be helpful for organization. I'm not all that familiar with aerospike in general and the differences between implementations.

Async definitely felt like a hole when I was evaluating (this) Aerospike client. I ended up heavily using Rayon for my needs (which is primarily loading large amounts of batch data).

@khaf
Copy link
Collaborator

khaf commented Feb 2, 2021

There are missing features we can list (corporate wikis for the win) but I don't think that's what this ticket is about. If there are idiomatic Rust features missing in the client, or the client needs minor redesign/clean up work to make it more suitable for users in Rust ecosystem, we'd like to have them as tickets and work them out before adding more features to avoid reworks. While the actual idea expressed in this ticket is beyond the scope of this library, I think the general idea of supporting direct de/serialization in the client is very appealing.

@brianbruggeman
Copy link
Author

brianbruggeman commented Feb 2, 2021

I think the general idea of supporting direct de/serialization in the client is very appealing.

I'm just unclear what this means. I've got a trait above, but you've mentioned that's beyond the scope - so what do you mean by supporting serde more directly?

Also where is the correct place to have more discussion based things?

@khaf
Copy link
Collaborator

khaf commented Feb 2, 2021

Your trait actually performs operations. My idea of serialization would be a trait more or less like this (off the top of my head, don't take it literally. Been a while since I wrote actual Rust code):

pub trait AerospikeSerializer: ... {
    // this would directly write the FieldHeader and data bytes to the command buffer
    fn aerospike_serialize(&self, bins: Vec<String>, buf: &mut Option<&mut Buffer>) -> Result<usize>;
}

pub struct RecordHeader {
    /// Record modification count.
    pub generation: u32,

    /// Date record will expire, in seconds from Jan 01 2010, 00:00:00 UTC.
    pub expiration: u32,
}

pub trait AerospikeDeserializer: ... {
    // this would directly assign values to the struct fields, skipping Record, HashMap and the rest
    fn aerospike_deserialize(&self, bin_name: String, value: Value);

    // this would allow a callback for metadata values; Conversely, RecordHeader could become 
    // a struct and just passed once.
    fn aerospike_deserialize_header(&self, header: &RecordHeader);
}

Now, we can have API like this:

    pub fn get_obj(&self, policy: &ReadPolicy, key: &Key, obj: mut AerospikeDeserializer) -> Result<()>;
    pub fn put_obj(&self, policy: &WritePolicy, key: &Key, bins: Option<Vec<String>>, obj: AerospikeSerializer) -> Result<()>;

No need for Into<Bins> and its allocations anymore. And no need for Record and its internal Hashmap and Values. You can reuse the same allocated struct as many times as you want and save on those allocations too. In some circumstances, this will have massive impact on Scans/Queries and other streaming API as well.

@brianbruggeman
Copy link
Author

brianbruggeman commented Feb 2, 2021

I get that you don't want to overallocate, but as a user of the library, this just forces me (the plural me...users of the aerospike client) to do allocate. And that means that the plural me now has many different implementations of this kind of get/set and likely many of these are non-optimal. It doesn't make sense to me to micro-optimize the client at the expense of the user base.

Rust has Generics and they are fantastic in Traits. Your version still puts all of the onus on actually building Aerospike structures in the user level code (which as noted above are not great for users due to the lifetimes). You can argue that as a user, I should understand the allocations and should want to control them... but ultimately, I want an API that does this:

my_custom_structures.iter().for_each(|my_custom_data| set(&my_custom_data, ...).expect("Struct was not pushed"));

Your proposal is still so low level that I'd need to build the above set.

Likewise, I want something that pulls data out.

let records: Vec<MyStruct> = get(...).expect(...);

Note that MyStruct here is really a custom structure with an undefined Generics Trait that makes it work with get and set.

@khaf
Copy link
Collaborator

khaf commented Feb 3, 2021

I'd like to point out that there is no single me. There are very many different use cases of the Aerospike database. Some use it as the highest performance distributed Key-Value store, some use it as a document store, etc. The optimal API needed for each use case would be different. The only way we could meet the demand is for us to provide a performant client, so it would be possible to build other APIs on top of it. In that case, if a user would want to trade some CPU cycles or memory space for convenience, they could easily do that.
The Rust client was conceived and implemented in a way that would allow very high performance applications with Rust's type system and guarantees. It would be folly for us to waste all that potential for limited and undefined convenience, especially since you could easily build your optimal API on top of it in a different crate.
Note that I'm not hand waving your request away. I'm just simply stating how we perceive our core competence in the market, which in turn drives our engineering decisions. If we can combine your requests and our own goals, we'll definitely do that.
At the moment, I think your main request in this ticket is to remove the Bin lifetime. Is that correct?
@jhecking @jonas32 Do you see any downsides to this change in the library? Are there other examples of the same troublesome lifetime in the library that would benefit from the same change?

@jonas32
Copy link
Contributor

jonas32 commented Feb 3, 2021

I never really had problems related to the Bin lifetime before. Thats mainly because i never really had to work with that directly.
Im not sure how many parts of the client would be affected by changing that.
There should probably not be any problems with that in general. We would have to do it to see how it affects memory/performance. It might be a memory factor, but i guess it would be good for the client to trade that memory for usability at some points.

@brianbruggeman
Copy link
Author

brianbruggeman commented Feb 3, 2021

I understand the motivation to be performant, and to reduce memory allocations. I worked on embedded systems for over a decade and a half, and a full decade of that was flight controls for military aircraft. So I really do understand constraints especially related to memory and performance.

I equally think that engineering decisions should also be data driven. Before I posted my original ask, I did my homework on performance. There's no notable change in actual performance on my local machine by removing the bin lifetime requirement (which uses a String instead of a &'a str). Memory allocation may be a different story; the numbers will be different on platform and based on the memory allocator used. It's harder to test memory profiling on a Mac, and Rust doesn't make it as easy as adding another benchmark. But if your concern was wall clock time, then I don't think there's a problem.

@jhecking
Copy link
Contributor

jhecking commented Feb 4, 2021

Are there other examples of the same troublesome lifetime in the library that would benefit from the same change?

That's one question I had as well. @brianbruggeman you mentioned this earlier as well:

As a more general statement, I suspect some of the other lifetime requirements could also be removed but I don't have a use case for them right now.

If we do remove the lifetimes on Bin we should probably review the rest of the API for onerous lifetimes as well. Since this would be a breaking API change, would be best to do it all in one go.

I'm generally not opposed to making these changes, esp. if benchmarking shows no significant performance impact. @brianbruggeman how did you test the performance impact of your changes? Using the bundled client benchmark?

@brianbruggeman
Copy link
Author

brianbruggeman commented Feb 8, 2021

@jhecking I did.

With the Makefile change I made, it's also included as a make option.

I didn't look very closely at the benchmarking, but it's pretty slim. I suspect the benchmarking may also need to be expanded for a better comparison.

@brianbruggeman
Copy link
Author

I realize that there's a small team, but it feels like this project is pretty defunct in terms of being able to move it forward from a community perspective. If there's a lack of adoption and requests on the Rust side, I suspect this is at least partially why. At this point, I need to get things done on my side, and I can't really wait for navigating through some sort of arbitrary political minefield. So, I'm just going to state - hey I tried to play ball, but I'm forking and moving on.

@khaf
Copy link
Collaborator

khaf commented Feb 23, 2021

I think we were very clear as to what we would consider changing, and were waiting for you to come back with the life times you'd like to see removed from the library. The trait is something that you'd have to implement by yourself after we accommodated your requests. Since this was a breaking change, we wanted to have a healthy debate before the change.
What part of this process do you find unreasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants