Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update entity generation to preserve customisations #1931

Open
dave42w opened this issue Oct 24, 2023 · 12 comments
Open

Update entity generation to preserve customisations #1931

dave42w opened this issue Oct 24, 2023 · 12 comments

Comments

@dave42w
Copy link

dave42w commented Oct 24, 2023

Following a helpful discussion on Discord I would like to code an upgrade to sea-orm-cli generate entity, if the project is willing to merge a pull request.

My idea is that when an entity is regenerated the custom sections of code that we have written should not be lost.

My experience, so far, is only with modifying impl ActiveModelBehavior for ActiveModel {} but I imagine there will be other areas, such as Relations, where we will want modifications to be kept when the entity is regenerated.

I am not proposing adding great intelligence, so it will not remove the need for manual changes if the database schema changes. However, it would save manual work at present without enforcing fixed conventions around how the code base is organised.

I would modify Sea-orm-cli generate to first read the pre-existing entities looking for specific comments used to mark sections to be preserved. These would be kept and after the new entity has been written these preserved sections would be inserted to replace the marked sections.

As a fail safe preserved code would be appended if the correct place can't be found (and written to a new file if the original entity file has disappeared (renamed table for example).

So for example standard entity currently ends with:

impl ActiveModelBehavior for ActiveModel {}

To preserve changes a customised version of this would be marked by specific comments like this

// ** modified from generated code start ** impl ActiveModelBehavior for ActiveModel {}

use chrono::Utc;
use sea_orm::{prelude::async_trait::async_trait, ActiveValue};

#[async_trait]
impl ActiveModelBehavior for ActiveModel {
    async fn before_save<C: ConnectionTrait>(
        mut self,
        _db: &C,
        insert: bool,
    ) -> Result<Self, DbErr> {
        if insert {
            self.id = ActiveValue::NotSet;
            self.created_timestamp = ActiveValue::Set(Utc::now().to_string());
        };
        self.updated_timestamp = ActiveValue::Set(Utc::now().to_string());
        Ok(self)
    }
}

#[derive(DeriveIntoActiveModel, Serialize, Deserialize)]
pub struct BasicSong {
    pub id: i32,
    pub collection_id: i32,
    pub number: i32,
    pub title: String,
}

// ** modified from generated code end ** impl ActiveModelBehavior for ActiveModel {}

I am thinking that if the code after the start and end comment markers is the same then we are replacing a single line. If they differ then we start inserting our preserved code at the first and the look for the first occurrence of the end code.

eg to replace a single line with a section of code to be preserved

// ** modified from generated code start ** impl ActiveModelBehavior for ActiveModel {}
...
...
...
// ** modified from generated code end ** impl ActiveModelBehavior for ActiveModel {}

To replace multiple lines then this:

// ** modified from generated code start ** pub enum Relation {
my version of pub enum Relation {
...
...
my end }
// ** modified from generated code start ** }

Given my current rust experience I am thinking of a brute force approach.

  1. Scan all entity source files, collect all the sections to be preserved.
  2. Let generation happen as now
  3. Go back through the entities and add back the preserved text. If the right place can't be found (maybe the entity generation has been upgraded) then add the preserved text after a comment at the end of the entity file.
  4. If any entity sources with preserved text have disappeared (maybe the table has been renamed) then write the preserved text to the original source filename with a comment

If this isn't wanted in the sea-orm codebase then I can always write it as stand alone utilities (entity-store-preserved-code and entity-restore-preserved-code).

@tyt2y3
Copy link
Member

tyt2y3 commented Oct 25, 2023

Thank you for your suggestion, a collaboration is of course welcome.

I always have such an idea in my head, but I know it would take a lot of effort to properly implement.

I think the general idea might be:

  1. reconstruct the old, clean state. This could be reading the existing entity file, extract the table schema, and re-generate a clean entity file
  2. generate a new entity file, and compare it with the output of 1, to generate a diff
  3. apply the diff to the existing entity file

I think your idea is the other way, i.e. identify the diff that the users themselves made, and re-apply the diff after entity generation.

I think it all comes down to how can we get back to the 'clean state' with the least caveat.

@dave42w
Copy link
Author

dave42w commented Oct 25, 2023

Hi,
Thanks.

I can see good things about the diff approach. I think diesel does something like that.

With step 1 I think the safest is each time the entity is written, a backup entity is immediately created to be kept as the clean version. Otherwise there is the risk that the schema has been updated without the entity being regenerated so it is no longer possible to recreate the clean entity.

A challenge will be to ensure the clean entity backup is properly kept by the developer as it is unusual and they might not put it in their version control.

I don't think I have a very full picture of all the sorts of customisations people will need, so far mine are all very simple at the bottom of the entity source.

Dave

@tyt2y3
Copy link
Member

tyt2y3 commented Oct 25, 2023

Otherwise there is the risk that the schema has been updated without the entity being regenerated so it is no longer possible to recreate the clean entity.

I think it would not cause a problem. Because the clean state is for knowing "how's the schema been changed since last entity generation? Oh added two columns. Now we add those.".

Or alternatively, to know "what changes did the user made?"

I can name a few off my head:

  1. custom derives. e.g. in addition to serde::Serialize
  2. custom column attributes, e.g. #[sea_orm(column_type = "..", select_as = "..")]
  3. custom impl blocks (which are easy to extract e.g. impl Entity)
  4. doc comments (///) on struct, or attributes
  5. impl ActiveModelBehaviour

But afterall, we can start from whatever works for you! We can have the above discussions in mind and baked into the design so that we can eventually get there.

I imagine the flow can be:

  1. use syn to parse the entity file and extract the custom impl blocks
  2. store those custom impl blocks somewhere in the codegen context
  3. output those impl blocks during codegen

What I want to avoid, though, are custom marker comments. I'd hope it could be smart enough to classify a block as "custom"?

Actually, we may not need syn for token level parsing at all. May be "split by paragraph" and some regex would do.

Would be happy to see an initial implementation.

@dave42w
Copy link
Author

dave42w commented Oct 25, 2023

Hi,

I think a good next step would be to collect a reference set of entities with as many different customisations as possible. We can use these as part of a test suite.

Dave

@tyt2y3
Copy link
Member

tyt2y3 commented Oct 25, 2023

May be we can find some from real world projects in https://github.com/SeaQL/sea-orm/blob/master/COMMUNITY.md ?

@dave42w
Copy link
Author

dave42w commented Oct 26, 2023

May be we can find some from real world projects in https://github.com/SeaQL/sea-orm/blob/master/COMMUNITY.md ?

Unfortunately we will only see the customized entities, we don't want to have to build them all to generate pristine entities. Plus of course we only see the open source projects.

So instead, I suggest putting out a call to developers to submit examples of customizations showing the before and after. The motivation is that if they submit, then their style of customization gets supported.

What I want to avoid, though, are custom marker comments. I'd hope it could be smart enough to classify a block as "custom"?

H'mm, this does seem challenging. For example I currently customize from

impl ActiveModelBehavior for ActiveModel {}

to

use chrono::Utc;
use sea_orm::{prelude::async_trait::async_trait, ActiveValue};

#[async_trait]
impl ActiveModelBehavior for ActiveModel {
...
    pub title: String,
}

So I have 4 lines before the line that I am actually customizing and the generated section that precedes it varies from entity to entity.

It seems to me that to get started comment markers are a helpful starting point that gets us started and gets useful functionality at low cost. I have no problem having no comment markers as a long term goal, however, I suspect it fulls very much until the last 20% of features taking 80% of the effort. Probably not something that I'm interested in coding myself.

Currently I'm thinking of adding two cli

  1. ExtractCustomEntity (source directory only option) for each entity source file creates source.rs.pristine and source.rs.custom

  2. ApplyCustomEntity (source directory only option) for each entity source file creates a copy source.rs.precustom and then merges source.rs.custom into source.rs there remains the possibilities that if

a) the match code in source.rs.custom does not exist in source.rs then it will be due to either a seaorm change or a schema change, having source.rs.pristine available will make it easier to track difficult edge cases.

b) in the situation where a table has been renamed in the schema then source.rs.pristine will give the full context making it easier to find the new source file for the customization to be applied manually.

testing can include checking for a perfect round trip (ExtractCustomEntity on source.rs should create source.rs.pristine and source.rs.custom then ApplyCustomEntity should exactly recreate the original source.rs from source.rs.pristine and source.rs.custom with zero differrences. We can also run generate and compare the source.rs with source.rs.precustom expecting zero changes (except between schema changes or seaorm versions).

Later these can be called by generate before and after it's current work so the use becomes transparent.

@waynr
Copy link

waynr commented Oct 29, 2023

Just gonna drop a link to my proposed alternative to this from the discussion: #1932 (comment)

Basically, impl blocks for a given type don't need to be in the same module as the type itself, just in the same crate. I haven't spent enough time with generated entities yet but plan to start doing so soon -- i imagine the only potential blocker for my suggestion would be the lack of pub(crate) or pub visibility modifiers on generated entity fields and methods.

@dave42w
Copy link
Author

dave42w commented Oct 30, 2023

@waynr we had a chat about this on Discord. My understanding is that this is great for some parts but if replacing

impl ActiveModelBehavior for ActiveModel {}

Then this line still needs to be commented out in the generated entity.

So I'd prefer a solution that covers all bases.

@waynr
Copy link

waynr commented Oct 30, 2023

Yeah, it just seems like what's being proposed here is super complicated for what you're trying to achieve. Wouldn't it be simpler to add an option to the sea-orm-cli generate entity subcommand that avoids generating an empty impl block and go with the alternative I proposed? I guess as long as the maintainers are willing to accept that a CLI option like that my own needs would be satisfied.

@waynr
Copy link

waynr commented Oct 30, 2023

I went ahead an opened a PR with my suggestion since that provides a simple path forward for my own use of generated entities.

@dave42w
Copy link
Author

dave42w commented Oct 30, 2023

I think this is a much better solution. We just need the documentation updating. I'm happy to look at that (I'm on holiday this week with poor access).

@jryio
Copy link

jryio commented Dec 25, 2023

Related to this issue is #1069 which I would very much like to see implemented.

In that issue the proposed solution is to use Git to diff and restore the newly generated entity which mangles any customizations to existing entities.

I think either defining custom derivations in a migration or having SeaOrm do the intelligent diffing would be preferred.

Because the recommended workflow form the SeaOrm docs is migration first, it would make sense to encode the custom #[derive(...)] statements in those migrations, generally single source of truth is easiest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants