Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop/create SQLite columns when changing column type to something not convertible #22598

Open
Tracked by #22950 ...
bairog opened this issue Sep 18, 2020 · 19 comments
Open
Tracked by #22950 ...

Comments

@bairog
Copy link

bairog commented Sep 18, 2020

Hello.
I have simple class library project with a EF Core 5.0-rc1 DbContext that targets .NET 5.0-preview7 (I'm using .NET 5-preview7 because according to info on download page it is the latest one that supports Visual Studio 2019 v16.7 Release. I use only stable IDE builds in my project and cannot use preview IDE versions).
I've have following entites

public class User
    {
        public Guid Id { get; set; }
        public String Name { get; set; }
        public Int32 Age { get; set; }

        public virtual IList<Post> Posts { get; set; }
    }

    public class Post
    {
        public Guid Id { get; set; }
        public String Text { get; set; }
        public DateTime PostDate { get; set; }
        public Guid UserId { get; set; }
        public virtual User User { get; set; }
    }

Then I've added first migration via dotnet ef migrations add InitialCreate command, created test database and populated some data.
After that I've changed Post.Id data type from Guid to Int32 and created second migration.

According to this SQLite migrations table rebuilds are now available since EF Core 5.0.0-preview8.
Also according to docs AlterColumn command should work for SQLite via table rebuild.

But running context.Database.Migrate(); throws an exception

SQLite Error 20: 'datatype mismatch'.'

Steps to reproduce

  1. Download test project - https://yadi.sk/d/iZiJnlidwyNMlg (test database test.db with some data is already placed in program output folder).
  2. Compile and run Test console application - context.Database.Migrate(); throws an exception

Exception StackTrace:

   at Microsoft.Data.Sqlite.SqliteException.ThrowExceptionForRC(Int32 rc, sqlite3 db)
   at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(CommandBehavior behavior)
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader()
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteNonQuery()
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteNonQuery(RelationalCommandParameterObject parameterObject)
   at Microsoft.EntityFrameworkCore.Migrations.MigrationCommand.ExecuteNonQuery(IRelationalConnection connection, IReadOnlyDictionary`2 parameterValues)
   at Microsoft.EntityFrameworkCore.Migrations.Internal.MigrationCommandExecutor.ExecuteNonQuery(IEnumerable`1 migrationCommands, IRelationalConnection connection)
   at Microsoft.EntityFrameworkCore.Migrations.Internal.Migrator.Migrate(String targetMigration)
   at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.Migrate(DatabaseFacade databaseFacade)
   at Test.Program.Main(String[] args) in D:\Test\Test\Program.cs:line 14

Further technical details

EF Core version: 5.0-rc1
Database provider: Microsoft.EntityFrameworkCore.Sqlite 5.0-rc1
Target framework: .NET 5.0-preview7
Operating system: Windows 10 2004 x64
IDE: Visual Studio 2019 16.7.3 Professional

@bairog bairog changed the title "SQLite Error 20: 'datatype mismatch'.'" error while peforming migration with SQLite table rebuild in .NET Core 5.0-preview7 project "SQLite Error 20: 'datatype mismatch'.'" error while peforming migration with SQLite table rebuild in .NET 5.0-preview7 project Sep 18, 2020
@ajcvickers
Copy link
Member

/cc @bricelam

See also #22593

@bricelam
Copy link
Contributor

bricelam commented Sep 18, 2020

From Result and Error Codes in the SQLite docs:

The rowid of a table must be an integer. Attempt to set the rowid to anything other than an integer results in an SQLITE_MISMATCH error.

You'll need to use a DropColumn with an AddColumn operation instead.

This might require something like #15586 to improve. Note, you can easily hit this issue on SQL Server and other providers when altering to an incompatible type.

@bairog
Copy link
Author

bairog commented Sep 19, 2020

@bricelam My migration 20200918104754_PostIdToInt32.cs was generated by dotnet ef migrations add PostIdToInt32 command:

public partial class PostIdToInt32 : Migration
    {
        protected override void Up(MigrationBuilder migrationBuilder)
        {
            migrationBuilder.AlterColumn<long>(
                name: "Id",
                table: "Posts",
                type: "INTEGER",
                nullable: false,
                oldClrType: typeof(Guid),
                oldType: "TEXT")
                .Annotation("Sqlite:Autoincrement", true);
        }

        protected override void Down(MigrationBuilder migrationBuilder)
        {
            migrationBuilder.AlterColumn<Guid>(
                name: "Id",
                table: "Posts",
                type: "TEXT",
                nullable: false,
                oldClrType: typeof(long),
                oldType: "INTEGER")
                .OldAnnotation("Sqlite:Autoincrement", true);
        }
    }

And it is not working out-of-the-box. I was thinking that SQLite table rebuilds is an automatic way to update table schema and data in that table. I was expecting that simply adding migration will do all the work.
That is not true? I should still need to write prorer migrations for schema (and also convert data by myself)?
But how does it differ from what it was before introducing SQLite table rebuilds?

@roji
Copy link
Member

roji commented Sep 19, 2020

@bairog table rebuilds works automatically with most changes, such as adding or removing a column. However, changing a column's type is quite different, since data conversion has to occur, and how exactly that should be done can depend on the user. For example, if you want to convert a text column to a binary one, which encoding should be used? Also, such conversions can simply fail, depending on the data you have. For example, the down migration above converts a TEXT column to an INTEGER, but what happens if there is non-numeric text data in the column?

For these reasons, data type conversions may require user intervention, and can't always be done automatically for you. As @bricelam mentioned above, this has nothing to do with table rebuilds or SQLite - other databases will error in the same way when you try to alter a column's type, and require you to specify a conversion method (see npgsql/efcore.pg#144 for a similar discussion on PostgreSQL).

@bairog
Copy link
Author

bairog commented Sep 20, 2020

@roji @bricelam Yes, I understand that automatic data type convertion is not always possible for "ordiary" columns. But Primary Key/Foreign Key columns are different: they stand only to connect tables and data in such columns has no physical meaning. PK Guid 'aaaaaaaa-0000-aaaa-0000-000000000000' itself means nothing - it only connects to FK(s) with same value. So regarding my case (changing PK type from Guid to Int32) conversion can be the following: first row Guid (for example 'aaaaaaaa-0000-aaaa-0000-000000000000') -> 0 (and all FK(s) shul be also 0); second row Guid (for example 'bbbbbbbb-0000-bbbb-0000-000000000123') - > 1 (nd all FK(s) shul be also 1), etc.. Actually Int32 can have any value and just keep connections between tables. But from my point of view it's better simply to start from 0 and perform auto-increment.
I think that auto-convertion is reasonable to implement. If it is needed I can create a feature suggestion somewhere.

P.S. From my point of view even some data convertions for "ordinary" columns can be done automatically: at least Boolean -> Int, Int -> Double, Int -> String and Double -> String, maybe even more. For me that auto-convertion is also reasonable to implement (but PK convertions has highest proirity).

Thank you in advance.

@roji
Copy link
Member

roji commented Sep 20, 2020

But Primary Key/Foreign Key columns are different: they stand only to connect tables and data in such columns has no physical meaning. PK Guid 'aaaaaaaa-0000-aaaa-0000-000000000000' itself means nothing - it only connects to FK(s) with same value.

That may be true for your specific case, but definitely not in many others. PK (and therefore FK) values aren't always meaningless (or generated), in many cases they're natural keys, and therefore have just as much meaning as any other field. Even a GUID key could be identifying the row in some other database or system, and in that sense it could be "meaningful". Transparently changing all GUIDs to arbitrary numbers could work in your specific case, but could be extremely destructive to another database. In that sense, it makes sense for users to edit their migrations and specify whatever conversion strategy fits their specific case.

P.S. From my point of view even some data convertions for "ordinary" columns can be done automatically: at least Boolean -> Int, Int -> Double, Int -> String and Double -> String, maybe even more. For me that auto-convertion is also reasonable to implement (but PK convertions has highest proirity).

For Boolean->Int, I'm not sure what value true should map to: 1? -1? I don't think it's right for an EF provider to make an implicit decision on an important data migration question.

Regardless, remember that EF is also responsible for generating Down migrations for each Up migration. While int->string indeed works, the opposite will fail if there are non-numeric values. This isn't an absolute argument - it may be OK to generate an Up migration that always works and preserves values for going back (e.g. int->string), while allowing for the Down migration to not be failsafe. But it does seem better to make the user aware of the potentially problematic migration by requiring their attention.

@bairog
Copy link
Author

bairog commented Sep 21, 2020

Transparently changing all GUIDs to arbitrary numbers could work in your specific case, but could be extremely destructive to another database. In that sense, it makes sense for users to edit their migrations and specify whatever conversion strategy fits their specific case.

Ok, but why not to create automatic migrations and show warning that they are potentially not failsafe (user will decide whether he should write his own migrations or not)? In sutiations like mine (when PK has no meaning) it will be just working out-of-the-box.

For Boolean->Int, I'm not sure what value true should map to: 1? -1? I don't think it's right for an EF provider to make an implicit decision on an important data migration question.

From my point of view it's reasonable for EF provider to work like System.Convert class is working: Convert.ToInt32(true) = 1; Convert.ToInt32(16.0) = 16, etc. Again, if it is not suitable for particaular user - he will write his own convertions.

This isn't an absolute argument - it may be OK to generate an Up migration that always works and preserves values for going back (e.g. int->string), while allowing for the Down migration to not be failsafe. But it does seem better to make the user aware of the potentially problematic migration by requiring their attention.

You are right, it's better to make the user aware of the potentially problematic migration. EF provider should create automatic migrations and show warning in case they are potentially not failsafe. User will make it's own decision - is it correct (he will use automatically created migrations) or not (and he will write his own migrations) in his particular case. That will serve rapid application development a lot.

P. S. I will investigate how other SQLite EF providers (Devart.Data.SQLite, System.Data.SQLite, etc.) are working in that situatioon and will report you. But I've already told you my point of view: it is better to have automatic migrations (with a warning in case they are potentially not failsafe) than not to have them at all.

@bairog
Copy link
Author

bairog commented Sep 21, 2020

P. P. S. In attached sample project I've tried to comment database migration

//if (context.Database.GetPendingMigrations().Any())
         //context.Database.Migrate();

and SQLite EF provider succesfully returned posts (var posts = context.Posts.Include(p => p.User).AsNoTracking().ToList();) despite of that facts that Post.Id is Int32 in model now. Guid 24CE8864-69BD-4246-B8C4-72EB15811603 (that is stored by SQLite as TEXT) converts to 24, 33585D22-FD60-48B7-86F9-9CAE0ADF1140 - to 33585, C9B75A44-BE8D-4FFE-9104-4A84C8EB5558 - to 0, etc. Not sure is that by design or is a bug (no warning is shown that EF model doesn't corresponds to database schema).

@roji
Copy link
Member

roji commented Sep 21, 2020

Ok, but why not to create automatic migrations and show warning that they are potentially not failsafe (user will decide whether he should write his own migrations or not)? In sutiations like mine (when PK has no meaning) it will be just working out-of-the-box.

Because it seems like a very bad idea to perform such a potentially destructive operation automatically, with just a warning that many will disregard. We can't just implicitly turn GUIDs into random numbers even if it works in your specific case.

I will investigate how other SQLite EF providers (Devart.Data.SQLite, System.Data.SQLite, etc.) are working in that situatioon and will report you

Unless I'm mistaken, these are ADO.NET providers rather than EF providers, so they're not really relevant to this question. Devart may have an EF Core provider, I'm not sure.

it is better to have automatic migrations (with a warning in case they are potentially not failsafe) than not to have them at all.

I disagree. It's true that in some cases EF Core does scaffold potentially destructive migrations and emits a warning - as you suggest; this happens, for example, for dropping a column. The difference is that in that case the user has removed a property from the model, and so it's reasonable to expect that they're aware and conscious of what they're doing; the model change itself already expresses the destructive action. This is not the case when changing a GUID to a number.

To turn the question around, what is the disadvantage of requiring the user to manually edit the migrations? Yes, it is a bit more manual work for the user - for the relatively rare cases of incompatible column type changes - but that's the only downside. Doing it automatically, on the other hand, is extremely dangerous in that it can cause unintentional data loss - isn't the choice clear here?

Guid 24CE8864-69BD-4246-B8C4-72EB15811603 (that is stored by SQLite as TEXT) converts to 24

Even if Sqlite implicitly converts TEXT to INT when reading, that's very different from (implicitly) changing the data on-disk, and irrevocably losing all your GUIDs.

@bairog
Copy link
Author

bairog commented Sep 21, 2020

Unless I'm mistaken, these are ADO.NET providers rather than EF providers, so they're not really relevant to this question. Devart may have an EF Core provider, I'm not sure.

My mistake - packages with EF providers are Devart.Data.SQLite.EFCore and System.Data.SQLite.EF. I will investigate soon.

To turn the question around, what is the disadvantage of requiring the user to manually edit the migrations? Yes, it is a bit more manual work for the user - for the relatively rare cases of incompatible column type changes - but that's the only downside.

For me that "a bit more manual work" is a big disadvantage. I've made a mistake in my SQLite database with 25+ tables - PKs are GUIDs instead if Int32. SQLite doesn't have native datatype for GUID and therefore performs badly in that case, especially in my 250Gb+ database. So writing all Up/Down migrations manually is a real pain, just typing donet ef migrations add GuidToInt32 command is of cource more convenient and fast way.

Doing it automatically, on the other hand, is extremely dangerous in that it can cause unintentional data loss - isn't the choice clear here?

The choice is not clear for me. A perform automatic database backups before every migration operations (potentially destructive or not). Moreover data in database may become corrupted not only after migration but for any reason (i. e. filesystem or network failure), isn't it what backups intended for?

@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 21, 2020

System.Data.SQLite is an EF6 provider, not EF Core, and it does not support migrations.

@bairog
Copy link
Author

bairog commented Sep 21, 2020

System.Data.SQLite is an EF6 provider, not EF Core, and it does not support migrations.

Yes, I see it now. System.Data.SQLite for now only have EF6 provider and doesn't support EF Core of any version. Devart.Data.SQLite.EFCore doesn't support EF Core 5.0 for a moment. I will wait for some time..

@roji
Copy link
Member

roji commented Sep 21, 2020

For me that "a bit more manual work" is a big disadvantage.

Not as much as losing all your GUID values, I'd say. On one side is an inconvenience only when altering column types, on the other is unexpected data loss.

Moreover data in database may become corrupted not only after migration but for any reason (i. e. filesystem or network failure), isn't it what backups intended for?

That's weird logic... The fact that filesystem corruption can happen isn't a reason for EF Core to intentionally and implicitly perform destructive data operations that aren't clearly the user's intent. I'd also perform backups before migrations - and carefully inspect all my migration SQL scripts - but many users don't.

@ajcvickers ajcvickers changed the title "SQLite Error 20: 'datatype mismatch'.'" error while peforming migration with SQLite table rebuild in .NET 5.0-preview7 project Drop/create SQLite columns when changing column type to something not convertible Sep 21, 2020
@ajcvickers
Copy link
Member

Notes from triage:

  • We're not going to attempt anything too magical here; in particular no auto-conversion between non-convertible types or processing of FKs, etc.
  • With the support of Query: TypeCompatibility chart for inference #15586, we may be able to do drop/remove columns, with the normal warning for data loss.

Putting on the backlog for this.

@bairog
Copy link
Author

bairog commented Sep 22, 2020

@ajcvickers Just to make it clear:

  1. In my described situation (changing PK data type from Guid to Int32) Post table will be dropped and recreated with new schema? All table data will be lost and FKs will be DELETE CASCADE/SET NULL regarding what was set for PK in database?
  2. Looks like I'm completely confused about SQLite table rebuilds, they work differently than I expected. Could you provide a couple of examples (or documentation) how something was working before introducing SQLite table rebuilds and how it is working now? Thank you.

@roji
Copy link
Member

roji commented Sep 22, 2020

@bairog in a nutshell, table rebuilds work by creating a new table with the new schema, copying all the data across from the old table to the new, and then swapping the tables (while temporarily suspending foreign key enforcing). No data gets lost.

One thing you may be missing, is that altering column types is only a very specific change covered by rebuilds - but there are many others where type conversion problems aren't relevant (adding a column, removing a column...).

Hope that clarifies things. Try using version 5.0.0-rc1 of the provider, playing around and seeing what happens.

@ajcvickers
Copy link
Member

@bairog

  1. Correct. It's like any other data loss scenario with Migrations.
  2. https://docs.microsoft.com/en-us/ef/core/what-is-new/ef-core-5.0/whatsnew#migrations-rebuild-sqlite-tables

@bairog
Copy link
Author

bairog commented Sep 23, 2020

@ajcvickers @roji Let's make my task a little more complicated.
So I have the following entites:

public class User
    {
        public Guid Id { get; set; }
        public String Name { get; set; }
        public Int32 Age { get; set; }

        public virtual IList<Post> Posts { get; set; }
    }

    public class Post
    {
        public Guid Id { get; set; }
        public String Text { get; set; }
        public DateTime PostDate { get; set; }
        public Guid UserId { get; set; }
        public virtual User User { get; set; }
    }

And I need to change User.Id, Post.Id and Post.UserId from Guid to Int32. According to this the correct way is to
Create new table->Copy data->Drop old table->Rename new into old.

I've tried to write the following migration (a big code block inside that spoiler):
protected override void Up(MigrationBuilder migrationBuilder)
        {
            //Creating new table for Users with both Id (Int32) and Id_Guid (Guid)
            migrationBuilder.CreateTable(
                name: "Users_NEW",
                columns: table => new
                {
                    Id = table.Column<int>(type: "INTEGER", nullable: false)
                        .Annotation("Sqlite:Autoincrement", true),
                    Id_Guid = table.Column<Guid>(type: "TEXT", nullable: false),
                    Name = table.Column<string>(type: "TEXT", nullable: true),
                    Age = table.Column<int>(type: "INTEGER", nullable: false)
                },
                constraints: table =>
                {
                    table.PrimaryKey("PK_Users", x => x.Id);
                });

            //Filling data into Users_NEW from old Users table (Id is auto-incremented, Id_Guid is taken from old table - Users.Id)
            migrationBuilder.Sql(@"INSERT INTO [Users_NEW]([Id_Guid], [Name], [Age]) SELECT [Id], [Name], [Age] FROM Users");


            //Creating new table for Posts with both UserId (Int32) and UserId_Guid (Guid)
            migrationBuilder.CreateTable(
                name: "Posts_NEW",
                columns: table => new
                {
                    Id = table.Column<int>(type: "INTEGER", nullable: false)
                        .Annotation("Sqlite:Autoincrement", true),
                    Text = table.Column<string>(type: "TEXT", nullable: true),
                    PostDate = table.Column<DateTime>(type: "TEXT", nullable: false),
                    UserId_Guid = table.Column<Guid>(type: "TEXT", nullable: false),
                    UserId = table.Column<int>(type: "INTEGER", nullable: false, defaultValue: 0)
                },
                constraints: table =>
                {
                    table.PrimaryKey("PK_Posts", x => x.Id);                    
                });

            //Filling data into Posts_NEW from old Posts table (UserId is set to 0 for ALL rows, UserId_Guid is taken from old table - Posts.UserId)
            migrationBuilder.Sql(@"INSERT INTO [Posts_NEW]([Text], [PostDate], [UserId_Guid]) SELECT [Text], [PostDate], [UserId] FROM Posts");


            //Updating Posts_NEW.UserId (Int32) with correct values from Users.Id
            migrationBuilder.Sql(@"UPDATE [Posts_NEW] SET [UserId] = (SELECT [Id] FROM [Users_NEW] WHERE [Users_NEW].[Id_Guid]=[Posts_NEW].[UserId_Guid])");


            //Drop old Users table, remove unnecessary Users_NEW.Id_Guid column and renaming Users_NEW to Users
            migrationBuilder.DropTable(name: "Users");
            migrationBuilder.DropColumn(table: "Users_NEW", name: "Id_Guid");
            migrationBuilder.RenameTable(name: "Users_NEW", newName: "Users");

            //Drop old Posts table, remove unnecessary Posts_NEW.UserId_Guid column and renaming Posts_NEW to Posts
            migrationBuilder.DropTable(name: "Posts");
            migrationBuilder.DropColumn(table: "Posts_NEW", name: "UserId_Guid");
            migrationBuilder.RenameTable(name: "Posts_NEW", newName: "Posts");

            //Now Posts_NEW.UserId (Int32) column is filled - and we can add a Foreign Key constraint
            migrationBuilder.AddForeignKey(table: "Posts",
                        name: "FK_Posts_Users_UserId",
                        column: "UserId",
                        principalTable: "Users",
                        principalColumn: "Id",
                        onDelete: ReferentialAction.Cascade);

            //creating index for FK Posts.UserId - IS THAT NECESSSARY OR NOT FOR A UserId(Int32) COLUMN???
            migrationBuilder.CreateIndex(
                name: "IX_Posts_UserId",
                table: "Posts",
                column: "UserId");
        }

Looks like migration is working now and data is obtained correctly from updated database.
Am I going the right way? As you can see in my migrations I convert Guid PKs simply by auto-incrementing Int32 PKs.
But I need to preserve original Guid values in extra column for some time to fill FK with correct values of Int32 PKs (and delete that excess column aftewards). Is there an easier way to do that?
I've uploaded my project here. Thank you.

@bairog
Copy link
Author

bairog commented Sep 25, 2020

@ajcvickers @roji Am I going the right way? As you can see in my migrations I convert Guid PKs simply by auto-incrementing Int32 PKs.
But I need to preserve original Guid values in extra column for some time to fill FK with correct values of Int32 PKs (and delete that excess column aftewards). Is there an easier way to do that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants