Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync to new device is slow #523

Closed
VagueGit opened this issue Mar 30, 2021 · 28 comments
Closed

Sync to new device is slow #523

VagueGit opened this issue Mar 30, 2021 · 28 comments

Comments

@VagueGit
Copy link
Contributor

Testing with a ~500mb SQLite database, legacy data is upgraded to SQLite. The SQLite db is synced to SQL Server using DMS over https.

At the end of the initial sync a snapshot is created on the server.

We find the initial sync of server data back to another device is quite slow. DMS takes around 1 hour 15mins to sync that server data to a new SQLite database.

The app we have in production, not using DMS, takes 15-20mins to sync to a new device with the same test data.

We test on a very fast internet connection. Most of our customers have slower, less reliable internet.

Download snapshot is not resilient to connection interruption #470 introduced retry logic. That is most appreciated. But if we do go live with DMS, our customers will notice how much slower it is to add our app to a new device.

The issue does not seem to be related to indexes as the snapshot is already created. The server we use for testing now appears to have sufficient resources. We use batch size=2000 as I assume a smaller batch size is more resilient.

Is there anything else we can do, or DMS can do to reduce the time DMS takes to sync to a new device?

@Mimetis
Copy link
Owner

Mimetis commented Mar 30, 2021

You're right
We have a regression on the SQLite Upsert method.

We resolved this issue a while ago after some deep tests, concluding that using With in a sqlite statement is really slow when triggers are set.
See the full conversation here : #314

So far, we changed the With () with a simple Insert or Replace statement that resolved the issue, and made the SQLite pretty fast.

Then we had a new issue, using Insert or Replace, where data could be lost, that we discussed here: #460

Well. It's tricky...

If you have any suggestions, @gb0o, @gentledepp, @VagueGit, please share ! :)

I'm investigating, but for sure @VagueGit, the performance bottleneck you are experiencing is an issue we need to fix !

@gentledepp
Copy link
Contributor

gentledepp commented Mar 30, 2021

@Mimetis Is it really the upsert method, or is it the trigger?
If it is only the upsert method, then just splitting the upsert into

  • an update
  • and a conditional insert statement (in case the update affected no rows) may be the solution already?

This will not be as fast as a single insert or replace, but it will at least scale linearly.

In case it is the trigger, however - why again do you need all the IFNULL(NULLIF(... statements in the INSERT OR IGNORE part?

DROP TRIGGER "main"."ProductCategory_update_trigger";
CREATE TRIGGER [ProductCategory_update_trigger] AFTER UPDATE ON [ProductCategory] 

Begin 
	UPDATE [ProductCategory_tracking] 
	SET [update_scope_id] = NULL -- scope id is always NULL when update is made locally
		,[timestamp] = replace(strftime('%Y%m%d%H%M%f', 'now'), '.', '')
		,[last_change_datetime] = datetime('now')
	Where [ProductCategory_tracking].[ProductCategoryID] = new.[ProductCategoryID]
	 AND (
	    IFNULL(NULLIF([old].[ParentProductCategoryID], [new].[ParentProductCategoryID]), NULLIF([new].[ParentProductCategoryID], [old].[ParentProductCategoryID])) IS NOT NULL
	 OR IFNULL(NULLIF([old].[Name], [new].[Name]), NULLIF([new].[Name], [old].[Name])) IS NOT NULL
	 OR IFNULL(NULLIF([old].[rowguid], [new].[rowguid]), NULLIF([new].[rowguid], [old].[rowguid])) IS NOT NULL
	 OR IFNULL(NULLIF([old].[ModifiedDate], [new].[ModifiedDate]), NULLIF([new].[ModifiedDate], [old].[ModifiedDate])) IS NOT NULL
	 OR IFNULL(NULLIF([old].[Attribute With Space], [new].[Attribute With Space]), NULLIF([new].[Attribute With Space], [old].[Attribute With Space])) IS NOT NULL
	 ) 
; 
	INSERT OR IGNORE INTO [ProductCategory_tracking] (
		[ProductCategoryID]
		,[update_scope_id]
		,[timestamp]
		,[sync_row_is_tombstone]
		,[last_change_datetime]
	) 
	SELECT 
		new.[ProductCategoryID]
		,NULL
		,replace(strftime('%Y%m%d%H%M%f', 'now'), '.', '')
		,0
		,datetime('now')
	WHERE (SELECT COUNT(*) FROM [ProductCategory_tracking] WHERE [ProductCategoryID]=new.[ProductCategoryID])=0
	 AND (
	    IFNULL(NULLIF([old].[ParentProductCategoryID], [new].[ParentProductCategoryID]), NULLIF([new].[ParentProductCategoryID], [old].[ParentProductCategoryID])) IS NOT NULL
	 OR IFNULL(NULLIF([old].[Name], [new].[Name]), NULLIF([new].[Name], [old].[Name])) IS NOT NULL
	 OR IFNULL(NULLIF([old].[rowguid], [new].[rowguid]), NULLIF([new].[rowguid], [old].[rowguid])) IS NOT NULL
	 OR IFNULL(NULLIF([old].[ModifiedDate], [new].[ModifiedDate]), NULLIF([new].[ModifiedDate], [old].[ModifiedDate])) IS NOT NULL
	 OR IFNULL(NULLIF([old].[Attribute With Space], [new].[Attribute With Space]), NULLIF([new].[Attribute With Space], [old].[Attribute With Space])) IS NOT NULL
	 ) 
; 
End


@Mimetis
Copy link
Owner

Mimetis commented Mar 30, 2021

Why again do you need all the IFNULL(NULLIF(... statements in the INSERT OR IGNORE part?

Well, it's a tricky one, but it's mandatory:

  • Define a table with some columns that should be synced and some columns that should not.
  • Make an Insert in you SQLite db.
  • Sync 1. Sync 2.
  • The CleanupMetadata on SQLite will run at some point (usually, after the second sync if you did not update the row) and will delete the tracking row.
  • Make an Update on this row on a column that is not part of the sync columns involved.

In that scenario, the trigger will execute the Insert or Ignore statement, but we don't want a tracking row here, because we've just updated a column that is not part of the sync columns.

Splitting the Upsert into 1 Update and 1 Conditional Insert

I guess it worth's a test.

I have a second option:

We can make a simple Insert Into tbl () Values () when we are making a first sync. And then relying on the With statement for next sync.
I mean, the bottleneck is raised when we are initializing the database, not when we are making regular sync, because (usually) regular sync does not have too many rows (comparing to a first initiliaze sync)

So, we may have eventually have a distinction between:

  • Hey First Sync : Go for a simple Insert Into
  • Hey Other Sync : Need to compare and bla bla bla, Go for With ()

Obviously, the first Insert Into will be quite fast, comparing to any other statement.

Your thoughts ?

@VagueGit
Copy link
Contributor Author

So, we may have eventually have a distinction between:

* Hey First Sync : Go for a simple `Insert Into`

* Hey Other Sync : Need to compare and bla bla bla, Go for `With ()`

Obviously, the first Insert Into will be quite fast, comparing to any other statement.

That would be great!

@gentledepp
Copy link
Contributor

Why go down the rabbit hole and clean up the tracking table every sync?
What do you gain except of much higher complexity?

Would it not be absolutely sufficient to say

  • as long as there is a row in the base table, there will always be a row in the tracking table
  • only if the row in the base table is deleted (i.e. the tracking table has istombstone=1) will the row in the tracking table be deleted after the sync was successful.

I do not understand why you are doing it differently :-|

@Mimetis
Copy link
Owner

Mimetis commented Mar 30, 2021

Because the clean up metadata is something that is needed, in both places, server or client.
You can't live with a tracking changes table that is growing more and more over the time.

Even if we are running the clean up routine manually (and not automatically) we may eventually get a point in time where we have rows in the client db, with no tracking row associated.
That being said, I agree on the argument "delete only tracking row with IsTombstone=1 after a successful sync."

But, once again, another problem is coming, when you are dealing with a new db (especially server side)
If tracking rows are required for each associated row, the first sync will need to insert all the tracking rows for all existing rows (especially on the server side).

Actually that was the case in previous version, but it was a big performance bottleneck, especially with large tables, where the initialization process had to take a really long time to achieve.

I know, there is no magic solution, and the automatic clean up (automatic on the client, manual on the server) is the most efficient way today to handle all the scenarios.

I'm still investigating the 2 solutions we are discussing.
I guess if your solution is working, it will be better than mine, because it will rely on only one UpsertCommand command (and not one InitializeInsertCommand and one UpsertCommand )

But for now... even with a single Insert Into, that is the most efficient, I'm not able to reach the performances we had here #314 (comment).

I'm going crazy, since I'm not able to go under 18 sec (and the previous version was able to reach under 12 sec)

(by the way, happy to see you again here @gentledepp , I was wondering if you were still using DMS of if you finally gave up !)

@Mimetis
Copy link
Owner

Mimetis commented Mar 30, 2021

I'm trying both solutions at the same time

First of all, I think I have found a really good improvement when using the syntax With().
Well ... not exactly...
I found something interesting...

This query is really slow and not linear at all:

WITH CHANGESET as (SELECT [c].[CustomerID], [c].[FirstName] FROM 
    (SELECT @CustomerID as [CustomerID], @FirstName as [FirstName]) as [c]
LEFT JOIN [Customer_tracking] AS [side] ON [side].[CustomerID] = @CustomerID
LEFT JOIN [Customer] AS [base] ON [base].[CustomerID] = [c].[CustomerID]

INSERT INTO [Customer] ([CustomerID], [FirstName])
SELECT * from CHANGESET WHERE TRUE
ON CONFLICT ([CustomerID]) DO UPDATE SET [FirstName]=excluded.[FirstName];

The interesting line is LEFT JOIN [Customer] AS [base] ON [base].[CustomerID] = [c].[CustomerID]

And this one is really fast and completely linear:

WITH CHANGESET as (SELECT [c].[CustomerID], [c].[FirstName] FROM 
    (SELECT @CustomerID as [CustomerID], @FirstName as [FirstName]) as [c]
LEFT JOIN [Customer_tracking] AS [side] ON [side].[CustomerID] = @CustomerID
LEFT JOIN [Customer] AS [base] ON [base].[CustomerID] = @CustomerID

INSERT INTO [Customer] ([CustomerID], [FirstName])
SELECT * from CHANGESET WHERE TRUE
ON CONFLICT ([CustomerID]) DO UPDATE SET [FirstName]=excluded.[FirstName];

The interesting line is LEFT JOIN [Customer] AS [base] ON [base].[CustomerID] = @CustomerID

I ... just ... don't know why...

Anyway, I'm continuing investigating.
But one sure thing is that going for the second sql command is way faster.

I'm testing having a really simple Insert Or Ignore Into when we are in a new database (or Reinitiliaze mode) and another statement for day to day sync (where we need to compare things)

@Mimetis Mimetis reopened this Mar 30, 2021
@Mimetis
Copy link
Owner

Mimetis commented Mar 30, 2021

Hey @VagueGit , can you make a test with the last beta version v0.7.4 available here :

(And of course, update all the others packages as well)

And let me know if it is improving your first sync ?

@VagueGit
Copy link
Contributor Author

Thank you for this @Mimetis and @gentledepp for your input. I installed DMS 0.7.4-beta-0616 on the web server and client. Unfortunately this seems to have made little difference.

45mins for upgrade and initial sync of legacy data from SQLite to SQL Server over http. 1hr 12mins to sync the same data from SQL Server to SQLite on a new device. Consistently sync down takes ~40% longer than upgrade + sync up.

About 1hr of that time is DMS downloading snapshots. 123 files, 181mb.

FileZilla downloads the same snapshots from the same server to the same device in 5mins.

Can anything be done to speed up the time DMS takes to download snapshots? That seems to be where the biggest gains are to be made.

@Mimetis
Copy link
Owner

Mimetis commented Mar 31, 2021

Damn, this is weird.
So, if I understand correctly, the sync process (except the download) is still pretty fast (12 mins, if we are removing the 1h download), correct ?

I need to test this scenario, because I really don't see why it's so slow.
First of all, when downloading the files, DMS is reading the file to get some information. That's why it's slower than FileZilla, for instance. BUT it's not a reason to be SO long...

Can you share the schema of your database ?
Or, if there is only a few tables that are huge, only the schema of these tables
Or even better, if there is no sensitive, can you share a backup of one customer database ?

I will continue to make more tests to find the bottleneck, but without any help, it can be tricky.

@VagueGit
Copy link
Contributor Author

I'm sorry but it is not permitted for me to share the schema, only snippets. Also we use copies of customer data for testing and we are not allowed to share that outside the company.

The core of the schema is quite conventional
CustomerCategorys->Customers->Orders->OrderDetails
Orders->Payments
OrderDetails->Jobs->Components->SubComponents

38 tables. All full sync. Most tables have <10 columns. The widest table has 32 columns but only 1 row. It's a 'settings' table.

I can't think of anything unusual about our design that might cause an issue. It doesn't cause an issue for the current version of this app in production.

I'm sorry I can't be more helpful.

If our app were to download the snapshots directory, would DMS recognise it was there and work with it, or would DMS want to download again?

@Mimetis
Copy link
Owner

Mimetis commented Mar 31, 2021

I've just made a test with a big table, over http, and downloading seems normal.

If our app were to download the snapshots directory, would DMS recognise it was there and work with it, or would DMS want to download again?

DMS will download again. As I said, there is not only download all files. It's more:

  • Download the file
  • Open It
  • Read It
  • Make some assumption, see if we need to download another file etc..
  • Save it to the local storage

If you are making a sync from your computer (using a console application for example) is it as slow as on a device ?

@gentledepp
Copy link
Contributor

gentledepp commented Mar 31, 2021

@Mimetis yes I am still here, but still on DMS 0.3.2 🙄
I will not touch the latest version until I have automatic migrations implemented, because I do not want to touch any sync stuff ever again. This has cost me so much time of my life already - it should just work out of the box and leave me alone. Currently, every schema migration gives us a headache.

The interesting line is LEFT JOIN [Customer] AS [base] ON [base].[CustomerID] = @CustomerID

I ... just ... don't know why...

Well, you could use SQLite Studio to explain the execution plan to you.
What I would guess is, that because you do use = @CustomerID, SQLite does not have to match two indexes, but only use one value (@Customerd) as a lookup in a single index. This is usually a very fast index seek.

But, once again, another problem is coming, when you are dealing with a new db (especially server side)
If tracking rows are required for each associated row, the first sync will need to insert all the tracking rows for all existing rows (especially on the server side).

It would be really, really awesome if you could document all those itsy bitsy design decisions and and internals for the community to better understand how DMS works

@Mimetis
Copy link
Owner

Mimetis commented Mar 31, 2021

@Mimetis yes I am still here, but still on DMS 0.3.2 🙄

Wow !! good luck :)

It should just work out of the box

Wow !! you're dreaming ;)

What I would guess is, that because you do use = @CustomerID, SQLite does not have to match two indexes, but only use one value (@Customerd) as a lookup in a single index. This is usually a very fast index seek.

LEFT JOIN [Customer] AS [base] ON [base].[CustomerID] = [c].[CustomerID] is an index seek as well.
It should be fast, but it seems we have an index scan here, for some reasons..

Anyway, it's way much faster today with the fix i've published in the last pre-release

@VagueGit
Copy link
Contributor Author

If you are making a sync from your computer (using a console application for example) is it as slow as on a device ?

Those numbers are on a high-spec dev laptop

@Mimetis
Copy link
Owner

Mimetis commented Mar 31, 2021

Ok,
Is it faster when run it locally ? (to be sure the problem is not coming from the http connection ? )

@VagueGit
Copy link
Contributor Author

I will setup a local SQL Server and run the api locally and post back tmw (it's getting late here)

@Mimetis
Copy link
Owner

Mimetis commented Mar 31, 2021

Yes I know the timezone is really complicated for both of us ;) good luck !

@gentledepp
Copy link
Contributor

I would propose using some perf test tool like Jetbrains dotTrace... otherwise you'll get gray and old until you find the culprit

@VagueGit
Copy link
Contributor Author

gray and old

too late :-)

@VagueGit
Copy link
Contributor Author

VagueGit commented Apr 1, 2021

To test if the webserver was slow in serving the snapshot files I added a DownloadController to the sync api

public IActionResult Index([FromHeader] string filename)
{
    string contentRootPath = (string)AppDomain.CurrentDomain.GetData("ContentRootPath");
    string path = Path.Combine(contentRootPath, "Snapshots", databaseName, "DefaultScope", "ALL", filename);

    MemoryStream memory = new MemoryStream();
    using (FileStream stream = new FileStream(path, FileMode.Open))
    {
        stream.CopyTo(memory);
    }
    memory.Position = 0;
    FileStreamResult thing = File(memory, contentType: "text/plain", Path.GetFileName(path));
    return thing;
}

I call it when the client app starts. It gets the file names to download by iterating through the Snapshots folder. It then downloads those files from the web server to a temp folder.

foreach (string fileEntry in fileEntries)
{
    string fileName = Path.GetFileName(fileEntry);
    string downloadPath = Path.Combine(snapshotsTargetPath, fileName);
    RestRequest restRequest = new RestRequest();
    restRequest.AddHeader("fileName", fileName);
    byte[] response = restClient.DownloadData(restRequest);
    File.WriteAllBytes(downloadPath, response);
}

It downloads the snapshot files from the webserver to disk in 27mins. That's about twice as fast as DMS.

@workgroupengineering
Copy link
Contributor

workgroupengineering commented Apr 1, 2021

You could try using a custom serilizer like BSON, ProtoBuf or MessagePack

You can also add compression

@VagueGit
Copy link
Contributor Author

VagueGit commented Apr 1, 2021

@workgroupengineering Thanks for your suggestion. The issue here is not serialisation. The database has already been serialised on the web server. The question is how to improve the performance of DMS in downloading the serialised data.

Your point is taken regarding BSON. Our app in production uses BSON. BSON is faster to parse but results in bigger files, so takes longer to download. BSON would exacerbate our issue.

This issue impacts us as we have customers with 20+ years of business data in our app. We can upgrade a customer to DMS in about 45mins. Downloading the snapshots created to a new machine takes over an hour (on v fast internet).

One option we have considered is to upload the first SQLite database to our CDN. Subsequent installations in that company can be inititialised with a copy of that database. We do that with new installations. We could extend that to existing users adding new machines.

That introduces additional costs. There are also technical issues in that approach, such as when to expire that database.

Perhaps DMS could zip, download, unzip? It would be preferable if DMS could make this issue go away; have DMS just work out of the box as suggested by @gentledepp

@Mimetis
Copy link
Owner

Mimetis commented Apr 2, 2021

have DMS just work out of the box

This is an OSS project.
You have access to the source code.
You can debug.
You can modify it.
You can make a PR to improve it.
This IS AN OSS PROJECT.

Do you pay something to have a full product "working out of the box" ?
Do you know how much time I spend to make this project ?
Do you know how does it cost for me doing this during my "spare time" ?
Do you help me find out what's the problem here ?
Did you do one fucking single Pull Request to help this project moving forward ?
How many time did I ask you to give me a sample repro to figure out what's the problem is ?

have DMS just work out of the box

Just... No

have DMS just work out of the box

Well, you can choose another framework where you will have some kind of guarantees "out of the box" (maybe) like:

Now I will just stop trying to understand what's your problem until you give me a sample reproduction.

Have a good day

@omarbekov
Copy link

@VagueGit

One option we have considered is to upload the first SQLite database to our CDN. Subsequent installations in that company can be inititialised with a copy of that database. We do that with new installations. We could extend that to existing users adding new machines.
That introduces additional costs. There are also technical issues in that approach, such as when to expire that database.

We do kind of the same thing. Generate a new Sqlite .db file every two hours on the server side.
If your DB is big, you could generate a copy once a day or every two days at night.
New clients then will just download the .db file from the server, and start a sync process on that file.
But, an important thing to do is to set the new scope_id-s.
On the server side - in the history scope info table.
And in the file that is being downloaded by the client - in the scope info table.

@Mimetis Hey, man, thank you for your time and work and for the awesome framework the DMS is!

@Mimetis
Copy link
Owner

Mimetis commented Apr 5, 2021

Working on a new performance version.
I'm trying to remove the Deserialize / Serialize mechanisms that occurs when downloading a snapshot (or batches when BatchSize > 0)
I have a gain of approximatively 10% comparing to the current version.
A little bit more when testing from a device (~ 15%)

Then I'm trying to fill all the sockets available to download the batches in parallel.
Depending on the client application, I have a performance increase of ... 80%
The downside effect is that the batches are not downloaded in the correct order.

image

(On the right, the actual DMS version, on the left the In Progress version)

The snapshot downloaded here is about 400Mo containing approximatively 400k rows.
I've interrupted the sync to have the time consummed only during downloading files

  • Before: 01min10sec
  • After: 20sec

Still a lot to do and test, but in a good way.

I'm working hard to be sure that this new version will be backward compatible (Upgraded Server will continue to work with Previous Version Deployed Client Deployed)

To be continued ..

@Mimetis
Copy link
Owner

Mimetis commented Apr 7, 2021

Can you make a test, with last prerelease ? #527

@VagueGit
Copy link
Contributor Author

VagueGit commented Apr 7, 2021

Sync 500mb data from the server to a new device was ~1hr 15 mins. Using DMS 0.8.0-beta-0624 with maxDownladingDegreeOfParallelism: 4 sync 500mb data from the server to a new device is 0hr 14mins. That's a spectacular improvement. Well done @Mimetis !

@lordofffarm Thanks for your input. There are about 500 companies using the product that I was considering for DMS. Each of those companies has it's own server database and each user in a company has a local database. Taking regular copies of those databases would add significant cost for us to pass onto our customers. I'm not sure I could justify the expense.

However DMS 0.8.0-beta-0624 is such a spectacular improvement in syncing to new clients, I think this problem has gone away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants