-
-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync to new device is slow #523
Comments
You're right We resolved this issue a while ago after some deep tests, concluding that using So far, we changed the Then we had a new issue, using Well. It's tricky... If you have any suggestions, @gb0o, @gentledepp, @VagueGit, please share ! :) I'm investigating, but for sure @VagueGit, the performance bottleneck you are experiencing is an issue we need to fix ! |
@Mimetis Is it really the upsert method, or is it the trigger?
This will not be as fast as a single In case it is the trigger, however - why again do you need all the DROP TRIGGER "main"."ProductCategory_update_trigger";
CREATE TRIGGER [ProductCategory_update_trigger] AFTER UPDATE ON [ProductCategory]
Begin
UPDATE [ProductCategory_tracking]
SET [update_scope_id] = NULL -- scope id is always NULL when update is made locally
,[timestamp] = replace(strftime('%Y%m%d%H%M%f', 'now'), '.', '')
,[last_change_datetime] = datetime('now')
Where [ProductCategory_tracking].[ProductCategoryID] = new.[ProductCategoryID]
AND (
IFNULL(NULLIF([old].[ParentProductCategoryID], [new].[ParentProductCategoryID]), NULLIF([new].[ParentProductCategoryID], [old].[ParentProductCategoryID])) IS NOT NULL
OR IFNULL(NULLIF([old].[Name], [new].[Name]), NULLIF([new].[Name], [old].[Name])) IS NOT NULL
OR IFNULL(NULLIF([old].[rowguid], [new].[rowguid]), NULLIF([new].[rowguid], [old].[rowguid])) IS NOT NULL
OR IFNULL(NULLIF([old].[ModifiedDate], [new].[ModifiedDate]), NULLIF([new].[ModifiedDate], [old].[ModifiedDate])) IS NOT NULL
OR IFNULL(NULLIF([old].[Attribute With Space], [new].[Attribute With Space]), NULLIF([new].[Attribute With Space], [old].[Attribute With Space])) IS NOT NULL
)
;
INSERT OR IGNORE INTO [ProductCategory_tracking] (
[ProductCategoryID]
,[update_scope_id]
,[timestamp]
,[sync_row_is_tombstone]
,[last_change_datetime]
)
SELECT
new.[ProductCategoryID]
,NULL
,replace(strftime('%Y%m%d%H%M%f', 'now'), '.', '')
,0
,datetime('now')
WHERE (SELECT COUNT(*) FROM [ProductCategory_tracking] WHERE [ProductCategoryID]=new.[ProductCategoryID])=0
AND (
IFNULL(NULLIF([old].[ParentProductCategoryID], [new].[ParentProductCategoryID]), NULLIF([new].[ParentProductCategoryID], [old].[ParentProductCategoryID])) IS NOT NULL
OR IFNULL(NULLIF([old].[Name], [new].[Name]), NULLIF([new].[Name], [old].[Name])) IS NOT NULL
OR IFNULL(NULLIF([old].[rowguid], [new].[rowguid]), NULLIF([new].[rowguid], [old].[rowguid])) IS NOT NULL
OR IFNULL(NULLIF([old].[ModifiedDate], [new].[ModifiedDate]), NULLIF([new].[ModifiedDate], [old].[ModifiedDate])) IS NOT NULL
OR IFNULL(NULLIF([old].[Attribute With Space], [new].[Attribute With Space]), NULLIF([new].[Attribute With Space], [old].[Attribute With Space])) IS NOT NULL
)
;
End
|
Well, it's a tricky one, but it's mandatory:
In that scenario, the trigger will execute the
I guess it worth's a test. I have a second option: We can make a simple So, we may have eventually have a distinction between:
Obviously, the first Your thoughts ? |
That would be great! |
Why go down the rabbit hole and clean up the tracking table every sync? Would it not be absolutely sufficient to say
I do not understand why you are doing it differently :-| |
Because the clean up metadata is something that is needed, in both places, server or client. Even if we are running the clean up routine manually (and not automatically) we may eventually get a point in time where we have rows in the client db, with no tracking row associated. But, once again, another problem is coming, when you are dealing with a new db (especially server side)
I know, there is no magic solution, and the automatic clean up (automatic on the client, manual on the server) is the most efficient way today to handle all the scenarios. I'm still investigating the 2 solutions we are discussing. But for now... even with a single I'm going crazy, since I'm not able to go under 18 sec (and the previous version was able to reach under 12 sec) (by the way, happy to see you again here @gentledepp , I was wondering if you were still using |
I'm trying both solutions at the same time First of all, I think I have found a really good improvement when using the syntax This query is really slow and not linear at all: WITH CHANGESET as (SELECT [c].[CustomerID], [c].[FirstName] FROM
(SELECT @CustomerID as [CustomerID], @FirstName as [FirstName]) as [c]
LEFT JOIN [Customer_tracking] AS [side] ON [side].[CustomerID] = @CustomerID
LEFT JOIN [Customer] AS [base] ON [base].[CustomerID] = [c].[CustomerID]
INSERT INTO [Customer] ([CustomerID], [FirstName])
SELECT * from CHANGESET WHERE TRUE
ON CONFLICT ([CustomerID]) DO UPDATE SET [FirstName]=excluded.[FirstName]; The interesting line is And this one is really fast and completely linear: WITH CHANGESET as (SELECT [c].[CustomerID], [c].[FirstName] FROM
(SELECT @CustomerID as [CustomerID], @FirstName as [FirstName]) as [c]
LEFT JOIN [Customer_tracking] AS [side] ON [side].[CustomerID] = @CustomerID
LEFT JOIN [Customer] AS [base] ON [base].[CustomerID] = @CustomerID
INSERT INTO [Customer] ([CustomerID], [FirstName])
SELECT * from CHANGESET WHERE TRUE
ON CONFLICT ([CustomerID]) DO UPDATE SET [FirstName]=excluded.[FirstName]; The interesting line is I ... just ... don't know why... Anyway, I'm continuing investigating. I'm testing having a really simple |
Hey @VagueGit , can you make a test with the last beta version (And of course, update all the others packages as well) And let me know if it is improving your first sync ? |
Thank you for this @Mimetis and @gentledepp for your input. I installed 45mins for upgrade and initial sync of legacy data from SQLite to SQL Server over http. 1hr 12mins to sync the same data from SQL Server to SQLite on a new device. Consistently sync down takes ~40% longer than upgrade + sync up. About 1hr of that time is DMS downloading snapshots. 123 files, 181mb. FileZilla downloads the same snapshots from the same server to the same device in 5mins. Can anything be done to speed up the time DMS takes to download snapshots? That seems to be where the biggest gains are to be made. |
Damn, this is weird. I need to test this scenario, because I really don't see why it's so slow. Can you share the schema of your database ? I will continue to make more tests to find the bottleneck, but without any help, it can be tricky. |
I'm sorry but it is not permitted for me to share the schema, only snippets. Also we use copies of customer data for testing and we are not allowed to share that outside the company. The core of the schema is quite conventional 38 tables. All full sync. Most tables have <10 columns. The widest table has 32 columns but only 1 row. It's a 'settings' table. I can't think of anything unusual about our design that might cause an issue. It doesn't cause an issue for the current version of this app in production. I'm sorry I can't be more helpful. If our app were to download the snapshots directory, would DMS recognise it was there and work with it, or would DMS want to download again? |
I've just made a test with a big table, over http, and downloading seems normal.
DMS will download again. As I said, there is not only download all files. It's more:
If you are making a sync from your computer (using a console application for example) is it as slow as on a device ? |
@Mimetis yes I am still here, but still on DMS 0.3.2 🙄
Well, you could use SQLite Studio to explain the execution plan to you.
It would be really, really awesome if you could document all those itsy bitsy design decisions and and internals for the community to better understand how DMS works |
Wow !! good luck :)
Wow !! you're dreaming ;)
Anyway, it's way much faster today with the fix i've published in the last pre-release |
Those numbers are on a high-spec dev laptop |
Ok, |
I will setup a local SQL Server and run the api locally and post back tmw (it's getting late here) |
Yes I know the timezone is really complicated for both of us ;) good luck ! |
I would propose using some perf test tool like Jetbrains dotTrace... otherwise you'll get gray and old until you find the culprit |
too late :-) |
To test if the webserver was slow in serving the snapshot files I added a
I call it when the client app starts. It gets the file names to download by iterating through the Snapshots folder. It then downloads those files from the web server to a temp folder.
It downloads the snapshot files from the webserver to disk in 27mins. That's about twice as fast as DMS. |
You could try using a custom serilizer like BSON, ProtoBuf or MessagePack You can also add compression |
@workgroupengineering Thanks for your suggestion. The issue here is not serialisation. The database has already been serialised on the web server. The question is how to improve the performance of DMS in downloading the serialised data. Your point is taken regarding BSON. Our app in production uses BSON. BSON is faster to parse but results in bigger files, so takes longer to download. BSON would exacerbate our issue. This issue impacts us as we have customers with 20+ years of business data in our app. We can upgrade a customer to DMS in about 45mins. Downloading the snapshots created to a new machine takes over an hour (on v fast internet). One option we have considered is to upload the first SQLite database to our CDN. Subsequent installations in that company can be inititialised with a copy of that database. We do that with new installations. We could extend that to existing users adding new machines. That introduces additional costs. There are also technical issues in that approach, such as when to expire that database. Perhaps DMS could zip, download, unzip? It would be preferable if DMS could make this issue go away; have DMS just work out of the box as suggested by @gentledepp |
This is an OSS project. Do you pay something to have a full product "working out of the box" ?
Just... No
Well, you can choose another framework where you will have some kind of guarantees "out of the box" (maybe) like: Now I will just stop trying to understand what's your problem until you give me a sample reproduction. Have a good day |
We do kind of the same thing. Generate a new Sqlite @Mimetis Hey, man, thank you for your time and work and for the awesome framework the DMS is! |
Working on a new performance version. Then I'm trying to fill all the sockets available to download the batches in parallel. (On the right, the actual The snapshot downloaded here is about 400Mo containing approximatively 400k rows.
Still a lot to do and test, but in a good way. I'm working hard to be sure that this new version will be backward compatible (Upgraded Server will continue to work with Previous Version Deployed Client Deployed) To be continued .. |
Can you make a test, with last prerelease ? #527 |
Sync 500mb data from the server to a new device was ~1hr 15 mins. Using @lordofffarm Thanks for your input. There are about 500 companies using the product that I was considering for DMS. Each of those companies has it's own server database and each user in a company has a local database. Taking regular copies of those databases would add significant cost for us to pass onto our customers. I'm not sure I could justify the expense. However |
Testing with a ~500mb SQLite database, legacy data is upgraded to SQLite. The SQLite db is synced to SQL Server using DMS over https.
At the end of the initial sync a snapshot is created on the server.
We find the initial sync of server data back to another device is quite slow. DMS takes around 1 hour 15mins to sync that server data to a new SQLite database.
The app we have in production, not using DMS, takes 15-20mins to sync to a new device with the same test data.
We test on a very fast internet connection. Most of our customers have slower, less reliable internet.
Download snapshot is not resilient to connection interruption #470 introduced retry logic. That is most appreciated. But if we do go live with DMS, our customers will notice how much slower it is to add our app to a new device.
The issue does not seem to be related to indexes as the snapshot is already created. The server we use for testing now appears to have sufficient resources. We use batch size=2000 as I assume a smaller batch size is more resilient.
Is there anything else we can do, or DMS can do to reduce the time DMS takes to sync to a new device?
The text was updated successfully, but these errors were encountered: