Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve high CPU usage when performing DB reads #419

Merged
merged 10 commits into from
Mar 24, 2019

Conversation

abraunegg
Copy link
Owner

@abraunegg abraunegg commented Mar 19, 2019

  • Disable automatic indexing (enabled by default in sqlite as of version 3.7.17) as we specifically create the required indexes
  • Tell SQLite to store temporary tables in memory. This will speed up many read operations that rely on temporary tables, indices, and views.

Resolves issues:
#21, #347, #394, #404, #432 (in part, as full file system scanning is still occuring which is being looked at in #433)

* Disable automatic indexing as we specifically create the required indexes
* Tell SQLite to store temporary tables in memory. This will speed up many read operations that rely on temporary tables, indices, and views.
* Add links & reasoning behind other PRAGMA settings used
* Add new index specifically for driveId & parentId paring
@abraunegg abraunegg added this to the 2.3.0 milestone Mar 21, 2019
@abraunegg
Copy link
Owner Author

@norbusan
Based on user feedback, it would appear that the latest commit does fix the issue.

Question is - how to 'fix' this properly?

The easiest way potentially would be to 'revision' the database internal counter, as, when I was coding all the CentOS / Fedora fixes, to overcome the DB issues, if the DB version was < X it would re-create the tables & database.

Worth testing this increment to resolve this situation as well? Potentially it 'should' but I do not know if the index's would still be hanging around / would get auto created correctly - would need complete testing.

@norbusan
Copy link
Collaborator

Yes, that is probably the best idea. In shotwell (photo editor) where I also contribute(d), there is a database table version saved in the DB, and the program compares and does some update routines depending on the version.

So asssume we are at version 1 now, and bump it in the program to version 2. On next run the program sees that the saved DB is actually version 1, and does add the index and updates the version saved in the database.

That works nicely.

In this case, I am not sure if calling the create index on every database open sounds like calling for problems? I don't know what sqlite is doing in case there is already an index.

* To force DB schema & index creation, bump DB schema version
@abraunegg
Copy link
Owner Author

@norbusan
Updated the DB version:

Loading config ...
Using Config Dir: /home/alex/.config/onedrive
No config file found, using application defaults
Initializing the OneDrive API ...
Opening the item database ...
The item database is incompatible, re-creating database table structures
All operations will be performed in: /home/alex/OneDrive
Initializing the Synchronization Engine ...
Account Type: personal
Default Drive ID: 66d53be8a5056eca
Default Root ID: 66D53BE8A5056ECA!101
Remaining Free Space: 4821211072
Fetching details for OneDrive Root
OneDrive Root does not exist in the database. We need to add it.
Added OneDrive Root to the local database
Initializing monitor ...
OneDrive monitor interval (seconds): 45

When this gets pushed into Master, everyone's DB will get updated to support the new index

norbusan
norbusan previously approved these changes Mar 21, 2019
@abraunegg abraunegg changed the title WIP: Resolve high CPU usage when performing DB reads Resolve high CPU usage when performing DB reads Mar 22, 2019
* Update handling of skip_dir and skip_file parsing - should only check if the file is excluded if the parent directory is not
* Add another index for selectByPath database queries
asnowfix and others added 3 commits March 24, 2019 06:40
* New build option to get more DEBUG symbolic information
* Update ldc2 debug handling
* Use boolean values rather than on / off values
* Enable auto_vacuum for entry deletes / database cleanup
@norbusan
Copy link
Collaborator

@abraunegg I am for merging this, it has shown its use already and fixed several participants problems.

@abraunegg
Copy link
Owner Author

Agreed - merging

@abraunegg abraunegg merged commit 79cc599 into master Mar 24, 2019
@abraunegg abraunegg deleted the high-cpu-usage-db-query branch March 24, 2019 02:39
@rednag
Copy link

rednag commented Mar 24, 2019

I don't see that this/my issue is fixed, because in 45s interval my CPU is 30s on 100% with onedrive v2.2.6-21-ga9795dd, or I do something wrong? :/

@abraunegg
Copy link
Owner Author

@rednag
Please can you have a read of #433 and confirm your issue of 100% CPU load is only occuring when the full file system scan is occuring?

If not - please 'delete' the items database file and test / try again.

Also, as this code is merged into master, with v2.3.0 pending - please ensure you rebuild your client from 'master'

@rednag
Copy link

rednag commented Mar 24, 2019

So every 45s is a full file system scan? In CPU history it looks like that. I do not upload or change anything and have every 45s around 30s 100% CPU usage.

I've deleted the files and test it again, but I always deleted the database items.

edit1:
Same behavior and always the same curve in CPU history.

1cy

2cy

You see, the cycle of 45s and it is pretty much the same curve.

edit2:
Restarted again with rebuild and deletion. While

Mär 24 21:29:20 CX onedrive[5399]: Processing 203 changes
Mär 24 21:29:22 CX onedrive[5399]: Processing 218 changes
Mär 24 21:29:23 CX onedrive[5399]: Processing 218 changes
Mär 24 21:29:25 CX onedrive[5399]: Processing 246 changes
Mär 24 21:29:27 CX onedrive[5399]: Processing 212 changes
Mär 24 21:29:29 CX onedrive[5399]: Processing 234 changes
Mär 24 21:29:31 CX onedrive[5399]: Processing 231 changes
Mär 24 21:29:33 CX onedrive[5399]: Processing 211 changes

There is no high CPU usage!

cy1

So while processing is less CPU usage than while monitoring?! So processing is finished an we have the 45s cycle with 100% CPU usage!
cy_high
cyclehigh1
cyclehigh2

Mär 24 21:35:38 CX onedrive[5399]: Processing JHPSAEM7E2QSZJ7ORDIPHTRXH5T5SINF
Mär 24 21:35:38 CX onedrive[5399]: The file has not changed
Mär 24 21:35:38 CX onedrive[5399]: Processing L3TKSR7EEL3XII6GNRX4EXD2I2NHRCXS
Mär 24 21:35:38 CX onedrive[5399]: The file has not changed
Mär 24 21:35:38 CX onedrive[5399]: Processing BAOS7CWJ2KVRWABOMTT7TZN3JMVDOSLG
Mär 24 21:35:38 CX onedrive[5399]: The file has not changed
Mär 24 21:35:38 CX onedrive[5399]: Processing 4SIW5KD6YVZWISRAQCJVKQ5JYSMSWMRC
Mär 24 21:35:38 CX onedrive[5399]: The file has not changed
Mär 24 21:35:38 CX onedrive[5399]: Processing NP6KKUPTEVFIWG7RU3GUMZPCYCJODDST
Mär 24 21:35:38 CX onedrive[5399]: The file has not changed

@abraunegg
Copy link
Owner Author

abraunegg commented Mar 24, 2019

@rednag

Mär 24 21:29:33 CX onedrive[5399]: Processing 211 changes

This is 'database' re-creation - taking OneDrive JSON data and processing it.

Mär 24 21:35:38 CX onedrive[5399]: Processing JHPSAEM7E2QSZJ7ORDIPHTRXH5T5SINF

This is scanning the database / validating the contents

At the end of this sequence, you will see:

Uploading new items of .

This is where the client is now performing a 'walk' of your sync_dir to ensure that all files / folders are actually uploaded.

This is where your 100% load is most likely coming from now. This is currently normal application behaviour.

So you have a couple of options:

  1. Change the 45 second sync window via your config file
  2. Wait for the changes in Reduce scanning of local filesystem needlessly every sync in monitor mode #433 to drop in v2.3.1 - however periodically the application will still scan the entire path which will cause CPU load.

If you are still having issues, please:

  1. Open a new issue ticket
  2. Provide debug data - perf data output & strace

@rednag
Copy link

rednag commented Mar 25, 2019

Thank you for your support :), how long does this sequence take

Mär 25 06:38:34 CX onedrive[5399]: Processing JHPSAEM7E2QSZJ7ORDIPHTRXH5T5SINF
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed
Mär 25 06:38:34 CX onedrive[5399]: Processing L3TKSR7EEL3XII6GNRX4EXD2I2NHRCXS
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed
Mär 25 06:38:34 CX onedrive[5399]: Processing BAOS7CWJ2KVRWABOMTT7TZN3JMVDOSLG
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed
Mär 25 06:38:34 CX onedrive[5399]: Processing 4SIW5KD6YVZWISRAQCJVKQ5JYSMSWMRC
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed
Mär 25 06:38:34 CX onedrive[5399]: Processing NP6KKUPTEVFIWG7RU3GUMZPCYCJODDST
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed

Then it is still validating?!

@abraunegg
Copy link
Owner Author

Thank you for your support :), how long does this sequence take

Mär 25 06:38:34 CX onedrive[5399]: Processing JHPSAEM7E2QSZJ7ORDIPHTRXH5T5SINF
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed
Mär 25 06:38:34 CX onedrive[5399]: Processing L3TKSR7EEL3XII6GNRX4EXD2I2NHRCXS
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed
Mär 25 06:38:34 CX onedrive[5399]: Processing BAOS7CWJ2KVRWABOMTT7TZN3JMVDOSLG
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed
Mär 25 06:38:34 CX onedrive[5399]: Processing 4SIW5KD6YVZWISRAQCJVKQ5JYSMSWMRC
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed
Mär 25 06:38:34 CX onedrive[5399]: Processing NP6KKUPTEVFIWG7RU3GUMZPCYCJODDST
Mär 25 06:38:34 CX onedrive[5399]: The file has not changed

Then it is still validating?!

That is the DB validation sequence. There should be little to moderate CPU load depending on the number of files within the local database.

Prior to this PR , a DB index was missing, thus causing excessive CPU.

How long should that process take? It depends on many factors - CPU speed, memory speed, disk I/O ...

@rednag
Copy link

rednag commented Mar 25, 2019

After 9h it is still validating and there is high CPU usage as well with the 2.3.

Maybe there is a better way to recognize changes, instead to monitor all files?! Isn't it possible to observe opened files in a path and just sync or monitor if files opened in the OneDrive directory?!

@abraunegg
Copy link
Owner Author

@rednag
I think you have something else going on where the disk I/O is hindering you - again, looking at the SD card.

@abraunegg
Copy link
Owner Author

@rednag
I am currently out of options as to what else it can be. Unless you can provide a real deep dive analysis of all the processes / stack traces etc of everything, it is very hard to say what your issue is.

I would even go so far to test your system / setup without using Ubuntu - use a pure Debian or CentOS or Arch Linux.

@abraunegg
Copy link
Owner Author

@norbusan
Open to also any suggestions you may have here ...

@norbusan
Copy link
Collaborator

sorry, I run out of ideas. Something is really strange on that computer. Either file reads are slow, or other processes are hogging IO, or whatever. But without actually sitting in front of the system I don't see a good way to debug this.

@rednag
Copy link

rednag commented Mar 25, 2019

edit:
I've switched the interval from 45s to 300s and now there are all problems gone?!

@lock
Copy link

lock bot commented Apr 26, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants