Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform DB scan earlier and reuse result #2525

Closed
wants to merge 3 commits into from

Conversation

JC-comp
Copy link
Contributor

@JC-comp JC-comp commented Oct 23, 2023

Changes

  • Skip already visisted items when checking database item consistency: fixed O(n^2) tree traversal
  • Skip filtering when scanning new local files if it already exists in database: fixed duplicate filtering as it must be filtered before being saved into database

Copy link
Owner

@abraunegg abraunegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JC-comp
Specifically for this commit (621637e), I understand what you are trying to achieve here - that is, check the database earlier as to if the new local item is in the database, and only if it is not in the database, then proceed with the validation checks.

The whole point of a 'new local item' is that it should never be in the database to begin with - if it is, then, there is some major referential integrity issue going on.

The whole point then is passing through the applicable gates to determine the new item's validity, and if those items pass, then do the DB scan.

By doing the DB scan earlier than the applicability checks and then excluded by those checks, it is needless additional processing overhead.

Please remove this commit as it does not make sense.

Copy link
Owner

@abraunegg abraunegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JC-comp
Specifically for this commit (1b3125a), do you have an actual example of how and why this code is needed?

I have added this commit to my current working branch of 'alpha-4' and cannot see that this code makes any impact:

image

Runtime Output:

Reading configuration file: /home/alex/.config/onedrive/config
Configuration file successfully loaded
Using 'user' configuration path for application state data: /home/alex/.config/onedrive

ERROR: Unable to create /var/log/onedrive
ERROR: Please manually create '/var/log/onedrive' and set appropriate permissions to allow write access for your user to this location.
ERROR: The requested client activity log will instead be located in your users home directory

Using the following path to store the runtime application log: /home/alex
Checking Application Version ...
Attempting to initialise the OneDrive API ...
Configuring Global Azure AD Endpoints
The OneDrive API was initialised successfully
Opening the item database ...
Application Version:  v2.5.0-alpha-4 GitHub version: v2.4.25-63-gbce3e7f
Account Type:         personal
Default Drive ID:     66d53be8a5056eca
Default Root ID:      66D53BE8A5056ECA!101
Remaining Free Space: 4.60 GB (4939212390 bytes)
Sync Engine Initialised with new Onedrive API instance
All application operations will be performed in the configured local 'sync_dir' directory: /home/alex/OneDrive
Fetching /delta response from the OneDrive API for Drive ID: 66d53be8a5056eca
Processing API Response Bundle: 1 - Quantity of 'changes|items' in this bundle to process: 0
Finished processing /delta JSON response from the OneDrive API
No additional changes or items that can be applied were discovered while processing the data received from Microsoft OneDrive
Syncing this OneDrive Personal Shared Folder: OneDrive - Constanze
Fetching /delta response from the OneDrive API for Drive ID: bc7d88ec1f539dcf
Processing API Response Bundle: 1 - Quantity of 'changes|items' in this bundle to process: 0
Finished processing /delta JSON response from the OneDrive API
No additional changes or items that can be applied were discovered while processing the data received from Microsoft OneDrive
Performing a database consistency and integrity check on locally stored data ... 
Processing DB entries for this Drive ID: 66d53be8a5056eca
Processing ~/OneDrive
The directory has not changed
Processing Bad Folder
The directory has not changed
Processing OneDrive - Constanze
Processing newdir
The directory has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf
The directory has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file10.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file0.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file5.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file2.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file4.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file8.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file6.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file7.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file9.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file3.data
The file has not changed
Processing newdir/jkqwerhqkjfkasdfkjdsafaskjdf/file1.data
The file has not changed
Processing newdir/random_large_files
The directory has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq
The directory has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file6.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file4.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file7.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file9.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file5.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file0.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file2.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file8.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file3.data
The file has not changed
Processing newdir/random_large_files/5ARDdyLGtkOXKeRHFPWVZhxd0E67P4Uq/file1.data
The file has not changed
Processing random_large_files
The directory has not changed
Processing DB entries for this Drive ID: bc7d88ec1f539dcf
Processing OneDrive - Constanze
The directory has not changed
Processing OneDrive - Constanze/debug_output.log
The file has not changed
Processing OneDrive - Constanze/keepass-minipc-windows.kdbx
The file has not changed
Processing OneDrive - Constanze/newfile.txt
The file has not changed
Scanning the local file system '~/OneDrive' for new data to upload ...
Skipping item - excluded by sync_list config: ./random_25k_files
Skipping item - invalid name (Contains ASCII Control Codes): ./Bad Folder/ƒ†�¯~‰
Skipping item - invalid name (Contains an invalid whitespace item): ./Bad Folder/State-of-the-art, challenges, and open issues in the integration of Internet of
Things and Cloud Computing.pdf
Skipping item - invalid name (Microsoft Naming Convention): ./forms
Skipping item - .file or .folder: ./.folder
Performing a last examination of the most recent online data within Microsoft OneDrive to complete the reconciliation process
Fetching /delta response from the OneDrive API for Drive ID: 66d53be8a5056eca
Processing API Response Bundle: 1 - Quantity of 'changes|items' in this bundle to process: 0
Finished processing /delta JSON response from the OneDrive API
No additional changes or items that can be applied were discovered while processing the data received from Microsoft OneDrive
Syncing this OneDrive Personal Shared Folder: OneDrive - Constanze
Fetching /delta response from the OneDrive API for Drive ID: bc7d88ec1f539dcf
Processing API Response Bundle: 1 - Quantity of 'changes|items' in this bundle to process: 0
Finished processing /delta JSON response from the OneDrive API
No additional changes or items that can be applied were discovered while processing the data received from Microsoft OneDrive

Sync with Microsoft OneDrive is complete

Please can you provide evidence that this code is required and has benefit.

@JC-comp
Copy link
Contributor Author

JC-comp commented Jan 9, 2024

By doing the DB scan earlier than the applicability checks and then excluded by those checks, it is needless additional processing overhead.

During the syncing process, we will scan the configured 'sync_dir' for new data to upload to OneDrive, that is, all possible paths (including those already in the DB) will be recursively traversed to search for new items.

For paths already present in the database, there is no need to reevaluate their applicability. Therefore, by moving the database scan earlier in commit (621637e), we can theoretically reduce the searching time. This optimization is based on the understanding that, in most cases, files meeting the applicability criteria would eventually undergo a database scan and the majority of the files should already be stored in the database.

Specifically for this commit (1b3125a), do you have an actual example of how and why this code is needed?

I can't recall why I added this, but it seems that it is indeed a redundant operation. I will remove it.

abraunegg added a commit that referenced this pull request Jan 13, 2024
* Remove potentially redundant applicability check of a path if this is already in the database. This is a manual merge of #2525
@abraunegg
Copy link
Owner

Closing this PR as this has been manually merged into 'alpha-5' via 635102e

@abraunegg abraunegg closed this Jan 13, 2024
@abraunegg abraunegg added this to the v2.5.0 milestone Jan 13, 2024
@abraunegg abraunegg changed the title Add caching for client side walk Perform DB scan earlier and reuse result Jan 13, 2024
@JC-comp JC-comp deleted the caching branch January 20, 2024 14:06
@abraunegg
Copy link
Owner

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Repository owner locked and limited conversation to collaborators Jan 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants