Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate filename length to conform with Linux FS limits before download #142

Closed
abraunegg opened this issue Aug 26, 2018 · 14 comments
Closed
Assignees
Labels
Bug Something isn't working Fixed
Milestone

Comments

@abraunegg
Copy link
Owner

abraunegg commented Aug 26, 2018

If the file that resides on OneDrive is longer than the filename length allowed by the Linux FS (255 bytes), then the sync fails to continue until the file on OneDrive is renamed or removed.

A check needs to be added when downloading a file that it conforms to the Linux FS limits & prints an applicable message if the filename length is longer than the FS limit and is skipped.

Further details here:

#134 (comment)

One more issue with filename/path length, totally unrelated to the one discussed above, but I just don't want to populate multiple issues (I will create a separate issue if it's necessary).

If filename (not full path) about to be downloaded from OneDrive exceeds filesystem limitations (most of Linux FS have 255 bytes (this time it's bytes, not Unicode chars!) - can be seen by issuing " getconf NAME_MAX [path]" command) - an exception is raised and appears in log:
Cannot open file [very long name here] in mode 'wb' (File name too long)
The program doesn't crash, but syncing doesn't proceed until the offending file is removed/renamed.

I think filesystem limits should be checked before attempting to create file and if they are not met - the program should fail gracefully, allowing the sync to proceed.

The issue is important as both OneDrive Personal and Windows allow much longer filenames.

@abraunegg abraunegg added Bug Something isn't working Investigating labels Aug 26, 2018
@abraunegg
Copy link
Owner Author

@borouhin
Can you please help validate this PR:

git clone https://github.com/abraunegg/onedrive.git
cd onedrive
git fetch origin pull/149/head:pr149
git checkout pr149
make
make install

@abraunegg abraunegg added In Progress Currently being worked on Fixed Waiting on Response and removed Investigating In Progress Currently being worked on labels Aug 27, 2018
@borouhin
Copy link

borouhin commented Aug 29, 2018

@abraunegg I'll test this version ASAP. Currently I'm forced to resync my OneDrive because of the other issues (I've submitted #151 and I'm still investigating another one) Every resync takes almost 1 day (700.000 files, 600 Gb...)

@borouhin
Copy link

borouhin commented Aug 29, 2018

@abraunegg This version just crashes when I create a very long filename:

Syncing changes from OneDrive ...
Applying changes of Path ID: AB629B1F05D0A26E!137
Downloading file temp/абвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзи ... Cannot open file `temp/абвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзи' in mode `wb' (File name too long)
Syncing changes from OneDrive ...
Applying changes of Path ID: AB629B1F05D0A26E!137
Downloading file temp/абвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзи ... Cannot open file `temp/абвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзи' in mode `wb' (File name too long)
Syncing changes from OneDrive ...
Applying changes of Path ID: AB629B1F05D0A26E!137
Downloading file temp/абвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзи ... std.exception.ErrnoException@std/stdio.d(430): Cannot open file `temp/абвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзи' in mode `wb' (File name too long)
----------------
/home/alb/dlang/dmd-2.081.2/linux/bin64/../../src/phobos/std/exception.d:515 @safe void std.exception.bailOut!(std.exception.ErrnoException).bailOut(immutable(char)[], ulong, scope const(char)[]) [0xbaa9e307]
??:? @safe shared(core.stdc.stdio._IO_FILE)* std.exception.enforce!(std.exception.ErrnoException).enforce!(shared(core.stdc.stdio._IO_FILE)*).enforce(shared(core.stdc.stdio._IO_FILE)*, lazy const(char)[], immutable(char)[], ulong) [0xbab8b452]
??:? ref @safe std.stdio.File std.stdio.File.__ctor(immutable(char)[], scope const(char)[]) [0xbab4545c]
src/onedrive.d:353 void onedrive.OneDriveApi.download(const(char)[], immutable(char)[], long) [0xbab05fb7]
src/onedrive.d:131 void onedrive.OneDriveApi.downloadById(const(char)[], const(char)[], immutable(char)[], long) [0xbab04b0a]
src/sync.d:662 void sync.SyncEngine.downloadFileItem(itemdb.Item, immutable(char)[]) [0xbab11f6b]
src/sync.d:600 void sync.SyncEngine.applyNewItem(itemdb.Item, immutable(char)[]) [0xbab11a9f]
src/sync.d:574 void sync.SyncEngine.applyDifference(std.json.JSONValue, immutable(char)[], bool) [0xbab1194b]
src/sync.d:437 void sync.SyncEngine.applyDifferences(immutable(char)[], const(char)[]) [0xbab10e67]
src/sync.d:201 void sync.SyncEngine.applyDifferences() [0xbab0ff55]
src/main.d:421 void main.performSync(sync.SyncEngine, immutable(char)[], bool, bool, bool) [0xbaaf57ea]
src/main.d:289 _Dmain [0xbaaf5073]

@abraunegg
Copy link
Owner Author

Thanks for the feedback

@abraunegg abraunegg changed the title Validate filename length before download to conform with Linux FS limits Validate filename length to conform with Linux FS limits before download Sep 13, 2018
@abraunegg abraunegg self-assigned this Sep 15, 2018
abraunegg added a commit that referenced this issue Sep 16, 2018
* Rework solution for issue #142 to be similar to skipping downloads when malware is detected
@abraunegg
Copy link
Owner Author

abraunegg commented Sep 16, 2018

@borouhin
Can you validate this updated PR for this issue please:

git clone https://github.com/abraunegg/onedrive.git
cd onedrive
git fetch origin pull/180/head:pr180
git checkout pr180
make
make install

Local validation:

Loading config ...
Using Config Dir: <redacted>
No config file found, using defaults
Initializing the OneDrive API ...
Opening the item database ...
All operations will be performed in: <redacted>
Initializing the Synchronization Engine ...
Account Type: personal
Default Drive ID: <redacted>
Default Root ID: <redacted>
Remaining Free Space: 112742891498
Fetching details for OneDrive Root
OneDrive Root exists in the database
Syncing changes from OneDrive ...
Applying changes of Path ID: <redacted>
Downloading file temp/абвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзи.txt ... ERROR: Cannot open file `temp/абвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзиабвгдеёжзи.txt' in mode `wb' (File name too long)
Uploading differences of .
Processing root
The directory has not changed
Processing temp
The directory has not changed
Uploading new items of .
Applying changes of Path ID: <redacted>

@abraunegg
Copy link
Owner Author

@borouhin
Any update on testing this PR?

@borouhin
Copy link

@abraunegg, I'm sorry, but you shouldn't count on me for testing anytime soon.

I just want to keep my files in sync. It's 10:50 PM now here. My last sync session started at 06:30 AM and it's still (!) rescanning every single folder and file in my OneDrive directory. Yes, it's a lot of files. But OneDrive Windows client keeps them in sync without noticeable delays (after initial sync), Dropbox for Linux client (I moved to OneDrive from Dropbox after their announcement to drop other FS support except ext4) did the same job with the same files quickly enough not to notice it.
It looks like a fundamental design flaw in this program, which cannot be fixed by individual patches.

Every test leads to restart of sync job, and my home server is constantly loaded with onedrive process, which impacts performance of other tasks. Sure, I could run onedrive on another machine dedicated to testing with a single small folder, but not right now, not enough time, sorry.

@abraunegg
Copy link
Owner Author

@borouhin
Totally understand.

Can you give me some filesystem stats of your OneDrive folder - number of files / folders? I will see if I can create a similar nested tree with files to test against, rather than the ~300Mb test set if files that I use.

@borouhin
Copy link

borouhin commented Oct 1, 2018

@abraunegg, currently (not all of my file archive is yet transferred to OneDrive):

$ du -sh ~/OneDrive/
560G    /home/alb/OneDrive/
$ find ~/OneDrive/ -type d -print | wc -l
47773
$ find ~/OneDrive/ -type f -print | wc -l
505230

P.S. Sorry for long delay in reply, I was on vacation abroad.

@abraunegg
Copy link
Owner Author

@borouhin
Thanks - working on #157 at the moment as it requires a full re-write of many aspects. In the test code I have written I am getting some good scanning performance differences for file / directory change evaluation.

@abraunegg abraunegg added this to the 2.1.3 milestone Oct 3, 2018
@borouhin
Copy link

borouhin commented Oct 3, 2018

A bit more statistics.

After vacation, I started onedrive in monitor mode again. There were some changes on both ends (in cloud and on the machine where I run the program). The number of files/directory/Gb - see above. Running with verbose flag, so I can track down errors (it slows down the process by itself, but otherwise I couldn't see what's exactly happening).

Timeline:
10:27 - started program, multiple "Monitor directory: ..." log entries for each dir
10:32 - finished enumerating monitored dirs, started downloading new files
12:30 - finished downloads (it wasn't extremely slow taking into account my connection speed), started deleting files removed from cloud
12:36 - finished deleting (not so much files, could be definitely faster), logged "Uploading differences of ." and started enumerating every single local dir and file saying that it "had not changed"
16:53 (yes, in 4,5 hours with 100% CPU usage!) - finished checking for changes
16:58 - uploaded 1 single file that indeed changed locally
17:03 - downloaded 2 files changed in cloud while running previous tasks
17:03 - logged "Monitor directory: ..." for each newly downloaded dir
17:04 - again!!! "Uploading differences of ." and enumerating every single local dir and file saying that it "had not changed"... I've stopped the program.

No errors, no crashes or restarts - but the program constantly wants to rescan the whole local folder again and again instead of actually monitoring directories and reacting to single changes.

@abraunegg
Copy link
Owner Author

@borouhin
This discussion is probably better under #157 btw

No errors, no crashes or restarts - but the program constantly wants to rescan the whole local folder again and again instead of actually monitoring directories and reacting to single changes.

So this is how the client operates. In 'monitor' mode, it uses inotify to watch for file changes locally, however at every 'interval' performs a full scan - locally & remote. This is currently consistent with how the original client operates as well.

What we could do here is 'change' how monitor mode actually operates - monitor via inotify local changes, but also then only check OneDrive for remote changes to download at the specified interval. This would negate having to enumerate through all of the local files ever X period, of which - if there 'was' a file changed locally, it would have already been picked up through inotify.

Thoughts?

abraunegg added a commit that referenced this issue Oct 3, 2018
…its #142 (#180)

* Validate filename length before download to conform with Linux FS limits
@abraunegg
Copy link
Owner Author

PR #180 merged

@lock
Copy link

lock bot commented Jan 5, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Something isn't working Fixed
Projects
None yet
Development

No branches or pull requests

2 participants