Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect maximum path length calculation for personal OneDrive #134

Closed
3 tasks done
borouhin opened this issue Aug 20, 2018 · 20 comments
Closed
3 tasks done

Incorrect maximum path length calculation for personal OneDrive #134

borouhin opened this issue Aug 20, 2018 · 20 comments
Labels
Bug Something isn't working Fixed

Comments

@borouhin
Copy link

borouhin commented Aug 20, 2018

Bug Report Details

Describe the bug

The maximum length of full path for a file/folder is determined after encoding URI for both OneDrive Business (for which this is correct) and Personal (for which this is not necessary). As the result, if there are files/folders with Unicode names long enough to exceed 430 chars when encoded as URI, but less than 430 chars themselves, incorrect entries about skipping these items regularly appear in the log file.
Interestingly, these files are, however, indeed synced correctly.

Application and Operating System Details:

To Reproduce

Steps to reproduce the behavior if not causing an application crash:

  1. Create file with long Unicode path/filename in OneDrive folder. For example, use Cyrillic names 150 characters long in total.
  2. Wait for sync.
  3. Look in onedrive.log file.
  4. Find the entry "Skipping item [path to our file here] due to the full path exceeding 430 characters (Microsoft OneDrive limitation)".
  5. Make sure the mentioned items are indeed synced without errors, contrary to what log file says,

Complete Verbose Log Output
Already described above, no specific behavior if run not in monitor mode.

Screenshots
Not applicable.

Additional context

  1. The problem is in the very beginning of uploadNewItems() function in src/sync.d. encodeComponent() should be called only for business and not personal accounts,

  2. I haven't found any definite source of information about OneDrive Personal path limits (the hyperlink mentioned in comments in uploadNewItems() function applies to OneDrive for Business only). But this can be easily checked. Create a 400 characters long Unicode path on Windows with the official client - OneDrive Personal syncs it without problems. Only increasing path length more than 430 chars, regardless of whether they are Latin or Unicode, leads to error.

  3. I haven't dug deeper in source code to clarify why files are nevertheless synced despite errors in log. My guess is because uploadNewFile() is called not only from uploadNewItems(), and the erroneous check is only present in the latter function, not in the former.
    Maybe there are cases when the files are not synced and thus the issue is more serious.

Bug Report Checklist

  • Detailed description
  • Reproduction steps (if applicable)
  • Verbose Log Output
@borouhin
Copy link
Author

Just created pull request #135 to fix this.

@abraunegg
Copy link
Owner

Thanks for the pull request.

I haven't found any definite source of information about OneDrive Personal path limits (the hyperlink mentioned in comments in uploadNewItems() function applies to OneDrive for Business only). But this can be easily checked. Create a 400 characters long Unicode path on Windows with the official client - OneDrive Personal syncs it without problems. Only increasing path length more than 430 chars, regardless of whether they are Latin or Unicode, leads to error.

This is the best reference I have for OneDrive Personal accounts being limited to 430 characters:

OneDrive/onedrive-api-docs#443 (comment)

This happens to be OneDrive Business with its 256 character limit - see this KB article. If you only care about supporting OneDrive consumer then you can target a maximum length of around 430 - see this user voice response.

The Business character limit was uplifted to 400 recently, however I do not have any new update around the Personal character limit.

The issue you have here however is where items that breach the character limit (or at least being logged as such) are still being uploaded.

@borouhin
Copy link
Author

borouhin commented Aug 21, 2018

As far as I can see, there are 2 issues indeed:

  1. Some Unicode paths are incorrectly logged as too long while they are not. This is the issue I described and proposed the pull request to fix.

  2. Perhaps, some really long paths are not treated the right way, because the check for maximum path length is made in the wrong place in code. I guess this may be the issue because my files logged as skipped were nevertheless synced. This requires further investigation.

@abraunegg
Copy link
Owner

So what needs to happen here is some long path name testing to determine what is going on under Business and Personal accounts.

The proposed fix your suggestion however 'breaks' the 430 character length when the file path is encoded for Personal accounts.

What should happen in both Business and Personal accounts:

  • If the encoded path is greater than allowed characters, then do not upload - flag as skipped due to file path length.

@borouhin
Copy link
Author

borouhin commented Aug 21, 2018

The path is not encoded for Personal accounts at all!
I don't know the internals of OneDrive, but I guess URI-encoding is only neccesary for Business accounts as they are SharePoint-based. OneDrive Personal (web-version, official client for Windows) allows uploading/downloading files with path up to 430 characters regardless of how long the path becomes when encoded.

As a test, I created nested folders with long Latin names and then with long Cyrillic names. OneDrive Personal for Windows in both cases started complaining only after 430 characters in path, regardless of whether they were Latin (not becoming longer after encoding) or Cyrillic (becoming significantly longer).

@abraunegg
Copy link
Owner

When I say 'encoded' - it is 'urlencoded' to switch ' ' to '%20'

All path's that are used are urlencoded before use against the MS Graph service, regardless of account type

@borouhin
Copy link
Author

borouhin commented Aug 21, 2018

They may be encoded for the purpose of interacting with API, but they don't seem to be encoded for the purpose of determining maximum path length.

I've just created a monstrous path in my OneDrive folder:
Testing long path in OneDrive Personal - 1\Testing long path in OneDrive Personal - 2\Testing long path in OneDrive Personal - 3\Testing long path in OneDrive Personal - 4\Testing long path in OneDrive Personal - 5\Testing long path in OneDrive Personal - 6\Testing long path in OneDrive Personal - 7\Testing long path in OneDrive Personal - 8\Testing long path in OneDrive Personal - 9\Testing long path in OneDrive Personal - 0

It is 429 characters long unencoded and it is beyond any doubt longer encoded (because of multiple spaces). OneDrive Personal for Windows syncs this path flawlessly! But if I add even one more character - it raises an error.
If I change all spaces in that path to a letter - nothing changes, it still syncs fine and it still raises an error if I try to create one more folder inside it. So the limit doesn't depend on whether there are encoded characters in the path or not.

@abraunegg
Copy link
Owner

As I said earlier - will have to diagnose what is going on with this as it 'should' be working OK

@borouhin
Copy link
Author

borouhin commented Aug 21, 2018

Sure, it needs proper investigation.

One more final note. Even with my proposed fix the limits in official client are 1 character longer. I guess it's because path in uploadNewItems() begins with dot, which actually isn't taken into consideration as well.

@borouhin
Copy link
Author

In course of transferring my personal archive to OneDrive (which has almost a million files, many of them with very long paths) I realized that my proposed patch is also not correct, as I was not aware of how exactly Unicode strings are treated in D language.
Instead of check_string.length (which returns UTF-8 bytes count), check_string.byGrapheme.walkLength (which returns the actual number of Unicode characters) should be used. With this correction, I get the same behavior with official client - errors for paths 430+ Unicode characters long, no errors otherwise.
There is one more weird problem with filename (not full path) lengths, but I need to investigate it further to submit issue properly.

@abraunegg
Copy link
Owner

The other aspect here is that the path length check needs to be against the output of this encodeComponent(path) regardless of whether this is a Business or Personal Account.

When uploading files, if a Business Account, simpleUpload is not used, as all uploads are session based and uses createUploadSession.

If a Personal Account, if the files are <4Mb, simpleUpload is used. If greater than 4Mb a session is created for those files to be uploaded.

In both cases (simpleUpload, createUploadSession) encodeComponent(filename) is used to create a valid HTML string for submitting via the POST request. The check has to be kept in place to ensure that for personal accounts the path length is validated.

@borouhin
Copy link
Author

The other aspect here is that the path length check needs to be against the output of this encodeComponent(path) regardless of whether this is a Business or Personal Account.

This assumption doesn't reflect the actual behavior of OneDrive Personal.
If I create a path of 200 Unicode non-latin characters - encodeComponent returns a string 600 characters long, which is more than maximum of 430. However, such path is indeed successfully accepted by OneDrive.

@abraunegg
Copy link
Owner

Right .. I agree .. but the advice provided by the OneDrive API team was that Personal Accounts was limited to 430. Unless that limit has changed for personal accounts, with zero documentation reference anywhere (possible) it needs to be checked manually via testing to see what the actual (new) limit is.

@borouhin
Copy link
Author

I did some testing, maybe not comprehensive, but my tests show that the limit is indeed 430 characters. But regardless of encoding.

@abraunegg
Copy link
Owner

So - there has to be some sort of "internal to OneDrive via MS Graph" handling of "string interpretation" of non-latin characters when encoded, so that a string that has 600+ such encoded characters, is interpreted as a path of 200 (which is below 430)

So essentially what the "check" needs to be is like what you have, but without the encode part for business accounts.

@borouhin
Copy link
Author

Exactly.
Maybe some more testing is necessary with complex Unicode characters (I've just tested with Cyrillic, no diacritics, combining characters etc.) I'll try to do it later.

@borouhin
Copy link
Author

borouhin commented Aug 26, 2018

One more issue with filename/path length, totally unrelated to the one discussed above, but I just don't want to populate multiple issues (I will create a separate issue if it's necessary).

If filename (not full path) about to be downloaded from OneDrive exceeds filesystem limitations (most of Linux FS have 255 bytes (this time it's bytes, not Unicode chars!) - can be seen by issuing " getconf NAME_MAX [path]" command) - an exception is raised and appears in log:
Cannot open file [very long name here] in mode 'wb' (File name too long)
The program doesn't crash, but syncing doesn't proceed until the offending file is removed/renamed.

I think filesystem limits should be checked before attempting to create file and if they are not met - the program should fail gracefully, allowing the sync to proceed.

The issue is important as both OneDrive Personal and Windows allow much longer filenames.

@abraunegg
Copy link
Owner

If filename (not full path) about to be downloaded from OneDrive exceeds filesystem limitations (most of Linux FS have 255 bytes

Yes this should be checked as part of the download sequence & tracked that way. I will open another case for this so it can be tracked.

@abraunegg abraunegg added In Progress Currently being worked on and removed Investigating labels Aug 26, 2018
abraunegg pushed a commit that referenced this issue Aug 27, 2018
* Update the handling of maximum path length calculation
@abraunegg
Copy link
Owner

PR merged

@lock
Copy link

lock bot commented Jan 5, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Something isn't working Fixed
Projects
None yet
Development

No branches or pull requests

2 participants