Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved fix for Unicode mistakes in atomic move #4209

Merged
merged 6 commits into from
Jan 11, 2022
Merged

Conversation

sampsyo
Copy link
Member

@sampsyo sampsyo commented Dec 27, 2021

The original problem was that this implementation of atomic moves, in #4060, used bytestrings incorrectly to invoke mktemp. This caused a regression that crashed beets when it tried to move files across filesystems (which, of course, actually involves a copy).

However, mktemp is deprecated, so this PR just avoids it altogether. Fortunately, the non-deprecated APIs in tempfile support all-bytes arguments.

Fixes #4168. Also closes #4192, which it supersedes.

Fixes #4168. Also closes #4192, which it supersedes.

The original problem is that this implementation used bytestrings
incorrectly to invoke `mktemp`. However, `mktemp` is deprecated, so this
PR just avoids it altogether. Fortunately, the non-deprecated APIs in
`tempfile` support all-bytes arguments.
@sampsyo
Copy link
Member Author

sampsyo commented Dec 27, 2021

Paging @catap, @LeoSebal, and @dertuxmalwieder to please try out this improved fix, which supersedes #4192.

@dertuxmalwieder
Copy link

I am currently using the patch from #4192 to move files from my local hard disk to a NAS and it has not crashed for me. But I will try your patch and see if I notice any difference. :-)

@catap
Copy link
Contributor

catap commented Dec 27, 2021

@sampsyo the strange things that I’m used original code and it works fine :)

anyway, I run it on macOS and probably it is OS specific things

@dertuxmalwieder
Copy link

This patch reintroduces the old bug:

  File "C:\Python311\Lib\site-packages\beets\util\__init__.py", line 500, in move
    prefix=b'.' + os.path.basename(dest),
           ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
TypeError: can't concat str to bytes

@sampsyo
Copy link
Member Author

sampsyo commented Dec 27, 2021

Thanks for trying it out, everybody! @dertuxmalwieder, do you have the complete traceback I could look at?

@dertuxmalwieder
Copy link

I already undid the patch, I am currently running beets over my whole collection, that will take a while...
I'll come back to it these days. :-)

(If nobody else does...)

This is mostly "defensive programming": clients *should* only call this
on bytestring paths, but just in case this gets called on a Unicode
string path, we should now not crash.
@sampsyo
Copy link
Member Author

sampsyo commented Dec 27, 2021

Awesome; thanks! It's partially a stab in the dark, but I have pushed an additional tweak that should almost certainly avoid that kind of problem in the odd cases where it might arise.

@dertuxmalwieder
Copy link

It seems to be hard to reproduce the crashes as there does not seem to be a pattern.

@wisp3rwind
Copy link
Member

Awesome; thanks! It's partially a stab in the dark, but I have pushed an additional tweak that should almost certainly avoid that kind of problem in the odd cases where it might arise.

I've only had a very brief look at this (and I haven't read the issues and PRs leading up to this one); but I'm not sure this fix is fully consistent yet: You're now using basename(bytestring_path(syspath(dest))) (syspath might cause the problem if on Windows, since it yields strings), whereas it might be more natural to do syspath(basename(dest)), i.e. convert using syspath only at the very edge to foreign API. Moreover, all arguments (dir, suffix, prefix) to NamedTemporaryFile should have the same type, which I think is not guaranteed right now. They probably all need to be passed through syspath (cf. mkstemp, which NamedTemporaryFile apparently delegates to).

@sampsyo
Copy link
Member Author

sampsyo commented Jan 3, 2022

Ah, you're very right about that, @wisp3rwind. Thanks for the sanity check. Let me try simplifying things with syspath in mind.

@sampsyo
Copy link
Member Author

sampsyo commented Jan 3, 2022

OK, let's give the above commit a try. The result may look a little ugly, but I think it's the right thing. The strategy is:

  • I eliminated the somewhat messy up-front syspath conversion. Instead, we use syspath directly before handing off strings to stdlib APIs, as usual.
  • To call NamedTemporaryFile, we first make sure we have bytes everywhere for the three arguments using bytestring_path.
  • Then, for Windows, we call syspath on all three arguments. This way, the arguments will all be str objects on Windows and all be bytes objects on Unix, as god intended. We need to use prefix=False for the two fragments (incomplete paths) that are passed as arguments. But using prefix=True (the default) for the dir argument means that we can still handle long paths on Windows.

@wisp3rwind
Copy link
Member

This seems to be a stubborn problem (IIRC, I'm at least partially responsible for reviewing the original PR...), thanks for dealing with it!

The result may look a little ugly, but I think it's the right thing.

I see what you mean, but I also can't think of a way to improve things.

  • To call NamedTemporaryFile, we first make sure we have bytes everywhere for the three arguments using bytestring_path.

I would claim that this is superfluous, if anything is calling move() with non-bytestring paths, that should be treated as a bug, right? In any case, the additional bytestring_paths shouldn't hurt.

  • Then, for Windows, we call syspath on all three arguments. This way, the arguments will all be str objects on Windows and all be bytes objects on Unix, as god intended. We need to use prefix=False for the two fragments (incomplete paths) that are passed as arguments. But using prefix=True (the default) for the dir argument means that we can still handle long paths on Windows.

Sounds correct to me 👍

@wisp3rwind
Copy link
Member

While we're at it, the two os.path.isdir() checks at the function's entrance should also be garnished with syspath(), shouldn't they?

@sampsyo
Copy link
Member Author

sampsyo commented Jan 6, 2022

I would claim that this is superfluous, if anything is calling move() with non-bytestring paths, that should be treated as a bug, right? In any case, the additional bytestring_paths shouldn't hurt.

Yeah, you're absolutely right that this should be considered a bug. I was just trying to "defensively program" here: to tolerate those bugs instead of crashing, just because I'm not terribly confident that every client does the right thing (given the other tracebacks I've seen). It would be good, in some future iteration, to back away from this and actually enforce this requirement (with assert isinstance(path, bytes) or similar) to see what breaks…

While we're at it, the two os.path.isdir() checks at the function's entrance should also be garnished with syspath(), shouldn't they?

Yep, good call. I'll add these.

@sampsyo
Copy link
Member Author

sampsyo commented Jan 6, 2022

Alright, with that little bit of evolution out of the way, I would be ever so grateful if the interested parties could give this a shot! I'm reasonably confident in it, however, so I'd like to merge this soonish if there are no obvious red flags.

@nopeinomicon
Copy link

Alright, with that little bit of evolution out of the way, I would be ever so grateful if the interested parties could give this a shot! I'm reasonably confident in it, however, so I'd like to merge this soonish if there are no obvious red flags.

It seems to be working on my end! I was having this issue before and your fix works. Thank you much!

@sampsyo
Copy link
Member Author

sampsyo commented Jan 11, 2022

OK, awesome, thanks, @nopeinomicon! I'll call this working for now—if anyone discovers any trouble down the line, we can amend.

@sampsyo sampsyo merged commit a0587e0 into master Jan 11, 2022
@sampsyo sampsyo deleted the atomic-move-fix branch January 11, 2022 16:18
@Gorasa
Copy link

Gorasa commented Mar 5, 2023

Will there be a bugfix version with this fix any time soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode error in new atomic move
6 participants