Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize move.move_file for moving files between different OSFS instances. #523

Merged
merged 42 commits into from
May 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
db837c5
add optimization to move
tfeldmann Feb 8, 2022
d6db460
update contributors
tfeldmann Feb 8, 2022
43dc890
add test
tfeldmann Feb 8, 2022
cb49b6e
optimization not dependent of OSFS anymore
tfeldmann Mar 24, 2022
1f7b0ff
Merge branch 'master' into move_file-optimization
tfeldmann Mar 24, 2022
448f15f
remove unneeded import
tfeldmann Mar 24, 2022
68846ce
lock src and dst for move operation
tfeldmann Mar 24, 2022
003e1b7
handle ValueError
tfeldmann Mar 24, 2022
fa1803b
fix unbound var
tfeldmann Mar 24, 2022
86adaa8
set "read_only": True in `WrapReadOnly.getmeta()`
tfeldmann Mar 25, 2022
014123b
handle ro FSs in move_file optimization
tfeldmann Mar 25, 2022
14d4fa2
add test for two different MemFS instances
tfeldmann Mar 25, 2022
a97c0c3
support FS URLs
tfeldmann Mar 25, 2022
bbe4389
clear use of fs url instead of str
tfeldmann Mar 25, 2022
69022aa
add test for read-only sources
tfeldmann Mar 25, 2022
545e016
more tests for read only sources
tfeldmann Mar 25, 2022
fd577df
clarify fallthrough and add writeable=True
tfeldmann Mar 25, 2022
4edb1e0
cleanup destination if move fails
tfeldmann Mar 25, 2022
20203d9
update changelog
tfeldmann Mar 25, 2022
ad7a970
raise exc in mange_fs if fs is not writeable
tfeldmann Mar 25, 2022
1267194
don't run tests twice in parameterized_class
tfeldmann Mar 25, 2022
b0963bf
remove unneeded import
tfeldmann Mar 25, 2022
4a102ce
update changelog
tfeldmann Mar 25, 2022
763efc2
rename test case
tfeldmann Mar 25, 2022
e490452
formatting
tfeldmann Mar 25, 2022
0c02f53
add python2.7 commonpath backport
tfeldmann Mar 25, 2022
9be0381
fix path in ResourceReadOnly exception
tfeldmann Mar 25, 2022
baf7019
test_move cleanup
tfeldmann Mar 25, 2022
69d1906
check time on if self.preserve_time is set
tfeldmann Mar 25, 2022
7993af0
update changelog wording
tfeldmann Mar 25, 2022
9cfc5dd
revert import order
tfeldmann Mar 25, 2022
54d3374
ignore flake8 C401 in _pathcompat
tfeldmann Mar 25, 2022
c051f23
fix codestyle and typecheck errors
tfeldmann Mar 25, 2022
a549250
removed duplicated test code
tfeldmann Mar 25, 2022
44cbe31
add `cleanup_dest_on_error` in `move_file`
tfeldmann Mar 25, 2022
696c4ca
use non-overlapping osfs urls
tfeldmann Mar 25, 2022
891d1fc
use OSFS instead of fs_open, rename cleanup_dst_on_error param
tfeldmann Mar 28, 2022
4d80b22
fix `supports_rename` in `WrapReadOnly`
tfeldmann Mar 28, 2022
78f6696
cleanup test_move filenames
tfeldmann Mar 28, 2022
46fbc71
Fix circular import between `fs.base`, `fs.osfs` and `fs.move`
althonos Mar 30, 2022
5675f84
cleanup and docs
tfeldmann Apr 1, 2022
da33922
test `src` first, then `dst`
tfeldmann Apr 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
([#527](https://github.com/PyFilesystem/pyfilesystem2/pull/527)).
- Mark `fs.zipfs.ReadZipFS` as a case-sensitive filesystem
([#527](https://github.com/PyFilesystem/pyfilesystem2/pull/527)).
- Optimized moving files between filesystems with syspaths.
([#523](https://github.com/PyFilesystem/pyfilesystem2/pull/523)).
- Fixed `fs.move.move_file` to clean up the copy on the destination in case of errors.
- `fs.opener.manage_fs` with `writeable=True` will now raise a `ResourceReadOnly`
exception if the managed filesystem is not writeable.
- Marked filesystems wrapped with `fs.wrap.WrapReadOnly` as read-only.


## [2.4.15] - 2022-02-07
Expand Down
1 change: 1 addition & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Many thanks to the following developers for contributing to this project:
- [Silvan Spross](https://github.com/sspross)
- [@sqwishy](https://github.com/sqwishy)
- [Sven Schliesing](https://github.com/muffl0n)
- [Thomas Feldmann](https://github.com/tfeldmann)
- [Tim Gates](https://github.com/timgates42/)
- [@tkossak](https://github.com/tkossak)
- [Todd Levi](https://github.com/televi)
Expand Down
41 changes: 41 additions & 0 deletions fs/_pathcompat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# mypy: ignore-errors
try:
from os.path import commonpath
except ImportError:
# Return the longest common sub-path of the sequence of paths given as input.
# The paths are not normalized before comparing them (this is the
# responsibility of the caller). Any trailing separator is stripped from the
# returned path.

def commonpath(paths):
"""Given a sequence of path names, returns the longest common sub-path."""

if not paths:
raise ValueError("commonpath() arg is an empty sequence")

paths = tuple(paths)
if isinstance(paths[0], bytes):
sep = b"/"
curdir = b"."
else:
sep = "/"
curdir = "."

split_paths = [path.split(sep) for path in paths]

try:
(isabs,) = set(p[:1] == sep for p in paths)
except ValueError:
raise ValueError("Can't mix absolute and relative paths")

split_paths = [[c for c in s if c and c != curdir] for s in split_paths]
s1 = min(split_paths)
s2 = max(split_paths)
common = s1
for i, c in enumerate(s1):
if c != s2[i]:
common = s1[:i]
break

prefix = sep if isabs else sep[:0]
return prefix + sep.join(common)
6 changes: 4 additions & 2 deletions fs/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

import six

from . import copy, errors, fsencode, iotools, move, tools, walk, wildcard
from . import copy, errors, fsencode, iotools, tools, walk, wildcard
from .copy import copy_modified_time
from .glob import BoundGlobber
from .mode import validate_open_mode
Expand Down Expand Up @@ -1083,10 +1083,12 @@ def movedir(self, src_path, dst_path, create=False, preserve_time=False):
ancestors is not a directory.

"""
from .move import move_dir

with self._lock:
if not create and not self.exists(dst_path):
raise errors.ResourceNotFound(dst_path)
move.move_dir(self, src_path, self, dst_path, preserve_time=preserve_time)
move_dir(self, src_path, self, dst_path, preserve_time=preserve_time)

def makedirs(
self,
Expand Down
90 changes: 59 additions & 31 deletions fs/move.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@

from .copy import copy_dir
from .copy import copy_file
from .errors import FSError
from .opener import manage_fs
from .osfs import OSFS
from .path import frombase
from ._pathcompat import commonpath

if typing.TYPE_CHECKING:
from .base import FS
Expand Down Expand Up @@ -42,6 +46,7 @@ def move_file(
dst_fs, # type: Union[Text, FS]
dst_path, # type: Text
preserve_time=False, # type: bool
cleanup_dst_on_error=True, # type: bool
):
# type: (...) -> None
"""Move a file from one filesystem to another.
Expand All @@ -53,26 +58,55 @@ def move_file(
dst_path (str): Path to a file on ``dst_fs``.
preserve_time (bool): If `True`, try to preserve mtime of the
resources (defaults to `False`).
cleanup_dst_on_error (bool): If `True`, tries to delete the file copied to
``dst_fs`` if deleting the file from ``src_fs`` fails (defaults to `True`).

"""
with manage_fs(src_fs) as _src_fs:
with manage_fs(dst_fs, create=True) as _dst_fs:
with manage_fs(src_fs, writeable=True) as _src_fs:
with manage_fs(dst_fs, writeable=True, create=True) as _dst_fs:
if _src_fs is _dst_fs:
# Same filesystem, may be optimized
_src_fs.move(
src_path, dst_path, overwrite=True, preserve_time=preserve_time
)
else:
# Standard copy and delete
with _src_fs.lock(), _dst_fs.lock():
copy_file(
_src_fs,
src_path,
_dst_fs,
dst_path,
preserve_time=preserve_time,
)
return

if _src_fs.hassyspath(src_path) and _dst_fs.hassyspath(dst_path):
lurch marked this conversation as resolved.
Show resolved Hide resolved
# if both filesystems have a syspath we create a new OSFS from a
# common parent folder and use it to move the file.
try:
src_syspath = _src_fs.getsyspath(src_path)
dst_syspath = _dst_fs.getsyspath(dst_path)
common = commonpath([src_syspath, dst_syspath])
althonos marked this conversation as resolved.
Show resolved Hide resolved
if common:
rel_src = frombase(common, src_syspath)
rel_dst = frombase(common, dst_syspath)
with _src_fs.lock(), _dst_fs.lock():
with OSFS(common) as base:
base.move(rel_src, rel_dst, preserve_time=preserve_time)
return # optimization worked, exit early
except ValueError:
# This is raised if we cannot find a common base folder.
Comment on lines +88 to +89
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exception-catching still needed, now that you check if common: ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because frombase might raise one

# In this case just fall through to the standard method.
pass
tfeldmann marked this conversation as resolved.
Show resolved Hide resolved

# Standard copy and delete
with _src_fs.lock(), _dst_fs.lock():
copy_file(
_src_fs,
src_path,
_dst_fs,
dst_path,
preserve_time=preserve_time,
)
try:
_src_fs.remove(src_path)
except FSError as e:
# if the source cannot be removed we delete the copy on the
# destination
if cleanup_dst_on_error:
_dst_fs.remove(dst_path)
raise e


def move_dir(
Expand All @@ -97,22 +131,16 @@ def move_dir(
resources (defaults to `False`).

"""

def src():
return manage_fs(src_fs, writeable=False)

def dst():
return manage_fs(dst_fs, create=True)

with src() as _src_fs, dst() as _dst_fs:
with _src_fs.lock(), _dst_fs.lock():
_dst_fs.makedir(dst_path, recreate=True)
copy_dir(
src_fs,
src_path,
dst_fs,
dst_path,
workers=workers,
preserve_time=preserve_time,
)
_src_fs.removetree(src_path)
with manage_fs(src_fs, writeable=True) as _src_fs:
with manage_fs(dst_fs, writeable=True, create=True) as _dst_fs:
with _src_fs.lock(), _dst_fs.lock():
_dst_fs.makedir(dst_path, recreate=True)
copy_dir(
src_fs,
src_path,
dst_fs,
dst_path,
workers=workers,
preserve_time=preserve_time,
)
_src_fs.removetree(src_path)
9 changes: 9 additions & 0 deletions fs/opener/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

from .base import Opener
from .errors import UnsupportedProtocol, EntryPointError
from ..errors import ResourceReadOnly
from .parse import parse_fs_url

if typing.TYPE_CHECKING:
Expand Down Expand Up @@ -282,10 +283,18 @@ def manage_fs(
"""
from ..base import FS

def assert_writeable(fs):
if fs.getmeta().get("read_only", True):
raise ResourceReadOnly(path="/")

if isinstance(fs_url, FS):
if writeable:
assert_writeable(fs_url)
yield fs_url
else:
_fs = self.open_fs(fs_url, create=create, writeable=writeable, cwd=cwd)
if writeable:
assert_writeable(_fs)
try:
yield _fs
finally:
Expand Down
8 changes: 8 additions & 0 deletions fs/wrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
Dict,
Iterator,
IO,
Mapping,
Optional,
Text,
Tuple,
Expand Down Expand Up @@ -320,3 +321,10 @@ def touch(self, path):
# type: (Text) -> None
self.check()
raise ResourceReadOnly(path)

def getmeta(self, namespace="standard"):
# type: (Text) -> Mapping[Text, object]
self.check()
meta = dict(self.delegate_fs().getmeta(namespace=namespace))
meta.update(read_only=True, supports_rename=False)
return meta
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ per-file-ignores =
tests/*:E501
fs/opener/*:F811
fs/_fscompat.py:F401
fs/_pathcompat.py:C401

[isort]
default_section = THIRD_PARTY
Expand Down
Loading