Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38704: [C++] Implement Azure FileSystem Move() via Azure DataLake Storage Gen 2 API #39904

Merged
merged 25 commits into from
Feb 10, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
c16a9c1
azurefs_test.cc: Use pcg for random number generation
felipecrv Jan 18, 2024
f001b40
azurefs_test.cc: Use different APIs to create containers/filesystems …
felipecrv Jan 18, 2024
2b3067c
azurefs_test.cc: Provide a shorter method name for path creation
felipecrv Jan 29, 2024
9af370c
Add exception.ErrorCode to the Status message
felipecrv Jan 30, 2024
a0b4b6f
Add GetBlobClient() helper function
felipecrv Jan 18, 2024
016bb39
Extract GetFileInfoOfPathWithinContainer()
felipecrv Jan 31, 2024
94ee77f
Add AcquireBlobLease() and AcquireContainerLease()
felipecrv Jan 31, 2024
8ea6b85
Add lease_id param to GetFileInfo(adlfs_client, ...)
felipecrv Jan 20, 2024
86d88c2
Add optional lease_id parameter to DeleteDirOnFileSystem
felipecrv Feb 1, 2024
22eded5
Introduce the LeaseGuard class
felipecrv Jan 31, 2024
80414f1
Add implementation skeleton of FileSystem::Move() + Impl::RenameConta…
felipecrv Jan 13, 2024
723d6af
Work around Azurite limitations and SDK/service bugs
felipecrv Jan 13, 2024
03cd298
Add checks for single and multi-container move operations
felipecrv Jan 14, 2024
cd238ec
Implement CreateContainerFromPath (one of the Move() scenarios)
felipecrv Jan 31, 2024
a66e85a
Improve utilities for creating and asserting POSIX errors
felipecrv Jan 29, 2024
35cbd4c
Implement MovePaths and a huge sets of tests for it
felipecrv Feb 2, 2024
b41356e
Add lease acquisition/release logs and MovePathsWithDataLakeAPI logs
felipecrv Feb 2, 2024
c19c574
Revert "Add lease acquisition/release logs and MovePathsWithDataLakeA…
felipecrv Feb 2, 2024
005c5c4
Incorporate early feedback by kou
felipecrv Feb 2, 2024
1955b31
Add comments after Matt Topol's feedback
felipecrv Feb 5, 2024
0fb549b
io_util.h: Add missing include
felipecrv Feb 5, 2024
86c551a
Remove calls to a method that doesn't exist
felipecrv Feb 6, 2024
280a61c
Changes based on latest comments by kou
felipecrv Feb 9, 2024
419d7df
Fix ASAN violation in NormalizerKeyValueMetadata (unrelated)
felipecrv Feb 9, 2024
89a1ce2
Merge branch 'main' into azure_move
felipecrv Feb 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
723 changes: 693 additions & 30 deletions cpp/src/arrow/filesystem/azurefs.cc

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions cpp/src/arrow/filesystem/azurefs.h
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,25 @@ class ARROW_EXPORT AzureFileSystem : public FileSystem {

Status DeleteFile(const std::string& path) override;

/// \brief Move / rename a file or directory.
///
/// There are no files immediately at the root directory, so paths like
/// "/segment" always refer to a container of the storage account and are
/// treated as directories.
///
/// If `dest` exists but the operation fails for some reason, `Move`
/// guarantees `dest` is not lost.
///
/// Conditions for a successful move:
/// 1. `src` must exist.
/// 2. `dest` can't contain a strict path prefix of `src`. More generally,
/// a directory can't be made a subdirectory of itself.
/// 3. If `dest` already exists and it's a file, `src` must also be a file.
/// `dest` is then replaced by `src`.
/// 4. All components of `dest` must exist, except for the last.
/// 5. If `dest` already exists and it's a directory, `src` must also be a
/// directory and `dest` must be empty. `dest` is then replaced by `src`
/// and its contents.
Status Move(const std::string& src, const std::string& dest) override;

Status CopyFile(const std::string& src, const std::string& dest) override;
Expand Down
460 changes: 448 additions & 12 deletions cpp/src/arrow/filesystem/azurefs_test.cc

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions cpp/src/arrow/filesystem/util_internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,21 @@ Status PathNotFound(std::string_view path) {
.WithDetail(StatusDetailFromErrno(ENOENT));
}

Status IsADir(std::string_view path) {
return Status::IOError("Is a directory: '", path, "'")
.WithDetail(StatusDetailFromErrno(EISDIR));
}

Status NotADir(std::string_view path) {
return Status::IOError("Not a directory: '", path, "'")
.WithDetail(StatusDetailFromErrno(ENOTDIR));
}

Status NotEmpty(std::string_view path) {
return Status::IOError("Directory not empty: '", path, "'")
.WithDetail(StatusDetailFromErrno(ENOTEMPTY));
}

Status NotAFile(std::string_view path) {
return Status::IOError("Not a regular file: '", path, "'");
}
Expand Down
6 changes: 6 additions & 0 deletions cpp/src/arrow/filesystem/util_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,15 @@ Status CopyStream(const std::shared_ptr<io::InputStream>& src,
ARROW_EXPORT
Status PathNotFound(std::string_view path);

ARROW_EXPORT
Status IsADir(std::string_view path);

ARROW_EXPORT
Status NotADir(std::string_view path);

ARROW_EXPORT
Status NotEmpty(std::string_view path);

ARROW_EXPORT
Status NotAFile(std::string_view path);

Expand Down
7 changes: 7 additions & 0 deletions cpp/src/arrow/util/io_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,13 @@ std::shared_ptr<StatusDetail> StatusDetailFromErrno(int errnum) {
return std::make_shared<ErrnoDetail>(errnum);
}

std::optional<int> ErrnoFromStatusDetail(const StatusDetail& detail) {
if (detail.type_id() == kErrnoDetailTypeId) {
return checked_cast<const ErrnoDetail&>(detail).errnum();
}
return std::nullopt;
}

#if _WIN32
std::shared_ptr<StatusDetail> StatusDetailFromWinError(int errnum) {
if (!errnum) {
Expand Down
3 changes: 3 additions & 0 deletions cpp/src/arrow/util/io_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

#include <atomic>
#include <memory>
#include <optional>
#include <string>
#include <utility>
#include <vector>
Expand Down Expand Up @@ -264,6 +265,8 @@ std::string WinErrorMessage(int errnum);

ARROW_EXPORT
std::shared_ptr<StatusDetail> StatusDetailFromErrno(int errnum);
ARROW_EXPORT
std::optional<int> ErrnoFromStatusDetail(const StatusDetail& detail);
#if _WIN32
ARROW_EXPORT
std::shared_ptr<StatusDetail> StatusDetailFromWinError(int errnum);
Expand Down
Loading