Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-767: [C++] Filesystem abstraction #4225

Closed
wants to merge 1 commit into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Apr 29, 2019

Add a FileSystem interface plus a MockFileSystem implementation (only keeping data in memory).

@pitrou
Copy link
Member Author

pitrou commented Apr 29, 2019

Submitted for discussion. There are some open questions here:

  1. should paths use std::string or a dedicated abstraction?
  2. should filesystems be anchored on a specific folder or just at the root of the corresponding resource?

My answers:

  1. std::string is more readily usable and a path abstraction is overkill for the kind of simple use cases for the filesystem API
  2. filesystems should just start at the root directory of the resource

@pitrou
Copy link
Member Author

pitrou commented Apr 29, 2019

/// Move / rename a file or directory.
///
/// The destination will be replaced if it exists.
virtual Status Move(const std::string& src, const std::string& dest) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another small nit: If we go with strings as an abstraction would it make more sense to use string_view as the api?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in my opinion, string_view is useful for performance-critical APIs, which is not the case here. Implicit conversion between string_view and std::string does not seem to work on all compilers, which would make the API more cumbersome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think std::string is OK

std::vector<FileStats>* out) = 0;

/// Create a folder and subfolders.
virtual Status CreateFolder(const std::string& path, bool recursive = true) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you throughts on the implementation for this, when the underlying store doesn't support folders explicitly (e.g. S3)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should perhaps emulate them using / as a standard delimiter (perhaps configurable when instantiating the S3 filesystem object).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversation has been had many times over!

Specifically for S3, there are two ideas of what a folder might mean, in the absence of a real posix-like hierarchy: the simplest, that create-folder is a no-op, and you only ever infer folders if they contain things; and that a empty key which ends with '/' is to be considered an empty folder (but it could later morph to a file if data is written). The latter is the convention used by the S3 console.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I've never used S3, so I can't make a choice myself. The question would be: how do people usually use buckets?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing files with the S3 API allows for a delimiter/prefix mechanism, you say how you want to define the directory name separators (always "/") and what prefix you want to list, and you get back a list of keys with that prefix and no more delimiters and a list of "common prefixes" of keys with that prefix and more delimiters. That acts a lot like a posix list directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so basically using / as a delimiter to simulate directories in S3 is an appropriate approach?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using / as a delimiter to simulate directories in S3

Definitely, but it needs to be done with care, mindful that it isn't truly like that (and this is the case for other key-value stores too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but that's more of a quality of implementation issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good testing, of course; but an outstanding issue on s3fs has been whether mkdir(path) followed by exists(path) should necessarily always return True. It is debatable!

/// Delete a file.
virtual Status DeleteFile(const std::string& path) = 0;
/// Delete many files.
virtual Status DeleteFiles(const std::vector<string>& path) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concrete use-case for batch delete? I would imagine most file systems don't support transactional type deletes, so it might pay to expose this at a higher level instead of the core filesystem

Copy link
Member Author

@pitrou pitrou Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect something like deleting a Parquet dataset with a large number of files/partitions?
The intention is not to allow transactional deletes but rather to avoid a roundtrip per each individual delete (which may hurt a lot if accessing a remote filesystem with 100+ms latencies).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, certainly, some file-systems will provide shortcuts for bulk operations or various types, and they will be very important in many use cases. In general, it would be exceptionally useful if file-systems provide glob and directory-tree-based operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need globbing at the filesystem level or can we give the client a complete list of directory contents and do the filtering on the client side?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to make the default implementation of this call DeleteFile on each path. Are there any filesystems that do support multiple-deletes in a single function call / RPC?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the server can do glob, rather than having to issue many list-dir commands and filter ourselves, this would apply to that too. I'm not sure that any of the services you are thinking about here do - the only one I can think of is SSH.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that recursively listing, in the current proposal, is possible with GetTargetStats(const Selector& select, ...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that Arrow should be concerned with deleting upstream files. At least in this iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pragmatically, deleting files can be useful for testing :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsaintjacques we need to be able to delete files to roll back incomplete writes

cpp/src/arrow/filesystem/filesystem.h Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/filesystem.h Outdated Show resolved Hide resolved
std::vector<FileStats>* out) = 0;

/// Create a folder and subfolders.
virtual Status CreateFolder(const std::string& path, bool recursive = true) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Delete a file.
virtual Status DeleteFile(const std::string& path) = 0;
/// Delete many files.
virtual Status DeleteFiles(const std::vector<string>& path) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to make the default implementation of this call DeleteFile on each path. Are there any filesystems that do support multiple-deletes in a single function call / RPC?

/// Move / rename a file or directory.
///
/// The destination will be replaced if it exists.
virtual Status Move(const std::string& src, const std::string& dest) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could call this RenameFile (as TF does) instead of Move. I note that Move operations on directories may not be natively supported and may have to be emulated for e.g. S3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Move implies that the destination could be clobbered. Rename also implies you're staying in the same directory.

/// Move / rename a file or directory.
///
/// The destination will be replaced if it exists.
virtual Status Move(const std::string& src, const std::string& dest) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think std::string is OK

cpp/src/arrow/filesystem/filesystem.h Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/filesystem.h Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/filesystem.h Show resolved Hide resolved
@pitrou
Copy link
Member Author

pitrou commented Apr 29, 2019

Design question: should we aim for this to be wrappable in a Python fsspec-compliant object? @wesm

@martindurant
Copy link

should we aim for this to be wrappable in a Python fsspec-compliant object

I expect you want to surpass and replace that, which would be fine with me (except I have have probably too many biases/ideas on how it should function...)

Note that all fsspec implementations are all currently co-derived from pyarrow file-system, so they do work with the current pyarrow stack just fine. I notice that you are going exclusively with tensorF naming convention here, which is not the case for the old pyarrow file-system class, so there is room for confusion.

@pitrou
Copy link
Member Author

pitrou commented Apr 29, 2019

I'm actually not really looking at the TensorFlow naming, just using whatever feels reasonable to me :-)

(the fsspec API models a bit too much on Unix naming IMHO)

@martindurant
Copy link

the fsspec API models a bit too much on Unix naming

From Wes's comment at some point, many of the methods are also aliased:
https://github.com/martindurant/filesystem_spec/blob/master/fsspec/spec.py#L7 , e.g., mv and rename are the identical.

@wesm
Copy link
Member

wesm commented Apr 29, 2019

@pitrou I think we should take a minimalist approach for the time being to address the use cases we have in front of us (reading and writing datasets in a variety of formats)

Copy link
Contributor

@aregm aregm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou I am not quite sure I understand the problem statement - what is the problem that you want to solve, that STL is not solving? (https://en.cppreference.com/w/cpp/filesystem)

@pitrou
Copy link
Member Author

pitrou commented Apr 29, 2019

@aregm Two things:

  1. The filesystem API is an abstract API that should allow to access arbitrary filesystems (e.g. S3, Hadoop...), while the C++ std::filesystem is a concrete library to access only the local filesystem.
  2. std::filesystem is C++17, but Arrow remains C++11-compliant.

@wesm
Copy link
Member

wesm commented Apr 29, 2019

@aregm this is discussed in the Filesystems section in https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=sharing which has been discussed on the mailing list

@aregm
Copy link
Contributor

aregm commented Apr 29, 2019

@aregm this is discussed in the Filesystems section in https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=sharing which has been discussed on the mailing list

Then why not to expand the standard filsystem library instead of creating another abstraction layer? TensorFlow was designed before C++17, and with a different goal in mind.

@aregm
Copy link
Contributor

aregm commented Apr 29, 2019

@aregm Two things:

  1. The filesystem API is an abstract API that should allow accessing arbitrary filesystems (e.g. S3, Hadoop...), while the C++ std::filesystem is a concrete library to access only the local filesystem.

It is not only concrete library, but it is also a standard API, and we are talking about the API, not implementation. The implementation in Linux is a concrete library. Basing on the standard one allows to easily derive from it and reuse whatever is needed on AWS C++ SDK, HDFS API for remote, and fallback to the STL when local. This will preserve clean and familiar API and keep LSP working, which makes things simpler, and we will not add another API.

  1. std::filesystem is C++17, but Arrow remains C++11-compliant.

You can't avoid going 17. It's there, and there is no decent compiler, that does not support it. So this is not a case.

@aregm
Copy link
Contributor

aregm commented Apr 29, 2019

@pitrou

@aregm Two things:

  1. The filesystem API is an abstract API that should allow accessing arbitrary filesystems (e.g. S3, Hadoop...), while the C++ std::filesystem is a concrete library to access only the local filesystem.

It is not only concrete library, but it is also a standard API, and we are talking about the API, not implementation. The implementation in Linux is a concrete library. Basing on the standard one allows to easily derive from it and reuse whatever is needed on AWS C++ SDK, HDFS API for remote, and fallback to the STL when local. This will preserve clean and familiar API and keep LSP working, which makes things simpler, and we will not add another API.

  1. std::filesystem is C++17, but Arrow remains C++11-compliant.

You can't avoid going 17. It's there, and there is no decent compiler, that does not support it. So this is not a case.

@wesm
Copy link
Member

wesm commented Apr 29, 2019

@aregm there's a number of details in our platform design that you may be glossing over a bit. We've defined all of our own abstractions for memory management (e.g. arrow::Buffer), memory allocation, different kinds of files (see https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/interfaces.h -- these are similar to TensorFlow's notion of file interfaces), etc.

At this point redesigning around std::filesystem and other STL interfaces, even if C++17 were on the table (I don't think it is), might not meet our requirements. The approach here is consistent with the general development strategy of the project so far

@martindurant
Copy link

it is also a standard API

indeed, but there are many of these to draw on. From my (biased) point of view, the Filesystem libraryAPI is incomplete (my reference, as always) and contains a lot that is unnecessary (blocks and links and such)

@aregm
Copy link
Contributor

aregm commented Apr 30, 2019

@wesm

@aregm there's a number of details in our platform design that you may be glossing over a bit. We've defined all of our own abstractions for memory management (e.g. arrow::Buffer), memory allocation, different kinds of files (see https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/interfaces.h -- these are similar to TensorFlow's notion of file interfaces), etc.

At this point redesigning around std::filesystem and other STL interfaces, even if C++17 were on the table (I don't think it is), might not meet our requirements. The approach here is consistent with the general development strategy of the project so far

Wes, can you, please, point to the design and strategy docs, because there are lots of silent assumptions that you have in mind with other guys, that I am not aware of, as they implicitly assumed. One of which is the requirements for filesystem - I completely understand why memory management should be custom, Arrow is a framework for building various in-memory data analytical and database systems, which implies optimized and efficient memory management. A thing you cannot get from stdlibc++. What are the requirements or the design philosophy for the FS - I do not know, so assume that do not add more entities without need is the right strategy. So any design doc is appreciated. I'll just stack them and add links to the maillist and Confluence so everyone will be aware.

/// Any symlink is automatically dereferenced, recursively
virtual Status GetTargetStats(const std::string& path, FileStats* out);
/// Same, for many targets at once.
virtual Status GetTargetStats(const std::vector<std::string>& paths,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the behavior on partial failures? Return nothing with status? How does the consumer know which file failed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what failures you have in mind. A non-existent file returns successfully with the NonExistent file type. An IO error (e.g. network error to the S3 server) will probably translate into an error Status.

Copy link
Contributor

@fsaintjacques fsaintjacques Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is more regarding partial failures, say you want stats for [a, b, c, d] and returns [stat_a, stat_b, FAIL, stat_d]. What should be the returned array, and how do you know which of the input failed.

Usually, the return type would be Array<Result<FileStat, FileError>>, but your current signature is Result<Array<FileStat>, FileError>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, can you give an example of a partial failure that's not an exceptional condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, for the purposes of this API, most errors can be modelled as NonExistent.
For example, in the POSIX stat() docs, the errors EACCESS, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR can be mapped to NonExistent.
We're left with the errors EIO and EOVERFLOW (plus ENOMEM on Linux), which should be truly exceptional, and EBADF and EINVAL, which point to bugs in the code.

/// Delete a file.
virtual Status DeleteFile(const std::string& path) = 0;
/// Delete many files.
virtual Status DeleteFiles(const std::vector<string>& path) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that Arrow should be concerned with deleting upstream files. At least in this iteration.

cpp/src/arrow/filesystem/filesystem.h Show resolved Hide resolved
@wesm
Copy link
Member

wesm commented Apr 30, 2019

@aregm I don't have detailed design documents about our IO platform -- there are 3+ years worth of JIRA issues and pull requests (and associated discussions), you'll have to review the changelog / history of the project to get a more nuanced idea about how we've arrived at this point.

Broadly speaking, this is our approach in this project:

  • Reference's counted memory management with encapsulated zero copy semantics, i.e. arrow::Buffer and its implementations

  • File interfaces (stream and random access) that are aware of our approach to memory management and zero copy. Arrow's "serialization" interfaces are able then to interact with files with details about zero copy (or not) not leaking into their implementation

  • File system encapsulation: place local filesystem, HDFS, S3, Google Cloud, Azure, and other remote file systems behind a common API. This API must use our file interfaces (which are aware of our memory model) and interoperate with other abstractions we have developed.

The STL is much more general purpose and not aware of the abstractions we've developed and specialized requirements around zero-copy semantics that we have. I do not think it is viable for our purposes.

If you disagree with the project's overall approach to memory management, IO, and interacting with remote data please start a mailing list discussion about it.

@pitrou
Copy link
Member Author

pitrou commented Apr 30, 2019

I've added a RandomAccessFile-returning API. I think demanding a memory-mapped file is premature optimization.

@pitrou
Copy link
Member Author

pitrou commented Apr 30, 2019

@wesm

Are there any filesystems that do support multiple-deletes in a single function call / RPC?

At least S3 does: https://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html

@pitrou
Copy link
Member Author

pitrou commented Apr 30, 2019

Side note: I would be inclined to create a InputFile interface that only has ReadAt, GetSize and Close (but neither Read, Seek nor Tell). RandomAccessFile would inherit from both InputStream and InputFile for compatibility.

Context: https://issues.apache.org/jira/browse/ARROW-2835

@wesm
Copy link
Member

wesm commented Apr 30, 2019

@pitrou let us leave the memory-mapped file question for a future PR where we can discuss it in more detail

@pitrou
Copy link
Member Author

pitrou commented May 2, 2019

Other question still: is it a good idea to have compression as part of this API? Do we benefit by allowing filesystem-specific implementations of compression, or does it just create more work for filesystem implementors?

@pitrou pitrou force-pushed the ARROW-767-fs-abstraction branch from 5d258e1 to 200e09f Compare May 2, 2019 17:32
@pitrou
Copy link
Member Author

pitrou commented May 2, 2019

I've started a mock filesystem implementation (in-memory) to help validate the API.

@pitrou pitrou force-pushed the ARROW-767-fs-abstraction branch 10 times, most recently from bbdc679 to 9810082 Compare May 6, 2019 17:59
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks reasonable as a first pass on the API. I left a few minor comments, but I think this can be merged relatively soon and we can move on to creating some implementations

Status FileSystem::DeleteFiles(const std::vector<std::string>& paths) {
Status st = Status::OK();
for (const auto& path : paths) {
st &= DeleteFile(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this short-circuit on the first failure? If multiple paths fail then some of the error state will get clobbered

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aim here is that the user does not have to retry deleting selectively the other files.

// A system clock time point expressed as a 64-bit (or more) number of
// nanoseconds since the epoch.
using TimePoint =
std::chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any filesystems that provide nanosecond-level information (curious)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently ext4 does.


// The file type.
FileType type() const { return type_; }
void type(FileType type) { type_ = type; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it should be SetType? TBH, I think the Google style guide is not very helpful for such accessor methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_$PROP is the naming style used by Protocol Buffers at least. I agree the guidance is hazy


// The full file path in the filesystem.
std::string path() const { return path_; }
void path(const std::string& path) { path_ = path; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_path? And same per other attrs below

struct ARROW_EXPORT Selector {
// The directory in which to select files.
// If the path exists but doesn't point to a directory, this should be an error.
std::string base_dir;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking about some kind of wildcard API at some point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the idea, though it depends whether it's useful to have filesystem-specific implementations.

/// Move / rename a file or directory.
///
/// The destination will be replaced if it exists.
// XXX separate MoveFile / MoveDir ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's OK to be both for now. In HDFS I wrote Rename for both directories and files

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs.h#L139

Some filesystems may have to stat the path to determine what it is, where some don't have to, so I think having a single API makes sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.
Note that "stat first then choose the API" is vulnerable to race conditions. But for our purposes I'm not sure we care about such issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can leave it to the particular filesystem implementation to decide what is most appropriate to do


// XXX It's not very practical to have to explicitly declare inheritance
// of default overrides.
using FileSystem::OpenOutputStream;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'm not sure without thinking about it what a better way would be though (to have a with-flags and flagless version of these methods)

@@ -161,6 +161,8 @@ class ARROW_EXPORT BufferReader : public RandomAccessFile {
std::shared_ptr<Buffer> buffer() const { return buffer_; }

protected:
inline Status CheckClosed() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does inline have any effect here (curious)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping that it incites the compiler to inline the implementation into its callers... @fsaintjacques am I right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if inline will have the desired effect, but if it is marked inline then this function should be private, not protected: if BufferReader is subclassed in another translation unit and the subclass calls CheckClosed then we'll get a linker error when the inline function isn't defined

@pitrou
Copy link
Member Author

pitrou commented May 7, 2019

@wesm Do you think it's useful to keep the auto-(de)compression option and have each filesystem implementation handle it?
(of course we can provide some helpers)

@wesm
Copy link
Member

wesm commented May 7, 2019

I'm concerned the auto-decompression could be a bit too magical. It might be better to instantiate the stream decompressor explicitly

@pitrou
Copy link
Member Author

pitrou commented May 7, 2019

Ok. So the OpenFlags will become empty...

@wesm
Copy link
Member

wesm commented May 7, 2019

Hm. Perhaps it's better to omit the Open* methods with flags for now, then

@pitrou pitrou force-pushed the ARROW-767-fs-abstraction branch from 9810082 to 16dfd34 Compare May 7, 2019 14:22
@pitrou
Copy link
Member Author

pitrou commented May 7, 2019

Ok, I addressed the review comments and a few XXXs and TODOs as well.

@pitrou pitrou force-pushed the ARROW-767-fs-abstraction branch from 16dfd34 to 5ee4481 Compare May 7, 2019 14:41
@pitrou pitrou changed the title [DRAFT] ARROW-767: [C++] Filesystem abstraction ARROW-767: [C++] Filesystem abstraction May 7, 2019
@pitrou pitrou force-pushed the ARROW-767-fs-abstraction branch from 5ee4481 to a40ba52 Compare May 8, 2019 11:30
@pitrou
Copy link
Member Author

pitrou commented May 8, 2019

Will merge once CI passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants