Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-767: [C++] Filesystem abstraction #4225

Closed
wants to merge 1 commit into from

Conversation

@pitrou
Copy link
Contributor

pitrou commented Apr 29, 2019

Add a FileSystem interface plus a MockFileSystem implementation (only keeping data in memory).

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented Apr 29, 2019

Submitted for discussion. There are some open questions here:

  1. should paths use std::string or a dedicated abstraction?
  2. should filesystems be anchored on a specific folder or just at the root of the corresponding resource?

My answers:

  1. std::string is more readily usable and a path abstraction is overkill for the kind of simple use cases for the filesystem API
  2. filesystems should just start at the root directory of the resource
@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented Apr 29, 2019

/// Move / rename a file or directory.
///
/// The destination will be replaced if it exists.
virtual Status Move(const std::string& src, const std::string& dest) = 0;

This comment has been minimized.

Copy link
@emkornfield

emkornfield Apr 29, 2019

Contributor

another small nit: If we go with strings as an abstraction would it make more sense to use string_view as the api?

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

Not in my opinion, string_view is useful for performance-critical APIs, which is not the case here. Implicit conversion between string_view and std::string does not seem to work on all compilers, which would make the API more cumbersome.

This comment has been minimized.

Copy link
@wesm

wesm Apr 29, 2019

Member

I think std::string is OK

std::vector<FileStats>* out) = 0;

/// Create a folder and subfolders.
virtual Status CreateFolder(const std::string& path, bool recursive = true) = 0;

This comment has been minimized.

Copy link
@emkornfield

emkornfield Apr 29, 2019

Contributor

what are you throughts on the implementation for this, when the underlying store doesn't support folders explicitly (e.g. S3)?

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

I think we should perhaps emulate them using / as a standard delimiter (perhaps configurable when instantiating the S3 filesystem object).

This comment has been minimized.

Copy link
@wesm

This comment has been minimized.

Copy link
@martindurant

martindurant Apr 29, 2019

This conversation has been had many times over!

Specifically for S3, there are two ideas of what a folder might mean, in the absence of a real posix-like hierarchy: the simplest, that create-folder is a no-op, and you only ever infer folders if they contain things; and that a empty key which ends with '/' is to be considered an empty folder (but it could later morph to a file if data is written). The latter is the convention used by the S3 console.

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

Well, I've never used S3, so I can't make a choice myself. The question would be: how do people usually use buckets?

This comment has been minimized.

Copy link
@martindurant

martindurant Apr 29, 2019

Listing files with the S3 API allows for a delimiter/prefix mechanism, you say how you want to define the directory name separators (always "/") and what prefix you want to list, and you get back a list of keys with that prefix and no more delimiters and a list of "common prefixes" of keys with that prefix and more delimiters. That acts a lot like a posix list directory.

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

Ok, so basically using / as a delimiter to simulate directories in S3 is an appropriate approach?

This comment has been minimized.

Copy link
@martindurant

martindurant Apr 29, 2019

using / as a delimiter to simulate directories in S3

Definitely, but it needs to be done with care, mindful that it isn't truly like that (and this is the case for other key-value stores too)

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

Right, but that's more of a quality of implementation issue.

This comment has been minimized.

Copy link
@martindurant

martindurant Apr 29, 2019

Good testing, of course; but an outstanding issue on s3fs has been whether mkdir(path) followed by exists(path) should necessarily always return True. It is debatable!

/// Delete a file.
virtual Status DeleteFile(const std::string& path) = 0;
/// Delete many files.
virtual Status DeleteFiles(const std::vector<string>& path) = 0;

This comment has been minimized.

Copy link
@emkornfield

emkornfield Apr 29, 2019

Contributor

Is there a concrete use-case for batch delete? I would imagine most file systems don't support transactional type deletes, so it might pay to expose this at a higher level instead of the core filesystem

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

I suspect something like deleting a Parquet dataset with a large number of files/partitions?
The intention is not to allow transactional deletes but rather to avoid a roundtrip per each individual delete (which may hurt a lot if accessing a remote filesystem with 100+ms latencies).

This comment has been minimized.

Copy link
@martindurant

martindurant Apr 29, 2019

Yes, certainly, some file-systems will provide shortcuts for bulk operations or various types, and they will be very important in many use cases. In general, it would be exceptionally useful if file-systems provide glob and directory-tree-based operations.

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

Do we need globbing at the filesystem level or can we give the client a complete list of directory contents and do the filtering on the client side?

This comment has been minimized.

Copy link
@wesm

wesm Apr 29, 2019

Member

Might want to make the default implementation of this call DeleteFile on each path. Are there any filesystems that do support multiple-deletes in a single function call / RPC?

This comment has been minimized.

Copy link
@martindurant

martindurant Apr 29, 2019

If the server can do glob, rather than having to issue many list-dir commands and filter ourselves, this would apply to that too. I'm not sure that any of the services you are thinking about here do - the only one I can think of is SSH.

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

Note that recursively listing, in the current proposal, is possible with GetTargetStats(const Selector& select, ...).

This comment has been minimized.

Copy link
@fsaintjacques

fsaintjacques Apr 30, 2019

Contributor

I do not think that Arrow should be concerned with deleting upstream files. At least in this iteration.

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 30, 2019

Author Contributor

Pragmatically, deleting files can be useful for testing :-)

This comment has been minimized.

Copy link
@wesm

wesm Apr 30, 2019

Member

@fsaintjacques we need to be able to delete files to roll back incomplete writes

cpp/src/arrow/filesystem/filesystem.h Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/filesystem.h Outdated Show resolved Hide resolved
std::vector<FileStats>* out) = 0;

/// Create a folder and subfolders.
virtual Status CreateFolder(const std::string& path, bool recursive = true) = 0;

This comment has been minimized.

Copy link
@wesm
/// Delete a file.
virtual Status DeleteFile(const std::string& path) = 0;
/// Delete many files.
virtual Status DeleteFiles(const std::vector<string>& path) = 0;

This comment has been minimized.

Copy link
@wesm

wesm Apr 29, 2019

Member

Might want to make the default implementation of this call DeleteFile on each path. Are there any filesystems that do support multiple-deletes in a single function call / RPC?

/// Move / rename a file or directory.
///
/// The destination will be replaced if it exists.
virtual Status Move(const std::string& src, const std::string& dest) = 0;

This comment has been minimized.

Copy link
@wesm

wesm Apr 29, 2019

Member

Could call this RenameFile (as TF does) instead of Move. I note that Move operations on directories may not be natively supported and may have to be emulated for e.g. S3

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 29, 2019

Author Contributor

I don't know. Move implies that the destination could be clobbered. Rename also implies you're staying in the same directory.

/// Move / rename a file or directory.
///
/// The destination will be replaced if it exists.
virtual Status Move(const std::string& src, const std::string& dest) = 0;

This comment has been minimized.

Copy link
@wesm

wesm Apr 29, 2019

Member

I think std::string is OK

cpp/src/arrow/filesystem/filesystem.h Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/filesystem.h Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/filesystem.h Show resolved Hide resolved
@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented Apr 29, 2019

Design question: should we aim for this to be wrappable in a Python fsspec-compliant object? @wesm

@martindurant

This comment has been minimized.

Copy link

martindurant commented Apr 29, 2019

should we aim for this to be wrappable in a Python fsspec-compliant object

I expect you want to surpass and replace that, which would be fine with me (except I have have probably too many biases/ideas on how it should function...)

Note that all fsspec implementations are all currently co-derived from pyarrow file-system, so they do work with the current pyarrow stack just fine. I notice that you are going exclusively with tensorF naming convention here, which is not the case for the old pyarrow file-system class, so there is room for confusion.

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented Apr 29, 2019

I'm actually not really looking at the TensorFlow naming, just using whatever feels reasonable to me :-)

(the fsspec API models a bit too much on Unix naming IMHO)

@martindurant

This comment has been minimized.

Copy link

martindurant commented Apr 29, 2019

the fsspec API models a bit too much on Unix naming

From Wes's comment at some point, many of the methods are also aliased:
https://github.com/martindurant/filesystem_spec/blob/master/fsspec/spec.py#L7 , e.g., mv and rename are the identical.

@wesm

This comment has been minimized.

Copy link
Member

wesm commented Apr 29, 2019

@pitrou I think we should take a minimalist approach for the time being to address the use cases we have in front of us (reading and writing datasets in a variety of formats)

Copy link
Contributor

aregm left a comment

@pitrou I am not quite sure I understand the problem statement - what is the problem that you want to solve, that STL is not solving? (https://en.cppreference.com/w/cpp/filesystem)

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented Apr 29, 2019

@aregm Two things:

  1. The filesystem API is an abstract API that should allow to access arbitrary filesystems (e.g. S3, Hadoop...), while the C++ std::filesystem is a concrete library to access only the local filesystem.
  2. std::filesystem is C++17, but Arrow remains C++11-compliant.
@wesm

This comment has been minimized.

Copy link
Member

wesm commented Apr 29, 2019

@aregm this is discussed in the Filesystems section in https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=sharing which has been discussed on the mailing list

@aregm

This comment has been minimized.

Copy link
Contributor

aregm commented Apr 29, 2019

@aregm this is discussed in the Filesystems section in https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=sharing which has been discussed on the mailing list

Then why not to expand the standard filsystem library instead of creating another abstraction layer? TensorFlow was designed before C++17, and with a different goal in mind.

@aregm

This comment has been minimized.

Copy link
Contributor

aregm commented Apr 29, 2019

@aregm Two things:

  1. The filesystem API is an abstract API that should allow accessing arbitrary filesystems (e.g. S3, Hadoop...), while the C++ std::filesystem is a concrete library to access only the local filesystem.

It is not only concrete library, but it is also a standard API, and we are talking about the API, not implementation. The implementation in Linux is a concrete library. Basing on the standard one allows to easily derive from it and reuse whatever is needed on AWS C++ SDK, HDFS API for remote, and fallback to the STL when local. This will preserve clean and familiar API and keep LSP working, which makes things simpler, and we will not add another API.

  1. std::filesystem is C++17, but Arrow remains C++11-compliant.

You can't avoid going 17. It's there, and there is no decent compiler, that does not support it. So this is not a case.

@aregm

This comment has been minimized.

Copy link
Contributor

aregm commented Apr 29, 2019

@pitrou

@aregm Two things:

  1. The filesystem API is an abstract API that should allow accessing arbitrary filesystems (e.g. S3, Hadoop...), while the C++ std::filesystem is a concrete library to access only the local filesystem.

It is not only concrete library, but it is also a standard API, and we are talking about the API, not implementation. The implementation in Linux is a concrete library. Basing on the standard one allows to easily derive from it and reuse whatever is needed on AWS C++ SDK, HDFS API for remote, and fallback to the STL when local. This will preserve clean and familiar API and keep LSP working, which makes things simpler, and we will not add another API.

  1. std::filesystem is C++17, but Arrow remains C++11-compliant.

You can't avoid going 17. It's there, and there is no decent compiler, that does not support it. So this is not a case.

@wesm

This comment has been minimized.

Copy link
Member

wesm commented Apr 29, 2019

@aregm there's a number of details in our platform design that you may be glossing over a bit. We've defined all of our own abstractions for memory management (e.g. arrow::Buffer), memory allocation, different kinds of files (see https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/interfaces.h -- these are similar to TensorFlow's notion of file interfaces), etc.

At this point redesigning around std::filesystem and other STL interfaces, even if C++17 were on the table (I don't think it is), might not meet our requirements. The approach here is consistent with the general development strategy of the project so far

@martindurant

This comment has been minimized.

Copy link

martindurant commented Apr 29, 2019

it is also a standard API

indeed, but there are many of these to draw on. From my (biased) point of view, the Filesystem libraryAPI is incomplete (my reference, as always) and contains a lot that is unnecessary (blocks and links and such)

@aregm

This comment has been minimized.

Copy link
Contributor

aregm commented Apr 30, 2019

@wesm

@aregm there's a number of details in our platform design that you may be glossing over a bit. We've defined all of our own abstractions for memory management (e.g. arrow::Buffer), memory allocation, different kinds of files (see https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/interfaces.h -- these are similar to TensorFlow's notion of file interfaces), etc.

At this point redesigning around std::filesystem and other STL interfaces, even if C++17 were on the table (I don't think it is), might not meet our requirements. The approach here is consistent with the general development strategy of the project so far

Wes, can you, please, point to the design and strategy docs, because there are lots of silent assumptions that you have in mind with other guys, that I am not aware of, as they implicitly assumed. One of which is the requirements for filesystem - I completely understand why memory management should be custom, Arrow is a framework for building various in-memory data analytical and database systems, which implies optimized and efficient memory management. A thing you cannot get from stdlibc++. What are the requirements or the design philosophy for the FS - I do not know, so assume that do not add more entities without need is the right strategy. So any design doc is appreciated. I'll just stack them and add links to the maillist and Confluence so everyone will be aware.

/// Any symlink is automatically dereferenced, recursively
virtual Status GetTargetStats(const std::string& path, FileStats* out);
/// Same, for many targets at once.
virtual Status GetTargetStats(const std::vector<std::string>& paths,

This comment has been minimized.

Copy link
@fsaintjacques

fsaintjacques Apr 30, 2019

Contributor

What's the behavior on partial failures? Return nothing with status? How does the consumer know which file failed?

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 30, 2019

Author Contributor

Depends on what failures you have in mind. A non-existent file returns successfully with the NonExistent file type. An IO error (e.g. network error to the S3 server) will probably translate into an error Status.

This comment has been minimized.

Copy link
@fsaintjacques

fsaintjacques Apr 30, 2019

Contributor

My comment is more regarding partial failures, say you want stats for [a, b, c, d] and returns [stat_a, stat_b, FAIL, stat_d]. What should be the returned array, and how do you know which of the input failed.

Usually, the return type would be Array<Result<FileStat, FileError>>, but your current signature is Result<Array<FileStat>, FileError>.

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 30, 2019

Author Contributor

Well, can you give an example of a partial failure that's not an exceptional condition?

This comment has been minimized.

Copy link
@pitrou

pitrou Apr 30, 2019

Author Contributor

Note that, for the purposes of this API, most errors can be modelled as NonExistent.
For example, in the POSIX stat() docs, the errors EACCESS, ELOOP, ENAMETOOLONG, ENOENT, ENOTDIR can be mapped to NonExistent.
We're left with the errors EIO and EOVERFLOW (plus ENOMEM on Linux), which should be truly exceptional, and EBADF and EINVAL, which point to bugs in the code.

/// Delete a file.
virtual Status DeleteFile(const std::string& path) = 0;
/// Delete many files.
virtual Status DeleteFiles(const std::vector<string>& path) = 0;

This comment has been minimized.

Copy link
@fsaintjacques

fsaintjacques Apr 30, 2019

Contributor

I do not think that Arrow should be concerned with deleting upstream files. At least in this iteration.

cpp/src/arrow/filesystem/filesystem.h Show resolved Hide resolved
@wesm

This comment has been minimized.

Copy link
Member

wesm commented Apr 30, 2019

@aregm I don't have detailed design documents about our IO platform -- there are 3+ years worth of JIRA issues and pull requests (and associated discussions), you'll have to review the changelog / history of the project to get a more nuanced idea about how we've arrived at this point.

Broadly speaking, this is our approach in this project:

  • Reference's counted memory management with encapsulated zero copy semantics, i.e. arrow::Buffer and its implementations

  • File interfaces (stream and random access) that are aware of our approach to memory management and zero copy. Arrow's "serialization" interfaces are able then to interact with files with details about zero copy (or not) not leaking into their implementation

  • File system encapsulation: place local filesystem, HDFS, S3, Google Cloud, Azure, and other remote file systems behind a common API. This API must use our file interfaces (which are aware of our memory model) and interoperate with other abstractions we have developed.

The STL is much more general purpose and not aware of the abstractions we've developed and specialized requirements around zero-copy semantics that we have. I do not think it is viable for our purposes.

If you disagree with the project's overall approach to memory management, IO, and interacting with remote data please start a mailing list discussion about it.

@pitrou pitrou force-pushed the pitrou:ARROW-767-fs-abstraction branch from b285cfe to 4723fd6 Apr 30, 2019
@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented Apr 30, 2019

I've added a RandomAccessFile-returning API. I think demanding a memory-mapped file is premature optimization.

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented Apr 30, 2019

@wesm

Are there any filesystems that do support multiple-deletes in a single function call / RPC?

At least S3 does: https://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented Apr 30, 2019

Side note: I would be inclined to create a InputFile interface that only has ReadAt, GetSize and Close (but neither Read, Seek nor Tell). RandomAccessFile would inherit from both InputStream and InputFile for compatibility.

Context: https://issues.apache.org/jira/browse/ARROW-2835

@wesm

This comment has been minimized.

Copy link
Member

wesm commented Apr 30, 2019

@pitrou let us leave the memory-mapped file question for a future PR where we can discuss it in more detail

@aregm

This comment has been minimized.

Copy link
Contributor

aregm commented Apr 30, 2019

If you disagree with the project's overall approach to memory management, IO, and interacting with remote data please start a mailing list discussion about it.

@wesm That is the point, I do not disagree, I just want to understand.

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented May 2, 2019

Another question: should we have separate methods for moving files and directories (so MoveFile and MoveDir instead of a single Move)?

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented May 2, 2019

Other question still: is it a good idea to have compression as part of this API? Do we benefit by allowing filesystem-specific implementations of compression, or does it just create more work for filesystem implementors?

@pitrou pitrou force-pushed the pitrou:ARROW-767-fs-abstraction branch from 5d258e1 to 200e09f May 2, 2019
@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented May 2, 2019

I've started a mock filesystem implementation (in-memory) to help validate the API.

@pitrou pitrou force-pushed the pitrou:ARROW-767-fs-abstraction branch 10 times, most recently from bbdc679 to 9810082 May 6, 2019
Copy link
Member

wesm left a comment

Overall this looks reasonable as a first pass on the API. I left a few minor comments, but I think this can be merged relatively soon and we can move on to creating some implementations

Status FileSystem::DeleteFiles(const std::vector<std::string>& paths) {
Status st = Status::OK();
for (const auto& path : paths) {
st &= DeleteFile(path);

This comment has been minimized.

Copy link
@wesm

wesm May 6, 2019

Member

Should this short-circuit on the first failure? If multiple paths fail then some of the error state will get clobbered

This comment has been minimized.

Copy link
@pitrou

pitrou May 7, 2019

Author Contributor

The aim here is that the user does not have to retry deleting selectively the other files.

// A system clock time point expressed as a 64-bit (or more) number of
// nanoseconds since the epoch.
using TimePoint =
std::chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds>;

This comment has been minimized.

Copy link
@wesm

wesm May 6, 2019

Member

are there any filesystems that provide nanosecond-level information (curious)?

This comment has been minimized.

Copy link
@pitrou

pitrou May 7, 2019

Author Contributor

Apparently ext4 does.


// The file type.
FileType type() const { return type_; }
void type(FileType type) { type_ = type; }

This comment has been minimized.

Copy link
@wesm

wesm May 6, 2019

Member

set_type?

This comment has been minimized.

Copy link
@pitrou

pitrou May 7, 2019

Author Contributor

Then it should be SetType? TBH, I think the Google style guide is not very helpful for such accessor methods.

This comment has been minimized.

Copy link
@wesm

wesm May 7, 2019

Member

set_$PROP is the naming style used by Protocol Buffers at least. I agree the guidance is hazy


// The full file path in the filesystem.
std::string path() const { return path_; }
void path(const std::string& path) { path_ = path; }

This comment has been minimized.

Copy link
@wesm

wesm May 6, 2019

Member

set_path? And same per other attrs below

struct ARROW_EXPORT Selector {
// The directory in which to select files.
// If the path exists but doesn't point to a directory, this should be an error.
std::string base_dir;

This comment has been minimized.

Copy link
@wesm

wesm May 6, 2019

Member

Are you thinking about some kind of wildcard API at some point?

This comment has been minimized.

Copy link
@pitrou

pitrou May 7, 2019

Author Contributor

Yes, that's the idea, though it depends whether it's useful to have filesystem-specific implementations.

/// Move / rename a file or directory.
///
/// The destination will be replaced if it exists.
// XXX separate MoveFile / MoveDir ?

This comment has been minimized.

Copy link
@wesm

wesm May 6, 2019

Member

I guess it's OK to be both for now. In HDFS I wrote Rename for both directories and files

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs.h#L139

Some filesystems may have to stat the path to determine what it is, where some don't have to, so I think having a single API makes sense

This comment has been minimized.

Copy link
@pitrou

pitrou May 7, 2019

Author Contributor

Fair enough.
Note that "stat first then choose the API" is vulnerable to race conditions. But for our purposes I'm not sure we care about such issues.

This comment has been minimized.

Copy link
@wesm

wesm May 7, 2019

Member

Yeah, I think we can leave it to the particular filesystem implementation to decide what is most appropriate to do


// XXX It's not very practical to have to explicitly declare inheritance
// of default overrides.
using FileSystem::OpenOutputStream;

This comment has been minimized.

Copy link
@wesm

wesm May 6, 2019

Member

Good point, I'm not sure without thinking about it what a better way would be though (to have a with-flags and flagless version of these methods)

@@ -161,6 +161,8 @@ class ARROW_EXPORT BufferReader : public RandomAccessFile {
std::shared_ptr<Buffer> buffer() const { return buffer_; }

protected:
inline Status CheckClosed() const;

This comment has been minimized.

Copy link
@wesm

wesm May 6, 2019

Member

does inline have any effect here (curious)?

This comment has been minimized.

Copy link
@pitrou

pitrou May 7, 2019

Author Contributor

I'm hoping that it incites the compiler to inline the implementation into its callers... @fsaintjacques am I right?

This comment has been minimized.

Copy link
@bkietz

bkietz May 7, 2019

Contributor

Not sure if inline will have the desired effect, but if it is marked inline then this function should be private, not protected: if BufferReader is subclassed in another translation unit and the subclass calls CheckClosed then we'll get a linker error when the inline function isn't defined

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented May 7, 2019

@wesm Do you think it's useful to keep the auto-(de)compression option and have each filesystem implementation handle it?
(of course we can provide some helpers)

@wesm

This comment has been minimized.

Copy link
Member

wesm commented May 7, 2019

I'm concerned the auto-decompression could be a bit too magical. It might be better to instantiate the stream decompressor explicitly

@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented May 7, 2019

Ok. So the OpenFlags will become empty...

@wesm

This comment has been minimized.

Copy link
Member

wesm commented May 7, 2019

Hm. Perhaps it's better to omit the Open* methods with flags for now, then

@pitrou pitrou force-pushed the pitrou:ARROW-767-fs-abstraction branch from 9810082 to 16dfd34 May 7, 2019
@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented May 7, 2019

Ok, I addressed the review comments and a few XXXs and TODOs as well.

@pitrou pitrou force-pushed the pitrou:ARROW-767-fs-abstraction branch from 16dfd34 to 5ee4481 May 7, 2019
@pitrou pitrou changed the title [DRAFT] ARROW-767: [C++] Filesystem abstraction ARROW-767: [C++] Filesystem abstraction May 7, 2019
@pitrou pitrou force-pushed the pitrou:ARROW-767-fs-abstraction branch from 5ee4481 to a40ba52 May 8, 2019
@pitrou

This comment has been minimized.

Copy link
Contributor Author

pitrou commented May 8, 2019

Will merge once CI passes.

@pitrou pitrou closed this in 9fadcd2 May 8, 2019
@pitrou pitrou deleted the pitrou:ARROW-767-fs-abstraction branch May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.