Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ObjectStore] Add append API impl for LocalFileSystem #3824

Merged
merged 2 commits into from
Mar 11, 2023

Conversation

metesynnada
Copy link
Contributor

@metesynnada metesynnada commented Mar 9, 2023

Which issue does this PR close?

Closes #3740 and closes #3742 since we go down a different path.

Rationale for this change

This PR adds the LocalFileSystem implementation for the "append" API.

What changes are included in this PR?

  • Append API implementation for LocalFileSystem.

However, to simplify things, we used the tokio::fs feature which is not supported on wasm32. There is another implementation on synnada-ai#5 that alters the LocalUpload struct to support both staged and unstaged file writings.

Are there any user-facing changes?

No.

@github-actions github-actions bot added arrow Changes to the arrow crate object-store Object Store Interface labels Mar 9, 2023
@alamb alamb changed the title Append API imp for LocalFileSystem [ObjectStore] Add append API impl for LocalFileSystem Mar 9, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @metesynnada

I also verified that the semantics match what is described on

/// Returns an [`AsyncWrite`] that can be used to append to the object at `location`
///
/// A new object will be created if it doesn't already exist, otherwise it will be
/// opened, with subsequent writes appended to the end.
///
/// This operation cannot be supported by all stores, most use-cases should prefer
/// [`ObjectStore::put`] and [`ObjectStore::put_multipart`] for better portability
/// and stronger guarantees
///
/// This API is not guaranteed to be atomic, in particular
///
/// * On error, `location` may contain partial data
/// * Concurrent calls to [`ObjectStore::list`] may return partially written objects
/// * Concurrent calls to [`ObjectStore::get`] may return partially written data
/// * Concurrent calls to [`ObjectStore::put`] may result in data loss / corruption
/// * Concurrent calls to [`ObjectStore::append`] may result in data loss / corruption
///
/// Additionally some stores, such as Azure, may only support appending to objects created
/// with [`ObjectStore::append`], and not with [`ObjectStore::put`], [`ObjectStore::copy`], or
/// [`ObjectStore::put_multipart`]

cc @tustvold

@@ -194,7 +194,7 @@ impl<W: Write> Writer<W> {
}

/// A CSV writer builder
#[derive(Debug)]
#[derive(Clone, Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience of target-specific feature selection has been that they cause very strange and hard to debug problems, the ahash getrandom nonsense is a frequent source of pain.

I wonder if we could instead add a default-disabled top-level feature, maybe append_local that gates this functionality? This would also help with flagging the functionality as not part of the standard object store contract.

@ozankabak
Copy link

IMO the target-specific selection in this case is straightforward/surgical and I don't immediately see any potential headaches related to this in the future. Given that the only target this doesn't support this is WASM, keeping the selection seems OK to me.

BTW, I am not categorically against the append_local idea as it doesn't really matter for Datafusion -- we will simply enable it there. However, it seems like an avoidable complexity in general. Our initial thinking was to remove the selection when support arrives, and we already discovered workarounds in case we want to support it sooner rather than later.

@alamb, @tustvold: Let us know what you think and we can finalize accordingly. Thanks for the reviews!

@alamb
Copy link
Contributor

alamb commented Mar 11, 2023

My experience of target-specific feature selection has been that they cause very strange and hard to debug problems, the ahash getrandom nonsense is a frequent source of pain.

I wonder if we could instead add a default-disabled top-level feature, maybe append_local that gates this functionality? This would also help with flagging the functionality as not part of the standard object store contract.

I am not sure about the WASM usecase in general for object_store (as in I don't really understand why it is an an important target to support for object_store at all). If we are looking for simplifications maybe we can remove that feature support instead of making a new feature flag 🤔

I agree with @ozankabak that making a new feature flag for append seems overly complicated. I think this would be fine to merge as is.

What do you think @tustvold ?

@tustvold
Copy link
Contributor

Fine with me, was just a suggestion.

@alamb alamb merged commit 9ce0ebb into apache:master Mar 11, 2023
@alamb
Copy link
Contributor

alamb commented Mar 11, 2023

Thanks everyone!

@ursabot
Copy link

ursabot commented Mar 11, 2023

Benchmark runs are scheduled for baseline = c96274a and contender = 9ce0ebb. 9ce0ebb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

metesynnada added a commit to synnada-ai/arrow-rs that referenced this pull request Mar 16, 2023
* Append Push API

* wasm is not enabled.
metesynnada added a commit to synnada-ai/arrow-rs that referenced this pull request Mar 16, 2023
* Append Push API

* wasm is not enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Async JSON Writer Support for Async CSV Writer
5 participants