Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Dataset] arrow::dataset::Partitioning::Default() can't be used for writing dataset #15256

Closed
kou opened this issue Jan 8, 2023 · 3 comments · Fixed by #33674
Closed
Assignees
Milestone

Comments

@kou
Copy link
Member

kou commented Jan 8, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Because arrow::dataset::DefaultPartitioning::Format() isn't implemented:

Result<PartitionPathFormat> Format(const compute::Expression& expr) const override {
return Status::NotImplemented("formatting paths from ", type_name(),
" Partitioning");
}

It's required in WriteBatch():

ARROW_ASSIGN_OR_RAISE(destination,
write_options.partitioning->Format(partition_expression));

Is it expected that we can't use arrow::dataset::Partitioning::Default() for writing dataset?
If it's expected, how about removing arrow::dataset::Partitioning::Default() because it's useless?
If it's not expected, how about implementing arrow::dataset::DefaultPartitioning::Format() like the following?

diff --git a/cpp/src/arrow/dataset/partition.cc b/cpp/src/arrow/dataset/partition.cc
index 46cdf9023c..13add35fb8 100644
--- a/cpp/src/arrow/dataset/partition.cc
+++ b/cpp/src/arrow/dataset/partition.cc
@@ -90,8 +90,7 @@ std::shared_ptr<Partitioning> Partitioning::Default() {
     }
 
     Result<PartitionPathFormat> Format(const compute::Expression& expr) const override {
-      return Status::NotImplemented("formatting paths from ", type_name(),
-                                    " Partitioning");
+      return PartitionPathFormat{"", ""};
     }
 
     Result<PartitionedBatches> Partition(

Component(s)

C++

@kou kou added the Type: bug label Jan 8, 2023
@westonpace
Copy link
Member

https://issues.apache.org/jira/browse/ARROW-15409 is probably related.

@kou
Copy link
Member Author

kou commented Jan 11, 2023

Thanks! I didn't notice it.

It seems that we want to use arrow::dataset::Partitioning::Default() for no partition write. (All data are written to the base directory.)

@westonpace
Copy link
Member

westonpace commented Jan 11, 2023

It seems that we want to use arrow::dataset::Partitioning::Default() for no partition write. (All data are written to the base directory.)

I agree. Either Partitioning::Default means "no partitioning" or we get rid of Partitioning::Default and make partitioning optional anywhere it is specified. I think your suggestion (Partitioning::Default means "all in one directory") is the easiest.

kou added a commit to kou/arrow that referenced this issue Jan 15, 2023
…ing::Default()

It writes all data into one directory.
kou added a commit to kou/arrow that referenced this issue Mar 31, 2023
…ing::Default()

It writes all data into one directory.
kou added a commit to kou/arrow that referenced this issue Apr 4, 2023
…ing::Default()

It writes all data into one directory.
westonpace pushed a commit that referenced this issue Apr 5, 2023
…efault() (#33674)

### What changes are included in this PR?

It writes all data into one directory.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

* Closes: #15256

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
@westonpace westonpace added this to the 12.0.0 milestone Apr 5, 2023
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…ing::Default() (apache#33674)

### What changes are included in this PR?

It writes all data into one directory.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

* Closes: apache#15256

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…ing::Default() (apache#33674)

### What changes are included in this PR?

It writes all data into one directory.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.

* Closes: apache#15256

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants