Skip to content

Conversation

@Lordworms
Copy link
Contributor

Which issue does this PR close?

Closes #9564

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Mar 25, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @Lordworms 🙏

I left a suggestion on how to clean up the code but I don't think it is required to merge this PR.

async fn test_disable_parallel_for_json_gz() -> Result<()> {
let config = SessionConfig::new()
.with_repartition_file_scans(true)
.with_repartition_file_min_size(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

let repartition_file_min_size = config.optimizer.repartition_file_min_size;
let repartition_file_min_size =
if self.file_compression_type == FileCompressionType::GZIP {
OptimizerOptions::default().repartition_file_min_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder it it would be simpler code to return None here

Suggested change
OptimizerOptions::default().repartition_file_min_size
None

That way the rest of the code could be left alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@alamb alamb changed the title Disable paralle reading for gziped ndjson file Disable parallel reading for gziped ndjson file Mar 27, 2024
target_partitions: usize,
config: &datafusion_common::config::ConfigOptions,
) -> Result<Option<Arc<dyn ExecutionPlan>>> {
if self.file_compression_type == FileCompressionType::GZIP {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Lordworms 🙏

Looks like there is a small CI failure otherwise this PR is ready to go

@alamb alamb merged commit ce3d446 into apache:main Mar 28, 2024
@alamb
Copy link
Contributor

alamb commented Mar 28, 2024

Thanks again @Lordworms and @Dandandan 🙏

Lordworms added a commit to Lordworms/arrow-datafusion that referenced this pull request Apr 1, 2024
* for debug

* disable paralle reading for gziped ndjson file

* directly return None

* delete .gz

* fix clippy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel read for json compressed files when it should not

3 participants