[SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation by vladimirg-db · Pull Request #47424 · apache/spark

vladimirg-db · 2024-07-19T14:40:13Z

What changes were proposed in this pull request?

Use HashSet/HashMap instead of doing linear searches over the Seq. In case of 1000s of partitions this significantly improves the performance.

Why are the changes needed?

To avoid the O(n*m) passes in the PreprocessTableCreation

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs

Was this patch authored or co-authored using generative AI tooling?

No

HyukjinKwon · 2024-07-20T00:51:45Z

I think .. maybe it's related to the read/write permission in Github Actions within your fork..

vladimirg-db · 2024-07-25T08:19:25Z

Recreated my form again... Also deleted apache-spark-ci-image
#47484

github-actions bot added the SQL label Jul 19, 2024

vladimirg-db mentioned this pull request Jul 19, 2024

[SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation #47384

Closed

github-actions bot added INFRA and removed INFRA labels Jul 23, 2024

vladimirg-db and others added 2 commits July 23, 2024 12:02

Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation

4c6c325

Update build_main.yml

cae1f04

github-actions bot added the INFRA label Jul 24, 2024

Update build_main.yml

942ff5a

github-actions bot removed the INFRA label Jul 24, 2024

vladimirg-db closed this by deleting the head repository Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation#47424

[SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation#47424
vladimirg-db wants to merge 3 commits intoapache:masterfrom
vladimirg-db:vladimirg-db/get-rid-of-linear-searches-preprocess-table-creation

vladimirg-db commented Jul 19, 2024

Uh oh!

HyukjinKwon commented Jul 20, 2024

Uh oh!

vladimirg-db commented Jul 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vladimirg-db commented Jul 19, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon commented Jul 20, 2024

Uh oh!

vladimirg-db commented Jul 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants