Skip to content

[SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation#47424

Closed
vladimirg-db wants to merge 3 commits intoapache:masterfrom
vladimirg-db:vladimirg-db/get-rid-of-linear-searches-preprocess-table-creation
Closed

[SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation#47424
vladimirg-db wants to merge 3 commits intoapache:masterfrom
vladimirg-db:vladimirg-db/get-rid-of-linear-searches-preprocess-table-creation

Conversation

@vladimirg-db
Copy link
Contributor

What changes were proposed in this pull request?

Use HashSet/HashMap instead of doing linear searches over the Seq. In case of 1000s of partitions this significantly improves the performance.

Why are the changes needed?

To avoid the O(n*m) passes in the PreprocessTableCreation

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon
Copy link
Member

I think .. maybe it's related to the read/write permission in Github Actions within your fork..

@github-actions github-actions bot added INFRA and removed INFRA labels Jul 23, 2024
@github-actions github-actions bot added the INFRA label Jul 24, 2024
@github-actions github-actions bot removed the INFRA label Jul 24, 2024
@vladimirg-db vladimirg-db closed this by deleting the head repository Jul 25, 2024
@vladimirg-db
Copy link
Contributor Author

Recreated my form again... Also deleted apache-spark-ci-image
#47484

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants