spark: use hashset in column level lineage instead of iterating through linkedlist #2584

mobuchowski · 2024-04-05T16:17:31Z

Original implementation used LinkedList which iteration performance is terrible. There's no real need to use list in that place - HashSet will be more performant collection.

…gh linkedlist Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>

harels

lgtm

spark: use hashset in column level lineage instead of iterating throu…

50afacd

…gh linkedlist Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>

mobuchowski requested review from harels, pawel-big-lebowski and tnazarew April 5, 2024 16:17

boring-cyborg bot added the area:integration/spark label Apr 5, 2024

harels approved these changes Apr 5, 2024

View reviewed changes

mobuchowski merged commit ccf2286 into main Apr 5, 2024
32 checks passed

mobuchowski deleted the spark-cll-speedup branch April 5, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark: use hashset in column level lineage instead of iterating through linkedlist #2584

spark: use hashset in column level lineage instead of iterating through linkedlist #2584

mobuchowski commented Apr 5, 2024

harels left a comment

spark: use hashset in column level lineage instead of iterating through linkedlist #2584

spark: use hashset in column level lineage instead of iterating through linkedlist #2584

Conversation

mobuchowski commented Apr 5, 2024

harels left a comment

Choose a reason for hiding this comment