-
Notifications
You must be signed in to change notification settings - Fork 210
Description
Title
Implement Native Vectorized overlay Function for Replacing Substrings in Strings
Abstract
Introduce a native, vectorized overlay function to implement substring replacement functionality in strings. This function replaces a specified portion of a string with a given substring, aligning with Apache Spark SQL semantics. It supports common string types and optimizes performance on large datasets.
Background and Motivation
overlay is a commonly used string manipulation function in Spark SQL, designed to replace a portion of a string at a specified position. It is widely used in text processing and data cleaning scenarios. By leveraging DataFusion's native support, we aim to implement the vectorized overlay function for Spark, enhancing performance and reducing resource consumption.