Skip to content

Return num_affected_rows from sql INSERT statement #15583

@hudi-bot

Description

@hudi-bot

Currently when running spark sql DML, in order to check how many rows were affected, users need to get to the commit stats using hudi cli or stored procedure.

We can improve user experience by returning num_affected_rows after INSERT INTO command, so that spark sql users can easily see how many rows were inserted without the need to go to the commits itself.

num_affected_rows can be extracted in writer itself form commitMetadata

Example:
{code:java}
spark.sql("""
create table test_mor (id int, name string)
using hudi
tblproperties (primaryKey = 'id', type='mor');
""")

spark.sql(
"""
INSERT INTO test_mor
VALUES 
(1, "a"),
(2, "b"),
(3, "c"),
(4, "d"),
(5, "e"),
(6, "f"),
(7, "g")
""").show()

returns:
+-----------------+
|num_affected_rows|
+-----------------+
|                7|
+-----------------+
{code}

JIRA info

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions