Currently when running spark sql DML, in order to check how many rows were affected, users need to get to the commit stats using hudi cli or stored procedure.
We can improve user experience by returning num_affected_rows after INSERT INTO command, so that spark sql users can easily see how many rows were inserted without the need to go to the commits itself.
num_affected_rows can be extracted in writer itself form commitMetadata
Example:
{code:java}
spark.sql("""
create table test_mor (id int, name string)
using hudi
tblproperties (primaryKey = 'id', type='mor');
""")
spark.sql(
"""
INSERT INTO test_mor
VALUES
(1, "a"),
(2, "b"),
(3, "c"),
(4, "d"),
(5, "e"),
(6, "f"),
(7, "g")
""").show()
returns:
+-----------------+
|num_affected_rows|
+-----------------+
| 7|
+-----------------+
{code}
JIRA info
Currently when running spark sql DML, in order to check how many rows were affected, users need to get to the commit stats using hudi cli or stored procedure.
We can improve user experience by returning num_affected_rows after INSERT INTO command, so that spark sql users can easily see how many rows were inserted without the need to go to the commits itself.
num_affected_rows can be extracted in writer itself form commitMetadata
Example:
{code:java}
spark.sql("""
create table test_mor (id int, name string)
using hudi
tblproperties (primaryKey = 'id', type='mor');
""")
spark.sql(
"""
INSERT INTO test_mor
VALUES
(1, "a"),
(2, "b"),
(3, "c"),
(4, "d"),
(5, "e"),
(6, "f"),
(7, "g")
""").show()
returns:
+-----------------+
|num_affected_rows|
+-----------------+
| 7|
+-----------------+
{code}
JIRA info