Skip to content

[feature](log) Add label and txn_id columns to audit_log for transactional SQL statements#63865

Open
heguanhui wants to merge 1 commit into
apache:masterfrom
heguanhui:feature/audit-log-add-label-txn-id
Open

[feature](log) Add label and txn_id columns to audit_log for transactional SQL statements#63865
heguanhui wants to merge 1 commit into
apache:masterfrom
heguanhui:feature/audit-log-add-label-txn-id

Conversation

@heguanhui
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: The audit_log internal table did not record transaction information (label and txn_id) for transactional SQL statements (INSERT/UPDATE/DELETE/MERGE_INTO). Users and DBAs could not trace which transaction a particular DML operation belonged to, making it difficult to correlate audit log entries with transaction metadata. Additionally, the old-style DELETE path (via DeleteHandler) did not call setOrUpdateInsertResult(), causing label/txn_id to be missing even if they were recorded.

What is changed and how it works?

  1. Add label and txn_id fields to AuditEvent — Two new @AuditField annotated public fields with corresponding builder setters.
  2. Populate label/txn_id in AuditLogHelper.logAuditLogImpl() — Only for transactional statement types (INSERT/UPDATE/DELETE/MERGE_INTO), read from ConnectContext.insertResult. Non-transactional statements (e.g., SELECT) get empty label and -1 txn_id.
  3. Add label and txn_id columns to InternalSchema.AUDIT_SCHEMAlabel as varchar(128), txn_id as bigint. Existing audit_log tables will be auto-altered by InternalSchemaInitializer.alterAuditSchemaIfNeeded().
  4. Serialize label/txn_id in AuditLoader.fillLogBuffer() — Append the two new fields to the CSV buffer for stream load into audit_log.
  5. Remove duplicate label field from LoadAuditEvent and StreamLoadAuditEvent — These subclasses now inherit label from the base AuditEvent, avoiding Class.getFields() reflection returning duplicate fields.
  6. Fix old-style DELETE path in DeleteHandler.process() — Call ctx.setOrUpdateInsertResult() after successful commit so that label/txn_id are available for audit logging.

Release note

Add label (varchar(128)) and txn_id (bigint) columns to the audit_log internal table. These columns record the transaction label and transaction ID for INSERT, UPDATE, DELETE, and MERGE INTO statements, enabling better transaction tracing and audit analysis.

Check List (For Author)

  • Test: Regression test / Unit Test
    • Regression test: test_audit_log_label_txn_id.groovy (INSERT with/without label, UPDATE, DELETE, SELECT)
    • Unit Test: AuditLogHelperTest (4 tests), AuditLogBuilderTest (8 tests), AuditEventProcessorTest, InternalSchemaInitializerTest, InternalSchemaAlterTest, AuditLogWorkloadGroupTest (20 tests total, all passed)
  • Behavior changed: Yes — audit_log table now has two additional columns; old-style DELETE now populates insertResult
  • Does this need documentation: Yes (will add separately)

Check List (For Reviewer)

  • I have added test cases for this bug fix or new feature
  • This PR will not cause performance regression
  • This PR will not break existing features

…ional SQL statements

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: The audit_log internal table did not record transaction information (label and txn_id) for transactional SQL statements (INSERT/UPDATE/DELETE/MERGE_INTO). Users and DBAs could not trace which transaction a particular DML operation belonged to, making it difficult to correlate audit log entries with transaction metadata. Additionally, the old-style DELETE path (via DeleteHandler) did not call setOrUpdateInsertResult(), causing label/txn_id to be missing even if they were recorded.

### Release note

Add `label` (varchar(128)) and `txn_id` (bigint) columns to the `audit_log` internal table. These columns record the transaction label and transaction ID for INSERT, UPDATE, DELETE, and MERGE INTO statements, enabling better transaction tracing and audit analysis.

### Check List (For Author)

- Test: Regression test / Unit Test
    - Regression test: test_audit_log_label_txn_id.groovy (INSERT with/without label, UPDATE, DELETE, SELECT)
    - Unit Test: AuditLogHelperTest (4 tests), AuditLogBuilderTest (8 tests), AuditEventProcessorTest, InternalSchemaInitializerTest, InternalSchemaAlterTest, AuditLogWorkloadGroupTest (20 tests total)
- Behavior changed: Yes - audit_log table now has two additional columns; old-style DELETE now populates insertResult
- Does this need documentation: Yes (will add separately)
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@heguanhui
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31029 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 32708331fac5f1d28ec4781cc4e9a16a5f8e3014, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17820	3981	3920	3920
q2	q3	10812	1380	803	803
q4	4682	473	351	351
q5	7586	2237	2071	2071
q6	377	172	135	135
q7	949	791	626	626
q8	9541	1756	1563	1563
q9	7130	4964	4941	4941
q10	6443	2226	1836	1836
q11	445	266	236	236
q12	691	430	295	295
q13	18238	3349	2786	2786
q14	254	259	237	237
q15	q16	819	776	705	705
q17	996	912	851	851
q18	6917	5933	5578	5578
q19	1185	1195	1069	1069
q20	523	396	258	258
q21	5797	2563	2461	2461
q22	421	354	307	307
Total cold run time: 101626 ms
Total hot run time: 31029 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4331	4238	4259	4238
q2	q3	4550	4940	4304	4304
q4	2055	2202	1360	1360
q5	4393	4292	4702	4292
q6	257	203	142	142
q7	2067	1798	1551	1551
q8	2464	2100	2152	2100
q9	8009	8003	7940	7940
q10	4892	4713	4437	4437
q11	575	425	370	370
q12	771	749	528	528
q13	3251	3613	2941	2941
q14	302	303	276	276
q15	q16	710	727	632	632
q17	1343	1312	1334	1312
q18	7990	7465	6748	6748
q19	1129	1089	1082	1082
q20	2219	2223	1939	1939
q21	5200	4505	4395	4395
q22	516	446	439	439
Total cold run time: 57024 ms
Total hot run time: 51026 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171053 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 32708331fac5f1d28ec4781cc4e9a16a5f8e3014, data reload: false

query5	4326	645	512	512
query6	340	216	199	199
query7	4224	524	313	313
query8	367	243	227	227
query9	8802	4001	3977	3977
query10	448	334	300	300
query11	5751	2402	2189	2189
query12	177	127	120	120
query13	1302	573	430	430
query14	6096	5442	5133	5133
query14_1	4427	4473	4438	4438
query15	211	206	184	184
query16	1005	454	418	418
query17	1113	736	581	581
query18	2430	479	360	360
query19	217	208	169	169
query20	134	128	128	128
query21	215	144	115	115
query22	13615	13569	13458	13458
query23	17287	16625	16270	16270
query23_1	16309	16362	16490	16362
query24	7573	1760	1307	1307
query24_1	1315	1299	1317	1299
query25	592	513	443	443
query26	1314	320	175	175
query27	2731	588	349	349
query28	4470	2021	2005	2005
query29	990	673	520	520
query30	307	247	203	203
query31	1131	1071	976	976
query32	91	76	75	75
query33	559	355	306	306
query34	1190	1116	660	660
query35	775	794	699	699
query36	1416	1427	1225	1225
query37	157	107	92	92
query38	3230	3181	3075	3075
query39	919	911	902	902
query39_1	904	886	862	862
query40	236	147	127	127
query41	71	73	68	68
query42	110	110	122	110
query43	326	334	284	284
query44	
query45	219	207	199	199
query46	1074	1182	719	719
query47	2385	2384	2264	2264
query48	398	410	313	313
query49	656	510	430	430
query50	962	369	255	255
query51	4336	4307	4395	4307
query52	106	108	93	93
query53	257	276	207	207
query54	331	283	271	271
query55	95	93	87	87
query56	313	319	310	310
query57	1427	1410	1334	1334
query58	313	275	270	270
query59	1599	1696	1425	1425
query60	320	320	294	294
query61	159	155	153	153
query62	704	647	585	585
query63	238	192	206	192
query64	2388	809	649	649
query65	
query66	1707	489	350	350
query67	29986	29725	29519	29519
query68	
query69	440	341	299	299
query70	1018	989	1004	989
query71	294	271	263	263
query72	3062	2705	2385	2385
query73	822	734	409	409
query74	5146	4943	4770	4770
query75	2698	2607	2292	2292
query76	2302	1137	755	755
query77	410	417	330	330
query78	12471	12423	11922	11922
query79	1441	1014	744	744
query80	1378	543	451	451
query81	489	280	238	238
query82	1352	161	121	121
query83	337	274	246	246
query84	262	145	112	112
query85	928	537	451	451
query86	449	352	311	311
query87	3439	3359	3192	3192
query88	3589	2692	2694	2692
query89	467	382	346	346
query90	1849	178	180	178
query91	179	178	163	163
query92	75	78	75	75
query93	1511	1467	892	892
query94	690	365	274	274
query95	688	462	339	339
query96	1007	781	357	357
query97	2732	2755	2645	2645
query98	236	224	224	224
query99	1222	1169	1028	1028
Total cold run time: 255125 ms
Total hot run time: 171053 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 96.15% (25/26) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants