-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Closed
Labels
area:corekind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yet
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.7.2
What happened?
I am encountering an unexpected issue with string concatenation in Airflow 2.7.2 using Python 3.11.5. The issue only occurs in Airflow, while the same string concatenation works correctly in local unit tests.
What you think should happen instead?
The concatenated string values_str_3 should be correctly formatted and match the output of values_str and values_str_2
How to reproduce
Use the following code snippet in an Airflow DAG or script:
values = ['1234', '5678', 'ABC_123', 'xyz-calc', '2024-01-01', 'NULL', '9876', 'NULL', 'example', 42, '2024-07-28T01:23:45.678', '2024-07-28T02:34:56.789', '2024-07-28T03:45:67.890', 'user_test', 'complete', '2024-07-28T04:56:78.901', '2024-07-28T05:67:89.012', 'NULL', 'spark-calc-1234-driver', 'NULL', 'NULL', 'XYZ']
values_str_list = []
for value in values:
if isinstance(value, int):
values_str_list.append(str(value))
elif value == 'NULL':
values_str_list.append('NULL')
else:
values_str_list.append(f"'{value}'")
values_str = ','.join(values_str_list) # This concatenation works correctly
print("Concatenated string values_str:")
print(values_str)
values_str_2 = ', '.join(values_str_list) # This concatenation works correctly
print("Concatenated string values_str_2:")
print(values_str_2)
values_str_3 = ',\n '.join(values_str_list) # This concatenation does NOT work correctly
print("Concatenated string values_str_3:")
print(values_str_3)
logging.info("Concatenated string values_str_3:")
logging.info(values_str_3)
The concatenated string values_str_3 is incorrectly formatted in Airflow logs.
Logs:
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Concatenated string values_str:
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - 1234,5678,'ABC_123','xyz-calc','2024-01-01',NULL,'9876',NULL,'example',42,'2024-07-28T01:23:45.678','2024-07-28T02:34:56.789','2024-07-28T03:45:67.890','user_test','complete','2024-07-28T04:56:78.901','2024-07-28T05:67:89.012',NULL,'spark-calc-1234-driver',NULL,NULL,'XYZ'
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Concatenated string values_str_2:
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - 1234, 5678, 'ABC_123', 'xyz-calc', '2024-01-01', NULL, '9876', NULL, 'example', 42, '2024-07-28T01:23:45.678', '2024-07-28T02:34:56.789', '2024-07-28T03:45:67.890', 'user_test', 'complete', '2024-07-28T04:56:78.901', '2024-07-28T05:67:89.012', NULL, 'spark-calc-1234-driver', NULL, NULL, 'XYZ'
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Concatenated string values_str_3:
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - 1234,
5678,
'ABC_123',
'xyz-calc',
'2024-01-01',
NULL,
'9876',
NULL,
'example',
42,
'2024-07-28T01:23:45.678',
'2024-07-28T02:34:56.789',
'2024-07-28T03:45:67.890',
'user_test',
'complete',
'2024-07-28T04:56:78.901',
'2024-07-28T05:67:89.012',
NULL,
'spark-calc-1234-driver',
NULL,
'XYZ'
[2024-07-29, 22:12:39 UTC] {2044_16_subscription.py:471} INFO - Concatenated string values_str_3:
[2024-07-29, 22:12:39 UTC] {2044_16_subscription.py:472} INFO - 1234,
5678,
'ABC_123',
'xyz-calc',
'2024-01-01',
NULL,
'9876',
NULL,
'example',
42,
'2024-07-28T01:23:45.678',
'2024-07-28T02:34:56.789',
'2024-07-28T03:45:67.890',
'user_test',
'complete',
'2024-07-28T04:56:78.901',
'2024-07-28T05:67:89.012',
NULL,
'spark-calc-1234-driver',
NULL,
'XYZ'
Operating System
Linux
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else?
Interestingly, when the concatenated string is split, the result is correct:
print("Original string values_str:", values_str.split(','))
print("Original string values_str_2:", values_str_2.split(', '))
print("Original string values_str_3:", values_str_3.split(',\n '))
Logs:
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Original string values_str: ['1234', '5678', "'ABC_123'", "'xyz-calc'", "'2024-01-01'", 'NULL', "'9876'", 'NULL', "'example'", '42', "'2024-07-28T01:23:45.678'", "'2024-07-28T02:34:56.789'", "'2024-07-28T03:45:67.890'", "'user_test'", "'complete'", "'2024-07-28T04:56:78.901'", "'2024-07-28T05:67:89.012'", 'NULL', "'spark-calc-1234-driver'", 'NULL', 'NULL', "'XYZ'"]
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Original string values_str_2: ['1234', '5678', "'ABC_123'", "'xyz-calc'", "'2024-01-01'", 'NULL', "'9876'", 'NULL', "'example'", 42, "'2024-07-28T01:23:45.678'", "'2024-07-28T02:34:56.789'", "'2024-07-28T03:45:67.890'", "'user_test'", "'complete'", "'2024-07-28T04:56:78.901'", "'2024-07-28T05:67:89.012'", 'NULL', "'spark-calc-1234-driver'", 'NULL', 'NULL', "'XYZ'"]
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Original string values_str_3: ['1234', '5678', "'ABC_123'", "'xyz-calc'", "'2024-01-01'", 'NULL', "'9876'", 'NULL', "'example'", '42', "'2024-07-28T01:23:45.678'", "'2024-07-28T02:34:56.789'", "'2024-07-28T03:45:67.890'", "'user_test'", "'complete'", "'2024-07-28T04:56:78.901'", "'2024-07-28T05:67:89.012'", 'NULL', "'spark-calc-1234-driver'", 'NULL', 'NULL', "'XYZ'"]
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:corekind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yet