Skip to content

Unexpected String Concatenation Issue in Airflow 2.7.2 #41224

@artemSSSS

Description

@artemSSSS

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.7.2

What happened?

I am encountering an unexpected issue with string concatenation in Airflow 2.7.2 using Python 3.11.5. The issue only occurs in Airflow, while the same string concatenation works correctly in local unit tests.

What you think should happen instead?

The concatenated string values_str_3 should be correctly formatted and match the output of values_str and values_str_2

How to reproduce

Use the following code snippet in an Airflow DAG or script:

values = ['1234', '5678', 'ABC_123', 'xyz-calc', '2024-01-01', 'NULL', '9876', 'NULL', 'example', 42, '2024-07-28T01:23:45.678', '2024-07-28T02:34:56.789', '2024-07-28T03:45:67.890', 'user_test', 'complete', '2024-07-28T04:56:78.901', '2024-07-28T05:67:89.012', 'NULL', 'spark-calc-1234-driver', 'NULL', 'NULL', 'XYZ']

values_str_list = []
for value in values:
    if isinstance(value, int):
        values_str_list.append(str(value))
    elif value == 'NULL':
        values_str_list.append('NULL')
    else:
        values_str_list.append(f"'{value}'")

values_str = ','.join(values_str_list)  # This concatenation works correctly
print("Concatenated string values_str:")
print(values_str)

values_str_2 = ', '.join(values_str_list)  # This concatenation works correctly
print("Concatenated string values_str_2:")
print(values_str_2)

values_str_3 = ',\n    '.join(values_str_list)  # This concatenation does NOT work correctly
print("Concatenated string values_str_3:")
print(values_str_3)
logging.info("Concatenated string values_str_3:")
logging.info(values_str_3)

The concatenated string values_str_3 is incorrectly formatted in Airflow logs.

Logs:

[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Concatenated string values_str:
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - 1234,5678,'ABC_123','xyz-calc','2024-01-01',NULL,'9876',NULL,'example',42,'2024-07-28T01:23:45.678','2024-07-28T02:34:56.789','2024-07-28T03:45:67.890','user_test','complete','2024-07-28T04:56:78.901','2024-07-28T05:67:89.012',NULL,'spark-calc-1234-driver',NULL,NULL,'XYZ'
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Concatenated string values_str_2:
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - 1234, 5678, 'ABC_123', 'xyz-calc', '2024-01-01', NULL, '9876', NULL, 'example', 42, '2024-07-28T01:23:45.678', '2024-07-28T02:34:56.789', '2024-07-28T03:45:67.890', 'user_test', 'complete', '2024-07-28T04:56:78.901', '2024-07-28T05:67:89.012', NULL, 'spark-calc-1234-driver', NULL, NULL, 'XYZ'
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Concatenated string values_str_3:
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - 1234,
    5678,
    'ABC_123',
    'xyz-calc',
    '2024-01-01',
    NULL,
    '9876',
    NULL,
    'example',
    42,
    '2024-07-28T01:23:45.678',
    '2024-07-28T02:34:56.789',
    '2024-07-28T03:45:67.890',
    'user_test',
    'complete',
    '2024-07-28T04:56:78.901',
    '2024-07-28T05:67:89.012',
    NULL,
    'spark-calc-1234-driver',
    NULL,
    'XYZ'
[2024-07-29, 22:12:39 UTC] {2044_16_subscription.py:471} INFO - Concatenated string values_str_3:
[2024-07-29, 22:12:39 UTC] {2044_16_subscription.py:472} INFO - 1234,
    5678,
    'ABC_123',
    'xyz-calc',
    '2024-01-01',
    NULL,
    '9876',
    NULL,
    'example',
    42,
    '2024-07-28T01:23:45.678',
    '2024-07-28T02:34:56.789',
    '2024-07-28T03:45:67.890',
    'user_test',
    'complete',
    '2024-07-28T04:56:78.901',
    '2024-07-28T05:67:89.012',
    NULL,
    'spark-calc-1234-driver',
    NULL,
    'XYZ'

Operating System

Linux

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else?

Interestingly, when the concatenated string is split, the result is correct:

print("Original string values_str:", values_str.split(','))
print("Original string values_str_2:", values_str_2.split(', '))
print("Original string values_str_3:", values_str_3.split(',\n    '))

Logs:


[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Original string values_str: ['1234', '5678', "'ABC_123'", "'xyz-calc'", "'2024-01-01'", 'NULL', "'9876'", 'NULL', "'example'", '42', "'2024-07-28T01:23:45.678'", "'2024-07-28T02:34:56.789'", "'2024-07-28T03:45:67.890'", "'user_test'", "'complete'", "'2024-07-28T04:56:78.901'", "'2024-07-28T05:67:89.012'", 'NULL', "'spark-calc-1234-driver'", 'NULL', 'NULL', "'XYZ'"]
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Original string values_str_2: ['1234', '5678', "'ABC_123'", "'xyz-calc'", "'2024-01-01'", 'NULL', "'9876'", 'NULL', "'example'", 42, "'2024-07-28T01:23:45.678'", "'2024-07-28T02:34:56.789'", "'2024-07-28T03:45:67.890'", "'user_test'", "'complete'", "'2024-07-28T04:56:78.901'", "'2024-07-28T05:67:89.012'", 'NULL', "'spark-calc-1234-driver'", 'NULL', 'NULL', "'XYZ'"]
[2024-07-29, 22:12:39 UTC] {logging_mixin.py:151} INFO - Original string values_str_3: ['1234', '5678', "'ABC_123'", "'xyz-calc'", "'2024-01-01'", 'NULL', "'9876'", 'NULL', "'example'", '42', "'2024-07-28T01:23:45.678'", "'2024-07-28T02:34:56.789'", "'2024-07-28T03:45:67.890'", "'user_test'", "'complete'", "'2024-07-28T04:56:78.901'", "'2024-07-28T05:67:89.012'", 'NULL', "'spark-calc-1234-driver'", 'NULL', 'NULL', "'XYZ'"]

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions