Skip to content

Airflow 3.2 unable to handle \0 in dag files #65379

@bernvaughn

Description

@bernvaughn

Under which category would you file this issue?

Airflow Core

Apache Airflow version

3.2.0

What happened and how to reproduce it?

Have a dag file in airflow 3.1.8 with a \0 literal character, then try to upgrade to airflow 3.2.0. Migrations will fail because \0 cannot be serialized into JSONB

field_del = '@@\0@@'
record_del = '^^\0^^'

What you think should happen instead?

We have dag(s) that process CSV files that use \0 as a delimiter, and we have the delimiter defined in the dag file. These should not prevent upgrading to 3.2.0

Operating System

rhel 8.6

Deployment

Docker-Compose

Apache Airflow Provider(s)

No response

Versions of Apache Airflow Providers

apache-airflow==3.2.0
apache-airflow-client==3.1.3
apache-airflow-providers-amazon==9.22.0
apache-airflow-providers-celery==3.15.0
apache-airflow-providers-common-compat==1.14.3
apache-airflow-providers-common-io==1.7.2
apache-airflow-providers-common-sql==1.34.0
apache-airflow-providers-fab==3.6.1
apache-airflow-providers-ftp==3.14.0
apache-airflow-providers-http==6.0.0
apache-airflow-providers-imap==3.10.0
apache-airflow-providers-postgres==6.5.0
apache-airflow-providers-sftp==5.5.0
apache-airflow-providers-smtp==2.4.5
apache-airflow-providers-snowflake==6.7.0
apache-airflow-providers-sqlite==4.2.0
apache-airflow-providers-ssh==4.2.0
apache-airflow-providers-standard==1.12.3

Official Helm Chart version

Not Applicable

Kubernetes Version

Not Applicable

Helm Chart configuration

No response

Docker Image customizations

FROM docker.io/apache/airflow:3.2.0-python3.12

USER root

RUN apt-get update && \
    apt-get install -y \
    bash \
    git \
    build-essential \
    gcc \
    libffi-dev \
    musl-dev \
    libpq-dev \
    xmlsec1 \
    postgresql

# disable strict host key checking for ssh to support saspy
RUN echo "StrictHostKeyChecking no" >> /etc/ssh/ssh_config

USER airflow

# set dontwritebytecode to prevent __pycache__ directories
# we don't need them in the container, and it is making things bloat
ENV PYTHONDONTWRITEBYTECODE=1

COPY requirements/production.txt .
# use --no-compile to prevent .pyc files from being created
# use --no-cache-dir to prevent pip from caching the downloaded packages
# both of these is to save space / prevent bloat in the container
RUN pip install -r production.txt --no-compile --no-cache-dir

COPY plugins /opt/airflow/plugins
COPY dags /opt/airflow/dags

Anything else?

We are upgrading from 3.1.8 to 3.2.0 to resolve security vulnerabilities, but the airflow migration step is failing in our deployment pipeline.

Specifically, the migration 0089 is where it is failing. This migration changes serialized_dag column to JSONB from JSON, which cannot handle the null character (literal \0) being stored in the database. We have dags that use this character to define a delimiter in some csv files we consume.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corearea:db-migrationsPRs with DB migrationarea:serializationkind:bugThis is a clearly a bugpriority:highHigh priority bug that should be patched quickly but does not require immediate new release

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions