Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to airflow 2.4.0 from 2.3.4 causes NotNullViolation error #26497

Closed
1 of 2 tasks
ManikandanUV opened this issue Sep 19, 2022 · 29 comments · Fixed by #26885
Closed
1 of 2 tasks

Upgrading to airflow 2.4.0 from 2.3.4 causes NotNullViolation error #26497

ManikandanUV opened this issue Sep 19, 2022 · 29 comments · Fixed by #26885
Labels
affected_version:2.4 Issues Reported for 2.4 AIP-48 Data-aware Scheduling area:core kind:bug This is a clearly a bug
Milestone

Comments

@ManikandanUV
Copy link

ManikandanUV commented Sep 19, 2022

Apache Airflow version

2.4.0

What happened

Stopped existing processes, upgraded from airflow 2.3.4 to 2.4.0, and ran airflow db upgrade successfully. Upon restarting the services, I'm not seeing any dag runs from the past 10 days. I kick off a new job, and I don't see it show up in the grid view. Upon checking the systemd logs, I see that there are a lot of postgress errors with webserver. Below is a sample of such errors.

[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id]
[parameters: {'name': 'Datasets'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
[2022-09-19 14:03:16,183] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 13, null).

[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id]
[parameters: {'permission_id': 13, 'view_menu_id': None}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
[2022-09-19 14:03:16,209] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, Datasets).

[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id]
[parameters: {'name': 'Datasets'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
[2022-09-19 14:03:16,212] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 17, null).

[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id]
[parameters: {'permission_id': 17, 'view_menu_id': None}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
[2022-09-19 14:03:16,229] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, DAG Warnings).

[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id]
[parameters: {'name': 'DAG Warnings'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
[2022-09-19 14:03:16,232] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 17, null).

[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id]
[parameters: {'permission_id': 17, 'view_menu_id': None}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
[2022-09-19 14:03:16,250] {manager.py:511} ERROR - Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 13, 23).

I tried running airflow db check, init, check-migration, upgrade without any errors, but the errors still remain.

Please let me know if I missed any steps during the upgrade, or if this is a known issue with a workaround.

What you think should happen instead

All dag runs should be visible

How to reproduce

upgrade airflow, upgrade db, restart the services

Operating System

Ubuntu 18.04.6 LTS

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@ManikandanUV ManikandanUV added area:core kind:bug This is a clearly a bug labels Sep 19, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 19, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk potiuk added this to the Airflow 2.4.1 milestone Sep 20, 2022
@zachliu
Copy link
Contributor

zachliu commented Sep 20, 2022

ah, seems to be the "backend version" of the same issue #26505

----- update -----
maybe not... because i don't see those postgres INSERT errors from my webserver 🤔

@daniel-andresen-jimdo
Copy link

Hello, we also facing this problem. Existing dags are showing up in the Airflow UI but we can't create a new dag.
We are using dynamic dag creation to create new dags. After defining the new dag I ran the command "airflow dags reserialize" in the scheduler container. In the output I have had the same SQL like mentioned above.

We also use the officel Helm chart to deploy Airflow. We are using a postgres DB as backend. My feeling is that this is not related to #26505.

@uranusjr
Copy link
Member

I don’t think this is related to #26505, it looks like something in the permission stack not working well with AIP-48 stuff?

@uranusjr uranusjr added this to To do in Data-aware Scheduling (AIP-48) via automation Sep 26, 2022
@uranusjr uranusjr added the AIP-48 Data-aware Scheduling label Sep 26, 2022
@sterling-jackson
Copy link

sterling-jackson commented Sep 27, 2022

We saw the same problem this past week with in an attempted upgrade to 2.4 from 2.1.0. We are using Ubuntu and Postgres in AWS with the scheduler/webserver on an EC2 instance sending work to a Kubernetes cluster on EKS. We tried both airflow db upgrade and airflow db reset. Same problem as OP with both approaches and we couldn't view/schedule/trigger jobs.

DAGs show up in the CLI and in the UI, but can't be found when you try to view job details in the UI. I noticed that the scheduler constantly throws errors about jobs not being in the serialized DAG table.

Manually running airflow dags reserialize immediately throws a bunch of these null constraint errors, suggesting some link, which is what led me to try the following workaround/experiment.

I used airflow db shell to reach into the DB and manipulate the id column in the four permission tables. I made these columns IDENTITY columns using ALTER TABLE ab_view_menu ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 5000). This appears to have resolved the issues. The errors went away and jobs can be viewed/scheduled/ triggered.

This worked and I'm including it here in case it helps diagnose the root cause, but I have no idea if this introduced other issues and am hesitant to promote this to our upper environments without more research.

Scheduler Errors
image

View DAG in UI Error
image

@ashb
Copy link
Member

ashb commented Sep 27, 2022

Question: while this looks bad, does it actually cause any process to exit with an error?

@ashb ashb removed this from To do in Data-aware Scheduling (AIP-48) Sep 27, 2022
@ManikandanUV
Copy link
Author

I upgraded our server from Ubuntu 18.04 to 20.04.5 and tried installing 2.4.0 again. The same issue persisted, so we've rolled back to 2.3.4. I have another server on 18.04 that successfully managed to upgrade from 2.3.4 to 2.4, so if you need me to make any comparison between the two servers (although the failing one is now on 20.04) to get to the root cause, let me know.

@ashb
Copy link
Member

ashb commented Sep 27, 2022

This will be caused by a difference in the rows in the database, not in the OS version.

Is there any chance you could (privately if needed) share a DB dump of the breaking install?

@ashb
Copy link
Member

ashb commented Sep 27, 2022

@ManikandanUV What version of SQLAlchemy and Flask-AppBuilder do you have installed in each server?

@ManikandanUV
Copy link
Author

@ashb
Failing server:
Name: SQLAlchemy
Version: 1.4.41
Name: Flask-AppBuilder
Version: 4.1.3

Working server:
Name: SQLAlchemy
Version: 1.4.40
Name: Flask-AppBuilder
Version: 4.1.3

@ManikandanUV
Copy link
Author

ManikandanUV commented Sep 27, 2022

@ashb how do I get the db dump? The servers have different dags, so they're not comparable in that aspect. Also, the working server started with 2.x (not sure which one, may be 2.2)->2.3.4->2.4, but the failing one was 1.x->1.10.15->2.3.4

@sfc-gh-mkmak
Copy link

@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow.
SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error: DAG seems to be missing from DagBag. Let me know if you need any additional info.

@ephraimbuddy
Copy link
Contributor

@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow. SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error: DAG seems to be missing from DagBag. Let me know if you need any additional info.

From what version did you start using airflow? Asking so I can try to upgrade from that version to 2.3.4 to 2.4.0

@sfc-gh-mkmak
Copy link

sfc-gh-mkmak commented Sep 27, 2022

@ashb We have experienced that problem as well while testing to 2.4.0 upgrade (from 2.2.4). We use centos 7 for airflow. SQLAlchemy - 1.4.27, Flask-AppBuilder 4.1.3. When I pulled DAGs I cannot open any of them on WebUI - I got that error: DAG seems to be missing from DagBag. Let me know if you need any additional info.

From what version did you start using airflow? Asking so I can try to upgrade from that version to 2.3.4 to 2.4.0

I think we started from version 1.10.14. But I saw some tests on version 1.10.12 - so I am not sure from which version our current database works - it can be 1.10.14 or 1.10.12

I tested it on new 2.2.4 deployment and upgraded it to 2.4.0. I have experienced the same problem.

@sterling-jackson
Copy link

@ashb I attached examples of the two errors I saw in case it's helpful.

@troyharvey
Copy link
Contributor

We are seeing the same issue upgrading from 2.3.4 to 2.4.0

[SQL: INSERT INTO ab_view_menu (name) VALUES (%(name)s) RETURNING ab_view_menu.id]
[parameters: {'name': 'DAG:example_branch_datetime_operator_3'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
ERROR [airflow.www.fab_security.sqla.manager] Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 1, null).

[SQL: INSERT INTO ab_permission_view (permission_id, view_menu_id) VALUES (%(permission_id)s, %(view_menu_id)s) RETURNING ab_permission_view.id]
[parameters: {'permission_id': 1, 'view_menu_id': None}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
ERROR [airflow.www.fab_security.sqla.manager] Creation of Permission View Error: (psycopg2.errors.NotNullViolation) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 4, 139).

@ashb
Copy link
Member

ashb commented Sep 29, 2022

Can anyone give us reproduction steps? Cos trying this with a "minimal" 2.3.4 and the upgrading to 2.4.0 neither myself nor @ephraimbuddy have been able to reproduce this, so we're not sure what step we're missing.

(And until we can reproduce it, we can't fix it)

@ManikandanUV
Copy link
Author

@ashb try our upgrade path, may be that's the key. I have two servers, one started at 1.x and other started at 2.x. The 2.x server upgraded to 2.3.4 and 2.4 without issues, where as the 1.x server upgraded to 2.3.4 and is now failing 2.4 upgrade from 2.3.4

@troyharvey
Copy link
Contributor

Thanks for the tip @sterling-jackson. Your suggestion resolved this one for us. If you use this solution alter the START WITH based on the max id in your tables.

ALTER TABLE public.ab_view_menu ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 300);
ALTER TABLE public.ab_permission_view ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 4000);
ALTER TABLE public.ab_permission_view_role ALTER id ADD GENERATED ALWAYS AS IDENTITY (START WITH 3000);

@JCoder01
Copy link
Contributor

JCoder01 commented Sep 29, 2022

Piling on here. I'm seeing the same thing. Seems to maybe be related Datasets?

[2022-09-29T09:55:10.909-0400] {manager.py:420} ERROR - Add View Menu Error: (psycopg2.errors.NotNullViolation) null value in column "id" of relation "ab_view_menu" violates not-null constraint
DETAIL:  Failing row contains (null, Datasets).

We started on version 1.10.xx too and have been upgrading with each release.

Edit: I see now, there is a Datasets views that is added.

@joshowen
Copy link
Contributor

We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:

ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass);
ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass);
ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass);
ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass);
ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass);
ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass);
ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass);
ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);

Which resolved our issue.

@ashb
Copy link
Member

ashb commented Sep 29, 2022

Thanks @joshowen that's very helpful

@stepanof
Copy link

stepanof commented Oct 4, 2022

ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT
ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);

We have the same problem after upgrading from 2.3.4 to 2.4.1

Thanks @joshowen, these commands resolved the problem.

@ashb
Copy link
Member

ashb commented Oct 5, 2022

Okay, I've found the source of the confusion, and the path needed to trigger this behaviour.

Run airflow webserver with < 1.10.13 in RBAC mode, where FAB creates it's tables.

In 1.10.13 we introduces a migration that creates the tables with the server_default but that migration only did anything if the tables didn't already exist. But the tables created by the FAB model have a default (but not a server_default).

Oh, and the final bit of the puzzle, in 2.4 we finally "took control" of the FAB security models in to airflow and those do not have the default set.

I'll work on a new migration to fix this up.

@ldacey
Copy link
Contributor

ldacey commented Oct 7, 2022

Ran into the same issue. We also started in 2017 which should be around version 1.8 something and have upgraded since then. I ran the commands @joshowen posted and now I can open the DAG again.

I am getting a airflow dags reserialize error though (DAGs already exist). Is it safe to delete from the serialize table and then reserialize again? It appears like everything is working fine as is, the only error is when I run that command in a container.

@zachliu
Copy link
Contributor

zachliu commented Oct 18, 2022

not until a new user couldn't register did i realized i was having this issue and a ton of such error messages in the logs 😂

@aabouelleil-seniorlink
Copy link

We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:

ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass);
ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass);
ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass);
ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass);
ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass);
ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass);
ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass);
ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);

Which resolved our issue.

We ran into this issue upgrading from 2.3.1 to 2.4.1 so it doesn't seem the issue is fixed yet. These table alterations resolved the problem though.

@joshowen
Copy link
Contributor

We started with 1.8.xx, went to 1.9.xx, 1.10.xx, and somehow all of our FAB tables ended up without sequences set for their IDs, but had the sequences created. We were seeing similar issues in 2.4.0, and manually ran:

ALTER TABLE "public"."ab_permission_view" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_id_seq'::regclass);
ALTER TABLE "public"."ab_permission" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_id_seq'::regclass);
ALTER TABLE "public"."ab_permission_view_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_permission_view_role_id_seq'::regclass);
ALTER TABLE "public"."ab_register_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_register_user_id_seq'::regclass);
ALTER TABLE "public"."ab_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_role_id_seq'::regclass);
ALTER TABLE "public"."ab_user" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_id_seq'::regclass);
ALTER TABLE "public"."ab_user_role" ALTER COLUMN "id" SET DEFAULT nextval('ab_user_role_id_seq'::regclass);
ALTER TABLE "public"."ab_view_menu" ALTER COLUMN "id" SET DEFAULT nextval('ab_view_menu_id_seq'::regclass);

Which resolved our issue.

We ran into this issue upgrading from 2.3.1 to 2.4.1 so it doesn't seem the issue is fixed yet. These table alterations resolved the problem though.

@ashb These must be run before running 0073_2_0_0_prefix_dag_permissions.py

@ashb
Copy link
Member

ashb commented Nov 28, 2022

Argh! Good catch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.4 Issues Reported for 2.4 AIP-48 Data-aware Scheduling area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.