Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log filename template records #20165

Merged
merged 4 commits into from Dec 20, 2021

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Dec 9, 2021

See #19058 (comment) and #19625 for context. This is the main prerequisite to make it possible for us to change the log_filename_template config’s default value.

This adds a new table LogFilename, and whenever an Airflow command is run (except those explicit set check_db=False), the user config value of log_filename_template is sync-ed into the table. Each DagRun gets a new log_filename_id foreign key (populated on creation) that can be used to look up what template they use to render task log filenames. All existing DagRun rows set this value to NULL (for performance reasons), and internally this makes them all use the first row in LogFilename, which should be the value in use when a user upgrades to 2.3.

The first commit is from #20163; that one needs to be merged first. Merged.

Edit: Oh, and this still needs tests, and an entry in UPDATING.md.

@uranusjr uranusjr added this to the Airflow 2.3.0 milestone Dec 9, 2021
@provide_session
def get_log_filename_template(self, *, session: Session = NEW_SESSION) -> Optional[str]:
if self.log_filename_id is None: # DagRun created before LogFilename introduction.
template = session.query(LogFilename.template).order_by(LogFilename.id).limit(1).scalar()
else:
template = session.query(LogFilename.template).filter_by(id=self.log_filename_id).one_or_none()
if template is not None:
return template
return airflow_conf.get("logging", "LOG_FILENAME_TEMPLATE")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure if setting up a relationship (for caching) is worthwile. Maybe?

airflow/models/dagrun.py Outdated Show resolved Hide resolved
sa.Column("timestamp", sa.UtcDateTime, nullable=False),
)
with op.batch_alter_table("task_instance") as batch_op:
batch_op.add_column(sa.Column("log_filename_id"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the server_default (or equivalent) needs to go here for it to have any effect: https://docs.sqlalchemy.org/en/13/core/defaults.html#server-defaults

A variant on the SQL expression default is the Column.server_default, which gets placed in the CREATE TABLE statement during a Table.create() operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to use default instead. If I read the SQLAlchemy docs correctly, supplying an expression would make it use the expression as an subquery in INSERT INTO?

airflow/models/dagrun.py Outdated Show resolved Hide resolved
@uranusjr uranusjr force-pushed the log-filename-template-records branch 3 times, most recently from 3795d9a to 3082d5a Compare December 13, 2021 09:39
@uranusjr
Copy link
Member Author

Test failures are likely unrelated (one flaky scheduler test failure as usual, the MSSQL job exploded). This should be good to go.

@uranusjr uranusjr marked this pull request as ready for review December 13, 2021 11:11
@uranusjr uranusjr added this to In progress in AIP-42: Dynamic Task Mapping Dec 13, 2021
@kaxil
Copy link
Member

kaxil commented Dec 13, 2021

Can you rebase on main and fix conflicts too plz

@uranusjr
Copy link
Member Author

Oh because I just merged a PR with model changes…

@uranusjr uranusjr force-pushed the log-filename-template-records branch from 3082d5a to f5ce0e4 Compare December 13, 2021 15:22

id = Column(Integer, primary_key=True)
template = Column(Text, nullable=False)
timestamp = Column(UtcDateTime, nullable=False, default=timezone.utcnow)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just informational, right?

Call it created_at maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah only informational. The name is borrowed from XCom; I like created_at more myself as well.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, one possible column to rename

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Dec 17, 2021
@uranusjr uranusjr force-pushed the log-filename-template-records branch from f5ce0e4 to 2c608d9 Compare December 20, 2021 10:37
@uranusjr
Copy link
Member Author

Eh MSSQL does not support ON DELETE RESTRICT 🤦

@potiuk
Copy link
Member

potiuk commented Dec 20, 2021

Eh MSSQL does not support ON DELETE RESTRICT facepalm

Oh yeah. Unfortunately SQLAlchemy "common" interface starts to break easily when we start using more and more sophisticated DB functionality.

@uranusjr uranusjr force-pushed the log-filename-template-records branch from 2c608d9 to dfb5c3e Compare December 20, 2021 11:23
@uranusjr uranusjr force-pushed the log-filename-template-records branch from dfb5c3e to d867d0e Compare December 20, 2021 11:23
@uranusjr
Copy link
Member Author

Changed to ON DELETE NO ACTION instead. This should work well enough.

@uranusjr uranusjr merged commit 921db4c into apache:main Dec 20, 2021
@uranusjr uranusjr deleted the log-filename-template-records branch December 20, 2021 16:02
@ashb ashb moved this from In progress to Done in AIP-42: Dynamic Task Mapping Jan 4, 2022
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging area:webserver Webserver related Issues full tests needed We need to run full set of tests for this PR to merge kind:documentation type:new-feature Changelog: New Features
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants