Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up Airflow Variable defaults with descriptions automatically #4202

Closed
AetherUnbound opened this issue Apr 24, 2024 · 9 comments · Fixed by #4297
Closed

Set up Airflow Variable defaults with descriptions automatically #4202

AetherUnbound opened this issue Apr 24, 2024 · 9 comments · Fixed by #4297
Assignees
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow 🐳 tech: docker Involves Docker

Comments

@AetherUnbound
Copy link
Contributor

Description

We have a number of Airflow Variables which act as levers for some of our DAGs, allowing us to tweak settings for certain DAGs without having to restart the stack. At present, the documentation for them is spread out and not easily accessible:

For this (and any other Variables that it might seem useful for), we should set up a JSON file with a default value and an adequate description for both what the shape of the expected values are and how this affects the related DAGs. This can effectively serve as documentation for the Variables above.

We can then use airflow variables import (with the --action-on-existing-key set to skip) as part of our entrypoint.sh to ensure it runs both locally and in production on every startup. In production, if the keys already exists, we'll just need to add the descriptions to them once since we won't be overwriting the values.

@AetherUnbound AetherUnbound added 🔧 tech: airflow Involves Apache Airflow 🟩 priority: low Low priority and doesn't need to be rushed 🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: catalog Related to the catalog and Airflow DAGs 🐳 tech: docker Involves Docker labels Apr 24, 2024
@madewithkode
Copy link
Contributor

In production, if the keys already exists, we'll just need to add the descriptions to them once since we won't be overwriting the values.

Hi @AetherUnbound, this looks like something I would like to take on.
Regarding the statement above, is this something that needs to be done in entrypoint.sh also? If yes, does airflow variables import provide a mechanism to read or access the skipped existing keys?

Secondly, where exactly would the addition of the description be done, In the in-memory loaded JSON or in the source JSON file?

@AetherUnbound
Copy link
Contributor Author

Hey @madewithkode! I think that step would necessarily need to be completed by the maintainers. Once we have the Variables documented, we can go in one by one for a one-off update of all the existing Variables to add descriptions. So please do feel free to take on the code work for this PR, and we'll handle the description adding in production once it's ready!

@madewithkode
Copy link
Contributor

Noted. Would open a draft PR and let you know once I come up with something.

@madewithkode
Copy link
Contributor

Hi @AetherUnbound does the following JSON file snippet mirror the goal of this task?
If yes, where can I find the suitable defaults? and if no, can you throw more light on what's required?

{
    "SILENCED_SLACK_NOTIFICATIONS": {
        // A link to a GitHub issue which describes why the notification
        //is silenced and tracks resolving the problem.
        "issue": "",
        // Slack notifications whose text or username contain the
        //predicate will be silenced. Matching is case-insensitive.
        "predicate": "",
        // A regex pattern that matches the task_id of the task
        // that triggered the notification. (Optional)
        "task_id_pattern": ""
    },
}

@AetherUnbound
Copy link
Contributor Author

Alright, so I tried to do some digging on what the JSON file actually looked like, and wasn't able to find any in the documentation linked above. Instead, I added a variable locally then ran the following:
image

$ airflow@ef2c68b81811:/opt/airflow$ airflow variables export -
{
    "SAMPLE_VARIABLE": "{\"sample_key\": true, \"other_key: 1, \"final_key\": [\"a\"]}"
}
1 Variables successfully exported.

So...even though this Variable has a description, there's no way to export/import that description using JSON files and airflow variables import 😭

However, it does look like we can use airflow variable set to apply both:

$ airflow@ef2c68b81811:/opt/airflow$ airflow variables set --help
Usage: airflow variables set [-h] [--description DESCRIPTION] [-j] [-v] key VALUE

Set variable

Positional Arguments:
  key                   Variable key
  VALUE                 Variable value

Options:
  -h, --help            show this help message and exit
  --description DESCRIPTION
                        Variable description, optional when setting a variable
  -j, --json            Serialize JSON variable
  -v, --verbose         Make logging output more verbose

The only trouble here is that there aren't any options for how to handle existing Variables, which leads me to believe it'll automatically overwrite if this command is run. We can determine that first though with airflow variables list:

$ airflow@ef2c68b81811:/opt/airflow$ airflow variables list -o plain
key
SAMPLE_VARIABLE
OTHER

So here's what I think are the next steps:

  1. We create a variables.csv file, potentially right under catalog/, that has the following columns: key, value, description.
  2. We fill out the values for each of those Variables described above (I'll follow up with a second comment on what the default values and description for each might actually look like)
  3. We add some logic to the entrypoint.sh which first runs airflow variables list -o plain to get the list of existing Variables, then iterate through each row of variables.csv and only run airflow variables set --description <description> <key> <value> if the key doesn't already exist in the database. That should prevent it from overriding existing values.

This is all admittedly way more complicated than initially expected @madewithkode 😅 And before we go with that approach, I want to make sure it makes sense with @stacimc!

@stacimc
Copy link
Contributor

stacimc commented May 7, 2024

Wow -- interesting find. Disappointing we can't import them so easily, but the next steps @AetherUnbound listed do make sense to me, and I think this feature is very much worth it!

@madewithkode
Copy link
Contributor

madewithkode commented May 8, 2024

Hi @AetherUnbound nice find and recommendations, I think this is still well within my capacity 😅, would sure be an interesting problem to tackle. I'd await your comment on what the default values and description for each of the variables would look like.

@AetherUnbound
Copy link
Contributor Author

@madewithkode I've created the TSV (it's saved as a CSV here because GitHub doesn't accept TSV uploads directly). It is using a tab for delimiting the columns. Note that the description field has double quotes (") in it - the description field may need to be wrapped in single quotes when it's handed to the command line arguments for airflow variable set. Let me know if you have any other questions!
variables.csv

@AetherUnbound
Copy link
Contributor Author

Just following up to note that I've added the descriptions to production as well 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow 🐳 tech: docker Involves Docker
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants