This commit fixes [AIRFLOW-6033] UI crashes on "Landing Times" by drexpp · Pull Request #6634 · apache/airflow

drexpp · 2019-11-22T10:01:36Z

Adding our company to "Who uses Apache Airflow?"

Hello everyone, is it possible that our company appears on "Who uses Apache Airflow?" section? We are a small team working in Endesa, a big spanish electric distributor and part of Enel. I wrote some pipelines to automate the ETLs processes we use with Hadoop / Spark, so I believe it would be great for our team.

Endesa [@drexpp]

Make sure you have checked all steps below.

Jira

My PR addresses the following Airflow Jira Issue AIRFLOW-6033 and references them in the PR title.

Description

I targeted to v1-10-stable since I think is what @ashb recommended to me in my last PR.

Here are some details about my PR:

Airflow UI will crash in the browser returning "Oops" message and the Traceback of the crashing error.

This is caused by modifying a task_id with a capital/small letter, I will point out some examples that will cause airflow to crash:

task_id = "DUMMY_TASK" to task_id = "dUMMY_TASK"
task_id = "Dummy_Task" to task_id = "dummy_Task" or "Dummy_task",...
task_id = "Dummy_task" to task_id = "Dummy_tASk"

File causing the problem: https://github.com/apache/airflow/blob/master/airflow/www/views.py (lines 1643 - 1654)

for task in dag.tasks:
    y[task.task_id] = []
    x[task.task_id] = []

    for ti in task.get_task_instances(start_date=min_date, end_date=base_date):

        ts = ti.execution_date
        if dag.schedule_interval and dag.following_schedule(ts):
            ts = dag.following_schedule(ts)
        if ti.end_date:
            dttm = wwwutils.epoch(ti.execution_date)
            secs = (ti.end_date - ts).total_seconds()
            x[ti.task_id].append(dttm)
            y[ti.task_id].append(secs)

We can see in first two lines inside the first for loop, how the dictionary x and y is being filled with tasks_id attributes which comes from the actual DAG.

The problem actually comes in the second for loop when you get the task instances from a DAG, I am not sure about this next part and I wish someone to clarify my question about this.

I think that the task instances (ti) received from get_task_instances() function comes from the information stored into the database, that is the reason of crash when you access to "Landing Times" page, is that the x and y where filled with the actual name of the task_id in the DAG and the task_instances' task_id has different name stored causing this problem access to the dictionary.

One of my main questions is how having a different task name (such as changing from "run" to "Run") the function get_task_instances() keeps returning past task instances with different name, such asking instances of Run but returns task instances (ti) with task_id "run"?

Error screeshot

How to replicate:

Launch airflow webserver -p 8080
Go to the Airflow-UI
Create an example DAG with a task_id name up to your choice in small letters (ex. "run")
Launch the DAG and wait its execution to finish
Modify the task_id inside the DAG with the first letter to capital letter (ex. "Run")
Refresh the DAG
Go to "Landing Times" inside the DAG menu in the UI
You will get an "oops" message with the Traceback.

Tests

My PR adds the following unit tests OR does not need testing for this extremely good reason:
- I didn't know exactly how to unit test this, if you have any advice I will do a test for it. Other than that, I did test checking that the behaviour was as expected:
  - Create DAG and access to Landing Times
  - Modify a task from the created DAG to a completly new name and access to Landing Times
  - Modifying a task with capital/lower letters and accessing to Landing Times
  - Switch to the original name and access to Landing Times

Commits

My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Documentation

In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
- If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

Passes flake8

…Hook (#5783) (cherry-picked from 1211675)

#5777) (cherry-picked from 70e937a)

(cherry picked from commit e550afc)

(cherry picked from commit 2ea2c53)

(cherry picked from commit 3fac1bd)

(cherry picked from commit 4d491f3)

…roblems (#5835) * [AIRFLOW-5233] Fixed consistency in whitespace (tabs/eols) + common problems (cherry picked from commit 5cfe9c2)

List the two separate pylint scripts for use inside the Docker containers in CONTRIBUTING.md. (cherry picked from commit a47292d)

… shell files (#5807) * [AIRFLOW-5204] Shellcheck + common licence in shell files (cherry picked from commit 6420712)

(cherry picked from commit 5e36f42)

…se logs (#5863) (cherry picked from commit 2c66298)

This was done in the top level README.me in abbb1ea in 2016. (cherry picked from commit 9d10ac7)

…%I) in _clean_execution_date instead of %H and %M (#5864) (cherry picked from commit 4777c8a)

(cherry picked from commit 698c38b)

(cherry picked from commit 80b413b)

(cherry picked from commit 46e5fb1)

(cherry picked from commit a317cd2)

(cherry picked from commit 7fb729d)

(cherry picked from commit 5f100db)

(cherry picked from commit d8c9bdc)

(cherry picked from commit e405be0)

(cherry picked from commit 2b94600)

…AG files (#5757) The scheduler calls `list_py_file_paths` to find DAGs to schedule. It does so without passing any parameters other than the directory. This means that it *won't* discover DAGs that are missing the words "airflow" and "DAG" even if DAG_DISCOVERY_SAFE_MODE is disabled. Since `list_py_file_paths` will refer to the configuration if `include_examples` is not provided, it makes sense to have the same behaviour for `safe_mode`. (cherry picked from commit c4a9d8b)

…Operators (#5567) Add KMS to BigQuery (cherry picked from commit 682aea2)

…e in GoogleCloudStorageToBigQueryOperator. (#5771) Set autodetect default value from false to be true to avoid breaking downstream services using GoogleCloudStorageToBigQueryOperator but not aware of the newly added autodetect field. This is to fix the current regression introduced by #3880 (cherry picked from commit 462ab88)

* [AIRFLOW-4856] change hard coded run_as_user try to use worker_run_as_user * [AIRFLOW-4856] change hard coded run_as_user add unit test * [AIRFLOW-4856] change hard coded run_as_user create new param git_sync_run_as_user * [AIRFLOW-4856] change hard coded run_as_user add back remove option * [AIRFLOW-4856] change hard coded run_as_user fix Flake8 * [AIRFLOW-4856] change hard coded run_as_user fix Flake8 * [AIRFLOW-4856] change hard coded run_as_user fix unit test * [AIRFLOW-4856] change hard coded run_as_user change the default value to it's old 65533 (cherry picked from commit b0bb65d)

…t credentials (#5475) (cherry picked from commit 1b19b0c)

…(RBAC only) (#5866) Add execution_date_arg Use execution_date_arg in graph, gantt, and Back To {parent.dag} links. Add check of execution date (cherry picked from commit 835eadf)

…#5809) (cherry picked from commit e090744)

…l file (#5790) (cherry picked from commit e1cb8ce)

…6314) (cherry picked from commit bb93a75)

(cherry picked from commit 76fe5e2)

…ry (#5744) (cherry picked from commit c650df4)

…#5747) (cherry picked from commit 877e42d)

* Update databricks operator * Updated token auth to get from extra_dejson * Update test DatabricksHookTokenTest to use get host from 'extra' (cherry picked from commit db770cf)

… patch_dataset and get_dataset (#5546) Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and get_dataset (cherry picked from commit 09b9610)

(cherry picked from commit ae9608d)

…#6303) (cherry picked from commit d4e282d)

…ved by SchedulerJob (#6274)" This reverts commit d0cf27c.

(cherry picked from commit f2b7f5a)

(cherry picked from commit 489e7fe)

(cherry picked from commit 844bbad)

) * discussion on original PR suggested removing private_key option as init param * with this PR, can still provide through extras, but not as init param * also add support for private_key in tunnel -- missing in original PR for this issue * remove test related to private_key init param * use context manager to auto-close socket listener so tests can be re-run (cherry picked from commit 0790ede)

Co-authored-by: Jarek Potiuk <jarek.potiuk@polidea.com> (cherry picked from commit adfcf67)

(cherry picked from commit c7ed169)

The detection of python version is complex because we need to handle several cases - including determining the version from image name on DockerHub, detecting python version from python in the environment, finally forced from python version. This caused multiple problems with Travis where we run tests with different version (auto-detected from current python - especially when python3 became present in Travis' python 2.7 images. Now all the jobs in Travis have PYTHON_VERSION forced and the code responsible for detecting current python version has been removed as it is not needed in this case. (cherry picked from commit 351ae4e)

All files are mounted in CI now and checked using the RAT tool. As opposed to only the runtime-needed files. This is enabled for CI build only as mounting all local files to Docker (especially on Mac) has big performance penalty when running the checks (slow osxfs volume and thousands of small node_modules files generated make the check runs for a number of minutes). The RAT checks will by default use the selective volumes but on CI they will mount the whole source directory. Also latest version of RAT tool is used now and the output - list of checked files - is additionally printed as output of the RAT check so that we are sure the files we expect to be there, are actually verified. (cherry picked from commit 7e440da)

rebased on v1-10-stable due to complete k8s refactor on master

drexpp · 2019-11-22T11:57:33Z

I will write my PR from master's branch in my fork to master's branch in this repo rather than v1-10-stable

kaxil and others added 30 commits August 12, 2019 15:02

[AIRFLOW-5169] Pass GCP Project ID explicitly to StorageClient in GCS…

f483683

…Hook (#5783) (cherry-picked from 1211675)

[AIRFLOW-5161] Static checks are run automatically in pre-commit hooks (

3f390db

#5777) (cherry-picked from 70e937a)

[AIRFLOW-XXX] Remove duplicate lines from CONTRIBUTING.md (#5830)

223702b

(cherry picked from commit e550afc)

[AIRFLOW-5227] Consistent licence for .sql files (#5829)

8218e76

(cherry picked from commit 2ea2c53)

[AIRFLOW-5229] Consistent licences to all other files (#5831)

6688e1e

(cherry picked from commit 3fac1bd)

[AIRFLOW-5225] Consistent licence for all JS files (#5827)

1f6cbd0

(cherry picked from commit 4d491f3)

[AIRFLOW-5233] Fixed consistency in whitespace (tabs/eols) + common p…

4d973e5

…roblems (#5835) * [AIRFLOW-5233] Fixed consistency in whitespace (tabs/eols) + common problems (cherry picked from commit 5cfe9c2)

[AIRFLOW-5239] Fix listing of pylint test scripts (#5844)

6127177

List the two separate pylint scripts for use inside the Docker containers in CONTRIBUTING.md. (cherry picked from commit a47292d)

[AIRFLOW-5204] Shellcheck + common licences + executable shebangs in…

82b5103

… shell files (#5807) * [AIRFLOW-5204] Shellcheck + common licence in shell files (cherry picked from commit 6420712)

[AIRFLOW-5263] Show diff on failure of pre-commit checks (#5869)

8213d3c

(cherry picked from commit 5e36f42)

[AIRFLOW-5257] ElasticSearch log handler errors when attemping to clo…

764e580

…se logs (#5863) (cherry picked from commit 2c66298)

[AIRFLOW-XXX] Fixed Azkaban link (#5865)

4693ddb

This was done in the top level README.me in abbb1ea in 2016. (cherry picked from commit 9d10ac7)

AIRFLOW-5258 ElasticSearch log handler, has 2 times of hours (%H and …

3ebe4a1

…%I) in _clean_execution_date instead of %H and %M (#5864) (cherry picked from commit 4777c8a)

[AIRFLOW-5245] Add more metrics around the scheduler (#5853)

fe1049a

(cherry picked from commit 698c38b)

[AIRFLOW-5244] Add list of standard FAB theme choices (#5849)

a838cb6

(cherry picked from commit 80b413b)

[AIRFLOW-XXX] Add Chao-Han to committer list (#5846)

0fce7f8

(cherry picked from commit 46e5fb1)

[AIRFLOW-5247] Move NPM dependencies up in the Dockerfile (#5870)

74fbd16

(cherry picked from commit a317cd2)

[AIRFLOW-5260] Allow empty uri arguments in connection strings (#5855)

7738f6a

(cherry picked from commit 7fb729d)

[AIRFLOW-XXX] Add doc on specifying SSH Key in SSH connection (#5872)

e9702cd

(cherry picked from commit 5f100db)

[AIRFLOW-5205] Xml files are checked with xmllint (#5808)

8fb949c

(cherry picked from commit d8c9bdc)

[AIRFLOW-5234] Rst files have consistent, auto-added license

e2d3af2

(cherry picked from commit e405be0)

[AIRFLOW-4763] Allow list in DockerOperator.command (#5408)

1731d2d

(cherry picked from commit 2b94600)

[AIRFLOW-4931] Add KMS Encryption Configuration to BigQuery Hook and …

469d731

…Operators (#5567) Add KMS to BigQuery (cherry picked from commit 682aea2)

[AIRFLOW-4846] Allow kube git-sync mode to use existing secret for gi…

a33c2a1

…t credentials (#5475) (cherry picked from commit 1b19b0c)

[AIRFLOW-1523] Clicking on Graph View should display related DAG run …

583c220

…(RBAC only) (#5866) Add execution_date_arg Use execution_date_arg in graph, gantt, and Back To {parent.dag} links. Add check of execution date (cherry picked from commit 835eadf)

[AIRFLOW-5206] Common licence in all .md files, TOC + removed TODO.md (…

fee4751

…#5809) (cherry picked from commit e090744)

[AIRFLOW-5180] Added static checks (yamllint) + auto-licences for yam…

fb11856

…l file (#5790) (cherry picked from commit e1cb8ce)

Francisco Chiang and others added 27 commits October 16, 2019 14:42

[AIRFLOW-5497] Update docstring in airflow/utils/dag_processing.py (#…

32a9ce8

…6314) (cherry picked from commit bb93a75)

[AIRFLOW-5657] Update the upper bound for dill (#6334)

7ac6c2d

(cherry picked from commit 76fe5e2)

[AIRFLOW-5130] Use GOOGLE_APPLICATION_CREDENTIALS constant from libra…

8d70ad2

…ry (#5744) (cherry picked from commit c650df4)

[AIRFLOW-5133] Keep original env state in provide_gcp_credential_file (…

f799e9b

…#5747) (cherry picked from commit 877e42d)

[Airflow 4923] Fix Databricks hook leaks API secret in logs (#5635)

be2b288

* Update databricks operator * Updated token auth to get from extra_dejson * Update test DatabricksHookTokenTest to use get host from 'extra' (cherry picked from commit db770cf)

[AIRFLOW-4908] Implement BigQuery Hooks/Operators for update_dataset,…

8364029

… patch_dataset and get_dataset (#5546) Implement BigQuery Hooks/Operators for update_dataset, patch_dataset and get_dataset (cherry picked from commit 09b9610)

[AIRFLOW-5459] Use a dynamic tmp location in Dataflow operator

5e48a02

(cherry picked from commit ae9608d)

[AIRFLOW-5126] Read aws_session_token in extra_config of the aws hook (…

da4fc45

…#6303) (cherry picked from commit d4e282d)

[AIRFLOW-XXX] Update logo in a few more places

27df987

Revert "[AIRFLOW-5608] Gracefully stop executor when SIGTERM is recei…

2fb4424

…ved by SchedulerJob (#6274)" This reverts commit d0cf27c.

[AIRFLOW-5687] Upgrade pip to 19.0.2 (#6358)

b943896

(cherry picked from commit f2b7f5a)

[AIRFLOW-5687] Fix Upgrade pip to 19.0.2 in CI build pipeline (#6361)

703c4d7

(cherry picked from commit 489e7fe)

[AIRFLOW-5574] Fix Google Analytics script loading (#6218)

123d44c

(cherry picked from commit 844bbad)

[AIRFLOW-XXX] Changelog for 1.10.6 (rc1) (#6356)

61e9180

[AIRFLOW-XXX] Update version to 1.10.6

9497151

[AIRFLOW-5745] Breeze complete has now licence (#6415)

e30fb85

[AIRFLOW-5746] move FakeDateTime into the only place it is used (#6416)

7a6adad

Co-authored-by: Jarek Potiuk <jarek.potiuk@polidea.com> (cherry picked from commit adfcf67)

[AIRFLOW-5746] Fix problems with static checks (#6420)

71e2041

(cherry picked from commit c7ed169)

[AIRFLOW-5755] Fixed most problems with py27

7904669

[AIRFLOW-5750] Licence check is done also for non-executable .sh (#6425)

143b431

[AIRFLOW-XXX] Update date in changelog

73bf718

[AIRFLOW-5066] allow k8s fieldref substitution (#5680)

99f22f3

rebased on v1-10-stable due to complete k8s refactor on master

[AIRFLOW-5792] Straighten out the migrations (#6447)

8c52454

This commit fixes [AIRFLOW-6033] UI crashes on "Landing Times"

db73004

drexpp changed the base branch from v1-10-stable to master November 22, 2019 11:54

drexpp closed this Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This commit fixes [AIRFLOW-6033] UI crashes on "Landing Times"#6634

This commit fixes [AIRFLOW-6033] UI crashes on "Landing Times"#6634
drexpp wants to merge 1452 commits intoapache:masterfrom
drexpp:v1-10-stable

drexpp commented Nov 22, 2019

Uh oh!

drexpp commented Nov 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

drexpp commented Nov 22, 2019

Adding our company to "Who uses Apache Airflow?"

Jira

Description

File causing the problem: https://github.com/apache/airflow/blob/master/airflow/www/views.py (lines 1643 - 1654)

Error screeshot

Tests

Commits

Documentation

Code Quality

Uh oh!

drexpp commented Nov 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants