Closed
Conversation
Dag hash function tried (and failed) to hash the list of tasks, then fell back on repr-ing the list, which took forever. Instead, hash tuple(task_dict.keys()). In addition this replaces two slow list comprehensions with much faster hash lookups (using the new task_dict).
If a process places items in a queue and the process is joined before the queue is emptied, it can lead to a deadlock under some circumstances. Closes AIRFLOW-61. See for example: https://docs.python.org/3/library/multiprocessing.html#all-start-methods ("Joining processes that use queues") http://stackoverflow.com/questions/31665328/python-3-multiprocessing-queue-deadlock-when-calling-join-before-the-queue-is-em http://stackoverflow.com/questions/31708646/process-join-and-queue-dont-work-with-large-numbers http://stackoverflow.com/questions/19071529/python-multiprocessing-125-list-never-finishes
When Scheduler is run with `—num-runs`, there can be multiple Schedulers and Executors all trying to run tasks. For queued tasks, Scheduler was previously only trying to run tasks that it itself had queued — but that doesn’t work if the Scheduler is restarting. This PR reverts that behavior and adds two types of “best effort” executions — before running a TI, executors check if it is already running, and before ending executors call sync() one last time
The scheduler can encounter a queued task twice before the task actually starts to run -- this locks the task and avoids that condition.
Pulling this patch [1] in from upstream in order to resolve a dependency conflict in the ETL repo: airflow 1.7.1.2 -> (funcsigs >= 0.4, funcsigs < 1) mock 2.0.0 -> (funcsigs >=1) Without this change, `control run piptools.compile etl` fails to resolve the dependencies in `requirements.in` because of the conflict. Tested locally and confirmed that this solves the conflict. If this change looks good, I'll follow up with a change to pin ETL's dependency on Airflow to this git SHA. [1] https://github.com/apache/incubator-airflow/pull/1826/files
Lyft's custom gunicorn is on version 19.1.1, which I believe is causing
the following error in the scheduler process when I tested etl in staging:
Traceback (most recent call last):
File "/usr/local/lib/service_venv/bin/airflow", line 4, in <module>
__import__('pkg_resources').require('airflow==1.7.1.2')
File
"/usr/local/lib/service_venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py",
line 3036, in <module>
@_call_aside
File
"/usr/local/lib/service_venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py",
line 3020, in _call_aside
f(*args, **kwargs)
File
"/usr/local/lib/service_venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py",
line 3049, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File
"/usr/local/lib/service_venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py",
line 656, in _build_master
return cls._build_from_requirements(__requires__)
File
"/usr/local/lib/service_venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py",
line 669, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File
"/usr/local/lib/service_venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py",
line 854, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'gunicorn<19.4.0,>=19.3.0'
distribution was not found and is required by airflow
Related PR: https://github.com/lyft/etl/pull/6037
* Background We're patching the socket library in order to use IP addresses instead of hostnames, because Lyft hostnames are special and don't actually resolve. (See https://github.com/lyft/etl/pull/5560 for context) `service_venv` sets the \$PATH such that `/usr/local/lib/service_venv/bin/` takes precedence over `srv/service/current/bin/`, so it's not enough to simply put a patched version of `airflow` in the ETL repository. We're ensuring that our patched version of `airflow` takes precedence by overwriting `/usr/local/lib/service_venv/bin/airflow` with a script that just calls our patched version at `/srv/service/current/bin/airflow` (from the ETL repository). See this salt-state [1]. That doesn't work anymore, because with the rollout of frozen_venvs we can't count on `airflow` being at a hardcoded path. https://github.com/lyft/etl/pull/6073 was one attempt to solve this by finding the service_venv root dynamically. That approach doesn't work, because it depends on `service_venv` being available by the time the Jinja template is rendered, but `service_venv` won't be available until it's created in the lyft-python state (which requires that the template is rendered). We can avoid this chicken and egg problem if we simply patch the Airflow code directly, rather than trying to overwrite it on deployment. We apply our patch iff the `prefer_ip_over_hostname` key under the `lyft` configuration section is set. * Rollout On the sharedairflowworker (which is only running test DAGs so far), we will stop overwriting the airflow command with our patched version and set the `prefer_ip_over_hostname` key. If this works, then we can take the same approach for the other ASGs. Until then, we will continue overwriting the `airflow` command under service_venv/ with our patched version from ETL. [1] https://github.com/lyft/etl/blob/master/ops/config/states/etl/init.sls#L50-L59
Support configuration key to toggle using IPs or hostnames
This symlink is left dangling when ETL builds airflow as a dependency,
which is causing deployment to fail. Excerpt from the logs:
/srv/salt/venv/bin/python /srv/pulldeploy/pulldeploy/deploy.py -p etl
2017-07-19 20:17:42,202 INFO frozen_venv ./venvs/service directory
created
2017-07-19 20:17:57,857 ERROR Command failed: cp -RL
/srv/venvs/service/* ./venvs/service
2017-07-19 20:17:57,857 ERROR cp: cannot stat
'/srv/venvs/service/trusty/service_venv/src/airflow/airflow/www/static/docs':
No such file or directory
Remove symlink to docs directory
I've examined the schema migrations, and I think that there's nothing backwards-incompatible. That is, we can decouple the database migrations from the deployment of the code. To prepare this commit, I took all of the Alembic changes in the directory `airflow/migrations` from the upstream master branch and applied them here [1] I'll deploy these changes in staging by running the `airflow upgradedb` command and then verify that the scheduler is able to schedule tasks and the workers can still execute them when using the code from Airflow 1.7 against the database schema defined in Airflow 1.8. [1] git checkout upstream-master -- airflow/migrations
[DATA-4816] Take Alembic migrations from upstream
Upgrade flask to 0.12.1
Airflow 1.8 depends on `flask-wtf==0.14`, so in order to run airflow 1.8 and 1.7 side by side we need to force etl to upgrade from `flask-wtf==0.12` to `flask-wtf==0.14`.
[DATA-4839] Upgrade flask-wtf to 0.14
The airflow 1.7 scheduler deletes any queued TI's belonging to DAGs that are not in the DagBag. Since we're filtering out DAGs whitelisted for 1.8 (just the long_running_test DAG for now), the scheduler thinks those TI's belong to a DAG that has been removed and deletes them before they get picked up by an airflow 1.8 worker. With this change, we'll simply ignore any TI's from a DAG that we know is handled by the 1.8 scheduler.
[DATA-4839] Do not delete queued TIs from invisible DAGs
This DAG will get scheduled by 1.8, so don't delete any of its "phantom" Task Instances which were created by the 1.8 scheduler.
[DPTOOLS-50] Ignore presto_query_logs DAG during 1.8 migration
|
@youngyjd, thanks for your PR! By analyzing the history of the files in this pull request, we identified @Fokko, @edgarRd and @bolkedebruin to be potential reviewers. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mistakenly created pr