-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New feature: HTTP interface? #92
Comments
The Airflow tasks are expected to be synchronous, or made so by writing some sort of sleep/check routine in your operator. It's also expected to raise an exception as a way to communicate an error. It may be tricky to generalize an HttpOperator since all systems expect different endpoints, payload and return different results. Using a PythonOperator that uses the requests lib is the quick way to do this. Maybe an HttpSensor would be generic enough, receiving an enpoint, payload and a regex to match in the response. I don't know much about A side note about hooks, they use the Connection model to store connection information as opposed to hard-coding it in script. It may be nice to have a thin HttpHook that would retrieve that info from the DB and acts as some thin wrapper around the requests lib. I'm not 100% on this though, it may not add a whole lot of value (vs confusion)... |
So I put something preliminary together of a hook and a sensor to get an idea of the complexity. You can see this development branch at: master...gtoonstra:http_protocol_sensor I agree, the flavours of what's written with HTTP is too rich to create anything that is generic enough and attempts in such generic approaches usually start to pollute other areas with logic, for example here proliferation of logic into the DAG's. In the branch, the hook raises exceptions, but I think most of those should be moved to the operator instead. This is based on the assumption that the operator class decides on success or failure based on the responses of the hook, not the hook itself. In cases of database when a db was expected and didn't exist, the hook is allowed to raise exceptions. Then all we need probably for now is a SimpleHttpOperator, which is limited to the following:
Anything beyond this simple use requires a specific operator:
So the remap operator probably falls into the latter category. |
* improve marquez_dag unit test Signed-off-by: Julien Le Dem <julien@apache.org> * test new method Signed-off-by: Julien Le Dem <julien@apache.org> * improve marquez.DAG tests Signed-off-by: Julien Le Dem <julien@apache.org> * adress review feedback Signed-off-by: Julien Le Dem <julien@apache.org>
So I went through the codebase and docs of 'airflow' today and I think it's a great fit for one of my projects. I'm the maintainer of "remap", which is a 100% python implementation of MapReduce only intended to run on a dozen of nodes for now. Jobs are kicked off through a REST interface.
Here's where a potential contribution comes in.
I didn't find anything already done with http interfaces, so my idea is to write an HTTPHook, an HTTP operator and a sensor for this. The operator/hook calls a URL resource with an indicated method and potentially some post data. The sensor later on calls another URL to check on progress. This would allow work to be executed asynchronously until some later checkpoint where the sensor needs to check whether something is available or not.
The HTTP library I intend to use is "requests", which should come installed with "pip": http://docs.python-requests.org/en/latest/
Operators never seem to return values and probably by design, which means that a worker process waits around for the job to complete, which means that an operator executes the action synchronously.
So, couple of questions:
As extra thought on 2, it's possible that external systems contain key/values that are useful to use in workflows. Is there a recommended mechanism of loading small pieces of data into a DAG workflow so that it's available from a context for example when another task is executed?
The text was updated successfully, but these errors were encountered: