-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-4030] second attempt to add singularity to airflow #7191
[AIRFLOW-4030] second attempt to add singularity to airflow #7191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few small comments. Also please adjust the commit message, otherwise, title-validator will not pass (git commit --amend
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more thing. Travis is sad because of static checks, please consider using pre-commit hook as described here it will fix most of the problems automatically :)
The linting appears to be all set, but since Singularity is installed, the tests are obviously going to fail. Let me know how you would like to proceed - it's a non trivial thing to get Singularity installed from within Travis. |
Also, could you give feedback on:
What is wrong about it? |
I think it's quite clearly described - it's about the "commit title". Commit title is the first line in commit message. As mentioned in the PR description:
Commit message should start with [AIRFLOW-NNNN] similarly as PR description. You can run
I have a question @vsoch - it seems that for some reason it is quite difficult to follow (for you at least). We have not only good and comprehensive documentation but also we have spend quite a lot of time to have messages printed either by the pre-commit system, or docs and require additional explanation (mostly repeating what is in the documentation already) - and pretty much careful reading the error messages could (I think) prevent asking extra questions - that was at least the goal. The messages are designed to guide you and In my answers I mostly repeat the messages already written there when you ask the question and link to already existing documentation. I am really curious why it happens. Maybe we can improve it somehow - is the documentation not written clearly enough? Is there any problem with surfacing the error messages ? Or maybe there is a problem with finding relevant information? Or maybe you need more time to get familiar with the tools we are using ? We tried (with professional technical writer) to place the documentation exactly where it is supposed to be, plus we push it exactly at the place where problems happen (for example in pre-commit checks) and try to make it as clear as possible - but apparently you still have problems with finding it. I am really curious - would it be possible for you to explain how are you approaching it, how are you reading (or maybe not?) the messages you get and how you are understanding (or not) what we try to communicate. As a novice user you are the best "test" for it - maybe also you can have some ideas, proposals or maybe even a Pull Request that can improve it and make the documentation/messages better? Looking forward to more insight. |
Ah understood, I thought it was for the PR title, I’ll update the message for the next fix I do. I think the documentation is good, there are just too many moving parts for a new contributor. I have limited bandwidth to read into the minute details so I tend to be responsive over proactive in terms of reading everything first, and I would say this tends to be the case for a large subset of contributors - we understand the basics and rely on the CI to tell us what to fix, and don’t expect all the other bells and whistles with new tools and steps we have to learn for just this PR. For this particular PR I found when I tried to follow the docs it was more trouble than just looking at the errors in the CI and fixing them one by one. When I largely ignored the docs and just looked at the CI and tried to fix things on my own based on that I was finally able to make progress. Your links and verbose re-explanation of current docs I know are done with good intention, but a large chunk sort of just confused me more. If it makes you feel better, just chock it up to me being an idiot and I promise I’ll never bother this community again. |
I don't think we have enough singularity experience to add it. We approached in various ways for other integrations:
I do not know if/whether any of such options is possible for Singularity - please take a look and see if this is possible to use either of the options above? If we find integration too difficult - In a number cases we rely on "unit tests" where we mock the external systems and mark the integration tests as skipped - we have a custom marker |
You definitely can’t get a fully working Singularity from within Docker (even with privileged it’s a bit janky) so I’ll need to look at the other options and decide what is most logical to try first. I’ll take a look tomorrow, thanks for the clearly outlined options! |
But one quick note - regardless of the testing troubles, the support on this PR has been absolutely excellent, night and day when compared to my previous experience. I'm really appreciative of that :) |
First of all it's not a matter of making me feel better at all. I certainly don't think of you as an idiot, and I would never, ever call anyone idiot, just because that person has no time, or bandwidth or capacity, to read and understand the TON of documentation we have. And if anything - that's a sign that maybe we should do better. You seem to be competent and you know how to program and fix things on your own and do your best to do it - I certainly cannot blame you that you do not know our complex setup. And Airflow IS a complex project. It has a LOT of moving parts and my goal is to make it approachable to even one-time contributors and my ultimate goal is to make it as easy and painless as possible - while keeping the high bar on quality, test automation, check automation etc. I am just curious what are the obstacles to that - and whether we could make it somewhat easier to someone like you who just comes to the project from outside for a short time. That's why I am looking for some suggestions/areas of improvements. I know the project too well myself to see the obstacles - there is even a term for that "Expert Blindness" - http://mnav.com/expert-blindness/ that's why I am looking for help and suggestions of people like you. To have the project much easier approachable is my goal from the very beginning of the project and - believe me - a year ago it took literally days to set it all up so that you could run tests. Now (if we do not have sudden breakage of master by 3rd party dependencies) you can have it up and running in 10 minutes (with fast network). I think so far - by looking of what you've done, I will make a few improvements:
or (and?)
I already added a few improvements to Contributing and we even have this PR in progress to explain how communication is expected to look like : #7204 - I will be happy if you add your comments there. WDYT? Anything else you think we can improve to make it more approachable? |
Thanks for the kind words :). The trouble with that it that we can't afford to spend that much time for every single new person coming :(. The project needs to be scalable. That's why my goal is to have it all as smooth as possible and as easy to self-study and self-discover as possible - only reaching out to mentors/committers/other experienced contributors when really needed. |
Codecov Report
@@ Coverage Diff @@
## master #7191 +/- ##
==========================================
+ Coverage 32.8% 86.81% +54%
==========================================
Files 892 893 +1
Lines 42173 42186 +13
==========================================
+ Hits 13836 36622 +22786
+ Misses 28337 5564 -22773
Continue to review full report at Codecov.
|
Writing up this robust feedback and researching the CI solutions is going to take me some time, just to keep you updated I won't be able to work on this promptly, I need to prioritize work for my job. But I'll be back, stay tuned! :) |
That's OK! |
Here are my thoughts @potiuk, I hope that they are helpful! I think you have a challenging problem of wanting to scale an interaction that has many expectations around people, and human interactions that require kindness and patience. But there definitely could be some tweaks (e.g., sticking more to traditional expectations of a review, giving details when it's called for and not expecting pre-request expertise, and either minimizing tooling that is unexpected, defaulting to using the testing environment as the base for giving feedback to the contributor, or removing layers of abstraction so it's less confusing). Anyway, I outlined the details much better in that document. Please feel free to share! |
okay, I've taken a look at the docker-compose approach. Specifically, I've created a version: "2.2"
services:
singularity:
image: quay.io/singularity/singularity:v3.5.1
command: -F /etc/alpine-release
entrypoint: /usr/bin/tail
privileged: true
volumes:
- /dev/urandom:/dev/random # Required to get non-blocking entropy source Note that I'm using privileged, which likely requires the sudo:true parameter for the testing setup. The traditional entrypoint is the Singularity executable, so I have a basic entrypoint/command to just keep the container running (and allow me to shell inside). Even outside of testing (which is fairly complicated and I'm hoping to not need to reproduce it locally) we can test if the basic interaction with Singularity would work inside the container. docker-compose --log-level INFO -f "scripts/ci/docker-compose/integration-singularity.yml" up -d Check that everything went up ok... docker-compose --log-level INFO -f "scripts/ci/docker-compose/integration-singularity.yml" logs
Attaching to docker-compose_singularity_1_7d898ecbdbb0
singularity_1_7d898ecbdbb0 | 3.10.3 And that we continued running... $ docker-compose --log-level INFO -f "scripts/ci/docker-compose/integration-singularity.yml" ps
Name Command State Ports
--------------------------------------------------------------------------------------
docker-compose_singularity_1_7d898ecbdbb0 /usr/bin/tail -F /etc/alpi ... Up Great! And then we can shell inside. We confirm the version of singularity. $ docker exec -it 0081d687eeb1 bash
bash-5.0# which singularity
/usr/local/bin/singularity
bash-5.0# singularity --version
singularity version 3.5.1 The basic thing we would need to do is start instances. We can actually build the SIF binary successfully, and let's do that first, pulling the container: $ docker exec -it 3e73d2d5b41e bash
bash-5.0# which singularity
/usr/local/bin/singularity
bash-5.0# singularity pull docker://busybox
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob bdbbaa22dec6 done
Copying config b075e856be done
Writing manifest to image destination
Storing signatures
2020/01/22 20:24:24 info unpack layer: sha256:bdbbaa22dec6b7fe23106d2c1b1f43d9598cd8fc33706cc27c1d938ecd5bffc7
INFO: Creating SIF file...
INFO: Build complete: busybox_latest.sif
bash-5.0# Now let's start the instance. bash-5.0# singularity instance start busybox_latest.sif busybox
ERROR: container cleanup failed: no instance found with name busybox
FATAL: container creation failed: mount /etc/localtime->/etc/localtime error: while mounting /etc/localtime: mount source /etc/localtime doesn't exist
FATAL: failed to start instance: while running /usr/local/libexec/singularity/bin/starter: exit status 255 okay, let's trace back and add this to our docker-compose yml file, we will mount as a volume from the host and redo the above. version: "2.2"
services:
singularity:
image: quay.io/singularity/singularity:v3.5.1
command: -F /etc/alpine-release
entrypoint: /usr/bin/tail
privileged: true
volumes:
- /etc/localtime:/etc/localtime Zooming ahead, let's recreate and then start the instance: bash-5.0# singularity instance start busybox_latest.sif busybox
INFO: instance started successfully Success! bash-5.0# singularity instance list
INSTANCE NAME PID IP IMAGE
busybox 78 /go/busybox_latest.sif And we can exec to it: bash-5.0# singularity exec instance://busybox echo "HELLO"
HELLO So, although I'm hesitant that something about travis won't provide us with all the permissions that we need, I'm going to give it a try to add a docker compose file to reproduce this. My question for you is how I should go about providing this integration - the examples given previously expose services, and for this case we would want an environment with airflow and singularity in the same space. Do you have thoughts or examples for how to approach this? And then given that we can add a custom container, it would need to use one of the Singularity bases (note they are alpine), and then also include dependencies for airflow (or is the install already handled given the container, and happens before the testing?) Either way, if customization is needed on top of the Singularity base (that isn't handled by the testing suite) what would be the best way to do this - building the image elsewhere and using here, or building on demand? Thanks! |
Thanks @vsoch ! Really sorry for not getting back before - I've been travelling and missed that last comment. Fantastic work! I really liked how you explained the steps you did to make it works! Kudos!
Travis and any other environment will provide all that we need. We already run in privileged mode (and pretty much all CI systems allow that in order to make all kinds of docker-in-docker approach). In fact what we already do is we are mounting docker socket to within the airflow-testing so that you can even run docker commands to reach out the host docker engine. This is needed to run kind (Kubernetes-In-Docker).
Hmm, I'd assume that having a base Singularity should be enough and that we can reach it via "singularity" host name via docker compose. But I understand you need to build in some Airflow dependencies in? Am I right? What kind of dependencies you need to build in that container? Is this like local airflow sources or some airflow dependencies? Excuse my ignorance as I do not know much about singularity.
I hope this is helpful! |
I think the third approach is best:
because Singularity requires a fairly specific set of custom dependencies, and then takes a while to compile. I can generate this file locally and use the already existing bases on quay.io. Could you give me some pointers about where to write the Dockerfile, what example from the current codebase (link) is what I should try to copy the logic for? |
I think the closest one is Kubernetes and I am leaning towards creating another RUNTIME (singularity). There are few of scripts to update. But it's not that hard at the end. I think a custom It's best if you look for RUNTIME capital wherever it's used but here is the guide Breeze (for local development and testing your solution):
Docker Compose:
In container:
That's it for interactive "local" breeze session. T Then if you run tests the script goes further. It builds airflow kubernetes image and deploys it to kubernetes. In interactive session, the deployment should be run manually as described in https://github.com/apache/airflow/blob/master/TESTING.rst#running-tests-with-kubernetes but if you run tests in travis (RUN_TESTS="true") those steps are executed manually. I hope that's enough leads. |
okay so I'm tracing the kubernetes (runtime) as an example, and I have a quick question. In scripts/ci/in_container/entrypoint_ci.sh I see that given the kubernetes runtime, you are setting some variable for a container name to be the airflow name with suffix "-kubernetes." if [[ ${RUNTIME:=""} == "kubernetes" ]]; then
unset KRB5_CONFIG
unset KRB5_KTNAME
export AIRFLOW_KUBERNETES_IMAGE=${AIRFLOW_CI_IMAGE}-kubernetes
AIRFLOW_KUBERNETES_IMAGE_NAME=$(echo "${AIRFLOW_KUBERNETES_IMAGE}" | cut -f 1 -d ":")
export AIRFLOW_KUBERNETES_IMAGE_NAME
AIRFLOW_KUBERNETES_IMAGE_TAG=$(echo "${AIRFLOW_KUBERNETES_IMAGE}" | cut -f 2 -d ":")
export AIRFLOW_KUBERNETES_IMAGE_TAG
fi What isn't clear is where this container is built, and then where it gets used again? At this point we are already sitting inside the airflow-testing container after a command like this:
and this would suggest the container is brought up inside of this container? Wouldn't it make more sense to have a docker-compose run that runs some equivalent of airflow-testing but with Singularity installed inside? |
Oh wait I see, the "deploy_airflow_to_kubernetes.sh" detects the runtime and triggers the build to happen via: https://github.com/apache/airflow/blob/master/scripts/ci/in_container/kubernetes/docker/rebuild_airflow_image.sh. Let me see if I can mirror that with Singularity. |
okay just to clarify - you want a Singularity + Airflow container run via a similar kind cluster? You said something about adding specific commands |
ping... |
Somehow I missed it completely. Sorry. Responding now. |
Yes it would make sense - similar to what we do with other integrations (say kerberos or mongo). What we are doing with kind is a bit different - we are forwarding the host's docker credential to inside the airflow-testing container and we are running the kind command line from inside it. This - in effect - reaches out to the host's docker and set's up the kind cluster using host's docker engine. With singularity - if it can be run in the host and we can connect to it from inside the airflow-testing container that would be best. |
I assume we can start singularity from the host and be able to forward this connection to inside the airlfow-testing container so that we can connect to it. With Kind - we are starting it from within the container (by forwarded docker socket) , but it could be started from the host as well (mongo, kerberos and others are started from the host using |
Ah yes this is what I wanted to ask about, specifically:
Singularity is different from docker - it doesn't have a service or daemon, it's akin to an executable. It would be installed inside a container, and it wouldn't work to "start on the host and forward."
You mean to say that we start a container with Singularity installed via docker-compose, and then run as a service? I think it would work to run things internally, I'm not sure the extent to which you could issue interactions from the outside. This I think is something that we could do:
Is this possible? |
I see now !. In this case I think the easiest is to add the binary to Docker image and running it from there. Then there is no need to run it as separate image/runtime. That will be way simpler. We already do that for the minicluster - for hadoop tests: Docker installation here: https://github.com/apache/airflow/blob/master/Dockerfile#L184 |
Perfecto! I'll give this a shot. |
looks like the error linked above was one off - a later build didn't trigger it. |
Yeah. We still have some flaky tests - much less than before but they happen sometimes due to resource problems we have with Travis :(. We are close to migrate to Github Actions but it's one of the things that gets delayed as we are busy with other higher-priority stuff. It's annoying but it's not a blocker |
…pulled.... Signed-off-by: vsoch <vsochat@stanford.edu>
Signed-off-by: vsoch <vsochat@stanford.edu>
Signed-off-by: vsoch <vsochat@stanford.edu>
…/airflow into add/attempt-2-singularity-operator
Signed-off-by: vsoch <vsochat@stanford.edu>
Signed-off-by: vsoch <vsochat@stanford.edu>
Signed-off-by: vsoch <vsochat@stanford.edu>
@potiuk for the recent tests, I'm seeing a redirect (302) for an address that (used to be?) 200? I don't see any singularity operator fails to debug so I don't have any new commits to push to re-test. Is it possible for someone on your team to trigger a re-run, in the case of a one time spoof, or to take a look into the redirects? Here is the output that I'm referencing:
|
It could be a transient error or some temporary problem with master (despite of the protections we have in place it breaks occassionally). We do not see it now in the latest master, so I suggest to rebase to the latest master (it's green: https://travis-ci.org/apache/airflow/builds/653784320?utm_medium=notification&utm_source=github_status) and see if the problem is still there. |
…-singularity-operator
Alright, rebase is just done! I think it's likely something transient, because the rebase just had changes to UPDATING.md (note that right before the set of last pushes I rebased when there was a conflict with setup.py). So let's hope that the transient error doesn't show up again! |
one transient error - but I restarted it. Looks good ! |
🤞 |
holy crap it passed! Woot!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wohooo!
Awesome work, congrats on your first merged pull request! |
Thanks everyone :) It's just a little before 3:00am here but I still think I'm going to have a tiny dance party before I slip back to sleep. 🎉 Really looking forward to someone using this in the wild! |
Indeed Awesome work 🎉 🎉 🎉 🎉 🎉 🎉 . Likewise :). Thanks for cooperation on that one ! |
BTW. @vsoch I am going to give a talk together with @aijamalnk on Diversity and building even more welcoming community at the Airflow Summit (http://airflowsummit.org). And I will certainly use some of the things we've learned during this conversation. But I have just a thought - since you are nearby (I guess :) ) maybe you would like to give some talk about your side of the experience (and a bit about Singularity). I think that might be cool. We have CFP open : https://pretalx.com/apache-airflow-summit-bay-area-2020/cfp :) |
Ah, I love the computer history museum! I’m actually a remote worker to Stanford based out of Colorado, so there is still an hour or so drive, few hour flight, and expensive Bay Area hotels in the way of me participating. Maybe in the future if there is an option to join remotely or get reimbursed, I’d be able to come. And in the meantime I can definitely look over content and give feedback, if that might help. |
I see, pity, let's see. It's the first-time event so we do not know what to expect and for now do not foresee reimbursements, but if we are successful with finding sponsors, who knows, maybe we will introduce a diversity fund. I will keep you updated :) |
Hello @vsoch - I spoke to co-organizer of Airflow Summit - @leahecole - and I think we can figure out something with regards you being speaker at the Summit. @leahecole -> can you please reach out to @vsoch and take it from here. I would love to make this happen! |
Hi @vsoch! Jarek spoke super highly about this contribution and I'd love to talk about how we can get you to the summit - I just followed you on twitter (@leahecole there too) - I think one good way would be if you're willing to give a talk about it - I'd be happy to help you through submitting the talk proposal. Feel free to DM me there to talk more details and thanks for your Airflow contribution! :) |
) * adding singularity operator and tests Signed-off-by: Vanessa Sochat <vsochat@stanford.edu> * removing encoding pragmas and fixing up dockerfile to pass linting Signed-off-by: Vanessa Sochat <vsochat@stanford.edu> * make workdir in /tmp because AIRFLOW_SOURCES not defined yet Signed-off-by: Vanessa Sochat <vsochat@stanford.edu> * curl needs to follow redirects with -L Signed-off-by: Vanessa Sochat <vsochat@stanford.edu> * moving files to where they are supposed to be, more changes to mock, no clue Signed-off-by: vsoch <vsochat@stanford.edu> * removing trailing whitespace, moving example_dag for singularity, adding licenses to empty init files Signed-off-by: vsoch <vsochat@stanford.edu> * ran isort on example dags file Signed-off-by: vsoch <vsochat@stanford.edu> * adding missing init in example_dags folder for singularity Signed-off-by: vsoch <vsochat@stanford.edu> * removing code from __init__.py files for singularity operator to fix documentation generation Signed-off-by: vsoch <vsochat@stanford.edu> * forgot to update link to singularity in operators and hooks ref Signed-off-by: vsoch <vsochat@stanford.edu> * command must have been provided on init of singularity operator instance Signed-off-by: vsoch <vsochat@stanford.edu> * I guess I'm required to have a task_id? Signed-off-by: vsoch <vsochat@stanford.edu> * try adding working_dir to singularity operator type definitions Signed-off-by: vsoch <vsochat@stanford.edu> * disable too many arguments for pylint of singularity operator init Signed-off-by: vsoch <vsochat@stanford.edu> * move pylint disable up to line 64 - doesnt catch at end of statement like other examples Signed-off-by: vsoch <vsochat@stanford.edu> * two spaces before inline comment Signed-off-by: vsoch <vsochat@stanford.edu> * I dont see task_id as a param for other providers, removing for singularity operator Signed-off-by: vsoch <vsochat@stanford.edu> * adding debug print Signed-off-by: vsoch <vsochat@stanford.edu> * allow for return of just image and/or lines Signed-off-by: vsoch <vsochat@stanford.edu> * dont understand how mock works, but the image should exist after its pulled.... Signed-off-by: vsoch <vsochat@stanford.edu> * try removing shutil, the client should handle pull folder instead Signed-off-by: vsoch <vsochat@stanford.edu> * try changing pull-file to same uri that is expected to be pulled Signed-off-by: vsoch <vsochat@stanford.edu> * import of AirflowException moved to exceptions Signed-off-by: vsoch <vsochat@stanford.edu> * DAG module was moved to airflow.models Signed-off-by: vsoch <vsochat@stanford.edu> * ensure pull is called with pull_folder Signed-off-by: vsoch <vsochat@stanford.edu>
This is a second attempt to add Singularity Container support to Apache Airflow by way of Singularity Python. I am using the previously created JIRA ticket 4030 (created in March 2019) as it is still relevant. I am a new contributor and largely not familiar with the community here (and yes I've read the guidelines) so I would appreciate support and kindness from the individuals that act as maintainers here, and any additional support from other folks that are also interested in this integration. Thank you!
Signed-off-by: Vanessa Sochat vsochat@stanford.edu
Issue link: AIRFLOW-4030
Make sure to mark the boxes below before creating PR: [x]
[AIRFLOW-NNNN]
. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX]
.In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.