Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4030] second attempt to add singularity to airflow #7191

Merged
merged 30 commits into from
Feb 23, 2020
Merged

[AIRFLOW-4030] second attempt to add singularity to airflow #7191

merged 30 commits into from
Feb 23, 2020

Conversation

vsoch
Copy link
Contributor

@vsoch vsoch commented Jan 16, 2020

This is a second attempt to add Singularity Container support to Apache Airflow by way of Singularity Python. I am using the previously created JIRA ticket 4030 (created in March 2019) as it is still relevant. I am a new contributor and largely not familiar with the community here (and yes I've read the guidelines) so I would appreciate support and kindness from the individuals that act as maintainers here, and any additional support from other folks that are also interested in this integration. Thank you!

Signed-off-by: Vanessa Sochat vsochat@stanford.edu


Issue link: AIRFLOW-4030

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few small comments. Also please adjust the commit message, otherwise, title-validator will not pass (git commit --amend).

airflow/contrib/operators/singularity_operator.py Outdated Show resolved Hide resolved
airflow/contrib/operators/singularity_operator.py Outdated Show resolved Hide resolved
airflow/contrib/operators/singularity_operator.py Outdated Show resolved Hide resolved
airflow/contrib/operators/singularity_operator.py Outdated Show resolved Hide resolved
airflow/contrib/operators/singularity_operator.py Outdated Show resolved Hide resolved
airflow/contrib/operators/singularity_operator.py Outdated Show resolved Hide resolved
Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more thing. Travis is sad because of static checks, please consider using pre-commit hook as described here it will fix most of the problems automatically :)

tests/contrib/operators/test_singularity_operator.py Outdated Show resolved Hide resolved
@vsoch
Copy link
Contributor Author

vsoch commented Jan 20, 2020

The linting appears to be all set, but since Singularity is installed, the tests are obviously going to fail. Let me know how you would like to proceed - it's a non trivial thing to get Singularity installed from within Travis.

@vsoch
Copy link
Contributor Author

vsoch commented Jan 20, 2020

Also, could you give feedback on:

Title Validator — Wrong commit title: adding singularity operator and tests

What is wrong about it?

@potiuk
Copy link
Member

potiuk commented Jan 21, 2020

Also, could you give feedback on:

Title Validator — Wrong commit title: adding singularity operator and tests

What is wrong about it?

I think it's quite clearly described - it's about the "commit title". Commit title is the first line in commit message. As mentioned in the PR description:

Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*

Commit message should start with [AIRFLOW-NNNN] similarly as PR description. You can run git commit --amend to change it and then git push --force to push it. It's also quite clearly described in the "Pull Request guidelines": https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines with example:

Preface your commit's subject & PR title with [AIRFLOW-XXX] COMMIT_MSG where XXX is the JIRA number. For example: [AIRFLOW-5574] Fix Google Analytics script loading.

I have a question @vsoch - it seems that for some reason it is quite difficult to follow (for you at least). We have not only good and comprehensive documentation but also we have spend quite a lot of time to have messages printed either by the pre-commit system, or docs and require additional explanation (mostly repeating what is in the documentation already) - and pretty much careful reading the error messages could (I think) prevent asking extra questions - that was at least the goal. The messages are designed to guide you and In my answers I mostly repeat the messages already written there when you ask the question and link to already existing documentation.

I am really curious why it happens. Maybe we can improve it somehow - is the documentation not written clearly enough? Is there any problem with surfacing the error messages ? Or maybe there is a problem with finding relevant information? Or maybe you need more time to get familiar with the tools we are using ?

We tried (with professional technical writer) to place the documentation exactly where it is supposed to be, plus we push it exactly at the place where problems happen (for example in pre-commit checks) and try to make it as clear as possible - but apparently you still have problems with finding it. I am really curious - would it be possible for you to explain how are you approaching it, how are you reading (or maybe not?) the messages you get and how you are understanding (or not) what we try to communicate.

As a novice user you are the best "test" for it - maybe also you can have some ideas, proposals or maybe even a Pull Request that can improve it and make the documentation/messages better?

Looking forward to more insight.

@vsoch
Copy link
Contributor Author

vsoch commented Jan 21, 2020

Ah understood, I thought it was for the PR title, I’ll update the message for the next fix I do.

I think the documentation is good, there are just too many moving parts for a new contributor. I have limited bandwidth to read into the minute details so I tend to be responsive over proactive in terms of reading everything first, and I would say this tends to be the case for a large subset of contributors - we understand the basics and rely on the CI to tell us what to fix, and don’t expect all the other bells and whistles with new tools and steps we have to learn for just this PR. For this particular PR I found when I tried to follow the docs it was more trouble than just looking at the errors in the CI and fixing them one by one. When I largely ignored the docs and just looked at the CI and tried to fix things on my own based on that I was finally able to make progress. Your links and verbose re-explanation of current docs I know are done with good intention, but a large chunk sort of just confused me more.

If it makes you feel better, just chock it up to me being an idiot and I promise I’ll never bother this community again.

@potiuk
Copy link
Member

potiuk commented Jan 21, 2020

The linting appears to be all set, but since Singularity is installed, the tests are obviously going to fail. Let me know how you would like to proceed - it's a non trivial thing to get Singularity installed from within Travis.

I don't think we have enough singularity experience to add it. We approached in various ways for other integrations:

I do not know if/whether any of such options is possible for Singularity - please take a look and see if this is possible to use either of the options above?

If we find integration too difficult - In a number cases we rely on "unit tests" where we mock the external systems and mark the integration tests as skipped - we have a custom marker pytest.mark.integration that does it: https://github.com/apache/airflow/blob/master/TESTING.rst#running-integration-tests . If you have pytest.mark.integration("singularity") as decorator, the test will be skipped unless you have INTEGRATION_SINGULARITY env variable defined. You can leave such tests in the code and manually test it when you have singularity (but it's best if we can figure out a way to add it to the docker-compose we are already running).

@vsoch
Copy link
Contributor Author

vsoch commented Jan 21, 2020

You definitely can’t get a fully working Singularity from within Docker (even with privileged it’s a bit janky) so I’ll need to look at the other options and decide what is most logical to try first. I’ll take a look tomorrow, thanks for the clearly outlined options!

@vsoch
Copy link
Contributor Author

vsoch commented Jan 21, 2020

But one quick note - regardless of the testing troubles, the support on this PR has been absolutely excellent, night and day when compared to my previous experience. I'm really appreciative of that :)

@potiuk
Copy link
Member

potiuk commented Jan 21, 2020

If it makes you feel better, just chock it up to me being an idiot and I promise I’ll never bother this community again.

First of all it's not a matter of making me feel better at all. I certainly don't think of you as an idiot, and I would never, ever call anyone idiot, just because that person has no time, or bandwidth or capacity, to read and understand the TON of documentation we have. And if anything - that's a sign that maybe we should do better. You seem to be competent and you know how to program and fix things on your own and do your best to do it - I certainly cannot blame you that you do not know our complex setup.

And Airflow IS a complex project. It has a LOT of moving parts and my goal is to make it approachable to even one-time contributors and my ultimate goal is to make it as easy and painless as possible - while keeping the high bar on quality, test automation, check automation etc.

I am just curious what are the obstacles to that - and whether we could make it somewhat easier to someone like you who just comes to the project from outside for a short time. That's why I am looking for some suggestions/areas of improvements. I know the project too well myself to see the obstacles - there is even a term for that "Expert Blindness" - http://mnav.com/expert-blindness/ that's why I am looking for help and suggestions of people like you.

To have the project much easier approachable is my goal from the very beginning of the project and - believe me - a year ago it took literally days to set it all up so that you could run tests. Now (if we do not have sudden breakage of master by 3rd party dependencies) you can have it up and running in 10 minutes (with fast network).

I think so far - by looking of what you've done, I will make a few improvements:

  • I am going to make "Breeze" THE default dev environment (we have it much lighter as of a few weeks) and if you had it installed (by simply running ./breeze command), then pre-commit would just work out-of-the box

or (and?)

  • we can direct new contributors to simply follow the CI messages - they are descriptive enough and should be good to start

I already added a few improvements to Contributing and we even have this PR in progress to explain how communication is expected to look like : #7204 - I will be happy if you add your comments there.

WDYT? Anything else you think we can improve to make it more approachable?

@potiuk
Copy link
Member

potiuk commented Jan 21, 2020

But one quick note - regardless of the testing troubles, the support on this PR has been absolutely excellent, night and day when compared to my previous experience. I'm really appreciative of that :)

Thanks for the kind words :). The trouble with that it that we can't afford to spend that much time for every single new person coming :(. The project needs to be scalable. That's why my goal is to have it all as smooth as possible and as easy to self-study and self-discover as possible - only reaching out to mentors/committers/other experienced contributors when really needed.
Of course you can't do it from day 1 or even day 5, but my hope is that if we relentlessly work on it and improve it and also get a lot of feedback from people like you - we can approach this "ideal" where everything is self-explanatory and easy to follow.

@codecov-io
Copy link

codecov-io commented Jan 21, 2020

Codecov Report

Merging #7191 into master will increase coverage by 54%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7191      +/-   ##
==========================================
+ Coverage    32.8%   86.81%     +54%     
==========================================
  Files         892      893       +1     
  Lines       42173    42186      +13     
==========================================
+ Hits        13836    36622   +22786     
+ Misses      28337     5564   -22773
Impacted Files Coverage Δ
airflow/bin/airflow 84.61% <0%> (ø)
airflow/plugins_manager.py 86.95% <0%> (+0.72%) ⬆️
airflow/utils/db.py 98.31% <0%> (+0.84%) ⬆️
airflow/bin/cli.py 94.73% <0%> (+2.1%) ⬆️
airflow/executors/celery_executor.py 88.51% <0%> (+3.37%) ⬆️
airflow/exceptions.py 100% <0%> (+3.57%) ⬆️
airflow/utils/sqlalchemy.py 84.93% <0%> (+4.1%) ⬆️
airflow/utils/decorators.py 90.47% <0%> (+4.76%) ⬆️
airflow/kubernetes/pod_launcher.py 92.25% <0%> (+5.63%) ⬆️
airflow/www/forms.py 100% <0%> (+7.27%) ⬆️
... and 760 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40fbadc...c089430. Read the comment docs.

@vsoch
Copy link
Contributor Author

vsoch commented Jan 21, 2020

Writing up this robust feedback and researching the CI solutions is going to take me some time, just to keep you updated I won't be able to work on this promptly, I need to prioritize work for my job. But I'll be back, stay tuned! :)

@potiuk
Copy link
Member

potiuk commented Jan 21, 2020

That's OK!

@vsoch
Copy link
Contributor Author

vsoch commented Jan 22, 2020

Here are my thoughts @potiuk, I hope that they are helpful! I think you have a challenging problem of wanting to scale an interaction that has many expectations around people, and human interactions that require kindness and patience. But there definitely could be some tweaks (e.g., sticking more to traditional expectations of a review, giving details when it's called for and not expecting pre-request expertise, and either minimizing tooling that is unexpected, defaulting to using the testing environment as the base for giving feedback to the contributor, or removing layers of abstraction so it's less confusing).

Anyway, I outlined the details much better in that document. Please feel free to share!

@vsoch
Copy link
Contributor Author

vsoch commented Jan 22, 2020

okay, I've taken a look at the docker-compose approach. Specifically, I've created a integration-singularity.yml file that uses a container base that provides Singularity (v3.5.1).

version: "2.2"
services:
  singularity:
    image: quay.io/singularity/singularity:v3.5.1
    command: -F /etc/alpine-release
    entrypoint: /usr/bin/tail
    privileged: true
    volumes:
      - /dev/urandom:/dev/random   # Required to get non-blocking entropy source

Note that I'm using privileged, which likely requires the sudo:true parameter for the testing setup. The traditional entrypoint is the Singularity executable, so I have a basic entrypoint/command to just keep the container running (and allow me to shell inside). Even outside of testing (which is fairly complicated and I'm hoping to not need to reproduce it locally) we can test if the basic interaction with Singularity would work inside the container.

docker-compose --log-level INFO -f "scripts/ci/docker-compose/integration-singularity.yml" up -d

Check that everything went up ok...

docker-compose --log-level INFO -f "scripts/ci/docker-compose/integration-singularity.yml" logs
Attaching to docker-compose_singularity_1_7d898ecbdbb0
singularity_1_7d898ecbdbb0 | 3.10.3

And that we continued running...

$ docker-compose --log-level INFO -f "scripts/ci/docker-compose/integration-singularity.yml" ps
                  Name                                 Command               State   Ports
--------------------------------------------------------------------------------------
docker-compose_singularity_1_7d898ecbdbb0   /usr/bin/tail -F /etc/alpi ...   Up  

Great! And then we can shell inside. We confirm the version of singularity.

$ docker exec -it 0081d687eeb1 bash
bash-5.0# which singularity
/usr/local/bin/singularity
bash-5.0# singularity --version
singularity version 3.5.1

The basic thing we would need to do is start instances. We can actually build the SIF binary successfully, and let's do that first, pulling the container:

$ docker exec -it 3e73d2d5b41e bash
bash-5.0# which singularity
/usr/local/bin/singularity
bash-5.0# singularity pull docker://busybox
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob bdbbaa22dec6 done
Copying config b075e856be done
Writing manifest to image destination
Storing signatures
2020/01/22 20:24:24  info unpack layer: sha256:bdbbaa22dec6b7fe23106d2c1b1f43d9598cd8fc33706cc27c1d938ecd5bffc7
INFO:    Creating SIF file...
INFO:    Build complete: busybox_latest.sif
bash-5.0# 

Now let's start the instance.

bash-5.0# singularity instance start busybox_latest.sif busybox
ERROR:   container cleanup failed: no instance found with name busybox
FATAL:   container creation failed: mount /etc/localtime->/etc/localtime error: while mounting /etc/localtime: mount source /etc/localtime doesn't exist

FATAL:   failed to start instance: while running /usr/local/libexec/singularity/bin/starter: exit status 255

okay, let's trace back and add this to our docker-compose yml file, we will mount as a volume from the host and redo the above.

version: "2.2"
services:
  singularity:
    image: quay.io/singularity/singularity:v3.5.1
    command: -F /etc/alpine-release
    entrypoint: /usr/bin/tail
    privileged: true
    volumes:
      - /etc/localtime:/etc/localtime

Zooming ahead, let's recreate and then start the instance:

bash-5.0# singularity instance start busybox_latest.sif busybox
INFO:    instance started successfully

Success!

bash-5.0# singularity instance list
INSTANCE NAME    PID      IP              IMAGE
busybox          78                       /go/busybox_latest.sif

And we can exec to it:

bash-5.0# singularity exec instance://busybox echo "HELLO"
HELLO

So, although I'm hesitant that something about travis won't provide us with all the permissions that we need, I'm going to give it a try to add a docker compose file to reproduce this. My question for you is how I should go about providing this integration - the examples given previously expose services, and for this case we would want an environment with airflow and singularity in the same space. Do you have thoughts or examples for how to approach this?

And then given that we can add a custom container, it would need to use one of the Singularity bases (note they are alpine), and then also include dependencies for airflow (or is the install already handled given the container, and happens before the testing?) Either way, if customization is needed on top of the Singularity base (that isn't handled by the testing suite) what would be the best way to do this - building the image elsewhere and using here, or building on demand?

Thanks!

@potiuk
Copy link
Member

potiuk commented Jan 28, 2020

Thanks @vsoch ! Really sorry for not getting back before - I've been travelling and missed that last comment.

Fantastic work! I really liked how you explained the steps you did to make it works! Kudos!

So, although I'm hesitant that something about travis won't provide us with all the permissions that we need, I'm going to give it a try to add a docker compose file to reproduce this. My question for you is how I should go about providing this integration - the examples given previously expose services, and for this case we would want an environment with airflow and singularity in the same space. Do you have thoughts or examples for how to approach this?

Travis and any other environment will provide all that we need. We already run in privileged mode (and pretty much all CI systems allow that in order to make all kinds of docker-in-docker approach). In fact what we already do is we are mounting docker socket to within the airflow-testing so that you can even run docker commands to reach out the host docker engine. This is needed to run kind (Kubernetes-In-Docker).

And then given that we can add a custom container, it would need to use one of the Singularity bases (note they are alpine), and then also include dependencies for airflow (or is the install already handled given the container, and happens before the testing?) Either way, if customization is needed on top of the Singularity base (that isn't handled by the testing suite) what would be the best way to do this - building the image elsewhere and using here, or building on demand?

Hmm, I'd assume that having a base Singularity should be enough and that we can reach it via "singularity" host name via docker compose. But I understand you need to build in some Airflow dependencies in? Am I right? What kind of dependencies you need to build in that container? Is this like local airflow sources or some airflow dependencies? Excuse my ignorance as I do not know much about singularity.
We have few approaches we can get:

  1. In case of Kubernetes (Kind) we are using RUNTIME="kubernetes" env and we are checking that in the initial CI scripts - if the RUNTIME="kubernetes" we will install kubernetes-in-docker cluster and provision it with the latest Airflow locally built CI image and then we run our tests with this. This is most complex thing we do and I hope we do not have to do the same.

  2. If those are about some small packages to install at running singularity after it is started already, we could simply do it in the CI scripts when we start up and see "INTEGRATION_SINGULARITY" is set to "true" (this will be properly set to true if you add it similarly to other integrations of ours and when --integration singularity switch is used).

  3. If you need something pre-backed in the Docker image itself then I think we need a separate Dockerfile with FROM: singularity and some steps to build the dockerfile before docker-compose is run (and refer to the newly built image in Docker compose. My best bet if that is the case, that would be that you build and test such images locally for local testing and once it works I can help to integrate it in the CI scripts properly (You can try yourself - this will likely be just appropriate 'docker build' command executed only when --integration singularity is used)

I hope this is helpful!

@vsoch
Copy link
Contributor Author

vsoch commented Jan 28, 2020

I think the third approach is best:

If you need something pre-backed in the Docker image itself then I think we need a separate Dockerfile with FROM: singularity and some steps to build the dockerfile before docker-compose is run (and refer to the newly built image in Docker compose. My best bet if that is the case, that would be that you build and test such images locally for local testing and once it works I can help to integrate it in the CI scripts properly (You can try yourself - this will likely be just appropriate 'docker build' command executed only when --integration singularity is used)

because Singularity requires a fairly specific set of custom dependencies, and then takes a while to compile. I can generate this file locally and use the already existing bases on quay.io. Could you give me some pointers about where to write the Dockerfile, what example from the current codebase (link) is what I should try to copy the logic for?

@potiuk
Copy link
Member

potiuk commented Jan 28, 2020

Could you give me some pointers about where to write the Dockerfile, what example from the current codebase (link) is what I should try to copy the logic for?

I think the closest one is Kubernetes and I am leaning towards creating another RUNTIME (singularity). There are few of scripts to update. But it's not that hard at the end.

I think a custom --start-singularity --stop-singularity switches should be added similar to --start-kind-cluster --stop-kind-cluster.

It's best if you look for RUNTIME capital wherever it's used but here is the guide

Breeze (for local development and testing your solution):

Docker Compose:

In container:

That's it for interactive "local" breeze session. T

Then if you run tests the script goes further. It builds airflow kubernetes image and deploys it to kubernetes. In interactive session, the deployment should be run manually as described in https://github.com/apache/airflow/blob/master/TESTING.rst#running-tests-with-kubernetes

but if you run tests in travis (RUN_TESTS="true") those steps are executed manually.
https://github.com/apache/airflow/blob/master/scripts/ci/in_container/entrypoint_ci.sh#L218 you have the deployment of image to kubernetes: https://github.com/apache/airflow/blob/master/scripts/ci/in_container/deploy_airflow_to_kubernetes.sh and one of the scripts it runs https://github.com/apache/airflow/blob/master/scripts/ci/in_container/kubernetes/docker/rebuild_airflow_image.sh - that's where the Kubernetes image is built.

I hope that's enough leads.

@vsoch
Copy link
Contributor Author

vsoch commented Jan 29, 2020

okay so I'm tracing the kubernetes (runtime) as an example, and I have a quick question. In scripts/ci/in_container/entrypoint_ci.sh I see that given the kubernetes runtime, you are setting some variable for a container name to be the airflow name with suffix "-kubernetes."

if [[ ${RUNTIME:=""} == "kubernetes" ]]; then
    unset KRB5_CONFIG
    unset KRB5_KTNAME
    export AIRFLOW_KUBERNETES_IMAGE=${AIRFLOW_CI_IMAGE}-kubernetes
    AIRFLOW_KUBERNETES_IMAGE_NAME=$(echo "${AIRFLOW_KUBERNETES_IMAGE}" | cut -f 1 -d ":")
    export AIRFLOW_KUBERNETES_IMAGE_NAME
    AIRFLOW_KUBERNETES_IMAGE_TAG=$(echo "${AIRFLOW_KUBERNETES_IMAGE}" | cut -f 2 -d ":")
    export AIRFLOW_KUBERNETES_IMAGE_TAG
fi

What isn't clear is where this container is built, and then where it gets used again? At this point we are already sitting inside the airflow-testing container after a command like this:

else if [[ ${RUNTIME:=} == "singularity" ]]; then
    set +u
    # shellcheck disable=SC2016
    docker-compose --log-level INFO \
      -f "${MY_DIR}/docker-compose/base.yml" \
      -f "${MY_DIR}/docker-compose/backend-${BACKEND}.yml" \
      -f "${MY_DIR}/docker-compose/runtime-singularity.yml" \
      "${INTEGRATIONS[@]}" \
      "${DOCKER_COMPOSE_LOCAL[@]}" \
         run airflow-testing \
           '/opt/airflow/scripts/ci/in_container/entrypoint_ci.sh "${@}"' \
           /opt/airflow/scripts/ci/in_container/entrypoint_ci.sh "${@}"
         # Note the command is there twice (!) because it is passed via bash -c
         # and bash -c starts passing parameters from $0. TODO: fixme
    set -u

and this would suggest the container is brought up inside of this container? Wouldn't it make more sense to have a docker-compose run that runs some equivalent of airflow-testing but with Singularity installed inside?

@vsoch
Copy link
Contributor Author

vsoch commented Jan 29, 2020

Oh wait I see, the "deploy_airflow_to_kubernetes.sh" detects the runtime and triggers the build to happen via: https://github.com/apache/airflow/blob/master/scripts/ci/in_container/kubernetes/docker/rebuild_airflow_image.sh. Let me see if I can mirror that with Singularity.

@vsoch
Copy link
Contributor Author

vsoch commented Jan 29, 2020

okay just to clarify - you want a Singularity + Airflow container run via a similar kind cluster? You said something about adding specific commands --start-singularity and --stop-singularity and I'm not sure what / where that is referring to, and how / why we would want to start or stop singularity (it's not a service, it's just a binary installed).

@vsoch
Copy link
Contributor Author

vsoch commented Feb 13, 2020

ping...

@potiuk
Copy link
Member

potiuk commented Feb 13, 2020

Somehow I missed it completely. Sorry. Responding now.

@potiuk
Copy link
Member

potiuk commented Feb 13, 2020

okay so I'm tracing the kubernetes (runtime) as an example, and I have a quick question. In scripts/ci/in_container/entrypoint_ci.sh I see that given the kubernetes runtime, you are setting some variable for a container name to be the airflow name with suffix "-kubernetes."

and this would suggest the container is brought up inside of this container? Wouldn't it make more sense to have a docker-compose run that runs some equivalent of airflow-testing but with Singularity installed inside?

Yes it would make sense - similar to what we do with other integrations (say kerberos or mongo). What we are doing with kind is a bit different - we are forwarding the host's docker credential to inside the airflow-testing container and we are running the kind command line from inside it. This - in effect - reaches out to the host's docker and set's up the kind cluster using host's docker engine.

With singularity - if it can be run in the host and we can connect to it from inside the airflow-testing container that would be best.

@potiuk
Copy link
Member

potiuk commented Feb 13, 2020

okay just to clarify - you want a Singularity + Airflow container run via a similar kind cluster? You said something about adding specific commands --start-singularity and --stop-singularity and I'm not sure what / where that is referring to, and how / why we would want to start or stop singularity (it's not a service, it's just a binary installed).

I assume we can start singularity from the host and be able to forward this connection to inside the airlfow-testing container so that we can connect to it. With Kind - we are starting it from within the container (by forwarded docker socket) , but it could be started from the host as well (mongo, kerberos and others are started from the host using docker-compose configuration and then we can connect to them from the "airflow-testing" by specifying their service names (mongo/kerberos etc).

@vsoch
Copy link
Contributor Author

vsoch commented Feb 13, 2020

I assume we can start singularity from the host and be able to forward this connection to inside the airlfow-testing container so that we can connect to it. With Kind - we are starting it from within the container (by forwarded docker socket) , but it could be started from the host as well (mongo, kerberos and others are started from the host using docker-compose configuration and then we can connect to them from the "airflow-testing" by specifying their service names (mongo/kerberos etc).

Ah yes this is what I wanted to ask about, specifically:

I assume we can start singularity from the host and be able to forward this connection to inside the airlfow-testing container so that we can connect to it.

Singularity is different from docker - it doesn't have a service or daemon, it's akin to an executable. It would be installed inside a container, and it wouldn't work to "start on the host and forward."

started from the host using docker-compose configuration and then we can connect to them from the "airflow-testing" by specifying their service names (mongo/kerberos etc).

You mean to say that we start a container with Singularity installed via docker-compose, and then run as a service? I think it would work to run things internally, I'm not sure the extent to which you could issue interactions from the outside. This I think is something that we could do:

  1. installing Singularity as a binary inside a container (these bases already exist)
  2. running the container via docker-compose
  3. But then running tests inside of that container

Is this possible?

@potiuk
Copy link
Member

potiuk commented Feb 13, 2020

  1. installing Singularity as a binary inside a container (these bases already exist)
  2. running the container via docker-compose
  3. But then running tests inside of that container

Is this possible?

I see now !. In this case I think the easiest is to add the binary to Docker image and running it from there. Then there is no need to run it as separate image/runtime. That will be way simpler. We already do that for the minicluster - for hadoop tests:

Docker installation here: https://github.com/apache/airflow/blob/master/Dockerfile#L184
And starting the cluster here: https://github.com/apache/airflow/blob/master/scripts/ci/in_container/entrypoint_ci.sh#L140
J.

@vsoch
Copy link
Contributor Author

vsoch commented Feb 13, 2020

Perfecto! I'll give this a shot.

@vsoch
Copy link
Contributor Author

vsoch commented Feb 20, 2020

looks like the error linked above was one off - a later build didn't trigger it.

@potiuk
Copy link
Member

potiuk commented Feb 21, 2020

looks like the error linked above was one off - a later build didn't trigger it.

Yeah. We still have some flaky tests - much less than before but they happen sometimes due to resource problems we have with Travis :(. We are close to migrate to Github Actions but it's one of the things that gets delayed as we are busy with other higher-priority stuff. It's annoying but it's not a blocker

vsoch and others added 8 commits February 21, 2020 11:53
…pulled....

Signed-off-by: vsoch <vsochat@stanford.edu>
Signed-off-by: vsoch <vsochat@stanford.edu>
Signed-off-by: vsoch <vsochat@stanford.edu>
…/airflow into add/attempt-2-singularity-operator
Signed-off-by: vsoch <vsochat@stanford.edu>
Signed-off-by: vsoch <vsochat@stanford.edu>
Signed-off-by: vsoch <vsochat@stanford.edu>
@vsoch
Copy link
Contributor Author

vsoch commented Feb 23, 2020

@potiuk for the recent tests, I'm seeing a redirect (302) for an address that (used to be?) 200? I don't see any singularity operator fails to debug so I don't have any new commits to push to re-test. Is it possible for someone on your team to trigger a re-run, in the case of a one time spoof, or to take a look into the redirects? Here is the output that I'm referencing:

tests/www/test_views.py:98: in check_content_in_response

    self.assertEqual(resp_code, resp.status_code)

E   AssertionError: 200 != 302

@potiuk
Copy link
Member

potiuk commented Feb 23, 2020

It could be a transient error or some temporary problem with master (despite of the protections we have in place it breaks occassionally). We do not see it now in the latest master, so I suggest to rebase to the latest master (it's green: https://travis-ci.org/apache/airflow/builds/653784320?utm_medium=notification&utm_source=github_status) and see if the problem is still there.

@vsoch
Copy link
Contributor Author

vsoch commented Feb 23, 2020

Alright, rebase is just done! I think it's likely something transient, because the rebase just had changes to UPDATING.md (note that right before the set of last pushes I rebased when there was a conflict with setup.py). So let's hope that the transient error doesn't show up again!

@potiuk
Copy link
Member

potiuk commented Feb 23, 2020

one transient error - but I restarted it. Looks good !

@potiuk
Copy link
Member

potiuk commented Feb 23, 2020

🤞

@vsoch
Copy link
Contributor Author

vsoch commented Feb 23, 2020

holy crap it passed! Woot!!

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wohooo!

@potiuk potiuk merged commit 0bb6879 into apache:master Feb 23, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 23, 2020

Awesome work, congrats on your first merged pull request!

@vsoch
Copy link
Contributor Author

vsoch commented Feb 23, 2020

Thanks everyone :) It's just a little before 3:00am here but I still think I'm going to have a tiny dance party before I slip back to sleep. 🎉

Really looking forward to someone using this in the wild!

@potiuk
Copy link
Member

potiuk commented Feb 23, 2020

Indeed Awesome work 🎉 🎉 🎉 🎉 🎉 🎉 . Likewise :). Thanks for cooperation on that one !

@potiuk
Copy link
Member

potiuk commented Feb 23, 2020

BTW. @vsoch I am going to give a talk together with @aijamalnk on Diversity and building even more welcoming community at the Airflow Summit (http://airflowsummit.org). And I will certainly use some of the things we've learned during this conversation. But I have just a thought - since you are nearby (I guess :) ) maybe you would like to give some talk about your side of the experience (and a bit about Singularity). I think that might be cool.

We have CFP open : https://pretalx.com/apache-airflow-summit-bay-area-2020/cfp :)

@vsoch
Copy link
Contributor Author

vsoch commented Feb 23, 2020

Ah, I love the computer history museum! I’m actually a remote worker to Stanford based out of Colorado, so there is still an hour or so drive, few hour flight, and expensive Bay Area hotels in the way of me participating. Maybe in the future if there is an option to join remotely or get reimbursed, I’d be able to come. And in the meantime I can definitely look over content and give feedback, if that might help.

@potiuk
Copy link
Member

potiuk commented Feb 23, 2020

I see, pity, let's see. It's the first-time event so we do not know what to expect and for now do not foresee reimbursements, but if we are successful with finding sponsors, who knows, maybe we will introduce a diversity fund. I will keep you updated :)

@potiuk
Copy link
Member

potiuk commented Mar 2, 2020

Hello @vsoch - I spoke to co-organizer of Airflow Summit - @leahecole - and I think we can figure out something with regards you being speaker at the Summit. @leahecole -> can you please reach out to @vsoch and take it from here. I would love to make this happen!

@leahecole
Copy link
Contributor

Hi @vsoch! Jarek spoke super highly about this contribution and I'd love to talk about how we can get you to the summit - I just followed you on twitter (@leahecole there too) - I think one good way would be if you're willing to give a talk about it - I'd be happy to help you through submitting the talk proposal. Feel free to DM me there to talk more details and thanks for your Airflow contribution! :)

galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
)

* adding singularity operator and tests

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>

* removing encoding pragmas and fixing up dockerfile to pass linting

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>

* make workdir in /tmp because AIRFLOW_SOURCES not defined yet

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>

* curl needs to follow redirects with -L

Signed-off-by: Vanessa Sochat <vsochat@stanford.edu>

* moving files to where they are supposed to be, more changes to mock, no clue

Signed-off-by: vsoch <vsochat@stanford.edu>

* removing trailing whitespace, moving example_dag for singularity, adding licenses to empty init files

Signed-off-by: vsoch <vsochat@stanford.edu>

* ran isort on example dags file

Signed-off-by: vsoch <vsochat@stanford.edu>

* adding missing init in example_dags folder for singularity

Signed-off-by: vsoch <vsochat@stanford.edu>

* removing code from __init__.py files for singularity operator to fix documentation generation

Signed-off-by: vsoch <vsochat@stanford.edu>

* forgot to update link to singularity in operators and hooks ref

Signed-off-by: vsoch <vsochat@stanford.edu>

* command must have been provided on init of singularity operator instance

Signed-off-by: vsoch <vsochat@stanford.edu>

* I guess I'm required to have a task_id?

Signed-off-by: vsoch <vsochat@stanford.edu>

* try adding working_dir to singularity operator type definitions

Signed-off-by: vsoch <vsochat@stanford.edu>

* disable too many arguments for pylint of singularity operator init

Signed-off-by: vsoch <vsochat@stanford.edu>

* move pylint disable up to line 64 - doesnt catch at end of statement like other examples

Signed-off-by: vsoch <vsochat@stanford.edu>

* two spaces before inline comment

Signed-off-by: vsoch <vsochat@stanford.edu>

* I dont see task_id as a param for other providers, removing for singularity operator

Signed-off-by: vsoch <vsochat@stanford.edu>

* adding debug print

Signed-off-by: vsoch <vsochat@stanford.edu>

* allow for return of just image and/or lines

Signed-off-by: vsoch <vsochat@stanford.edu>

* dont understand how mock works, but the image should exist after its pulled....

Signed-off-by: vsoch <vsochat@stanford.edu>

* try removing shutil, the client should handle pull folder instead

Signed-off-by: vsoch <vsochat@stanford.edu>

* try changing pull-file to same uri that is expected to be pulled

Signed-off-by: vsoch <vsochat@stanford.edu>

* import of AirflowException moved to exceptions

Signed-off-by: vsoch <vsochat@stanford.edu>

* DAG module was moved to airflow.models

Signed-off-by: vsoch <vsochat@stanford.edu>

* ensure pull is called with pull_folder

Signed-off-by: vsoch <vsochat@stanford.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants