Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for ip-per-task apps in the adminrouter cache #1628

Closed
wants to merge 1 commit into from

Conversation

SlevinBE
Copy link
Contributor

@SlevinBE SlevinBE commented Jun 6, 2017

High Level Description

The current DC/OS AdminRouter, the component which handles all the admin UI traffic of DC/OS, is not able to handle services which run in a virtual network (in our case Calico). Containers running in a virtual network get assigned their own IP, but the AdminRouter always takes the host name and port instead of the container's IP and port.
Therefore it's currently not possible to expose an ip-per-task app via the DC/OS admin interface (using the DCOS_SERVICE_NAME, DCOS_SERVICE_SCHEME and DCOS_SERVICE_PORT_INDEX app labels).

This PR tries to fix this issue in the AdminRouter's cache by taking the task's IP and port when it's an ip-per-task app (determined by the presence of the 'ipAddress' field on an app's definition) or taking the host's IP and port in case of a regular task (the default behaviour which existed until now).
I've used a similar approach as how it's implemented in Marathon-lb: https://github.com/mesosphere/marathon-lb/blob/master/utils.py#L314-L347

Related Issues

N/A

Checklist for all PR's

Checklist for component/package updates:

If you are changing components or packages in DC/OS (e.g. you are bumping the sha or ref of anything underneath packages), then in addition to the above please also include:

  • Change log from the last version integrated (this should be a link to commits for easy verification and review): example
  • Test Results: [link to CI job test results for component]
  • Code Coverage (if available): [link to code coverage report]

@d2iq-mergebot
Copy link
Collaborator

This repo has @mesosphere-mergebot integration. You can interact with the following commands.

@mesosphere-mergebot label [Request For Comment |Ship It |Holding |Work In Progress |Ready For Review] 
@mesosphere-mergebot bump-ee  
  • PR creators can apply one of [Ready For Review |Work In Progress]. Owners can apply any label.

@SlevinBE
Copy link
Contributor Author

SlevinBE commented Jun 7, 2017

@vespian not sure who to reference for this, but since you had a similar PR related to the AdminRouter I guess maybe you can help me out with this :-)
Two builds of this PR haven't succeeded yet:

  • integration-test/azure failed: but I can't see why exactly. I'm getting the following warning in the TeamCity interface:

You do not have enough permissions to access build type with id: ClosedSource_Dcos_IntegrationTests_CloudIntegrationTests_LaunchAzureResourceMana

  • continuous-integration/jenkins/pr-head has been pending since yesterday. Also can't see the reason why since I can't login.

Copy link
Contributor

@vespian vespian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@vespian
Copy link
Contributor

vespian commented Jun 7, 2017

@SlevinBE Thanks for the contribution!

I am sorry for the trouble with the CI, I will be taking care of making continuous-integration/jenkins/pr-head task public soonish, @orsenthil could you please have a look at why azure task is inaccessible?

As for the continuous-integration/jenkins/pr-head task itself - please run make flake8 target, I see some pep8 issues.

@jgehrcke
Copy link
Contributor

jgehrcke commented Jun 7, 2017

Thank you, @SlevinBE. I would appreciate if @karlkfi can take a look, too.

@jgehrcke jgehrcke requested a review from karlkfi June 7, 2017 07:59
@jgehrcke
Copy link
Contributor

jgehrcke commented Jun 7, 2017

@karlkfi I'd be interested to hear if you agree that we can maintain this feature in Admin Router in the long term. @SlevinBE please let us know if you are interested in also updating the rudimentary documentation at https://github.com/dcos/dcos/tree/master/packages/adminrouter/extra/src#service-endpoints :-).

@SlevinBE
Copy link
Contributor Author

SlevinBE commented Jun 7, 2017

@vespian thanks! I've resolved the pep8 code style issues. Shouldn't the code-quality build have fail ed in this case (because it didn't)? Or is this build serving a different purpose?

@vespian
Copy link
Contributor

vespian commented Jun 7, 2017

@SlevinBE Code quality check verifies the DC/OS build scripts, AR code quality checks have not been integrated yet.

@SlevinBE
Copy link
Contributor Author

SlevinBE commented Jun 7, 2017

@jgehrcke I've updated the service endpoints documentation to describe the ip-per-task behaviour

@karlkfi
Copy link
Contributor

karlkfi commented Jun 7, 2017

does this work for VIPs or just overlays?

@karlkfi
Copy link
Contributor

karlkfi commented Jun 7, 2017

@karlkfi I'd be interested to hear if you agree that we can maintain this feature in Admin Router in the long term.

I personally dislike the /service endpoint as-is, but that has nothing to do with this recent change. My primary concerns are:

  1. combining framework and "service" proxying into a single endpoint seems overly complicated and unintuitive.
  2. "service" is a poor name here because it's a service proxy, not service management, which makes it harder to refactor marathon (or another component) to have a "service" primitive. This is really more of an "admin proxy" than a general solution for exposing and load balancing apps & services.
  3. This endpoint is sort of trying to satisfy a service discover problem, which seems like it would better be solved by a lower level component. For example, why not just expose Spartan's endpoints here that are generated for every service, and let Spartan do the heavy lifting?
  4. I don't think that Marathon and Metronome (component schedulers) should be proxied by this endpoint. I think those should be top level APIs to make them more discoverable and easier to document.

Besides that, I don't have a problem with proxying to overlay IPs and host IPs, but I would expect such a system to also work with VIPs.

@@ -43,7 +43,7 @@ Admin Router allows Marathon tasks to define custom service UI and HTTP endpoint
}
```

In this case `http://<dcos-cluster>/service/service-name` would be forwarded to the host running the task using the first port allocated to the task.
When your container/task has its own IP (typically when running in a virtual network), `http://<dcos-cluster>/service/service-name` would be forwarded to the container/task's IP using the first port mapping or port definition when USER networking is enabled or the first discovery port otherwise. When your task/container is mapped to the host, it would be forwarded to the host running the task using the first port allocated to the task.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the first port mapping or port definition

I would prefer this mapping was triggered by a specific port name, like "http" or "https", rather than using the first one in the list. Do ports have labels too? That might be even better. Then we could require labels specific to Admin Router and not overload the names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is an error on my part in the documentation. The chosen port is defined by the DCOS_SERVICE_PORT_INDEX label, which is a required label.

@SlevinBE
Copy link
Contributor Author

SlevinBE commented Jun 9, 2017

@karlkfi this PR only implements the ip-per-task (overlays) functionality and does not include VIP support.

@karlkfi
Copy link
Contributor

karlkfi commented Jun 9, 2017

Alright, I see the DCOS_SERVICE_PORT_INDEX code already existed. I wish it used the port name, not the port index, but that's not a new problem.

karlkfi
karlkfi previously approved these changes Jun 9, 2017
@adam-mesos
Copy link
Contributor

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@SlevinBE
Copy link
Contributor Author

I've rebased to master as per mesosphere-mergebot's request

@adam-mesos
Copy link
Contributor

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump PR: mesosphere/dcos-enterprise/pull/1114

@jgehrcke
Copy link
Contributor

@SlevinBE please excuse the slow turnaround. Can you please squash your commits into a single one and add the prefix Admin Router: to the commit message?

@SlevinBE SlevinBE force-pushed the ip-per-container-dcos-service branch from 4b2ccaa to 438f5a9 Compare June 26, 2017 12:19
@SlevinBE
Copy link
Contributor Author

@jgehrcke done. Squashed them into one commit with the suggested prefix.

@vespian
Copy link
Contributor

vespian commented Jun 26, 2017

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/1114 updated.

@jgehrcke
Copy link
Contributor

@SlevinBE We're super sorry for the inconvenience, but we need to ask you to rebase one more time onto dcos/dcos' master branch. Otherwise we will not be able to get green light from the CI 🍏.

@SlevinBE SlevinBE force-pushed the ip-per-container-dcos-service branch from 438f5a9 to 287e3a0 Compare June 27, 2017 17:05
@SlevinBE
Copy link
Contributor Author

@jgehrcke done

@vespian
Copy link
Contributor

vespian commented Jun 28, 2017

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@vespian
Copy link
Contributor

vespian commented Jun 28, 2017

@SlevinBE Hi! I have not been able to get the Enterprise DC/OS PR pulling your changes to pass the CI, will keep investing it tomorrow. Thanks for your patience!

@SlevinBE
Copy link
Contributor Author

@vespian so a new rebase on master is not required as requested by the mergebot?

@vespian
Copy link
Contributor

vespian commented Jun 29, 2017

so a new rebase on master is not required as requested by the mergebot?

Not sure yet, I wanted to double check before asking you to do it again.

@vespian
Copy link
Contributor

vespian commented Jun 29, 2017

@SlevinBE After rebasing all went green. We have two options:

Please drop me a line which option suits you best. Thanks!

@SlevinBE
Copy link
Contributor Author

Let's use your test-PR, seems like the easiest solution.

@vespian
Copy link
Contributor

vespian commented Jun 30, 2017

@SlevinBE I got the Ship it! labels just a moment ago, your changes should be merged with the next train.

@jgehrcke jgehrcke mentioned this pull request Jun 30, 2017
@lingmann
Copy link
Contributor

lingmann commented Jul 1, 2017

Merged via #1699 Thanks for the PR @SlevinBE !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants