Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm chart: Add label to be allowed direct network access to the jupyterhub pod #352

Merged
merged 1 commit into from
Dec 2, 2020

Conversation

consideRatio
Copy link
Collaborator

@consideRatio consideRatio commented Nov 19, 2020

This PR adds a label that grants the dask-gateway api-server access to speak with the hub pod in the JupyterHub Helm chart which is required if the JupyterHub Helm chart's network policy are enforced, as they are by default in the latest version. This is confirmed to resolve the issue identified in dask/helm-chart#142.

The coupling that exists between the Helm charts come from configuring to use jupyterhub auth.

@consideRatio consideRatio changed the title Add label to be allowed direct access to the jupyterhub pod helm chart: Add label to be allowed direct network access to the jupyterhub pod Nov 19, 2020
@yuvipanda
Copy link
Contributor

This LGTM!

@jcrist
Copy link
Member

jcrist commented Dec 1, 2020

This fix seems fine to me, and I don't think there should be any downside to adding that label even if jupyterhub isn't used. But now I'm wondering - should we be doing the jupyterhub auth requests through the proxy -> hub instead of by directly contacting the hub pod?

@consideRatio
Copy link
Collaborator Author

@jcrist its a sound idea to go through the proxy, but sadly, the proxy isn't reached by a single k8s service independent by the z2jh chart configuration, it can be either proxy-http (if automatic TLS acquisition is enabled), or through proxy-public (if automatic TLS acquisition isn't enabled - no autohttps pod).

Also, the z2jh Helm chart may involve a release name within its k8s service names as well, which makes the automatic detection capable to fail.

My suggestion is to expose a configurable url, and leave it to the user to declare where the hub is. That would also allow dask-gateway to integrate with a jupyterhub outside the namespace for example.

@jcrist
Copy link
Member

jcrist commented Dec 2, 2020

My suggestion is to expose a configurable url, and leave it to the user to declare where the hub is.

We already support that as gateway.auth.jupyterhub.apiUrl. If not configured, we try to infer, assuming it's running in the same namespace.

Given that, is the fix above the best path forward? Makes automatic-support work with the recent jhub release, and there's already a manual config option for those that need it.

@consideRatio
Copy link
Collaborator Author

consideRatio commented Dec 2, 2020

@jcrist ah nice then I think this is fine!

In a way it would be nice to go through the proxy the fallback is to use apiUrl to set whatever URL that will work, but... if you do that, you need to look for proxy-http service and fallback to proxy-public service currently as sadly there isn't always a path to the HTTP entry, and you cannot use the HTTPS entrypoint as that requires sending traffic to the public domain name rather than https:/// because the HTTPS certs are not provided for https:// but for https://public-domain-name.

DaskHub which integrates z2jh with dask-gateway have python logic to check for proxy-http and falls back to proxy-public. I think the solution is to always have proxy-http around instead, buts its an issue for z2jh to fix rather than in dask-gateway.

@jcrist
Copy link
Member

jcrist commented Dec 2, 2020

Sounds good, merging then. Thanks!

@jcrist jcrist merged commit 7d89524 into dask:master Dec 2, 2020
@yuvipanda
Copy link
Contributor

Do you think it's possible to either make a 0.9.1 release of the helm chart with this change, or publish helm charts for each commit with chartpress as we do for z2jh? Currently blocked on this to setting up new JupyterHubs, since in many places you can't actually turn off network policies.

@jcrist
Copy link
Member

jcrist commented Dec 8, 2020

Sure. I'm a bit swamped today, but can hopefully kick off a release tonight or sometime tomorrow. It'd be good to get this automated (see #296), but first we need to transition off of travis ci.

@yuvipanda
Copy link
Contributor

Thanks, @jcrist!

@bolliger32
Copy link

Thanks for taking a look at this all! FWIW, we ran into this issue and were using the following adjustments to the jupyterhub chart to allow for communication with Dask-gateway resources.

  hub:
    networkPolicy:
      ingress:
        - ports:
          - port: http
          - port: https
          from:
            - podSelector:
                matchLabels:
                  gateway.dask.org/instance: "[clustername]-dask-gateway"
  # this allows dask-gateway pods to talk to the proxy and autohttps pods
  proxy:
    chp:
      networkPolicy:
        ingress:
          - ports:
            - port: http
            - port: https
            from:
              - podSelector:
                  matchLabels:
                    gateway.dask.org/instance: "[clustername]-dask-gateway"
    traefik:
      networkPolicy:
        ingress:
          - ports:
            - port: http
            - port: https
            from:
              - podSelector:
                  matchLabels:
                    gateway.dask.org/instance: "[clustername]-dask-gateway"

But this was just hacked together based on what I could understand from z2jh docs and I think it makes more sense to add labels to Dask-gateway pods (like proposed here) rather than allow more labels to be selected on the jupyterhub end. I also am not sure if adding this label selector to all three jupyterhub pods was overkill or not, but it does seem to work as a temporary situation for new jupyterhubs if needed before the next Dask-gateway release.

@yuvipanda
Copy link
Contributor

Thanks for the detailed config, @bolliger32. Since I restrict outbound network access on my singleuser pods to only 80, 443, and 53, I needed the following extra config:

    singleuser:
      extraLabels:
         hub.jupyter.org/network-access-proxy-http: "true"

      networkPolicy:
        # Needed for talking to the proxy pod
        egress:
          - ports:
              - port: 8000
                protocol: TCP
          - ports:
              - port: 80
                protocol: TCP
          - ports:
              - port: 443
                protocol: TCP

Thanks to @consideRatio for helping me debug this :)

@jcrist do you think you'll be able to do a release anytime soon? :D

@bolliger32
Copy link

bolliger32 commented Dec 11, 2020

ah - interesting. thanks @yuvipanda ! Is it easy enough to explain why the singleuser pods need to talk to the proxy? I'm asking b/c we are seeing some issues where blocked network traffic seems to kill gateway clusters. We get a bunch of errors in the traefik pod logs that look like: "Cannot create service: subset not found" providerName=kubernetescrd ingress=dask-67061cf424d54489b3e09e449557191f namespace=[clustername] serviceName=dask-67061cf424d54489b3e09e449557191f servicePort=8786. I'm wondering if not having something like this snippet may be part of the problem. It pops up in some cases (not all), and typically only when trying to pass a result back to the singleuser pod via .compute()

@yuvipanda
Copy link
Contributor

In a hub setup like this, your desk gateway client needs to talk to the dask-gateway hub service via the proxy, instead of directly. This is why the singleuser server needs to talk to the proxy. It fails when trying to first create a cluster, not later on. So maybe your issue is different?

@bolliger32
Copy link

You're right! it was indeed a totally different issue: #363

@AndreaGiardini
Copy link
Contributor

@jcrist Can we please have a new release of dask-gateway (software + helm chart)?

This issue and #353 are quite nice-to-have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants