Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: Remote Cluster #1

Open
wants to merge 74 commits into
base: main
Choose a base branch
from
Open

[WIP]: Remote Cluster #1

wants to merge 74 commits into from

Conversation

Shrinjay
Copy link
Owner

No description provided.

Comment on lines 20 to 21
kubeconfig_path = os.environ.get('EG_REMOTE_CLUSTER_KUBECONFIG_PATH', '/etc/kube/config')
context = os.environ.get('EG_REMOTE_CLUSTER_CONTEXT', None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would be consistent with the method used for env fetching. I don't think there is a big difference, but I think it is wise to pick one or another.

Suggested change
kubeconfig_path = os.environ.get('EG_REMOTE_CLUSTER_KUBECONFIG_PATH', '/etc/kube/config')
context = os.environ.get('EG_REMOTE_CLUSTER_CONTEXT', None)
kubeconfig_path = os.getenv('EG_REMOTE_CLUSTER_KUBECONFIG_PATH', '/etc/kube/config')
context = os.getenv('EG_REMOTE_CLUSTER_CONTEXT', None)

if os.getenv("KUBERNETES_SERVICE_HOST"):
# Running inside cluster
if os.getenv('EG_USE_REMOTE_CLUSTER') and get_remote_if_available:
kubeconfig_path = os.environ.get('EG_REMOTE_CLUSTER_KUBECONFIG_PATH', '/etc/kube/config')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you pick the /etc/kube/config default, or did this come from something existing?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked that, honestly by random, I realize I forgot to set this as the default in the helm chart as well, but the idea was for the path to be the same in the helm chart and in code

super().__init__()

def get_kubernetes_client(self, get_remote_if_available=True) -> client.ApiClient:
"""Get kubernetes api client with appropriate configuration"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what this does and also what the effect of get_remote_if_available is?

config.load_kube_config(client_configuration=kubernetes_config)

self.log.debug(
"Created kubernetes client for host {host}".format(host=kubernetes_config.host)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: f-strings for the win

Suggested change
"Created kubernetes client for host {host}".format(host=kubernetes_config.host)
f"Created kubernetes client for host {kubernetes_config.host}"

Comment on lines 324 to 326
service_account_list_in_namespace: client.V1ServiceAccountList = client\
.CoreV1Api(api_client=kubernetes_client)\
.list_namespaced_service_account(namespace=namespace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(More of a meta point than a specific set of lines)

Slash continuations in Python are pretty evil. Was this something you put in manually, or something the formatter did for you?

You might want to put these on one (really long) line, and then use black to find a solution that doesn't involve slashes. I believe the project uses black anyway, so it is a good idea to run that over your code from the getgo.

Copy link
Owner Author

@Shrinjay Shrinjay Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyCharm did it when I tried to chain the invocations over multiple lines like I usually do in js/ts, honestly wasn't aware slash continuations were a problem until I just googled it, so thanks for pointing this one out

Comment on lines 329 to 331
service_account_names_in_namespace: List[str] = list(map(lambda svcaccount: svcaccount.metadata.name, service_accounts_in_namespace))

if service_account_name not in service_account_names_in_namespace:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Super super good on you for using functional programming here. This is a senior dev move.

  2. We can make this an even more big-headed by using a generator comprehension. The list/map method is on the order of n in terms of memory complexity and roughly 2n in terms of time complexity (we build a list of length n which takes n time, then we scan the list which takes n time). The lambda also requires constructing a function context, which can be expensive. The generator method is on the order of o(1) in terms of memory complexity and takes only n in terms of time complexity (we just perform a single scan of the other list without actually constructing anything).

Suggested change
service_account_names_in_namespace: List[str] = list(map(lambda svcaccount: svcaccount.metadata.name, service_accounts_in_namespace))
if service_account_name not in service_account_names_in_namespace:
service_account_names_in_namespace = (svcaccount.metadata.name for svcaccount in service_accounts_in_namespace)
if service_account_name not in service_account_names_in_namespace:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, you are probably thinking "why do we care about performance, this is not that performance intensive?"

This has less to do with performance and more to do with code quality; we want the Juypter reviewers to think "damn, thats some slick code, they must know what they are doing."

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I love finding out about optimizations like this, I honestly don't know enough about Python for how much I use it, so thanks for teaching me about this 😄

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's list comprehension... darn I should've remembered about that, I keep defaulting to the list-map combo, probably because I use .map a lot in TS

Comment on lines 350 to 352
remote_cluster_role_names = list(map(lambda role: role.metadata.name, remote_cluster_roles.items))

if kernel_cluster_role not in remote_cluster_role_names:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above RE: generator comprehension


self.log.info(f"Created service account {service_account_name} in namespace {namespace}")

def _forward_role_to_remote(self) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts:

  1. I don't think we (Lacework) want to grant a ClusterRole to JEG. At most, I think we want to grant roles to a handful of namespaces.
  2. I don't think we should be copying a role from one place to another, this seems non-transparent.
  3. I would recommend adding a static rolebinding to a specific namespace.
  4. I would recommend adding an option to not create the service account, allowing the user to supply their own service account. This lets people customize as they choose.

kubernetes_config: client.Configuration = client.Configuration()
if os.getenv("KUBERNETES_SERVICE_HOST"):
# Running inside cluster
if os.getenv('EG_USE_REMOTE_CLUSTER') and get_remote_if_available:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this works as expected. This will evaluate to true if EG_USE_REMOTE_CLUSTER is set to anything (including EG_USE_REMOTE_CLUSTER=False).

We should confirm this has the same semantics as other env options defined in JEG. Do they look for True/False or just Set/Unset. If True/False, we should add a more rigorous check. If set/unset, we should use EG_USE_REMOTE_CLUSTER=1 in the helm chart to denote this.

@Shrinjay
Copy link
Owner Author

Shrinjay commented Jan 25, 2023 via email

@Shrinjay
Copy link
Owner Author

I realized I'm an idiot and configmaps exist... we can mount the local role resource definitions as a configmap and then create it in the remote rather than copying the role.

pre-commit-ci bot and others added 30 commits March 6, 2023 16:20
updates:
- [github.com/charliermarsh/ruff-pre-commit: v0.0.253 → v0.0.254](astral-sh/ruff-pre-commit@v0.0.253...v0.0.254)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Set restart strategy to never for spark operator
* Disable eviction for driver pod
* Disable eviction on Kubernetes python kernels as well
* Bump ruff from 0.0.253 to 0.0.254

Bumps [ruff](https://github.com/charliermarsh/ruff) from 0.0.253 to 0.0.254.
- [Release notes](https://github.com/charliermarsh/ruff/releases)
- [Changelog](https://github.com/charliermarsh/ruff/blob/main/BREAKING_CHANGES.md)
- [Commits](astral-sh/ruff@v0.0.253...v0.0.254)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update ruff version in pre-commit config

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
…ver#1271)

* Properly report failures on Spark Operator kernels
* Update helper methods to get initial and error states
* Try to use detect_launch_failure as an option to identify failed CRD submission
* Fix small typos on lower conversion
* Update logic to better handle kernel connection parameters
* Fix restart issues
* Remove detect_launch_failure unnecessary code
* Remove debug code
* Simplify get_container_status signature
* Remove lower after the signature change
* Move get_initial_states to CustomResourceProcessProxy
* Conditionally display get_container_status debug message
* Move get_container_status to CRD process proxy
* Update get_initial_states function doc to emphasize lowercase contract
* Reword to remove spark operator references
* Add log updates
* Remove obsolete code
* Add more fields on the log statement
* Update log logic
* More logging fix-ups

---------

Co-authored-by: Kevin Bates <kbates4@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Bump ruff from 0.0.254 to 0.0.255

Bumps [ruff](https://github.com/charliermarsh/ruff) from 0.0.254 to 0.0.255.
- [Release notes](https://github.com/charliermarsh/ruff/releases)
- [Changelog](https://github.com/charliermarsh/ruff/blob/main/BREAKING_CHANGES.md)
- [Commits](astral-sh/ruff@v0.0.254...v0.0.255)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Additional changes necessary for ruff-0.0.255

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
)

* GatewayClient changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Initialize variables before websocket.create_connection in KernelClient.__init__()

* Refactor fail on WS connection

* Avoid need for attribute checks

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
* Bump ruff from 0.0.255 to 0.0.257

Bumps [ruff](https://github.com/charliermarsh/ruff) from 0.0.255 to 0.0.257.
- [Release notes](https://github.com/charliermarsh/ruff/releases)
- [Changelog](https://github.com/charliermarsh/ruff/blob/main/BREAKING_CHANGES.md)
- [Commits](astral-sh/ruff@v0.0.255...v0.0.257)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Address changes for ruff upgrade

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
* Bump ruff from 0.0.257 to 0.0.259

Bumps [ruff](https://github.com/charliermarsh/ruff) from 0.0.257 to 0.0.259.
- [Release notes](https://github.com/charliermarsh/ruff/releases)
- [Changelog](https://github.com/charliermarsh/ruff/blob/main/BREAKING_CHANGES.md)
- [Commits](astral-sh/ruff@v0.0.257...v0.0.259)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Ignore S311 - only used for port selection within range and suffix

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
* Bump ruff from 0.0.259 to 0.0.260

Bumps [ruff](https://github.com/charliermarsh/ruff) from 0.0.259 to 0.0.260.
- [Release notes](https://github.com/charliermarsh/ruff/releases)
- [Changelog](https://github.com/charliermarsh/ruff/blob/main/BREAKING_CHANGES.md)
- [Commits](astral-sh/ruff@v0.0.259...v0.0.260)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update ruff version in pre-commit config

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
* Bump black[jupyter] from 23.1.0 to 23.3.0

Bumps [black[jupyter]](https://github.com/psf/black) from 23.1.0 to 23.3.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](psf/black@23.1.0...23.3.0)

---
updated-dependencies:
- dependency-name: black[jupyter]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update black version in pre-commit config

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
updates:
- [github.com/python-jsonschema/check-jsonschema: 0.21.0 → 0.22.0](python-jsonschema/check-jsonschema@0.21.0...0.22.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Bump ruff from 0.0.260 to 0.0.261

Bumps [ruff](https://github.com/charliermarsh/ruff) from 0.0.260 to 0.0.261.
- [Release notes](https://github.com/charliermarsh/ruff/releases)
- [Changelog](https://github.com/charliermarsh/ruff/blob/main/BREAKING_CHANGES.md)
- [Commits](astral-sh/ruff@v0.0.260...v0.0.261)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update ruff version in pre-commit config

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
Co-authored-by: Kevin Bates <kbates4@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants