Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COST-1913] Keda implementation #471

Merged
merged 2 commits into from Nov 8, 2021

Conversation

maskarb
Copy link
Member

@maskarb maskarb commented Oct 15, 2021

This PR is an idea for a KEDA implementation. This takes a different approach compared to #105. The CRD here adds an optional autoScaler field to each deployment. This field matches the KEDA ScaledObjectSpec, except for the MinReplicaCount (this value is left out of the spec in favor of the deployment minReplicas).

An example Clowdapp. This definition will create a standard HPA based on CPU utilization for the hello-app deployment:

apiVersion: cloud.redhat.com/v1alpha1
kind: ClowdApp
metadata:
  name: hello
  namespace: default
spec:
  # The name of the ClowdEnvironment providing the services
  envName: env-default

  # The bulk of your App. This is where your running apps will live
  deployments:
  - name: app
    # Give details about your running pod
    podSpec:
      image: quay.io/psav/clowder-hello
    autoScaler:
      maxReplicaCount: 10
      triggers:
      - type: cpu
        metadata:
          type: Utilization
          value: "50"
    # Creates a Service on port 8000
    webServices:
      public:
        enabled: true
      metrics:
        enabled: true

  # Request kafka topics for your application here
  kafkaTopics:
    - replicas: 3
      partitions: 64
      topicName: topicOne

  # Creates a database if local mode, or uses RDS in production
  database:
    # Must specify both a name and a major postgres version
    name: jumpstart-db
    version: 12

TODO:

  • build out tests
  • add docs
  • demo

@maskarb maskarb marked this pull request as draft October 15, 2021 19:17
Copy link
Collaborator

@psav psav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early look, looks good - just one small change at this point

apiVersion: v1
kind: Namespace
metadata:
name: test-basic-app
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here should match the name of the test, so in this case test-autoscaler

@psav psav added enhancement New feature or request pr-architectural-change Required architect sign-off labels Oct 26, 2021
@kylape
Copy link
Member

kylape commented Oct 27, 2021

With #481, you should be able to refer to env.status.prometheus.hostname to fill in the prometheus hostname for keda scale triggers. Once that's done, I think we can get this merged.

@psav
Copy link
Collaborator

psav commented Oct 29, 2021

With #481, you should be able to refer to env.status.prometheus.hostname to fill in the prometheus hostname for keda scale triggers. Once that's done, I think we can get this merged.

Ideally we'd like to have some tests around this, just ensuring that it puts out the right resource, and secondly we need to run make generate && make manifests && make build-template && make api-docs

@maskarb maskarb force-pushed the keda-implementation branch 2 times, most recently from 2e30155 to b2bf73d Compare November 5, 2021 16:44
@maskarb
Copy link
Member Author

maskarb commented Nov 5, 2021

@kylape and @psav I made a few updates

  1. Pulled the prometheus hostname from the env.
  2. I've added various kuttl tests.
  3. I've run the various make commands.
  4. Removed the IdleReplicaCount because Clowder will always try to maintain the minReplicaCount.

With these updates, I think it is ready to go.

@maskarb maskarb marked this pull request as ready for review November 5, 2021 16:54
@maskarb maskarb changed the title wip: [COST-1913] Keda implementation [COST-1913] Keda implementation Nov 5, 2021
@kylape
Copy link
Member

kylape commented Nov 5, 2021

Gah, looks like there's some conflicts with #486. Also, I'm not sure if you GPG signed your last commit, but our PR check requires that.

@maskarb
Copy link
Member Author

maskarb commented Nov 5, 2021

Ok, I'll resolve the conflicts and sign the next commit.

@maskarb
Copy link
Member Author

maskarb commented Nov 6, 2021

I looked into the failing test (test-disabled). I think I understand what is happening. It may be considered a regression, but I'm not sure.

For a Deployment, the default value for replicas is 1. So, if you don't set replicas, you'll get 1 replica.

For the failing test, we create the ClowdApp, set the ClowdApp disabled to true, set the Deployment replicas to 0, then set the ClowdApp disabled to false. The test is expecting reconciliation to cause the replica count to return to 1.

This worked previously because we were always setting the deployment replica count to the specified minReplicas in the clowdapp spec. For this test though, that value is not set. So this test was previously unsetting the 0 replicas as the test defines for the deployment, and then the deployment falls back to the default replica count of 1.

This behavior is implicit and I think it "just happened to work". I think logically it makes sense to specify a minReplica count for this test. However, if this test is demonstrating the expected behavior of clowder, then I'll need to update the deployment logic to reset a 0 replicas to the default of 1.

@kylape
Copy link
Member

kylape commented Nov 8, 2021

I think we should add minReplicas: 1 to the failing kuttl tests. I'm not aware of an intentional design around this specific edge case.

@kylape kylape merged commit a68dc76 into RedHatInsights:master Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pr-architectural-change Required architect sign-off
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants