Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] Cluster maximums OCP workload #4

Closed
rsevilla87 opened this issue Jul 10, 2023 · 3 comments · May be fixed by #5
Closed

[RFE] Cluster maximums OCP workload #4

rsevilla87 opened this issue Jul 10, 2023 · 3 comments · May be fixed by #5
Assignees
Labels
enhancement New feature or request stale

Comments

@rsevilla87
Copy link
Member

Is your feature request related to a problem? Please describe.

Having a workload able to reproduce the documented cluster-maximums can be very useful to detect regressions of some components that are not that intensively used by the current workloads.

i.e.:

  • Benchmarking max number of CRDs: It has been proven that a high number of CRDs had a negative impact in the API performance. Both in API responsiveness and resource usage. We're not tracking this scenario at the moment
  • Max number of endpoints per service. In our current workloads, we're testing a high number of services, however we're not adding a high number of endpoints to them. This scenario is being currently tracked in upstream with kube-proxy implemented services, but we're not actually tracking it with OVNKubernetes

There more examples like the above. This new workload shouldn't be used as a rule of thumb to demonstrate the limits of a cluster, but as a new helper to detect and verify scenarios we're not currently tracking.

Describe the solution you'd like

The cluster-maximums workload should be self-contained, based on a multi-job benchmark. With this approach maintaining and updating will be easier.

I started coding this workload, a initial approach about how it would look like is in the following snippet:

# Would test 10k namespaces, 10k routes, 10k service, 20k pods and 30k network policies                                
  - name: max-namespaces                                                                                                                                      
    namespace: max-namespaces
    jobIterations: {{.NAMESPACES}}                                                                                     
    qps: {{.QPS}}                                     
    burst: {{.BURST}}                                                                                                  
    namespacedIterations: true                   
    waitWhenFinished: true                       
    preLoadImages: false                   # We don't need to preload since this job is reusing images previously used                                        
    jobPause: 2m                                                                                                                                              
    namespaceLabels:                   
      security.openshift.io/scc.podSecurityLabelSync: false                                                                                                   
      pod-security.kubernetes.io/enforce: privileged                                                                                                          
      pod-security.kubernetes.io/audit: privileged  
      pod-security.kubernetes.io/warn: privileged 
    objects:                                     
      - objectTemplate: deployment-server.yml              
        replicas: 1                                     
        inputVars: 
          podReplicas: 1                                                                                               
      - objectTemplate: deployment-client.yml
        replicas: 1                          
        inputVars:  
          podReplicas: 1                         
          ingressDomain: {{.INGRESS_DOMAIN}}            
      - objectTemplate: service.yml                        
        replicas: 1                                                                                                                     
      - objectTemplate: route.yml                                   
        replicas: 1                                                 
      - objectTemplate: np-deny-all.yml                             
        replicas: 1                                                 
      - objectTemplate: np-allow-from-clients.yml                   
        replicas: 1                                                 
      - objectTemplate: np-allow-from-ingress.yml                              
        replicas: 1                                                            
                                                                               
  - name: remove-max-namespaces                                                
    qps: 5                                                                     
    burst: 5                                                                   
    jobType: delete                                                            
    jobPause: 2m                                                               
    objects:                                                                   
      - kind: Namespace                                                                                                                                       
        labelSelector: {kube-burner-job: max-namespaces}                       

# 5k backends per service: Five times -> 5k server pods + 1 client pods + 1 route + 3 network policies                                                        
  - name: max-backends                                                         
    namespace: max-backends                                                    
    jobIterations: 5                                                           
    qps: {{.QPS}}                                                              
    burst: {{.BURST}}                                                          
    namespacedIterations: true                                                 
    waitWhenFinished: true                                                     
    preLoadImages: false             # We don't need to preload since this job is reusing images previously used                                              
    jobPause: 2m                                                               
    namespaceLabels:                                                           
      security.openshift.io/scc.podSecurityLabelSync: false                                                                                                   
      pod-security.kubernetes.io/enforce: privileged                           
      pod-security.kubernetes.io/audit: privileged                             
      pod-security.kubernetes.io/warn: privileged                              
    objects:                                                                   
      - objectTemplate: deployment-server.yml                                  
        replicas: 1                                                            
        inputVars:                                                             
          podReplicas: {{.BACKENDS}}                                           
      - objectTemplate: deployment-client.yml                                  
        replicas: 1                                                            
        inputVars:                                                             
          podReplicas: 1                                                       
          ingressDomain: {{.INGRESS_DOMAIN}}                                   
      - objectTemplate: service.yml                                            
        replicas: 1                                                            
      - objectTemplate: route.yml                                              
        replicas: 1                                                            
      - objectTemplate: np-deny-all.yml                                        
        replicas: 1                                                            
      - objectTemplate: np-allow-from-clients.yml                              
        replicas: 1                                                            
      - objectTemplate: np-allow-from-ingress.yml                              
        replicas: 1                                                            

  - name: remove-max-backends                                                  
    jobType: delete                                                            
    objects:                                                                   
      - kind: Namespace                                                        
        labelSelector: {kube-burner-job: max-backends}                      
@rsevilla87 rsevilla87 added the enhancement New feature or request label Jul 10, 2023
@rsevilla87 rsevilla87 self-assigned this Jul 10, 2023
@github-actions
Copy link

github-actions bot commented Oct 9, 2023

This issue has become stale and will be closed automatically within 7 days.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 17, 2023
@rsevilla87 rsevilla87 reopened this Nov 21, 2023
@qiliRedHat
Copy link

Bug https://issues.redhat.com/browse/MON-3394 is discovered in ROSA with large number of namespaces with big number secrets "there are a lot of secrets on the cluster: 24464". So I suggest we added at least 3 secrets per namespace to cover this. (3 secrets x 10k namespaces=30k secrets > 24464)
In the old max-namespaces workload, there are 10 secrets in each namespace: https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/kube-burner/workloads/max-namespaces/max-namespaces.yml#L95C12-L95C12

@rsevilla87 rsevilla87 transferred this issue from kube-burner/kube-burner Jan 23, 2024
Copy link

This issue has become stale and will be closed automatically within 7 days.

@github-actions github-actions bot added the stale label Apr 22, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
2 participants