Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ import (
"github.com/cobaltcore-dev/cortex/internal/scheduling/pods"
"github.com/cobaltcore-dev/cortex/internal/scheduling/reservations"
"github.com/cobaltcore-dev/cortex/internal/scheduling/reservations/commitments"
reservationscontroller "github.com/cobaltcore-dev/cortex/internal/scheduling/reservations/controller"
"github.com/cobaltcore-dev/cortex/internal/scheduling/reservations/failover"
"github.com/cobaltcore-dev/cortex/pkg/conf"
"github.com/cobaltcore-dev/cortex/pkg/monitoring"
Expand Down Expand Up @@ -489,16 +488,16 @@ func main() {
}
if slices.Contains(mainConfig.EnabledControllers, "reservations-controller") {
setupLog.Info("enabling controller", "controller", "reservations-controller")
monitor := reservationscontroller.NewControllerMonitor(multiclusterClient)
monitor := reservations.NewMonitor(multiclusterClient)
metrics.Registry.MustRegister(&monitor)
reservationsControllerConfig := conf.GetConfigOrDie[reservationscontroller.Config]()
commitmentsConfig := conf.GetConfigOrDie[commitments.Config]()

if err := (&reservationscontroller.ReservationReconciler{
if err := (&commitments.CommitmentReservationController{
Client: multiclusterClient,
Scheme: mgr.GetScheme(),
Conf: reservationsControllerConfig,
Conf: commitmentsConfig,
}).SetupWithManager(mgr, multiclusterClient); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "Reservation")
setupLog.Error(err, "unable to create controller", "controller", "CommitmentReservation")
os.Exit(1)
}
}
Expand Down
253 changes: 253 additions & 0 deletions helm/bundles/cortex-nova/templates/pipelines_kvm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -664,4 +664,257 @@ spec:
requested by the nova flavor extra specs, like `{"arch": "x86_64",
"maxphysaddr:bits": 46, ...}`.
weighers: []
---
apiVersion: cortex.cloud/v1alpha1
kind: Pipeline
metadata:
name: kvm-committed-resource-reservation-general-purpose
spec:
schedulingDomain: nova
description: |
This pipeline is used for placing committed resource (CR) reservations
for general purpose workloads. It uses the same filtering
as regular VM scheduling to ensure proper placement, considering all
existing VMs, capacity constraints, traits, and other requirements.

Key difference from regular VM scheduling: reserved capacity is kept
locked (lockReserved: true) to prevent CR reservations from overlapping
with each other, even for the same project.

This is the pipeline used for KVM hypervisors (qemu and cloud-hypervisor).
Specifically, this pipeline uses load balancing for general purpose workloads.
type: filter-weigher
createDecisions: true
# Fetch all placement candidates, ignoring nova's preselection.
ignorePreselection: true
filters:
- name: filter_host_instructions
description: |
This step will consider the `ignore_hosts` and `force_hosts` instructions
from the nova scheduler request spec to filter out or exclusively allow
certain hosts.
- name: filter_has_enough_capacity
description: |
This step will filter out hosts that do not have enough available capacity
to host the requested flavor. Reserved space is kept locked to avoid
CR reservations overlapping.
params:
- {key: lockReserved, boolValue: true}
- name: filter_has_requested_traits
description: |
This step filters hosts that do not have the requested traits given by the
nova flavor extra spec: "trait:<trait>": "forbidden" means the host must
not have the specified trait. "trait:<trait>": "required" means the host
must have the specified trait.
- name: filter_has_accelerators
description: |
This step will filter out hosts without the trait `COMPUTE_ACCELERATORS` if
the nova flavor extra specs request accelerators via "accel:device_profile".
- name: filter_correct_az
description: |
This step will filter out hosts whose aggregate information indicates they
are not placed in the requested availability zone.
- name: filter_status_conditions
description: |
This step will filter out hosts for which the hypervisor status conditions
do not meet the expected values, for example, that the hypervisor is ready
and not disabled.
- name: filter_external_customer
description: |
This step prefix-matches the domain name for external customer domains and
filters out hosts that are not intended for external customers. It considers
the `CUSTOM_EXTERNAL_CUSTOMER_SUPPORTED` trait on hosts as well as the
`domain_name` scheduler hint from the nova request spec.
params:
- {key: domainNamePrefixes, stringListValue: ["iaas-"]}
- name: filter_allowed_projects
description: |
This step filters hosts based on allowed projects defined in the
hypervisor resource. Note that hosts allowing all projects are still
accessible and will not be filtered out. In this way some hypervisors
are made accessible to some projects only.
- name: filter_capabilities
description: |
This step will filter out hosts that do not meet the compute capabilities
requested by the nova flavor extra specs, like `{"arch": "x86_64",
"maxphysaddr:bits": 46, ...}`.

Note: currently, advanced boolean/numeric operators for the capabilities
like `>`, `!`, ... are not supported because they are not used by any of our
flavors in production.
- name: filter_instance_group_affinity
description: |
This step selects hosts in the instance group specified in the nova
scheduler request spec.
- name: filter_instance_group_anti_affinity
description: |
This step selects hosts not in the instance group specified in the nova
scheduler request spec, but only until the max_server_per_host limit is
reached (default = 1).
- name: filter_live_migratable
description: |
This step ensures that the target host of a live migration can accept
the migrating VM, by checking cpu architecture, cpu features, emulated
devices, and cpu modes.
- name: filter_requested_destination
params: {{ .Values.kvm.filterRequestedDestinationParams | toYaml | nindent 8 }}
description: |
This step filters hosts based on the `requested_destination` instruction
from the nova scheduler request spec. It supports filtering by host and
by aggregates.
weighers:
- name: kvm_prefer_smaller_hosts
params:
- {key: resourceWeights, floatMapValue: {"memory": 1.0}}
description: |
This step pulls virtual machines onto smaller hosts (by capacity). This
ensures that larger hosts are not overly fragmented with small VMs,
and can still accommodate larger VMs when they need to be scheduled.
- name: kvm_instance_group_soft_affinity
description: |
This weigher implements the "soft affinity" and "soft anti-affinity" policy
for instance groups in nova.

It assigns a weight to each host based on how many instances of the same
instance group are already running on that host. The more instances of the
same group on a host, the lower (for soft-anti-affinity) or higher
(for soft-affinity) the weight, which makes it less likely or more likely,
respectively, for the scheduler to choose that host for new instances of
the same group.
- name: kvm_binpack
multiplier: -1.0 # inverted = balancing
params:
- {key: resourceWeights, floatMapValue: {"memory": 1.0}}
description: |
This step implements a balancing weigher for workloads on kvm hypervisors,
which is the opposite of binpacking. Instead of pulling the requested vm
into the smallest gaps possible, it spreads the load to ensure
workloads are balanced across hosts. In this pipeline, the balancing will
focus on general purpose virtual machines.
---
apiVersion: cortex.cloud/v1alpha1
kind: Pipeline
metadata:
name: kvm-committed-resource-reservation-hana
spec:
schedulingDomain: nova
description: |
This pipeline is used for placing committed resource (CR) reservations
for HANA workloads. It uses the same comprehensive filtering as regular
VM scheduling to ensure proper placement, considering all existing VMs,
capacity constraints, traits, and other requirements.

Key difference from regular VM scheduling: reserved capacity is kept
locked (lockReserved: true) to prevent CR reservations from overlapping
with each other, even for the same project.

This is the pipeline used for KVM hypervisors (qemu and cloud-hypervisor).
Specifically, this pipeline uses bin-packing for HANA workloads to consolidate
them on fewer hosts, leaving larger hosts available for future large VMs.
type: filter-weigher
createDecisions: true
# Fetch all placement candidates, ignoring nova's preselection.
ignorePreselection: true
filters:
- name: filter_host_instructions
description: |
This step will consider the `ignore_hosts` and `force_hosts` instructions
from the nova scheduler request spec to filter out or exclusively allow
certain hosts.
- name: filter_has_enough_capacity
description: |
This step will filter out hosts that do not have enough available capacity
to host the requested flavor. Reserved space is kept locked to avoid
CR reservations overlapping.
params:
- {key: lockReserved, boolValue: true}
- name: filter_has_requested_traits
description: |
This step filters hosts that do not have the requested traits given by the
nova flavor extra spec: "trait:<trait>": "forbidden" means the host must
not have the specified trait. "trait:<trait>": "required" means the host
must have the specified trait.
- name: filter_has_accelerators
description: |
This step will filter out hosts without the trait `COMPUTE_ACCELERATORS` if
the nova flavor extra specs request accelerators via "accel:device_profile".
- name: filter_correct_az
description: |
This step will filter out hosts whose aggregate information indicates they
are not placed in the requested availability zone.
- name: filter_status_conditions
description: |
This step will filter out hosts for which the hypervisor status conditions
do not meet the expected values, for example, that the hypervisor is ready
and not disabled.
- name: filter_external_customer
description: |
This step prefix-matches the domain name for external customer domains and
filters out hosts that are not intended for external customers. It considers
the `CUSTOM_EXTERNAL_CUSTOMER_SUPPORTED` trait on hosts as well as the
`domain_name` scheduler hint from the nova request spec.
params:
- {key: domainNamePrefixes, stringListValue: ["iaas-"]}
- name: filter_allowed_projects
description: |
This step filters hosts based on allowed projects defined in the
hypervisor resource. Note that hosts allowing all projects are still
accessible and will not be filtered out. In this way some hypervisors
are made accessible to some projects only.
- name: filter_capabilities
description: |
This step will filter out hosts that do not meet the compute capabilities
requested by the nova flavor extra specs, like `{"arch": "x86_64",
"maxphysaddr:bits": 46, ...}`.

Note: currently, advanced boolean/numeric operators for the capabilities
like `>`, `!`, ... are not supported because they are not used by any of our
flavors in production.
- name: filter_instance_group_affinity
description: |
This step selects hosts in the instance group specified in the nova
scheduler request spec.
- name: filter_instance_group_anti_affinity
description: |
This step selects hosts not in the instance group specified in the nova
scheduler request spec, but only until the max_server_per_host limit is
reached (default = 1).
- name: filter_live_migratable
description: |
This step ensures that the target host of a live migration can accept
the migrating VM, by checking cpu architecture, cpu features, emulated
devices, and cpu modes.
- name: filter_requested_destination
params: {{ .Values.kvm.filterRequestedDestinationParams | toYaml | nindent 8 }}
description: |
This step filters hosts based on the `requested_destination` instruction
from the nova scheduler request spec. It supports filtering by host and
by aggregates.
weighers:
- name: kvm_prefer_smaller_hosts
params:
- {key: resourceWeights, floatMapValue: {"memory": 1.0}}
description: |
This step pulls virtual machines onto smaller hosts (by capacity). This
ensures that larger hosts are not overly fragmented with small VMs,
and can still accommodate larger VMs when they need to be scheduled.
- name: kvm_instance_group_soft_affinity
description: |
This weigher implements the "soft affinity" and "soft anti-affinity" policy
for instance groups in nova.

It assigns a weight to each host based on how many instances of the same
instance group are already running on that host. The more instances of the
same group on a host, the lower (for soft-anti-affinity) or higher
(for soft-affinity) the weight, which makes it less likely or more likely,
respectively, for the scheduler to choose that host for new instances of
the same group.
- name: kvm_binpack
params:
- {key: resourceWeights, floatMapValue: {"memory": 1.0}}
description: |
This step implements a binpacking weigher for workloads on kvm hypervisors.
It pulls the requested vm into the smallest gaps possible, to ensure
other hosts with less allocation stay free for bigger vms.
In this pipeline, the binpacking will focus on hana virtual machines.
{{- end }}
11 changes: 8 additions & 3 deletions helm/bundles/cortex-nova/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,14 @@ cortex-scheduling-controllers:
- reservations-controller
enabledTasks:
- nova-decisions-cleanup-task
# Endpoints configuration for reservations controller
endpoints:
novaExternalScheduler: "http://localhost:8080/scheduler/nova/external"
# NovaExternalScheduler is the URL of the nova external scheduler API for CR reservations
novaExternalScheduler: "http://localhost:8080/scheduler/nova/external"
# FlavorGroupPipelines maps flavor group IDs to pipeline names for CR reservations
# This allows different scheduling strategies per flavor group (e.g., HANA vs GP)
flavorGroupPipelines:
"2152": "kvm-committed-resource-reservation-hana" # HANA flavor group
"2101": "kvm-committed-resource-reservation-general-purpose" # General Purpose flavor group
"*": "kvm-committed-resource-reservation-general-purpose" # Catch-all fallback
# OvercommitMappings is a list of mappings that map hypervisor traits to
# overcommit ratios. Note that this list is applied in order, so if there
# are multiple mappings applying to the same hypervisors, the last mapping
Expand Down
Loading
Loading