Skip to content

PerfKitBenchmarker Configurations

ehankland edited this page Oct 7, 2016 · 2 revisions

Overview

A configuration describes the resources necessary to run a benchmark as well as the options that should be used while running it. Benchmarks have default configurations which determine how they will run if users don't specify any options. Users can override the default configuration settings by providing a configuration file or by using command line flags. In addition, configurations provide a means to run with static (i.e. pre-provisioned) machines and to run multiple copies of the same benchmark in a single PKB invocation.

Structure of a Configuration File

Configuration files are written in YAML, and are for the most part just nested dictionaries. At each level of the configuration, there are a set of keys which are allowed:

  • Valid top level keys:
    • flags: A YAML dictionary with overrides for default flag values. These are applied to all configs. Config level flags take precedence over top level flags.
    • benchmarks: A YAML array of dictionaries mapping benchmark names to their configs. This also determines which benchmarks to run.
    • *any_benchmark_name*: If the benchmarks key is not specified, then specifying a benchmark name mapped to a config will override that benchmark's default configuration in the event that that benchmark is run.
    • Any keys not listed above are allowed, but will not affect PKB unless they are mapped to configs with the name key. This allows for named benchmark configs - see below for an example.
  • Valid config keys:
    • vm_groups: A YAML dictionary mapping the names of VM groups to the groups themselves. These names can be any string.
    • description: A description of the benchmark.
    • name: The name of the benchmark.
    • flags: A YAML dictionary with overrides for default flag values.
    • flag_matrix: The name of the flag matrix to run with.
    • flag_matrix_defs: A YAML dictionary mapping names to flag matrices. Each flag matrix is itself a dictionary mapping flag names to lists of values. See the flag matrix section below for more information.
    • flag_matrix_filters: A YAML dictionary mapping names to filters. Each filter is a string that when evaluated in python either returns true or false depending on the values of the flag settings. See the flag matrix section below for more information.
  • Valid VM group keys:
    • vm_spec: A YAML dictionary mapping names of clouds (e.g. AWS) to the actual VM spec.
    • disk_spec: A YAML dictionary mapping names of clouds to the actual disk spec.
    • vm_count: The number of VMs to create in this group. If this key isn't specified, it defaults to 1.
    • disk_count: The number of disks to attach to VMs of this group. If this key isn't specified, it defaults to 1.
    • cloud: The name of the cloud to create the group in. This is used for multi-cloud configurations.
    • os_type: The OS type of the VMs to create (see the flag of the same name for more information). This is used if you want to run a benchmark using VMs with different OS types (e.g. Debian and RHEL).
    • static_vms: A YAML array of Static VM specs. These VMs will be used before any Cloud VMs are created. The total number of VMs will still add up to the number specified by the vm_count key.
  • For valid VM spec keys, see virtual_machine.BaseVmSpec and derived classes.
  • For valid disk spec keys, see disk.BaseDiskSpec and derived classes.

Specifying Configurations and Precedence Rules

The most basic way to specify a configuration is to use the --benchmark_config_file command line flag. Anything specified in the file will override the default configuration. Here is an example showing how to change the number of VMs created in the cluster_boot benchmark:

./pkb.py --benchmark_config_file=cluster_boot.yml --benchmarks=cluster_boot

[cluster_boot.yml]

cluster_boot: 
  vm_groups:
    default:
      vm_count: 10

A second flag, --config_override will directly override the config file. It can be specified multiple times. Since it overrides the config file, any settings supplied via this flag have a higher priority than those supplied via the --benchmark_config_file flag. Here is an example performing the same change to the default cluster_boot configuration as above:

./pkb.py --config_override="cluster_boot.vm_groups.default.vm_count=10" --benchmarks=cluster_boot

Finally, any other flags which the user specifies on the command line have the highest priority. For example, specifying the --machine_type flag will cause all VM groups to use that machine type, regardless of any other settings.

Result Metadata

Result metadata has been slightly modified as part of the configuration change (unless the VM group's name is 'default', in which case there is no change). The metadata created by the DefaultMetadataProvider is now prefixed by the VM group name. For example, if a VM group's name is 'workers', then all samples will contain workers_machine_type, workers_cloud, and workers_zone metadata. This change was made to enable benchmarks with heterogeneous VM groups.

Named Configs

Named configs allow multiple configs for the same benchmark to be present within the same config file.

cluster_boot_5:
  name: cluster_boot
  vm_groups:
    default:
      vm_count: 5
cluster_boot_10:
  name: cluster_boot
  vm_groups:
    default:
      vm_count: 10

Flag Matrices

Flag matrices allow configs to express multiple settings for any number of flags. PKB will run once for each combination of flag settings. If a filter for a flag matrix is present, then PKB will only run with a particular combination if the filter evaluates to true.

Here's a cross zone netperf config:

netperf:
  flag_matrix: cross_zone
  flag_matrix_filters:
    cross_zone: "zones < extra_zones"
  flag_matrix_defs:
    cross_zone:
      zones: [us-central1-a, europe-west1-d, asia-east1-c]
      extra_zones: [us-central1-a, europe-west1-d, asia-east1-c]
      machine_type: [n1-standard-2, n1-standard-8]

This would run netperf once for each pair of zones (no self pairs) for each of the machine types specified.

General Examples

Cross-Cloud netperf

netperf:
  vm_groups:
    vm_1:
      cloud: AWS
    vm_2:
      cloud: GCP

Multiple iperf runs

Run iperf under the default configuration, once with a single client thread, once with 8:

benchmarks:
  - iperf:
      flags:
        iperf_sending_thread_count: 1
  - iperf:
      flags:
        iperf_sending_thread_count: 8

fio with Static VMs

Testing against a mounted filesystem (under /scratch) for vm1, and against the disk directly (/dev/sdb) for vm2.

my_static_vms:  # Any key is accepted here.
  - &vm1
    user_name: perfkit
    ssh_private_key: /absolute/path/to/key
    ip_address: 1.1.1.1
    disk_specs:
      - mount_point: /scratch
  - &vm2
    user_name: perfkit
    ssh_private_key: /absolute/path/to/key
    ip_address: 2.2.2.2
    disk_specs:
      - device_path: /dev/sdb

benchmarks:
  - fio: {vm_groups: {default: {static_vms: [*vm1]}},
          flags: {against_device: False}}
  - fio: {vm_groups: {default: {static_vms: [*vm2]}},
          flags: {against_device: True}}

Cross region iperf

iperf:
  vm_groups:
    vm_1:
      cloud: GCP
      vm_spec:
        GCP: 
          zone: us-central1-b
    vm_2:
      cloud: GCP
      vm_spec:
        GCP: 
          zone: europe-west1-d

fio using Local SSDs

fio:
  vm_groups:
    default:
      cloud: AWS
      vm_spec:
        AWS:
          machine_type: i2.2xlarge
      disk_spec:
        AWS:
          disk_type: local
          num_striped_disks: 2