# Aggregation

## 场景一

### (1) 描述

场景一中，所有实验按固定的模式运行，包含干扰循环和负载循环
1. 初始化 client 和 server(vm)
2. 启动 background 干扰，当前支持的干扰有: cpu/cache/mem/io/net
3. 启动 workload， 当前支持的workload有: redis/nginx/mysql
4. workload 结束后，记录一份 `workload_info`, 包含时间戳, 负载、干扰强度等metadata
5. 回到步骤3，执行下一个workload，当所有workload都执行完毕后，记录一份 `info_per_epoch`, 包含所有已经执行的 `workload_info`, 以及此次循环的干扰强度信息
6. 回到步骤2，直到所有的强度的干扰都执行完毕
7. 结束实验，记录实验的启动、结束时间，总共消耗的时间，以及干扰循环的总次数

以上信息保存在 `exp.json` 或 `date.json` 中，另外通过 Prometheus client 请求得到整个实验周期中的metric数据，保存在 `merged.csv` 中


### (2) 数据处理


#### 1. Quick Start


通过 `read_from_dir(dir)` 读取 metadata 和csv 数据创建 `ExpData` 实例

In [1]:
from aggregation import *
pd.set_option('display.max_rows', 10) 

exp_root = "/home/ict/appProfile/data/redis_1/no_stress/redis_no_20231102130306"
exp_data = read_from_dir(exp_root)

In [9]:
# shift time
# 32 for opt, 10 for warmup, 1 for container start
shift = exp_data._time_shift(opt_interval=32, delay=11)
exp_data.shift_time(shift).exp["info_per_epoch"]

1698930260


[{'workloads': {'redis_0': {'start_time': 1698930261,
    'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 1',
    'end_time': 1698930421,
    'name': 'redis_0'},
   'redis_1': {'start_time': 1698930529,
    'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 2',
    'end_time': 1698930689,
    'name': 'redis_1'},
   'redis_2': {'start_time': 1698930796,
    'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 4',
    'end_time': 1698930957,
    'name': 'redis_2'},
   'redis_3': {'start_time': 1698931064,
    'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 8',
    'end_time': 1698931224,
    'name': 'redis_3'},
   'redis_4': {'start_time': 1698931332,
    'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 16',
    'end_time': 1698931493,
    'name': 'redis_4'},
   'redis_5': {'start_time': 16

调用 `agg_epoch()` 方法之后，`exp_data` 会按每次干扰循环聚合数据, 这也是推荐的使用方法

In [87]:
df_epoch = exp_data.agg_epoch()
df_epoch

Unnamed: 0,stress_sock,host_cache_llc_capacity_numa_0,host_cache_llc_capacity_numa_1,host_cache_llc_capacity_numa_2,host_cache_llc_capacity_numa_3,host_kernel_context_switch,host_kernel_interrupt,host_kernel_thread_fork_fork,host_kernel_syscall_accept,host_kernel_syscall_accept4,...,vm_block_io_sector_offset_offset_vda,vm_mem_bandwidth_total_numa_1,vm_mem_bandwidth_local_numa_1,host_cpu_usage_nice,host_kernel_syscall_renameat,host_kernel_syscall_sendmmsg,vm_cache_llc_capacity_numa_0,vm_mem_bandwidth_local_numa_0,vm_mem_bandwidth_total_numa_0,app__qps_error
redis_0,1,0.968750,0.968661,0.999556,0.966974,96650.618802,87880.749711,0.390680,258.500211,775.046369,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0
redis_1,1,0.972222,0.957229,0.996291,0.976405,89440.784806,105685.335783,0.652826,260.010030,777.059089,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0
redis_2,1,0.958508,0.946995,0.997608,0.977123,95098.355061,101113.614857,0.736895,260.223669,774.039616,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0
redis_3,1,0.967370,0.950731,0.990260,0.975487,94990.462197,98215.516875,0.614300,261.660227,772.280974,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0
redis_4,1,0.956992,0.960859,0.983507,0.974905,150224.098101,116542.495032,0.652799,256.558592,771.385910,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
redis_4,64,0.962256,0.959984,1.012906,0.966234,855672.961662,90576.506334,0.442764,2607.953959,7829.275348,...,0.0,0.265669,0.044658,0.0,0.0,0.0,0.000081,0.005583,0.016749,0.0
redis_5,64,0.950627,0.947394,1.008229,0.967281,862132.686779,71878.474272,0.379302,2589.879960,7777.544476,...,0.0,0.045803,0.024249,0.0,0.0,0.0,0.000098,0.005385,0.005385,0.0
redis_6,64,0.944103,0.955774,1.013437,0.948403,862127.029974,66706.707830,0.229723,2576.809734,7767.030211,...,0.0,0.524732,0.068616,0.0,0.0,0.0,0.007755,0.120334,0.278665,0.0
redis_7,64,0.921964,0.946911,1.009144,0.948420,830159.487117,53436.277458,0.468649,2602.467227,7800.227174,...,0.0,0.284298,0.091514,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0


因为返回的仍然是一个 DataFrame, 因此后续可以根据数据处理的需要自行设置方法，如下展示一种从 `df_epoch` 数据中获取某个 workload 数据的流程

In [88]:
df_epoch_group = df_epoch.groupby(df_epoch.index)
keys = list(df_epoch_group.groups.keys())
keys

['redis_0',
 'redis_1',
 'redis_2',
 'redis_3',
 'redis_4',
 'redis_5',
 'redis_6',
 'redis_7',
 'redis_8']

使用 `groupby` 函数处理 Dataframe, 并从 `groups.keys()` 选择一个获取此 workload 的数据

In [89]:
df_workload = df_epoch_group.get_group(keys[0])
df_workload

Unnamed: 0,stress_sock,host_cache_llc_capacity_numa_0,host_cache_llc_capacity_numa_1,host_cache_llc_capacity_numa_2,host_cache_llc_capacity_numa_3,host_kernel_context_switch,host_kernel_interrupt,host_kernel_thread_fork_fork,host_kernel_syscall_accept,host_kernel_syscall_accept4,...,vm_block_io_sector_offset_offset_vda,vm_mem_bandwidth_total_numa_1,vm_mem_bandwidth_local_numa_1,host_cpu_usage_nice,host_kernel_syscall_renameat,host_kernel_syscall_sendmmsg,vm_cache_llc_capacity_numa_0,vm_mem_bandwidth_local_numa_0,vm_mem_bandwidth_total_numa_0,app__qps_error
redis_0,1,0.96875,0.968661,0.999556,0.966974,96650.618802,87880.749711,0.39068,258.500211,775.046369,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_0,2,0.96985,0.963251,0.998992,0.978831,117763.962638,98605.710222,0.548427,513.619013,1532.473223,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_0,4,0.958268,0.968946,0.991771,0.981681,150676.735255,113340.90589,0.637914,1004.930397,3021.243554,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_0,8,0.970216,0.965634,0.990011,0.971316,209067.153616,144805.509077,0.661161,1980.806103,5930.021388,...,0.0,0.00252,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_0,16,0.964489,0.972538,1.012879,0.968939,361701.192915,212367.390377,0.6834,2579.799589,7766.100785,...,0.0,0.018229,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_0,32,0.972507,0.9682,1.006232,0.963618,767478.384461,94138.875941,0.177476,2740.294718,8208.010752,...,0.0,0.006297,0.003778,0.0,0.0,0.0,0.000458,0.042844,0.050403,0.0
redis_0,64,0.965993,0.961731,1.00752,0.974515,890121.744501,92809.935008,0.411728,2559.70843,7689.542174,...,0.0,0.24703,0.017231,0.0,0.0,0.0,0.005682,0.072385,0.24357,0.0


#### 2. Custom Process

`exp_data` 将读入的数据保存在 `exp` 与 `df` 两个字段中，可以通过 `exp_data.exp` 与 `exp_data.df` 来直接访问读入的数据， 如获取某个 workload info, 可以采取如下方式

In [90]:
workload_info = exp_data.exp["info_per_epoch"][0]["workloads"][keys[0]]
workload_info

{'start_time': 1699233148,
 'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 1',
 'end_time': 1699233308,
 'name': 'redis_0',
 'stress': {'net': {'sock': '1'}}}

`workloads_of` 可以通过 name 获取所有的 workloads

In [91]:
workload_infos = exp_data.workloads_of(keys[0])
workload_infos

[{'start_time': 1699233148,
  'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 1',
  'end_time': 1699233308,
  'name': 'redis_0',
  'stress': {'net': {'sock': '1'}}},
 {'start_time': 1699235735,
  'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 1',
  'end_time': 1699235895,
  'name': 'redis_0',
  'stress': {'net': {'sock': '2'}}},
 {'start_time': 1699238333,
  'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 1',
  'end_time': 1699238493,
  'name': 'redis_0',
  'stress': {'net': {'sock': '4'}}},
 {'start_time': 1699240932,
  'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 1',
  'end_time': 1699241092,
  'name': 'redis_0',
  'stress': {'net': {'sock': '8'}}},
 {'start_time': 1699243529,
  'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 1',
  'end_time': 1699243689,
  'name': 'redis_0

随后，可以使用” `workload_df` 来获取此 workload 对应的 dataframe

In [92]:
exp_data.workload_df(workload_info)

Unnamed: 0_level_0,host_cache_llc_capacity_numa_0,host_cache_llc_capacity_numa_1,host_cache_llc_capacity_numa_2,host_cache_llc_capacity_numa_3,host_kernel_context_switch,host_kernel_interrupt,host_kernel_thread_fork_fork,host_kernel_syscall_accept,host_kernel_syscall_accept4,host_kernel_syscall_access,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1699233156000,0.960227,0.982955,1.011364,0.960227,97053.000000,87574.000000,0.00000,270.000000,763.500000,329.500000,...,16384.0,3.5,0.002666,0.0,0.0,39174.500000,3919.000000,43093.500000,0.99,0.990000
1699233160000,0.974432,0.982955,0.997159,0.968750,95996.500000,86444.000000,0.00000,271.500000,762.500000,329.500000,...,0.0,0.0,0.000000,0.0,0.0,40742.500000,4071.000000,44813.500000,0.99,0.990000
1699233168000,0.977273,0.968750,1.011364,0.960227,96545.500000,87106.500000,0.00000,262.631316,770.385193,329.664832,...,0.0,0.0,0.000000,0.0,0.0,42005.502751,4200.600300,46206.103052,0.99,0.990000
1699233172000,0.963068,0.926136,0.991477,0.977273,100666.833417,90133.566783,0.00000,248.624312,782.391196,329.664832,...,0.0,0.0,0.000000,0.0,0.0,41116.000000,4110.500000,45226.500000,0.99,0.990024
1699233176000,0.980114,0.982955,0.997159,0.960227,97091.545773,87937.468734,0.50025,266.000000,770.000000,330.500000,...,0.0,0.0,0.000000,0.0,0.0,40439.500000,4046.000000,44485.500000,0.99,0.990000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1699233288000,0.954545,0.965909,1.008523,0.977273,89642.821411,82538.269135,0.00000,260.869565,771.614193,329.335332,...,0.0,0.0,0.000000,0.0,0.0,38047.000000,3806.000000,41853.000000,0.99,0.990000
1699233292000,0.957386,0.968750,0.997159,0.991477,91407.000000,85054.500000,0.00000,245.000000,789.000000,329.500000,...,0.0,0.0,0.000000,0.0,0.0,39143.500000,3912.000000,43055.500000,0.99,0.990000
1699233296000,0.991477,0.968750,1.002841,0.971591,92787.500000,85145.500000,1.00000,260.000000,779.500000,330.500000,...,0.0,0.0,0.000000,0.0,0.0,39173.000000,3916.500000,43089.500000,0.99,0.990025
1699233300000,0.965909,0.960227,1.000000,0.980114,95951.000000,86669.500000,0.00000,253.500000,782.000000,332.000000,...,0.0,0.0,0.000000,0.0,0.0,39764.382191,3976.488244,43740.870435,0.99,0.990000


默认情况下不会提取 "stress" 数据，可以指定参数 `with_stress` 来开启

In [93]:
exp_data.workload_df(workload_info, with_stress=True)

Unnamed: 0_level_0,stress_sock,host_cache_llc_capacity_numa_0,host_cache_llc_capacity_numa_1,host_cache_llc_capacity_numa_2,host_cache_llc_capacity_numa_3,host_kernel_context_switch,host_kernel_interrupt,host_kernel_thread_fork_fork,host_kernel_syscall_accept,host_kernel_syscall_accept4,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1699233156000,1,0.960227,0.982955,1.011364,0.960227,97053.000000,87574.000000,0.00000,270.000000,763.500000,...,16384.0,3.5,0.002666,0.0,0.0,39174.500000,3919.000000,43093.500000,0.99,0.990000
1699233160000,1,0.974432,0.982955,0.997159,0.968750,95996.500000,86444.000000,0.00000,271.500000,762.500000,...,0.0,0.0,0.000000,0.0,0.0,40742.500000,4071.000000,44813.500000,0.99,0.990000
1699233168000,1,0.977273,0.968750,1.011364,0.960227,96545.500000,87106.500000,0.00000,262.631316,770.385193,...,0.0,0.0,0.000000,0.0,0.0,42005.502751,4200.600300,46206.103052,0.99,0.990000
1699233172000,1,0.963068,0.926136,0.991477,0.977273,100666.833417,90133.566783,0.00000,248.624312,782.391196,...,0.0,0.0,0.000000,0.0,0.0,41116.000000,4110.500000,45226.500000,0.99,0.990024
1699233176000,1,0.980114,0.982955,0.997159,0.960227,97091.545773,87937.468734,0.50025,266.000000,770.000000,...,0.0,0.0,0.000000,0.0,0.0,40439.500000,4046.000000,44485.500000,0.99,0.990000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1699233288000,1,0.954545,0.965909,1.008523,0.977273,89642.821411,82538.269135,0.00000,260.869565,771.614193,...,0.0,0.0,0.000000,0.0,0.0,38047.000000,3806.000000,41853.000000,0.99,0.990000
1699233292000,1,0.957386,0.968750,0.997159,0.991477,91407.000000,85054.500000,0.00000,245.000000,789.000000,...,0.0,0.0,0.000000,0.0,0.0,39143.500000,3912.000000,43055.500000,0.99,0.990000
1699233296000,1,0.991477,0.968750,1.002841,0.971591,92787.500000,85145.500000,1.00000,260.000000,779.500000,...,0.0,0.0,0.000000,0.0,0.0,39173.000000,3916.500000,43089.500000,0.99,0.990025
1699233300000,1,0.965909,0.960227,1.000000,0.980114,95951.000000,86669.500000,0.00000,253.500000,782.000000,...,0.0,0.0,0.000000,0.0,0.0,39764.382191,3976.488244,43740.870435,0.99,0.990000


或者通过 `agg_one_workload` 获取此 workload 对应时间序列下的均值

In [94]:
exp_data.agg_one_workload(workload_info)

Unnamed: 0,host_cache_llc_capacity_numa_0,host_cache_llc_capacity_numa_1,host_cache_llc_capacity_numa_2,host_cache_llc_capacity_numa_3,host_kernel_context_switch,host_kernel_interrupt,host_kernel_thread_fork_fork,host_kernel_syscall_accept,host_kernel_syscall_accept4,host_kernel_syscall_access,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
0,0.96875,0.968661,0.999556,0.966974,96650.618802,87880.749711,0.39068,258.500211,775.046369,330.015599,...,1920.256128,0.39068,0.000155,0.0625,1.1e-05,40012.370412,4001.252622,44013.623033,0.99,0.990003


`exp_data` 也允许只获取某个 干扰循环 的数据，通过下标指示

In [95]:
exp_data.agg_epoch(0)

Unnamed: 0,stress_sock,host_cache_llc_capacity_numa_0,host_cache_llc_capacity_numa_1,host_cache_llc_capacity_numa_2,host_cache_llc_capacity_numa_3,host_kernel_context_switch,host_kernel_interrupt,host_kernel_thread_fork_fork,host_kernel_syscall_accept,host_kernel_syscall_accept4,...,app_redis_qos_p99_latency_get,vm_hypervisor_emulator_syscall_count_mmap,vm_hypervisor_emulator_syscall_count_mprotect,vm_hypervisor_emulator_syscall_duration_mmap,vm_hypervisor_emulator_syscall_duration_mprotect,vm_hypervisor_vcpu_syscall_count_futex_time64,vm_hypervisor_vcpu_syscall_duration_futex_time64,host_kernel_syscall_fsync,host_kernel_syscall_pselect6_time64,host_kernel_syscall_fallocate
redis_0,1,0.96875,0.968661,0.999556,0.966974,96650.618802,87880.749711,0.39068,258.500211,775.046369,...,0.990003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_1,1,0.972222,0.957229,0.996291,0.976405,89440.784806,105685.335783,0.652826,260.01003,777.059089,...,0.992668,0.097201,0.097201,0.305556,0.340278,0.069458,0.662037,0.0,0.0,0.0
redis_2,1,0.958508,0.946995,0.997608,0.977123,95098.355061,101113.614857,0.736895,260.223669,774.039616,...,2.947731,0.052605,0.052605,4.157895,2.289474,0.0,0.0,0.0,0.0,0.0
redis_3,1,0.96737,0.950731,0.99026,0.975487,94990.462197,98215.516875,0.6143,261.660227,772.280974,...,3.959854,0.142857,0.142857,0.971429,1.019048,0.0,0.0,0.042857,0.0,0.0
redis_4,1,0.956992,0.960859,0.983507,0.974905,150224.098101,116542.495032,0.652799,256.558592,771.38591,...,4.893004,0.194417,0.194417,0.592593,0.819444,0.166639,2.208333,0.0,0.0,0.0
redis_5,1,0.954545,0.962485,0.987398,0.964452,115976.615824,73778.365587,0.66668,260.6059,776.266359,...,5.214967,0.089718,0.089718,0.587607,0.613248,0.0,0.0,0.0,0.0,0.0
redis_6,1,0.936497,0.934993,0.992814,0.951872,84457.490858,66429.50367,0.617662,258.463707,778.463818,...,7.075059,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.014706
redis_7,1,0.930697,0.926286,0.988113,0.957984,57458.135299,50930.626318,0.618428,258.408744,778.527469,...,12.753027,0.131553,0.131553,0.892544,1.155702,0.0,0.0,0.0,0.0,0.0
redis_8,1,0.932468,0.959659,0.990584,0.930763,55058.133685,47818.727024,0.785679,259.303906,775.208933,...,23.147469,0.100021,0.100021,0.695238,0.766667,0.0,0.0,0.0,0.014286,0.0


### 3. Advanced Usage

默认情况下 `exp_data` 对每个 workload 采用如下预处理手段，按顺序依次为:
1. `filter_column_startswith(col_prefix=("vm", "app"))`: 只选用 `vm`, `app` 前缀的指标
2. `filter_column_useless(std_min=1e-10)`: 过滤掉平均值为0, 或方差小于 `1e-10` 的指标
3. `filter_row_noise(col_prefix=("app"))`: 过滤行中 `app` 为前缀指标中的离群值

同时，对于每个 workload 采用如下聚合手段，按顺序依次为:
1. `lambda x : x.mean().to_frame().T`: 将一个workload时序数据按均值压缩为一行

`exp_data` 允许对上述处理进行自定义, 需要注意的是, 自定义的方法设置完毕之后，将会一直生效，包括在 `agg_epoch` 时
- `set_workload_preprocess_funcs(df_funcs):`: 自定义预处理手段
- `set_workload_agg_funcs(df_funcs):`: 自定义聚合手段

其中 `df_funcs` 为一组函数，每个函数都满足如下签名:
- `df_func(df: DataFrame) -> DataFrame`

In [96]:
# defualt_workload_preprocess_funcs = [
#     filter_column_startswith(col_prefix=("stress", "host", "vm", "app")),
#     filter_column_useless(excol_prefix=("stress")),
#     filter_row_noise(col_prefix=("app")),
# ]

# defualt_workload_agg_funcs = [
#     lambda x : x.mean().to_frame().T,
# ]

exp_data.set_workload_preprocess_funcs([
    filter_column_startswith(col_prefix=("stress", "host", "vm", "app")),
    filter_column_useless(excol_prefix=("stress")),
])

exp_data.workload_df(workload_info, with_stress=True)

Unnamed: 0_level_0,stress_sock,host_cache_llc_capacity_numa_0,host_cache_llc_capacity_numa_1,host_cache_llc_capacity_numa_2,host_cache_llc_capacity_numa_3,host_kernel_context_switch,host_kernel_interrupt,host_kernel_thread_fork_fork,host_kernel_syscall_accept,host_kernel_syscall_accept4,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1699233148000,1,0.980114,0.943182,1.011364,0.946023,105757.878939,108640.820410,0.0000,257.000000,780.000000,...,0.0,0.0,0.000000,0.0,0.0,92724.000000,9274.500000,101998.500000,2.949383,2.949151
1699233152000,1,0.974432,0.980114,1.005682,0.957386,100340.170085,90785.892946,1.0005,271.000000,763.500000,...,0.0,0.0,0.000000,0.0,0.0,34340.000000,3451.000000,37791.000000,2.952766,2.952418
1699233156000,1,0.960227,0.982955,1.011364,0.960227,97053.000000,87574.000000,0.0000,270.000000,763.500000,...,16384.0,3.5,0.002666,0.0,0.0,39174.500000,3919.000000,43093.500000,0.990000,0.990000
1699233160000,1,0.974432,0.982955,0.997159,0.968750,95996.500000,86444.000000,0.0000,271.500000,762.500000,...,0.0,0.0,0.000000,0.0,0.0,40742.500000,4071.000000,44813.500000,0.990000,0.990000
1699233164000,1,0.985795,0.982955,1.014205,0.940341,102174.000000,91165.500000,1.0000,263.368316,771.614193,...,0.0,0.0,0.000000,0.0,0.0,41879.060470,4185.907046,46064.967516,0.990235,0.990047
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1699233292000,1,0.957386,0.968750,0.997159,0.991477,91407.000000,85054.500000,0.0000,245.000000,789.000000,...,0.0,0.0,0.000000,0.0,0.0,39143.500000,3912.000000,43055.500000,0.990000,0.990000
1699233296000,1,0.991477,0.968750,1.002841,0.971591,92787.500000,85145.500000,1.0000,260.000000,779.500000,...,0.0,0.0,0.000000,0.0,0.0,39173.000000,3916.500000,43089.500000,0.990000,0.990025
1699233300000,1,0.965909,0.960227,1.000000,0.980114,95951.000000,86669.500000,0.0000,253.500000,782.000000,...,0.0,0.0,0.000000,0.0,0.0,39764.382191,3976.488244,43740.870435,0.990000,0.990000
1699233304000,1,0.980114,0.931818,1.005682,0.997159,92742.000000,85475.500000,1.0000,260.500000,768.500000,...,0.0,0.0,0.000000,0.0,0.0,39815.500000,3983.000000,43798.500000,0.990000,0.990202


In [97]:
# 获取最大值而不是平均值
exp_data.set_workload_agg_funcs([
    lambda x : x.max().to_frame().T,
])

exp_data.agg_one_workload(workload_info)

Unnamed: 0,host_cache_llc_capacity_numa_0,host_cache_llc_capacity_numa_1,host_cache_llc_capacity_numa_2,host_cache_llc_capacity_numa_3,host_kernel_context_switch,host_kernel_interrupt,host_kernel_thread_fork_fork,host_kernel_syscall_accept,host_kernel_syscall_accept4,host_kernel_syscall_access,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
0,0.991477,0.997159,1.017045,0.997159,120235.5,109046.5,1.0005,275.0,791.5,332.333833,...,16392.196098,3.501751,0.002666,1.0,0.000193,92724.0,9274.5,101998.5,2.952766,2.952418


## Time Shift

### exp 运算

`exp.json` 中以 workload 为粒度进行编排组织
- `info_per_epoch` 以 epcoh 的维度组织 workload, 每个 epoch 在全局设置，如干扰上存在不同
- `info_per_workload` 以 workload 的维度组织，每个 workload 中包含了同一个 workload 不同epoch下的数据

In [16]:
assert exp_data.exp["info_per_workload"] == ipe_to_ipw(exp_data.exp["info_per_epoch"])
assert exp_data.exp["info_per_epoch"] == ipw_to_ipe(exp_data.exp["info_per_workload"])

In [17]:
exp = exp_data.exp
keys = list(exp["info_per_workload"].keys())
sub_workload_1 = {k: exp["info_per_workload"][k] for k in keys[:len(keys) // 2]}
sub_workload_2 = {k: exp["info_per_workload"][k] for k in keys[len(keys) // 2:]}

In [18]:
def workload_to_exp(workload):
    info_per_workload = workload
    info_per_epoch = ipw_to_ipe(workload)
    
    workload_keys = list(info_per_workload.keys())
    start_time = info_per_epoch[0]["workloads"][workload_keys[0]]["start_time"]
    end_time = info_per_epoch[-1]["workloads"][workload_keys[-1]]["end_time"]
    total_time = end_time - start_time
    n_epoch = len(info_per_epoch)
    date_format = 'timestamp'
    
    
    return {
        "start_time": start_time,
        "end_time": end_time,
        "total_time": total_time,
        "n_epoch": n_epoch,
        "date_format": date_format,
        "info_per_workload": info_per_workload,
        "info_per_epoch": info_per_epoch,
    }
sub_exp_1 = workload_to_exp(sub_workload_1)
sub_exp_2 = workload_to_exp(sub_workload_2)

sub_data_1 = filter_row_timerange(
    format_to_13_timestamp(sub_exp_1["start_time"]),
    format_to_13_timestamp(sub_exp_1["end_time"]),
)(exp_data.df)

sub_data_2 = filter_row_timerange(
    format_to_13_timestamp(sub_exp_2["start_time"]),
    format_to_13_timestamp(sub_exp_2["end_time"]),
)(exp_data.df)

sub_exp_data_1 = ExpData(sub_data_1, sub_exp_1)
sub_exp_data_2 = ExpData(sub_data_2, sub_exp_2)

In [19]:
total_exp_data = concat([sub_exp_data_1, sub_exp_data_2])