# Aggregation

## 场景一

### (1) 描述

场景一中，所有实验按固定的模式运行，包含干扰循环和负载循环
1. 初始化 client 和 server(vm)
2. 启动 background 干扰，当前支持的干扰有: cpu/cache/mem/io/net
3. 启动 workload， 当前支持的workload有: redis/nginx/mysql
4. workload 结束后，记录一份 `workload_info`, 包含时间戳, 负载、干扰强度等metadata
5. 回到步骤3，执行下一个workload，当所有workload都执行完毕后，记录一份 `info_per_epoch`, 包含所有已经执行的 `workload_info`, 以及此次循环的干扰强度信息
6. 回到步骤2，直到所有的强度的干扰都指向完毕
7. 结束实验，记录实验的启动、结束时间，总共消耗的时间，以及干扰循环的总次数

以上信息保存在 `exp.json` 或 `date.json` 中，另外通过 Prometheus client 请求得到整个实验周期中的metric数据，保存在 `merged.csv` 中


### (2) 数据处理


#### 1. Quick Start


通过 `read_from_dir(dir)` 读取 metadata 和csv 数据创建 `ExpData` 实例

In [1]:
from aggregation import *
pd.set_option('display.max_rows', 10) 

exp_root = "/home/ict/appProfile/data/addtion_exp/standard_stress_cache_20231023031422/"
exp_data = read_from_dir(exp_root)

调用 `agg_epoch()` 方法之后，`exp_data` 会按每次干扰循环聚合数据, 这也是推荐的使用方法

In [2]:
df_epoch = exp_data.agg_epoch()
df_epoch

Unnamed: 0,cache,vm_hypervisor_emulator_syscall_count_clone3,vm_hypervisor_emulator_syscall_count_fdatasync,vm_hypervisor_emulator_syscall_count_futex_time64,vm_hypervisor_emulator_syscall_count_madvise,vm_hypervisor_emulator_syscall_count_munmap,vm_hypervisor_emulator_syscall_count_poll_time64,vm_hypervisor_emulator_syscall_count_ppoll_time64,vm_hypervisor_emulator_syscall_count_prctl,vm_hypervisor_emulator_syscall_count_pwrite,...,vm_mem_bandwidth_local_total_numa_1,vm_mem_bandwidth_local_total_numa_3,vm_hypervisor_emulator_syscall_count_newfstatat,vm_hypervisor_emulator_syscall_count_recvmsg,vm_hypervisor_emulator_syscall_count_sendmsg,vm_mem_bandwidth_local_local_numa_1,vm_hypervisor_emulator_syscall_count_fallocate,vm_hypervisor_emulator_syscall_count_pread,vm_block_io_sector_offset_offset_vda,vm_hypervisor_emulator_syscall_duration_pread
redis_0,1,0.026316,0.026316,0.736842,0.052632,0.039474,315.381579,18.921053,0.013158,0.184211,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
redis_1,1,0.205128,0.102564,1.038462,0.192308,0.076923,0.000000,18.641026,0.102564,0.128205,...,0.003005,0.002003,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
redis_2,1,0.200000,0.175000,1.325000,0.087500,0.037500,315.507895,18.575450,0.100000,0.212500,...,0.000000,0.000000,3.500088,302.507570,4.000100,0.000000,0.0,0.0,0.0,0.0
redis_3,1,0.000000,0.052632,0.644737,0.000000,0.000000,0.000000,18.000000,0.000000,0.184211,...,0.006168,0.016447,0.000000,0.000000,0.000000,0.002056,0.0,0.0,0.0,0.0
redis_4,1,0.250000,0.050000,0.525000,0.050000,0.037500,0.000000,18.312500,0.125000,0.100000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
redis_4,32,0.195122,0.121951,1.439024,0.134146,0.073171,0.000000,19.439024,0.097561,0.182927,...,0.066692,0.000953,0.000000,0.000000,0.000000,0.013338,0.0,0.0,0.0,0.0
redis_5,32,0.121951,0.097561,0.865854,0.036585,0.000000,315.487805,19.219512,0.060976,0.207317,...,0.000000,0.024771,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
redis_6,32,0.000000,0.050000,0.587500,0.087500,0.037500,0.000000,19.087500,0.000000,0.100000,...,0.000000,0.011719,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
redis_7,32,0.200000,0.125000,0.725000,0.150000,0.075000,0.000000,20.437500,0.100000,0.062500,...,0.000000,0.003906,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0


因为返回的仍然是一个 DataFrame, 因此后续可以根据数据处理的需要自行设置方法，如下展示一种从 `df_epoch` 数据中获取某个 workload 数据的流程

In [3]:
df_epoch_group = df_epoch.groupby(df_epoch.index)
df_epoch_group.groups.keys()

dict_keys(['redis_0', 'redis_1', 'redis_2', 'redis_3', 'redis_4', 'redis_5', 'redis_6', 'redis_7', 'redis_8'])

使用 `groupby` 函数处理 Dataframe, 并从 `groups.keys()` 选择一个获取此 workload 的数据

In [4]:
df_workload = df_epoch_group.get_group('redis_0')
df_workload

Unnamed: 0,cache,vm_hypervisor_emulator_syscall_count_clone3,vm_hypervisor_emulator_syscall_count_fdatasync,vm_hypervisor_emulator_syscall_count_futex_time64,vm_hypervisor_emulator_syscall_count_madvise,vm_hypervisor_emulator_syscall_count_munmap,vm_hypervisor_emulator_syscall_count_poll_time64,vm_hypervisor_emulator_syscall_count_ppoll_time64,vm_hypervisor_emulator_syscall_count_prctl,vm_hypervisor_emulator_syscall_count_pwrite,...,vm_mem_bandwidth_local_total_numa_1,vm_mem_bandwidth_local_total_numa_3,vm_hypervisor_emulator_syscall_count_newfstatat,vm_hypervisor_emulator_syscall_count_recvmsg,vm_hypervisor_emulator_syscall_count_sendmsg,vm_mem_bandwidth_local_local_numa_1,vm_hypervisor_emulator_syscall_count_fallocate,vm_hypervisor_emulator_syscall_count_pread,vm_block_io_sector_offset_offset_vda,vm_hypervisor_emulator_syscall_duration_pread
redis_0,1,0.026316,0.026316,0.736842,0.052632,0.039474,315.381579,18.921053,0.013158,0.184211,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_0,2,0.225806,0.064516,0.580645,0.129032,0.048387,315.467742,18.193548,0.112903,0.032258,...,0.0,0.039062,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_0,4,0.368421,0.052632,0.736842,0.184211,0.078947,0.0,18.276316,0.184211,0.184211,...,0.023643,0.030839,0.0,0.0,0.0,0.009252,0.0,0.0,0.0,0.0
redis_0,8,0.475,0.1,1.0125,0.2375,0.125,0.0,18.5875,0.2375,0.225,...,0.020508,0.0,0.0,0.0,0.0,0.008789,0.0,0.0,0.0,0.0
redis_0,16,0.128205,0.076923,0.807679,0.06409,0.0,315.491918,18.12777,0.064103,0.217949,...,0.244391,0.0,3.49991,302.492251,3.999898,0.043069,0.0,0.012821,0.0,2.589744
redis_0,32,0.266667,0.133333,1.083333,0.2,0.1,0.0,19.5,0.133333,0.083333,...,0.0,0.009115,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### 2. Custom Process

`exp_data` 将读入的数据保存在 `exp` 与 `df` 两个字段中，可以通过 `exp_data.exp` 与 `exp_data.df` 来直接访问读入的数据， 如获取某个 workload info, 可以采取如下方式

In [5]:
workload_info = exp_data.exp["info_per_workload"]["redis_0"]["info_per_epoch"][0]
workload_info

{'start_time': 1698030875,
 'run_cmd': 'docker exec -it redis-client-1 memtier_benchmark  -s envoy --test-time 160 -t 1',
 'end_time': 1698031035,
 'name': 'redis_0',
 'addition': {'stress': {'cache': {'cache': '1'}}}}

随后，可以使用” `workload_df` 来获取此 workload 对应的 dataframe

In [6]:
exp_data.workload_df(workload_info)

Unnamed: 0_level_0,vm_hypervisor_emulator_syscall_count_clone3,vm_hypervisor_emulator_syscall_count_fdatasync,vm_hypervisor_emulator_syscall_count_futex_time64,vm_hypervisor_emulator_syscall_count_madvise,vm_hypervisor_emulator_syscall_count_munmap,vm_hypervisor_emulator_syscall_count_poll_time64,vm_hypervisor_emulator_syscall_count_ppoll_time64,vm_hypervisor_emulator_syscall_count_prctl,vm_hypervisor_emulator_syscall_count_pwrite,vm_hypervisor_emulator_syscall_count_pwritev,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1698030882000,0.0,0.0,0.0,0.0,0.0,315.5,19.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39967.0,3998.0,43965.0,0.990363,0.990399
1698030886000,0.0,0.0,0.0,0.0,0.0,315.5,18.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,40349.5,4032.5,44382.0,0.990121,0.990194
1698030890000,0.0,0.0,0.0,0.0,0.0,315.5,19.5,0.0,0.0,0.0,...,8192.0,2.0,0.000377,0,0.0,40161.0,4012.0,44173.0,0.990879,0.990565
1698030894000,0.0,0.0,0.0,0.0,0.0,315.0,19.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39884.5,3987.0,43871.5,0.991942,0.990764
1698030898000,0.0,0.0,4.5,2.0,1.5,315.5,18.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39500.5,3949.5,43450.0,0.990742,0.990829
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1698031014000,0.0,0.0,0.0,0.0,0.0,315.0,19.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39371.5,3938.5,43310.0,0.990000,0.990000
1698031018000,0.0,0.0,0.0,0.0,0.0,315.5,18.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39217.0,3923.0,43140.0,0.990000,0.990000
1698031022000,0.0,0.0,0.0,0.0,0.0,315.5,19.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39373.0,3935.5,43308.5,0.990250,0.990626
1698031030000,0.0,0.0,0.0,0.0,0.0,315.5,18.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,40052.0,4004.5,44056.5,0.990502,0.990891


或者通过 `agg_one_workload` 获取此 workload 对应时间序列下的均值

In [7]:
exp_data.agg_one_workload(workload_info)

Unnamed: 0,vm_hypervisor_emulator_syscall_count_clone3,vm_hypervisor_emulator_syscall_count_fdatasync,vm_hypervisor_emulator_syscall_count_futex_time64,vm_hypervisor_emulator_syscall_count_madvise,vm_hypervisor_emulator_syscall_count_munmap,vm_hypervisor_emulator_syscall_count_poll_time64,vm_hypervisor_emulator_syscall_count_ppoll_time64,vm_hypervisor_emulator_syscall_count_prctl,vm_hypervisor_emulator_syscall_count_pwrite,vm_hypervisor_emulator_syscall_count_pwritev,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
0,0.026316,0.026316,0.736842,0.052632,0.039474,315.381579,18.921053,0.013158,0.184211,0.039474,...,8192.0,0.315789,0.000136,0.131579,3.4e-05,39671.368421,3966.75,43638.118421,0.990426,0.990417


`exp_data` 也允许只获取某个 干扰循环 的数据，通过下标指示

In [8]:
exp_data.agg_epoch(0)

Unnamed: 0,cache,vm_hypervisor_emulator_syscall_count_clone3,vm_hypervisor_emulator_syscall_count_fdatasync,vm_hypervisor_emulator_syscall_count_futex_time64,vm_hypervisor_emulator_syscall_count_madvise,vm_hypervisor_emulator_syscall_count_munmap,vm_hypervisor_emulator_syscall_count_poll_time64,vm_hypervisor_emulator_syscall_count_ppoll_time64,vm_hypervisor_emulator_syscall_count_prctl,vm_hypervisor_emulator_syscall_count_pwrite,...,vm_hypervisor_vcpu_syscall_count_write,vm_hypervisor_vcpu_syscall_duration_futex_time64,vm_cache_last-level_cache_capacity_vm_occupancy_numa_1,vm_mem_bandwidth_local_local_numa_3,vm_mem_bandwidth_local_total_numa_1,vm_mem_bandwidth_local_total_numa_3,vm_hypervisor_emulator_syscall_count_newfstatat,vm_hypervisor_emulator_syscall_count_recvmsg,vm_hypervisor_emulator_syscall_count_sendmsg,vm_mem_bandwidth_local_local_numa_1
redis_0,1,0.026316,0.026316,0.736842,0.052632,0.039474,315.381579,18.921053,0.013158,0.184211,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_1,1,0.205128,0.102564,1.038462,0.192308,0.076923,0.0,18.641026,0.102564,0.128205,...,0.166667,5.403846,7.3e-05,0.0,0.003005,0.002003,0.0,0.0,0.0,0.0
redis_2,1,0.2,0.175,1.325,0.0875,0.0375,315.507895,18.57545,0.1,0.2125,...,0.0,2.275,0.0,0.0,0.0,0.0,3.500088,302.50757,4.0001,0.0
redis_3,1,0.0,0.052632,0.644737,0.0,0.0,0.0,18.0,0.0,0.184211,...,0.0,3.197368,0.007102,0.012336,0.006168,0.016447,0.0,0.0,0.0,0.002056
redis_4,1,0.25,0.05,0.525,0.05,0.0375,0.0,18.3125,0.125,0.1,...,0.0,0.0,0.000284,0.0,0.0,0.0,0.0,0.0,0.0,0.0
redis_5,1,0.21875,0.0625,0.578125,0.015625,0.0,0.0,18.53125,0.109375,0.09375,...,0.0,0.484375,0.0,0.001221,0.0,0.004883,0.0,0.0,0.0,0.0
redis_6,1,0.058824,0.0,0.632353,0.102941,0.0,0.0,17.897059,0.029412,0.161765,...,0.0,0.566176,0.002757,0.002298,0.011489,0.02068,0.0,0.0,0.0,0.004596
redis_7,1,0.184211,0.078947,0.907895,0.171053,0.039474,315.486842,18.407895,0.092105,0.171053,...,0.0,5.046053,0.000374,0.01028,0.0,0.016447,0.0,0.0,0.0,0.0
redis_8,1,0.244444,0.133333,0.922222,0.1,0.033333,0.0,18.955556,0.122222,0.133333,...,0.0,4.672222,0.0,0.0,0.0,0.001736,0.0,0.0,0.0,0.0


3. Advanced Usage

默认情况下 `exp_data` 对每个 workload 采用如下预处理手段，按顺序依次为:
1. `filter_column_startswith(col_prefix=("vm", "app"))`: 只选用 `vm`, `app` 前缀的指标
2. `filter_column_useless(std_min=1e-10)`: 过滤掉平均值为0, 或方差小于 `1e-10` 的指标
3. `filter_row_noise(col_prefix=("app"))`: 过滤行中 `app` 为前缀指标中的离群值

同时，对于每个 workload 采用如下聚合手段，按顺序依次为:
1. `lambda x : x.mean().to_frame().T`: 将一个workload时序数据按均值压缩为一行

`exp_data` 允许对上述处理进行自定义, 需要注意的是, 自定义的方法设置完毕之后，将会一直生效，包括在 `agg_epoch` 时
- `set_workload_preprocess_funcs(df_funcs):`: 自定义预处理手段
- `set_workload_agg_funcs(df_funcs):`: 自定义聚合手段

其中 `df_funcs` 为一组函数，每个函数都满足如下签名:
- `df_func(df: DataFrame) -> DataFrame`

In [14]:
exp_data.set_workload_preprocess_funcs([
    filter_column_startswith(col_prefix=("vm", "app")),
    filter_column_useless(std_min=1e-10),
])

exp_data.workload_df(workload_info)

Unnamed: 0_level_0,vm_hypervisor_emulator_syscall_count_clone3,vm_hypervisor_emulator_syscall_count_fdatasync,vm_hypervisor_emulator_syscall_count_futex_time64,vm_hypervisor_emulator_syscall_count_madvise,vm_hypervisor_emulator_syscall_count_munmap,vm_hypervisor_emulator_syscall_count_poll_time64,vm_hypervisor_emulator_syscall_count_ppoll_time64,vm_hypervisor_emulator_syscall_count_prctl,vm_hypervisor_emulator_syscall_count_pwrite,vm_hypervisor_emulator_syscall_count_pwritev,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1698030878000,0.0,0.0,8.5,0.0,0.0,315.5,20.5,0.0,3.0,0.5,...,16384.0,3.5,0.000600,0,0.0,39840.0,3985.0,43825.0,2.452920,2.474957
1698030882000,0.0,0.0,0.0,0.0,0.0,315.5,19.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39967.0,3998.0,43965.0,0.990363,0.990399
1698030886000,0.0,0.0,0.0,0.0,0.0,315.5,18.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,40349.5,4032.5,44382.0,0.990121,0.990194
1698030890000,0.0,0.0,0.0,0.0,0.0,315.5,19.5,0.0,0.0,0.0,...,8192.0,2.0,0.000377,0,0.0,40161.0,4012.0,44173.0,0.990879,0.990565
1698030894000,0.0,0.0,0.0,0.0,0.0,315.0,19.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39884.5,3987.0,43871.5,0.991942,0.990764
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1698031018000,0.0,0.0,0.0,0.0,0.0,315.5,18.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39217.0,3923.0,43140.0,0.990000,0.990000
1698031022000,0.0,0.0,0.0,0.0,0.0,315.5,19.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39373.0,3935.5,43308.5,0.990250,0.990626
1698031026000,0.0,0.0,0.0,0.0,0.0,315.5,19.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,39322.5,3934.0,43256.5,0.991919,0.991920
1698031030000,0.0,0.0,0.0,0.0,0.0,315.5,18.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0,0.0,40052.0,4004.5,44056.5,0.990502,0.990891


In [16]:
# 获取最大值而不是平均值
exp_data.set_workload_agg_funcs([
    lambda x : x.max().to_frame().T,
])

exp_data.agg_one_workload(workload_info)

Unnamed: 0,vm_hypervisor_emulator_syscall_count_clone3,vm_hypervisor_emulator_syscall_count_fdatasync,vm_hypervisor_emulator_syscall_count_futex_time64,vm_hypervisor_emulator_syscall_count_madvise,vm_hypervisor_emulator_syscall_count_munmap,vm_hypervisor_emulator_syscall_count_poll_time64,vm_hypervisor_emulator_syscall_count_ppoll_time64,vm_hypervisor_emulator_syscall_count_prctl,vm_hypervisor_emulator_syscall_count_pwrite,vm_hypervisor_emulator_syscall_count_pwritev,...,vm_block_io_bytes_io_write,vm_block_io_requests_write,vm_block_io_time_write,vm_block_io_flush_request_and_time_request_vda,vm_block_io_flush_request_and_time_time_vda,app_redis_qos_qps_of_redis_get,app_redis_qos_qps_of_redis_set,app_redis_qos_qps_of_redis_total,app_redis_qos_p99_latency_set,app_redis_qos_p99_latency_get
0,1.0,1.0,12.0,2.0,1.5,315.5,22.0,0.5,3.0,0.5,...,264192.0,3.5,0.002691,2.0,0.000653,40989.5,4101.5,45091.0,2.45292,2.474957
