Big Data Use Cases

Jump to bottom

Tânia Esteves edited this page Oct 13, 2021 · 4 revisions

Overview

CaT was evaluated resorting to two real Big Data applications: TensorFlow (v2.3.0) and Apache Hadoop (v2.7.1).

Next we describe the necessary steps to install and run the experiments for each use case.

Experimental Environment

Experiments were performed on machines with the following characteristics:

For TensorFlow use case (1 server):
- CPU: Intel Core i9-9900K CPU (8 physical and 16 logical cores).
- RAM: 16 GiB of DDR4 RAM.
- DISK: one 1 TiB Micron 2200S NVMe.
- GPU: NVIDIA GeForce RTX 2070 GPU with CUDA version 10.2.
For Apache Hadoop use case (5 servers: 3 DataNodes, 1 NameNode and 1 Client):
- CPU: Intel Core i5-9500 CPU (6 physical and logical cores).
- RAM: 16 GiB of DDR4 RAM.
- DISK: one 500 GiB, SATA III, Seagate ST500DM009-2F110 HDD; and one 250 GiB, Samsung SSD 970 EVO Plus.
- NET: Servers were interconnected by a switched 10 Gigabit Ethernet network.

Experimental Setups

The experiments included three distinct deployments:

Vanilla: The application running without tracing tools
CatBpf: The eBPF-based tracer running simultaneously with the target application and intercepting its events.
CatStrace: The Strace-based tracer intercepting application’s events.

Use cases

CaT's evaluation consisted of two types of experiments:

Content-aware Tracers Evaluation: where we evaluated the performance, resource usage, and storage overhead of CaT, namely of the two supported tracers, at the application's critical I/O path. Also, these experiments were useful to verify how the two different tracers vary in terms of accuracy (number of captured events).
CaT Framework in Action: where we evaluated what novel insights CaT's content-aware approach can provide.

TensorFlow

See TensorFlow: Installation and evaluation steps for the instructions to install and prepare the experiments.
See TensorFlow: Tracers Evaluation for the instructions to run the Content-aware Tracers Evaluation experiments for the TensorFlow Use Case.
See TensorFlow: Dataset shuffle for the instructions to run the CaT Framework in Action experiments for the TensorFlow Use Case.

Apache Hadoop

See Hadoop: Installation and evaluation steps for the instructions to install and prepare the experiments.
See Hadoop: Tracers Evaluation for the instructions to run the Content-aware Tracers Evaluation experiments for the Apache Hadoop Use Case.
See Hadoop: HDFS File Replication for the instructions to run the CaT Framework in Action experiments for the Apache Hadoop Use Case.