Skip to content

Big Data Use Cases

Tânia Esteves edited this page Oct 14, 2021 · 4 revisions

Overview

CaT was evaluated resorting to two real Big Data applications: TensorFlow (v2.3.0) and Apache Hadoop (v2.7.1).

Next we describe the necessary steps to install and run the experiments for each use case.

Experimental Environment

Experiments were performed on machines with the following characteristics:

  • For TensorFlow use case (1 server):

    • CPU: Intel Core i9-9900K CPU (8 physical and 16 logical cores).
    • RAM: 16 GiB of DDR4 RAM.
    • DISK: one 1 TiB Micron 2200S NVMe.
    • GPU: NVIDIA GeForce RTX 2070 GPU with CUDA version 10.2.
  • For Apache Hadoop use case (5 servers: 3 DataNodes, 1 NameNode and 1 Client):

    • CPU: Intel Core i5-9500 CPU (6 physical and logical cores).
    • RAM: 16 GiB of DDR4 RAM.
    • DISK: one 500 GiB, SATA III, Seagate ST500DM009-2F110 HDD; and one 250 GiB, Samsung SSD 970 EVO Plus.
    • NET: Servers were interconnected by a switched 10 Gigabit Ethernet network.

Experimental Setups

The experiments included three distinct deployments:

  • Vanilla: The application running without tracing tools
  • CatBpf: The eBPF-based tracer running simultaneously with the target application and intercepting its events.
  • CatStrace: The Strace-based tracer intercepting application’s events.

Use cases

CaT's evaluation consisted of two types of experiments:

  • Content-aware Tracers Evaluation: where we evaluated the performance, resource usage, and storage overhead of CaT, namely of the two supported tracers, at the application's critical I/O path. Also, these experiments were useful to verify how the two different tracers vary in terms of accuracy (number of captured events).
  • CaT Framework in Action: where we evaluated what novel insights CaT's content-aware approach can provide.

TensorFlow

Apache Hadoop

Clone this wiki locally