Skip to content

Big Data Use Cases

Tânia Esteves edited this page Oct 13, 2021 · 4 revisions

Big Data Use Cases

Overview

CaT was evaluated resorting to two real Big Data applications: TensorFlow (v2.3.0) and Apache Hadoop (v2.7.1).

Next we describe the necessary steps to install and run the experiments for each use case.

Experimental Environment

Experiments were performed on machines with the following characteristics:

  • For TensorFlow use case (1 server):

    • CPU: Intel Core i9-9900K CPU (8 physical and 16 logical cores).
    • RAM: 16 GiB of DDR4 RAM.
    • DISK: one 1 TiB Micron 2200S NVMe.
    • GPU: NVIDIA GeForce RTX 2070 GPU with CUDA version 10.2.
  • For Apache Hadoop use case (5 servers: 3 DataNodes, 1 NameNode and 1 Client):

    • CPU: Intel Core i5-9500 CPU (6 physical and logical cores).
    • RAM: 16 GiB of DDR4 RAM.
    • DISK: one 500 GiB, SATA III, Seagate ST500DM009-2F110 HDD; and one 250 GiB, Samsung SSD 970 EVO Plus.
    • NET: Servers were interconnected by a switched 10 Gigabit Ethernet network.

Experimental Setups

The experiments included three distinct deployments:

  • Vanilla: The application running without tracing tools
  • CatBpf: The eBPF-based tracer running simultaneously with the target application and intercepting its events.
  • CatStrace: The Strace-based tracer intercepting application’s events.

Content-aware Tracers Evaluation

TensorFlow

BigDataBench

CaT Framework in Action

TensorFlow Dataset shuffle

HDFS File replication

Clone this wiki locally