Skip to content
Jose Angel Herrero Velasco edited this page Dec 14, 2017 · 21 revisions

Gem5 NoSQL (CE Group - UC)

The gem5 Simulator, adapted to run NoSQL YCSB-cassandra workload

  • Version:
    • Initial commit from official gem5: a09d5f86ae9653dd787bb9f80acfcb167c6dcbab
    • Date: Sat Oct 15 15:11:07 2016 -0500

Gem5 main page

Gem5 official documentation

Gem5 official git repository

YCSB official site

Apache Cassandra official site


Initial steps:

Dependences.

  • Things you'll need that aren't part of gem5 itself.
  1. Hardware

    gem5 is largely agnostic about the hardware it runs on. However, there are several considerations to keep in mind when running gem5:

    • A 64-bit platform is strongly preferred over a 32-bit platform.
    • gem5's ISA support involves some very large auto-generated C++ files, which can require up to 1 GB for g++ to compile.
    • Ideally you should choose a host with the same endianness as the ISA you will be simulating.
  2. Operating System

    gem5 runs best on Linux and Unix. Most developers, and the current regression system, use Linux, so this platform has the best support. From CE grupo, we recommend Linux 8 (jessie) or newer.

System prerequisites

  • Things you need to be ready in your host.
  1. KVM support

    Gem5 speed can be greatly enhanced using virtual machine support. This can be use to boot the system or fast forward the state easily.To enable it you don't meet anything (but the /dev/kvm must be enabled at compile time). What you need is the kvm (qemu-kvm package) installed in the host.

    (root)$ apt-get install qemu-kvm

    Also, the admins should enable the access to users via the inclusion in a kvm users group. (who have access to /dev/kvm).

    (root)$ usermod -G kvm <username>

  2. External tools and required versions

    • g++, version 4.8 or newer or clang version 3.1 or newer.
    • Python, version 2.6 - 2.7 (it doesn't support Python 3.X).
    • SCons, version 0.98.1 or newer.
    • zlib, any recent version. For Debian/Ubuntu, you will need the "zlib-dev" or "zlib1g-dev" package to get the zlib.h header file as well as the library itself.
    • m4, the macro processor.
  3. Adicional software

    • sudo tool. Additionally, (as root) you need configure it to avoid continuous password requests.
      (root)$ vi /etc/sudores
                # ...
                <your username> ALL=(ALL:ALL) NOPASSWD: ALL
      
    • debootstrap, version 1.0.67 or newer.
    • git, version 2.1.4 or newer.
    • gzip, version 1.6-4 or newer.

Get gem5-NoSQL.

  • Download our modified version of gem5 from gitHUB.

    $ git clone https://github.com/abadp/gem5-NoSQL.git

Directories.

  • Description of the main directory hierarchy
    • atc_scripts: disk images generation scripts for run YCSB/Cassandra workloads
    • configs: example simulation configuration scripts
    • ext: less-common external packages needed to build gem5
    • src: source code of the gem5 simulator
    • system: source for some optional system software for simulated systems
    • tests: regression tests
    • util: useful utility programs and files
    • images: kernel file (.gz) and disk images
    • nosql: Cassandra and YSCB source (adapted)

Building System:

  • How to build or modify gem5 system.

    $ cd gem5_NoSQL
    $ scons build/X86/gem5.opt
    <ENTER>
    

Buildding the disk image for YCSB/cassandra workload

  • Setting up the disk image and benchmark applications for full system simulation

    $ cd gem5_NoSQL
    $ cd atc_scripts
    $ vi config.py
       # the below line must be changed with your gem5 absolute path
       gem5_dir = "/absolute/path/to/your/GEM5"
    $ ./create_disk_img.py
    $ ./update_disk_img.py 
    

    The result of running these commands is two files; The first one, a base disk image of debian jessie (x86_debian-jessie.img) and a second image with all you need to run YCSB/cassandra workload on gem5 (x86_debian_MULTIYCSB-cassandra).

    Note the kernel file is delivered in the images/kernels directory. Due to its size, the file must be compressed, so you need uncompress it before use it with gem5.

    $ cd images/kernels
    $ gzip -d x86_64-vmlinux-3.18.34_ceconfig.smp.gz
    

Running gem5:

  • How to run YCSB/cassandra workload on gem5's build system
  1. Launch scripts In the atc_scripts/launch_apps/NoSQL/cassandra directory you have two sample files (script, script-run), both of them are needed to run gem5 and YCSB/cassandra properly. Check them out and change whatever you consider, according to next:

    ./launch_app.sh $num_nodes $app $DB_size $num_threads 
    ./launch_app-run.sh $num_nodes $app $DB_size $num_threads $app_size
    

    Where:

    • $num_nodesis the number of nodes simulated by gem5.
    • $app is the identifier or a YCSB workload. It can be from "a" to "f". Go to YCSB Core Workloads for details.
    • $DB_sizeis the number of records to load into the cassandra database initially (default: 0). We recommend to use $DB_size=950000 to create a data base of 1 GB.
    • $num_threads is the number of YCSB client threads. By default, the YCSB Client uses a single worker thread, but additional threads can be specified. This is often done to increase the amount of load offered against the database. We recommend to use $num_threads=1 in any case.
    • $app_sizeis the number of operations to perform by YCSB client (for each thread). Typically you will want to use the it to control the amount of offered load.
  2. Load the cassandra DB

    $ build/X86/gem5.opt 
          configs/ac/fs_ac.py 
          --kernel=</absolute/path/to/your/GEM5>/images/kernels/x86_64-vmlinux-3.18.34_ceconfig.smp 
          --disk-image=</absolute/path/to/your/GEM5>/images/disks/x86_debian_MULTIYCSB-cassandra.img 
          --cpu-type=kvm 
          --cluster=<number of nodes>
          --num-cpus=<number on cores per node> 
          --mem-size=<Main memory per simulated nodo>MB 
          --sim_quantum=50000000 
          --ethernet=switch 
          --script=</absolute/path/to/your/GEM5>/atc_scripts/launch_apps/NoSQL/cassandra/script 
          --checkpoint-at-end
    

    You can use m5term (utils/term/) for connecting to the console of every simulated node and show the running process.

  3. Run the simulation

    $ build/X86/gem5.opt 
          configs/ac/fs_ac.py 
          --kernel=</absolute/path/to/your/GEM5>/images/kernels/x86_64-vmlinux-3.18.34_ceconfig.smp 
          --disk-image=</absolute/path/to/your/GEM5>/images/disks/x86_debian_MULTIYCSB-cassandra.img 
          --cpu-type=atomic 
          --restore-with-cpu=atomic  
          --cluster=<number of nodes>
          --num-cpus=<number on cores per node> 
          --mem-size=<Main memory per simulated nodo>MB 
          --ethernet=switch 
          --script=</absolute/path/to/your/GEM5>/atc_scripts/launch_apps/NoSQL/cassandra/script-run 
         -r 1
    

Enjoy!.