# 

# Using libCacheSim to run cache simulation

This tutorial will show you how to run cache simulation with `cachesim`.


cachesim is a tool provided by libCacheSim to quickly run some cache simulations, it supports 
* a variety of eviction algorithms such as FIFO, LRU, LFU, ARC, SLRU, LeCaR, CACHEUS, Hyperbolic, LHD, TinyLFU, Belady, LRB and GLCache. 
* a variety of admission algorithms such as size, bloomFilter and adaptSize. 
* text, csv trace as well as binary traces. 
* automatic multi-threaded simulations. 

Meanwhile, cachesim has high-performance with low resource usages. 

## Step 0. Install libCacheSim and download example dataset

Run script to install libCacheSim and download example dataset.

In [None]:
from IPython.display import clear_output
import subprocess

subprocess.run(["bash", "install.sh"])
subprocess.run(["bash", "download.sh"])

# Install the required packages
%pip install -r  "./libCacheSim/requirements.txt"
%pip install scipy

clear_output()

Cloning into 'libCacheSim'...


~/cache_dataset/tutorials/libCacheSim ~/cache_dataset/tutorials
[0;32m[INFO][0m Setting up Ubuntu dependencies...






Hit:1 http://repos.emulab.net/emulab/ubuntu jammy InRelease
Hit:2 http://repos.emulab.net/grub-backports/ubuntu jammy InRelease
Hit:3 http://us.archive.ubuntu.com/ubuntu jammy InRelease
Hit:4 http://us.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:5 http://us.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:6 http://us.archive.ubuntu.com/ubuntu jammy-security InRelease
Hit:7 https://packagecloud.io/github/git-lfs/ubuntu jammy InRelease
Reading package lists...
Building dependency tree...
Reading state information...
107 packages can be upgraded. Run 'apt list --upgradable' to see them.






build-essential is already the newest version (12.9ubuntu3).
google-perftools is already the newest version (2.9.1-0ubuntu3).
ninja-build is already the newest version (1.10.1-1).
xxhash is already the newest version (0.8.1-1).
0 upgraded, 0 newly installed, 0 to remove and 107 not upgraded.






libglib2.0-dev is already the newest version (2.72.4-0ubuntu2.5).
libunwind-dev is already the newest version (1.3.2-2build2.1).
0 upgraded, 0 newly installed, 0 to remove and 107 not upgraded.






libgoogle-perftools-dev is already the newest version (2.9.1-0ubuntu3).
0 upgraded, 0 newly installed, 0 to remove and 107 not upgraded.
[0;32m[INFO][0m Installing CMake...
CMake Installer Version: 3.31.0, Copyright (c) Kitware
This is a self-extracting archive.
The archive will be extracted to: /users/Haocheng/software/cmake

Using target directory: /users/Haocheng/software/cmake
Extracting, please wait...

Unpacking finished successfully
[0;32m[INFO][0m Installing Zstd...
[0;32m[INFO][0m Zstd version 1.5.0 already installed.
[0;32m[INFO][0m Installing XGBoost...
-- CMake version 3.31.0
-- xgboost VERSION: 3.1.0
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- /tmp/xgboost/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Configuring done (0.1s)
-- Generating done (0.1s)
-- Build files have been written 

--2025-07-06 22:13:32--  https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/cacheDatasets/cloudphysics/w89.oracleGeneral.bin.zst
Resolving ftp.pdl.cmu.edu (ftp.pdl.cmu.edu)... 128.2.147.165
Connecting to ftp.pdl.cmu.edu (ftp.pdl.cmu.edu)|128.2.147.165|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13676473 (13M) [application/octet-stream]
Saving to: ‘w89.oracleGeneral.bin.zst.3’

     0K .......... .......... .......... .......... ..........  0% 1.11M 12s
    50K .......... .......... .......... .......... ..........  0% 2.26M 9s
   100K .......... .......... .......... .......... ..........  1% 69.4M 6s
   150K .......... .......... .......... .......... ..........  1% 2.34M 6s
   200K .......... .......... .......... .......... ..........  1% 96.4M 5s
   250K .......... .......... .......... .......... ..........  2%  110M 4s
   300K .......... .......... .......... .......... ..........  2%  128M 3s
   350K .......... .......... .......... .......... .

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Step 1. Basic Usage
```
./cachesim trace_path trace_type eviction_algo cache_size [OPTION...]
```

use `./cachesim --help` to get more information.

In [2]:
subprocess.run(["./libCacheSim/_build/bin/cachesim", "--help"])

Usage: cachesim [OPTION...] trace_path trace_type eviction_algo cache_size
example: ./cachesim /trace/path csv LRU 100MB

trace can be zstd compressed
cache_size is in byte, but also support KB/MB/GB
supported trace_type: txt/csv/twr/vscsi/oracleGeneralBin
supported eviction_algo: LRU/LFU/FIFO/ARC/LeCaR/Cacheus
print-head-req: Print the first few requests when simulating start

 trace reader related parameters

  -n, --num-req=-1           Num of requests to process, default -1 means all
                             requests in the trace
  -s, --sample-ratio=1       Sample ratio, 1 means no sampling, 0.01 means
                             sample 1% of objects
  -t, --trace-type-params="obj-id-col=1;delimiter=,"
                             Parameters used for csv trace, e.g.,
                             "obj-id-col=1;delimiter=,"

 cache related parameters:

      --admission-params="prob=0.8"
                             params for admission algorithm
  -a, --admission=bloom-filter 

CompletedProcess(args=['./libCacheSim/_build/bin/cachesim', '--help'], returncode=0)


### Run a single cache simulation

Run the example vscsi trace with LRU eviction algorithm and 1GB cache size. 
Note that vscsi is a trace format, we also support csv traces. 

```bash
# Note that no space between the cache size and the unit, unit is not case sensitive
./cachesim ../data/trace.vscsi vscsi lru 1gb 
```


In [3]:
subprocess.run(["./libCacheSim/_build/bin/cachesim", "./w89.oracleGeneral.bin.zst", "oracleGeneral", "lru", "1gb"])

[32m[INFO]  07-06-2025 22:13:35 cli_parser.c:558  (tid=140692531667072): trace path: ./w89.oracleGeneral.bin.zst, trace_type ORACLE_GENERAL_TRACE, ofilepath result/w89.oracleGeneral.bin.zst.cachesim, 40 threads, warmup -1 sec, total 1 algo x 1 size = 1 caches, lru
[0m[36m[DEBUG] 07-06-2025 22:13:35 request.h:125  (tid=140692531667072): req clock_time 7736503, id 6084968, size 32768, op nop, valid 1
[0m[36m[DEBUG] 07-06-2025 22:13:35 request.h:125  (tid=140692531667072): req clock_time 7736503, id 6028808, size 4096, op nop, valid 1
[0m[32m[INFO]  07-06-2025 22:13:35    sim.c:61   (tid=140692531667072): w89.oracleGeneral.bin.zst LRU 24.00 hour: 607515 requests, miss ratio 0.4002, interval miss ratio 0.4002
[0m[32m[INFO]  07-06-2025 22:13:35    sim.c:61   (tid=140692531667072): w89.oracleGeneral.bin.zst LRU 48.00 hour: 967548 requests, miss ratio 0.3589, interval miss ratio 0.2893
[0m[32m[INFO]  07-06-2025 22:13:35    sim.c:61   (tid=140692531667072): w89.oracleGeneral.bin.zst

./w89.oracleGeneral.bin.zst LRU cache size     1GiB,          3625918 req, miss ratio 0.3859, throughput 4.35 MQPS


[32m[INFO]  07-06-2025 22:13:35    sim.c:61   (tid=140692531667072): w89.oracleGeneral.bin.zst LRU 168.00 hour: 3619133 requests, miss ratio 0.3864, interval miss ratio 0.5216
[0m

CompletedProcess(args=['./libCacheSim/_build/bin/cachesim', './w89.oracleGeneral.bin.zst', 'oracleGeneral', 'lru', '1gb'], returncode=0)


### Run multiple cache simulations
```bash
# Note that there is no space between the cache sizes
./cachesim ../data/trace.vscsi vscsi lru 1mb,16mb,256mb,8gb

# Or you can quote the cache sizes
./cachesim ../data/trace.vscsi vscsi lru "1mb, 16mb, 256mb, 8gb"

# besides absolute cache size, you can also use fraction of working set size
./cachesim ../data/trace.vscsi vscsi lru 0.001,0.01,0.1,0.2

# besides using byte as the unit, you can also treat all objects having the same size, and the size is the number of objects
./cachesim ../data/trace.vscsi vscsi lru 1000,16000 --ignore-obj-size 1

# new feature: you can run a few algorithms in parallel by concatenating the algorithms
./cachesim ../data/trace.vscsi vscsi fifo,lru,arc,qdlp 0.01 --ignore-obj-size 1

# run 4*4 simulations in parallel (no more than n_thread at the same time)
./cachesim ../data/trace.vscsi vscsi fifo,lru,arc,qdlp 0.01,0.05,0.1,0.2 --ignore-obj-size 1
```


In [4]:
subprocess.run(["./libCacheSim/_build/bin/cachesim", "./w89.oracleGeneral.bin.zst", "oracleGeneral", "fifo,lru,arc,qdlp", "0.01,0.05,0.1,0.2", "--ignore-obj-size", "1"])

[32m[INFO]  07-06-2025 22:13:35 cli_reader_utils.c:259  (tid=140318773395584): calculating working set size...
[0m[32m[INFO]  07-06-2025 22:13:36 cli_reader_utils.c:288  (tid=140318773395584): working set size: 770712 object 770712 byte
[0m[32m[INFO]  07-06-2025 22:13:39 cli_parser.c:558  (tid=140318773395584): trace path: ./w89.oracleGeneral.bin.zst, trace_type ORACLE_GENERAL_TRACE, ofilepath result/w89.oracleGeneral.bin.zst.cachesim, 40 threads, warmup -1 sec, total 4 algo x 4 size = 16 caches, fifo, lru, arc, qdlp, ignore object size
[0m[32m[INFO]  07-06-2025 22:13:39 simulator.c:302  (tid=140318773395584): simulate_with_multi_caches starts computation, num_warmup_req 0, start cache FIFO size 8KiB, end cache QDLP-0.1000-0.9000-Clock2-1 size 151KiB, 16 caches, 40 threads, please wait
[0m


./w89.oracleGeneral.bin.zst FIFO cache size     7707, 3625918 req, miss ratio 0.4491, byte miss ratio 0.4491
./w89.oracleGeneral.bin.zst FIFO cache size    38535, 3625918 req, miss ratio 0.4141, byte miss ratio 0.4141
./w89.oracleGeneral.bin.zst FIFO cache size    77071, 3625918 req, miss ratio 0.3788, byte miss ratio 0.3788
./w89.oracleGeneral.bin.zst FIFO cache size   154142, 3625918 req, miss ratio 0.2864, byte miss ratio 0.2864
./w89.oracleGeneral.bin.zst LRU cache size     7707, 3625918 req, miss ratio 0.4479, byte miss ratio 0.4479
./w89.oracleGeneral.bin.zst LRU cache size    38535, 3625918 req, miss ratio 0.4076, byte miss ratio 0.4076
./w89.oracleGeneral.bin.zst LRU cache size    77071, 3625918 req, miss ratio 0.3683, byte miss ratio 0.3683
./w89.oracleGeneral.bin.zst LRU cache size   154142, 3625918 req, miss ratio 0.2669, byte miss ratio 0.2669
./w89.oracleGeneral.bin.zst ARC cache size     7707, 3625918 req, miss ratio 0.4454, byte miss ratio 0.4454
./w89.oracleGeneral.bin

CompletedProcess(args=['./libCacheSim/_build/bin/cachesim', './w89.oracleGeneral.bin.zst', 'oracleGeneral', 'fifo,lru,arc,qdlp', '0.01,0.05,0.1,0.2', '--ignore-obj-size', '1'], returncode=0)

### Auto detect cache sizes
cachesim can detect the working set of the trace and automatically generate cache sizes at 0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3 of the working set size. 
You can enable this feature by setting cache size to 0 or auto.

```bash
./cachesim ../data/trace.vscsi vscsi lru auto
```


In [5]:
subprocess.run(["./libCacheSim/_build/bin/cachesim", "./w89.oracleGeneral.bin.zst", "oracleGeneral", "lru", "auto"])

[32m[INFO]  07-06-2025 22:13:43 cli_reader_utils.c:259  (tid=140366750011520): calculating working set size...
[0m[32m[INFO]  07-06-2025 22:13:44 cli_reader_utils.c:288  (tid=140366750011520): working set size: 770712 object 23420813312 byte
[0m[32m[INFO]  07-06-2025 22:13:44 cli_parser.c:558  (tid=140366750011520): trace path: ./w89.oracleGeneral.bin.zst, trace_type ORACLE_GENERAL_TRACE, ofilepath result/w89.oracleGeneral.bin.zst.cachesim, 40 threads, warmup -1 sec, total 1 algo x 8 size = 8 caches, lru
[0m[32m[INFO]  07-06-2025 22:13:44 simulator.c:302  (tid=140366750011520): simulate_with_multi_caches starts computation, num_warmup_req 0, start cache LRU size 22MiB, end cache LRU size 17GiB, 8 caches, 40 threads, please wait
[0m


./w89.oracleGeneral.bin.zst LRU cache size       22MiB, 3625918 req, miss ratio 0.4781, byte miss ratio 0.5982
./w89.oracleGeneral.bin.zst LRU cache size       67MiB, 3625918 req, miss ratio 0.4580, byte miss ratio 0.5858
./w89.oracleGeneral.bin.zst LRU cache size      223MiB, 3625918 req, miss ratio 0.4346, byte miss ratio 0.5636
./w89.oracleGeneral.bin.zst LRU cache size      670MiB, 3625918 req, miss ratio 0.4039, byte miss ratio 0.5404
./w89.oracleGeneral.bin.zst LRU cache size     2233MiB, 3625918 req, miss ratio 0.3356, byte miss ratio 0.4585
./w89.oracleGeneral.bin.zst LRU cache size     4467MiB, 3625918 req, miss ratio 0.2483, byte miss ratio 0.4020
./w89.oracleGeneral.bin.zst LRU cache size     8934MiB, 3625918 req, miss ratio 0.2225, byte miss ratio 0.3650
./w89.oracleGeneral.bin.zst LRU cache size    17868MiB, 3625918 req, miss ratio 0.2160, byte miss ratio 0.3536


CompletedProcess(args=['./libCacheSim/_build/bin/cachesim', './w89.oracleGeneral.bin.zst', 'oracleGeneral', 'lru', 'auto'], returncode=0)

### Use different eviction algorithms
cachesim supports the following algorithms:
* [FIFO](./libCacheSim/libCacheSim/cache/eviction/FIFO.c)
* [LRU](./libCacheSim/libCacheSim/cache/eviction/LRU.c)
* [Clock](./libCacheSim/libCacheSim/cache/eviction/Clock.c)
* [LFU](./libCacheSim/libCacheSim/cache/eviction/LFU.c)
* [ARC](./libCacheSim/libCacheSim/cache/eviction/ARC.c)
* [SLRU](./libCacheSim/libCacheSim/cache/eviction/SLRU.c)
* [GDSF](./libCacheSim/libCacheSim/cache/eviction/GDSF.c)
* [WTinyLFU](./libCacheSim/libCacheSim/cache/eviction/WTinyLFU.c)
* [LeCaR](./libCacheSim/libCacheSim/cache/eviction/LeCaR.c)
* [Cacheus](./libCacheSim/libCacheSim/cache/eviction/Cacheus.c)
* [Hyperbolic](./libCacheSim/libCacheSim/cache/eviction/Hyperbolic.c)
* [LHD](./libCacheSim/libCacheSim/cache/eviction/LHD/LHDInterface.cpp)
* [GLCache](./libCacheSim/libCacheSim/cache/eviction/GLCache/GLCache.c)
* [Belady](./libCacheSim/libCacheSim/cache/eviction/Belady.c)
* [BeladySize](./libCacheSim/libCacheSim/cache/eviction/BeladySize.c)
* [QD-LP](./libCacheSim/libCacheSim/cache/eviction/QDLP.c)

You can just use the algorithm name as the eviction algorithm parameter, for example  

```bash
./cachesim ../data/trace.vscsi vscsi lecar auto
./cachesim ../data/trace.vscsi vscsi hyperbolic auto
./cachesim ../data/trace.vscsi vscsi lhd auto
./cachesim ../data/trace.vscsi vscsi glcache auto

# belady and beladySize require oracle trace
./cachesim ../data/trace.oracleGeneral oracleGeneral beladySize auto
```


### Use different trace types 
We have demonstrated the use of cachesim with vscsi trace. We also support csv traces.
To use a csv trace, we need to provide the column of *time*, *obj-id*, and *obj-size*. 
Both time and size are optional, but many algorithms rely on time and size to work properly.
The column starts from 1, the first column is 1, the second is 2, etc.
Besides the column information, a csv reader also requires the delimiter and whether the csv file has a header. 
cachesim builds in a simple delimiter and header detector, if the detected result is not correct, you can provide the correct information using `delimiter=,`, `has-header=true`.


```bash
# note that the parameters are separated by comma and quoted
./cachesim ../data/trace.csv csv lru 1gb -t "time-col=2, obj-id-col=5, obj-size-col=4"

# if object id is numeric, then we can pass obj-id-is-num=true to speed up
./cachesim ../data/trace.csv csv lru 1gb -t "time-col=2, obj-id-col=5, obj-size-col=4, obj-id-is-num=true"


# note that csv trace does not support UTF-8 encoding, only ASCII encoding is supported
./cachesim ../data/trace.csv csv lru 1gb -t "time-col=2, obj-id-col=5, obj-size-col=4, delimiter=,, has-header=true"
```

Besides csv trace, we also support txt trace and binary trace. 
```bash
# txt trace is a simple format that stores obj-id in each line
./cachesim ../data/trace.txt txt lru 1gb

# binary trace, format is specified using format string similar to Python struct
./cachesim ../data/trace.vscsi binary lru 1gb -t "format=<IIIHHQQ,obj-id-col=6,obj-size-col=2"

# oracleGeneral is a binary format that stores time, obj-id, size, next-access-time (in reference count)
./cachesim ../data/trace.oracleGeneral.bin oracleGeneral lru 1gb
```
**We recommend using binary trace because it can be a few times faster than csv trace and uses less DRAM resources.**



## Advanced usage

cachesim supports many advanced features, you can use `./cachesim --help` to get more information.
Here we give some examples. 

### Setting parameters for eviction algorithms
Some eviction algorithms have parameters, you can set the parameters by using `-e "k1=v1,k2=v2"` or `--eviction-params "k1=v1,k2=v2"` format.
```bash
# run SLRU with 4 segments
./cachesim ../data/trace.vscsi vscsi slru 1gb -e n-seg=4

# print the default parameters for SLRU
./cachesim ../data/trace.vscsi vscsi slru 1gb -e print
```


### Admission algorithm
cachesim supports the following admission algorithms: size, probabilistic, bloomFilter, adaptSize.
You can use `-a` or `--admission` to set the admission algorithm. 
```bash
# add a bloom filter to filter out objects on first access
./cachesim ../data/trace.vscsi vscsi lru 1gb -a bloomFilter
```

### Prefetching algorithm
cachesim supports the following prefetching algorithms: OBL, Mithril, PG (and AMP is on the way).
You can use `-p` or `--prefetch` to set the prefetching algorithm. 
```bash
# add a mithril to record object association information and fetch objects that are likely to be accessed in the future
./cachesim ../data/trace.vscsi vscsi lru 1gb -p Mithril
```

### Advanced features 
```bash
# change number of threads 
./cachesim ../data/trace.vscsi vscsi lru 1gb --num-thread=4

# cap the number of requests read from the trace
./cachesim ../data/trace.vscsi vscsi lru 1gb --num-req=1000000

# change output 
./cachesim ../data/trace.vscsi vscsi lru 1gb -o my-output

# ignore object size, each object has size one
./cachesim ../data/trace.vscsi vscsi lru 1gb --ignore-obj-size=true

# ignore object metadata size, different algorithms have different metadata size, this option will ignore the metadata size
./cachesim ../data/trace.vscsi vscsi lru 1gb --consider-obj-metadata=false

# use part of the trace to warm up the cache
./cachesim ../data/trace.vscsi vscsi lru 1gb --warmup-sec=86400

# Use TTL
./cachesim ../data/trace.vscsi vscsi lru 1gb --use-ttl=true

# Disable the print of the first few requests
./cachesim ../data/trace.vscsi vscsi lru 1gb --print-head-req=false
```

