# 1 install perf

To install `perf`, you should have root permission.

First, I tried `sudo yum install -y perf` to install, but it failed with the following error:
```
已加载插件：fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=stock error was
14: curl#6 - "Could not resolve host: mirrorlist.centos.org; 未知的错误"


 One of the configured repositories failed (未知),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=<repoid> ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>
        or
            subscription-manager repos --disable=<repoid>

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: base/7/x86_64
```
Then I tried to install `kernel-tools` and `kernel-debuginfo`. 
1. First, I tried `uname -a` to check the centOS version of 115 server, which is `3.10.0-1127.el7.x86_64`. 
2. Then I search https://mirrors.aliyun.com/centos/7/os/x86_64/Packages/ for two packages, but I can only find the `kernel-tools-3.10.0-1160.el7.x86_64.rpm` version. 
3. With more efforts, I found that all the `rpm` files are in https://vault.centos.org/7.7.1908/os/x86_64/Packages/.
4. I these found from internet, but they are not what I want:
    - https://vault.centos.org/7.7.1908/os/x86_64/Packages/kernel-tools-3.10.0-1062.el7.x86_64.rpm
    - https://vault.centos.org/7.7.1908/os/x86_64/Packages/kernel-debug-3.10.0-1062.el7.x86_64.rpm
5. Finally, I tried different versions of centOS on https://vault.centos.org/ and found what I want on https://vault.centos.org/7.8.2003/os/x86_64/Packages/
    - https://vault.centos.org/7.8.2003/os/x86_64/Packages/kernel-tools-libs-3.10.0-1127.el7.x86_64.rpm
    - https://vault.centos.org/7.8.2003/os/x86_64/Packages/kernel-tools-3.10.0-1127.el7.x86_64.rpm
    - https://vault.centos.org/7.8.2003/os/x86_64/Packages/kernel-debug-3.10.0-1127.el7.x86_64.rpm

Then I run `perf`, but can't find the command. As I check the `perf` in `kernel-tools`, I found the `perf` tool is not installed. I need to install it by running `./configure`, `make`, and `make install`. While I was doing this, I realize that I should install more dependencies. Therefore, I give up to install `perf` by `rpm`.

Finally, I tried to update the `yum` source by the following commands: 

In [None]:
%%bash

# backup the original yum repository files
sudo mkdir -p /etc/yum.repos.d/backup
sudo mv /etc/yum.repos.d/CentOS-*.repo /etc/yum.repos.d/backup/

# create a new yum repository file for CentOS Vault
sudo tee /etc/yum.repos.d/CentOS-Vault.repo <<EOF
[vault]
name=CentOS-7 - Vault
baseurl=http://vault.centos.org/7.9.2009/os/\$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
EOF

# update the yum cache
sudo yum clean all
sudo yum install -y make gcc elfutils-libelf-devel flex bison
Then I run `sudo yum install -y perf`, it works!


# 2 perf setting for non-root user

1. Change the kernel parameters for perf:
```
echo "kernel.perf_event_paranoid = -1" | sudo tee /etc/sysctl.d/perf.conf
sudo sysctl -p /etc/sysctl.d/perf.conf
```
2. change the permissions of files and directories used by perf
```
sudo chmod a+rw /sys/kernel/debug/tracing/
sudo chmod a+rw /dev/cpu/*/msr  # 如果使用 MSR 寄存器（如 `perf stat -e cycles`）
```


# 3 perf stat

## 3.1 running time analysis

First, I perform CPU statistics on a Java program:
```
perf stat -e cycles,instructions,cache-misses,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools
```
1. `cycles`: CPU cycles, the absolute time the CPU is running
2. `instructions`: sum instructions executed
    - `ins per cycle (IPC)`: higher IPC means better performance, usually above 1.0 is good (1 to 2)
3. `cache-misses`: cache misses, may lead to performance degradation
4. `branch-misses`: branch predict misses, high branch misses may lead to thread performance

The output of the command is like this:
```
 Performance counter stats for 'java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools':

 6,144,293,791,989      cycles
12,584,723,198,969      instructions              #    2.05  insn per cycle
    23,071,128,363      cache-misses
    11,975,545,242      branch-misses

     128.175826763 seconds time elapsed

    5549.520810000 seconds user
     635.074609000 seconds sys
```
analysis:
1. Although the IPC is 2.05, which is good, the cache misses and branch misses are very high, which may lead to performance degradation.
    - use `perf record -e cache-misses -g` to locate hotspots code, improve local cache usage
2. Use `branches` to count more about branches
3. Recorded times are:
    - `time elapsed`: time in the real-world taken by the command
    - `user`: time spent in user space (addition of all threads)
    - `sys`: time spent in kernel space (system calls, I/O operations, etc.)
4. High sys time may source from frequent I/O operations or lock contention
    - use `perf record -e syscalls:sys_enter_*` to analyze system calls

Run `perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ./00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools > perf.log`. I get the following output:
```
 Performance counter stats for 'java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools':

 6,200,993,415,974      cycles
12,628,342,484,101      instructions              #    2.04  insn per cycle
    23,131,657,000      cache-misses
 2,466,785,321,848      branches
    11,998,386,966      branch-misses             #    0.49% of all branches

     129.766272954 seconds time elapsed

    5658.384766000 seconds user
     592.448297000 seconds sys
```

Two `perf stat` analysis for FastCall2 are as follows:

In [None]:
%%bash

# 1 perf stat
# 1.1 IPC and branches

## disc
nohup perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/input/chr1_10M.fa -b ../00data/input/taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o ~/miniconda3/envs/debug/bin/samtools > 1.disc.log &

## blib
nohup perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod blib -a ../00data/input/chr1_10M.fa -b 1 -c 2 -d 32 -e ../disc/ -f ../blib/ > 2.blib.log &

## scan
nohup perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod scan -a ../00data/input/chr1_10M.fa -b ../00data/input/taxaBamMap.txt -c ../blib/1_1_10000001.lib.gz -d 1 -e 0 -f 30 -g 20 -h 0.05 -i ~/miniconda3/envs/debug/bin/samtools -j 32 -k ../scan > 3.scan.log &

# 1.2 times for different sample sizes

## disc
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/input/chr1_10M.fa -b ../00data/input/1.tbm.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../1disc/ -o ~/miniconda3/envs/debug/bin/samtools > test.1.1.disc.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/input/chr1_10M.fa -b ../00data/input/2.tbm.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../2disc/ -o ~/miniconda3/envs/debug/bin/samtools > test.2.1.disc.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/input/chr1_10M.fa -b ../00data/input/3.tbm.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../3disc/ -o ~/miniconda3/envs/debug/bin/samtools > test.3.1.disc.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/input/chr1_10M.fa -b ../00data/input/4.tbm.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../4disc/ -o ~/miniconda3/envs/debug/bin/samtools > test.4.1.disc.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/input/chr1_10M.fa -b ../00data/input/5.tbm.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../5disc/ -o ~/miniconda3/envs/debug/bin/samtools > test.5.1.disc.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/input/chr1_10M.fa -b ../00data/input/6.tbm.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../6disc/ -o ~/miniconda3/envs/debug/bin/samtools > test.6.1.disc.log

## blib
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod blib -a ../00data/input/chr1_10M.fa -b 1 -c 2 -d 32 -e ../1disc/ -f ../1blib/ > test.1.2.blib.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod blib -a ../00data/input/chr1_10M.fa -b 1 -c 2 -d 32 -e ../2disc/ -f ../2blib/ > test.2.2.blib.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod blib -a ../00data/input/chr1_10M.fa -b 1 -c 2 -d 32 -e ../3disc/ -f ../3blib/ > test.3.2.blib.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod blib -a ../00data/input/chr1_10M.fa -b 1 -c 2 -d 32 -e ../4disc/ -f ../4blib/ > test.4.2.blib.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod blib -a ../00data/input/chr1_10M.fa -b 1 -c 2 -d 32 -e ../5disc/ -f ../5blib/ > test.5.2.blib.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod blib -a ../00data/input/chr1_10M.fa -b 1 -c 2 -d 32 -e ../6disc/ -f ../6blib/ > test.6.2.blib.log

## scan
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod scan -a ../00data/input/chr1_10M.fa -b ../00data/input/1.tbm.txt -c ../1blib/1_1_10000001.lib.gz -d 1 -e 0 -f 30 -g 20 -h 0.05 -i ~/miniconda3/envs/debug/bin/samtools -j 32 -k ../1scan > test.1.3.scan.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod scan -a ../00data/input/chr1_10M.fa -b ../00data/input/2.tbm.txt -c ../2blib/1_1_10000001.lib.gz -d 1 -e 0 -f 30 -g 20 -h 0.05 -i ~/miniconda3/envs/debug/bin/samtools -j 32 -k ../2scan > test.2.3.scan.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod scan -a ../00data/input/chr1_10M.fa -b ../00data/input/3.tbm.txt -c ../3blib/1_1_10000001.lib.gz -d 1 -e 0 -f 30 -g 20 -h 0.05 -i ~/miniconda3/envs/debug/bin/samtools -j 32 -k ../3scan > test.3.3.scan.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod scan -a ../00data/input/chr1_10M.fa -b ../00data/input/4.tbm.txt -c ../4blib/1_1_10000001.lib.gz -d 1 -e 0 -f 30 -g 20 -h 0.05 -i ~/miniconda3/envs/debug/bin/samtools -j 32 -k ../4scan > test.4.3.scan.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod scan -a ../00data/input/chr1_10M.fa -b ../00data/input/5.tbm.txt -c ../5blib/1_1_10000001.lib.gz -d 1 -e 0 -f 30 -g 20 -h 0.05 -i ~/miniconda3/envs/debug/bin/samtools -j 32 -k ../5scan > test.5.3.scan.log
perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod scan -a ../00data/input/chr1_10M.fa -b ../00data/input/6.tbm.txt -c ../6blib/1_1_10000001.lib.gz -d 1 -e 0 -f 30 -g 20 -h 0.05 -i ~/miniconda3/envs/debug/bin/samtools -j 32 -k ../6scan > test.6.3.scan.log

## 3.2 IPC and branch miss analysis

First, use `grep` to obtain the IPC and branch miss information from the log files generated by `perf stat`. The command is as follows:

In [None]:
%%bash

grep "insn per cycle" test.*.1.disc.log 1.disc.log | awk -F':' '{print $1, $2}' > ipc.1.txt
grep "of all branches" test.*.1.disc.log 1.disc.log | awk -F':' '{print $1, $2}' > brc.1.txt

grep "insn per cycle" test.*.2.blib.log 2.blib.log | awk -F':' '{print $1, $2}' > ipc.2.txt
grep "of all branches" test.*.2.blib.log 2.blib.log | awk -F':' '{print $1, $2}' > brc.2.txt

grep "insn per cycle" test.*.3.scan.log 3.scan.log | awk -F':' '{print $1, $2}' > ipc.3.txt
grep "of all branches" test.*.3.scan.log 3.scan.log | awk -F':' '{print $1, $2}' > brc.3.txt

Then, use the ipc.R and brc.R scripts to analyze the IPC and branch miss information. These scripts are in script/01vmap4/debugFC2 directory.

# 4 perf record

Then I run `perf record` to find more debug information.

1. `perf record -e cache-misses -g -- java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools`

The output is like this:
```
[ perf record: Woken up 518 times to write data ]
Warning:
15 out of order events recorded.
[ perf record: Captured and wrote 1458.668 MB perf.data (16413076 samples) ]
```
Then I run `nohup perf script > out.perf &` to analyze the `perf.data` file, after which I get the `out.perf` file. Finally, I run `nohup FlameGraph/stackcollapse-perf.pl out.perf | FlameGraph/flamegraph.pl > flame.svg &` to generate the flame graph, with scripts from https://github.com/brendangregg/FlameGraph. 

2. `perf record -e cache-misses -g -- java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools`




In [None]:
%%bash

# 2 perf record
# 2.1 cpu

nohup perf record -o record.1.1.perf.data java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/input/chr1_10M.fa -b ../00data/input/1.tbm.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../1disc/ -o ~/miniconda3/envs/debug/bin/samtools > record.1.1.disc.log &



In [1]:
{
  "metadata": {
    "version": "2.0",
    "export_date": "2025-06-25T14:30:00Z"
  },
  "entries": [
    {
      "id": "ENTRY20250609_1",                 
      "date": "2025-06-09T08:15:00+08:00",     
      "timezone": "Asia/Shanghai",             
      "title": "晨间记录",                   
      "content": "这里是日记正文内容...",         
      "tags": ["晨间"],                          
      "mood": "neutral",                       
      "weather": "sunny",                      
      "location": null,                        
      "photos": [],                            
      "is_starred": false,                     
      "word_count": 0                          
    }
  ],
  "tags": [
    {"name": "晨间", "color": "#f39c12"}
  ]
}

b'`\xb7i\xdb\xeff\x00\xcbRox\xe0\xf0\x1e>d\xcaeS\x94i\x1e\xf3\xa9\x89EN\xa4\xe4\xa4\xfa\xaa\x9a\xad\xa9bw\x02\x16o\x01\xea\xc8R\xfe\x03\x8a\x9a\xb4z!\xc6\xc9\xaa\x9e9\xa3\xa2\x8d\xed\xeaz\xf1\xaa|\xfb>k\x89\xb1\xd3B\x98[f4\xf7\x85Q\t\xe3$\xe5\xa9E\x1f\x05\xeb\x93\x1e\x19\xd8)\x03\xe3)\x81\xe4b\xf2'
