# 1 Files management
## 1.1 recursive file management
If you want to manage file in one directory recursively, you can use `mapfile` command.

Remove files recursively:
```
mapfile -t rmlist < rmlist.txt
size=${#rmlist[@]}
for (( i = 0; i < size; i++ ));
do
    rm -r ${rmlist[i]}
done
```
Dowload files recursively:
```
mapfile -t dllist < dllist.txt
mapfile -t dirlist < dirlist.txt
size=${#dllist[@]}
for (( i = 0; i < size; i++ ));
do
    wget ftp://download.big.ac.cn/gsa2/CRA012590/2/${dllist[i]} -P $1${dirlist[i]}
done
```
There is a script I wrote to remove `.bam` files in a directory recursively:

In [None]:
%%bash
#! /bin/bash

# File name: rm.bam.sh
# Remove the primary .bam files in a directory recursively.
## $1: number
## $2: /path/to/index
## $3: /path/to/fastq_data

mapfile -t tag < $1
size=${#tag[@]}
echo "Tags loaded: ${#tag[@]}"
for ((i=0; i<size; i++));
  do
  echo "Processing tag: ${tag[i]}"
  if [ -f ${tag[i]}.rmdup.bam ]; then
    echo "Removing file: ${tag[i]}.bam"
    rm -r ${tag[i]}.bam
  fi
done


Also, there is a script I wrote to copy files in a directory recursively:

In [None]:
%%bash
#! /bin/bash

# File name: cp-for
# This script is used to copy the fastq files recursively.
## $1: /file/location
## $2: /path/to/destination
## $3: files

files=$3
size=${#files[@]}
for (( i = 0; i < size; i++ )); do
  cp -ax "$1/${files[i]}" "$2"
done

## 1.2 file comparison
To check is two directories contain same files, you can first save the file name into different files:
- `ls 115/ > name.115`
Then, you can use `grep -F -x -f name.115 name.203` to check if the files in `name.115` are also in `name.203`.
- the directions are the same.
To check the difference of two files, you can use `diff` command:
- `diff -q file1 file2` to check if the files are different.

## 1.3 file package and compression
Use `tar` to pack files: `tar -cvf {file}.tar {dir}`
Use `tar` to unpack files: `tar -xvf {file}.tar`

Use `gzip` to compress files: `gzip {file}`
Use `pigz` to compress files: `pigz -3 {file}`

Use `tar` to unpack compressed files: `tar -xzvf {file}.tar.gz`


# 2 Vim

Read binary files: `vim -b {file}`
- 将内容转化为16进制：`:%!xxd`
- 将内容转化回文本格式：`:%!xxd -r`

Delete lines in the region: `:1,10d`


# 3 conda

Check what environment the user has: `conda env list`

 


# 4 git
## 4.1 git setting
To set the git proxy, you can use the following command:
- `git config --global http.proxy http://10.207.199.218:7890`

To unset the git proxy, you can use the following command:
- `git config --global --unset http.proxy`



# 5 Coding Environment

## 5.1 Java

You can set different versions of Java for one user and transfer between by simple commands. To do this, first, you need to install openJDKs from various sources. Here I list some:
1. install openJDK from oracle: https://www.oracle.com/java/technologies/downloads/
    - remember to install the x64 version
2. install from localhost:
    - `scp D:\Zheng\Documents\2_NBS\pack\* *@*:~/lib`

After you install the compressed archive openJDK to server, you can do the following steps:
1. `cd ~/lib`
2. `tar -xzvf jdk-8u451-linux-x64.tar.gz`
3. `tar -xzvf jdk-17.0.15_linux-x64_bin.tar.gz`
4. `tar -xzvf jdk-21_linux-x64_bin.tar.gz`
5. `mkdir jvm pack`
6. `mv jdk-17.0.15_linux-x64_bin.tar.gz jdk-21_linux-x64_bin.tar.gz jdk-8u451-linux-x64.tar.gz pack/`
7. `mv mv jdk1.8.0_451/ jvm/jdk-8`
8. `mv jdk-17.0.15/ jvm/jdk-17`
9. `mv jdk-21.0.7/ jvm/jdk-21`

Then, you can set `~/.bashrc` with the following lines:
```
# java
# different java version setting
## 1. fields
export JAVA_8_HOME=~/lib/jvm/jdk-8
export JAVA_17_HOME=~/lib/jvm/jdk-17
export JAVA_21_HOME=~/lib/jvm/jdk-21
## 2. default setting
export JAVA_HOME=$JAVA_8_HOME
export PATH=$JAVA_HOME/bin:$PATH
## 3. set version change commands
alias java8='export JAVA_HOME=$JAVA_8_HOME && export PATH=$JAVA_HOME/bin:$PATH'
alias java17='export JAVA_HOME=$JAVA_17_HOME && export PATH=$JAVA_HOME/bin:$PATH'
alias java21='export JAVA_HOME=$JAVA_21_HOME && export PATH=$JAVA_HOME/bin:$PATH'
```
Finally, run `source ~/.bashrc` to load the new settings.

You can use `java8`, `java17` and `java21` to switch between different versions of Java and check the version of Java by using `java -version`, `echo $JAVA_HOME`, and `which java` command.


# 6 perf

## 6.1 install perf

To install `perf`, you should have root permission.

First, I tried `sudo yum install -y perf` to install, but it failed with the following error:
```
已加载插件：fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=stock error was
14: curl#6 - "Could not resolve host: mirrorlist.centos.org; 未知的错误"


 One of the configured repositories failed (未知),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=<repoid> ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>
        or
            subscription-manager repos --disable=<repoid>

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: base/7/x86_64
```
Then I tried to install `kernel-tools` and `kernel-debuginfo`. 
1. First, I tried `uname -a` to check the centOS version of 115 server, which is `3.10.0-1127.el7.x86_64`. 
2. Then I search https://mirrors.aliyun.com/centos/7/os/x86_64/Packages/ for two packages, but I can only find the `kernel-tools-3.10.0-1160.el7.x86_64.rpm` version. 
3. With more efforts, I found that all the `rpm` files are in https://vault.centos.org/7.7.1908/os/x86_64/Packages/.
4. I these found from internet, but they are not what I want:
    - https://vault.centos.org/7.7.1908/os/x86_64/Packages/kernel-tools-3.10.0-1062.el7.x86_64.rpm
    - https://vault.centos.org/7.7.1908/os/x86_64/Packages/kernel-debug-3.10.0-1062.el7.x86_64.rpm
5. Finally, I tried different versions of centOS on https://vault.centos.org/ and found what I want on https://vault.centos.org/7.8.2003/os/x86_64/Packages/
    - https://vault.centos.org/7.8.2003/os/x86_64/Packages/kernel-tools-libs-3.10.0-1127.el7.x86_64.rpm
    - https://vault.centos.org/7.8.2003/os/x86_64/Packages/kernel-tools-3.10.0-1127.el7.x86_64.rpm
    - https://vault.centos.org/7.8.2003/os/x86_64/Packages/kernel-debug-3.10.0-1127.el7.x86_64.rpm

Then I run `perf`, but can't find the command. As I check the `perf` in `kernel-tools`, I found the `perf` tool is not installed. I need to install it by running `./configure`, `make`, and `make install`. While I was doing this, I realize that I should install more dependencies. Therefore, I give up to install `perf` by `rpm`.

Finally, I tried to update the `yum` source by the following commands: 

In [None]:
%%bash

# backup the original yum repository files
sudo mkdir -p /etc/yum.repos.d/backup
sudo mv /etc/yum.repos.d/CentOS-*.repo /etc/yum.repos.d/backup/

# create a new yum repository file for CentOS Vault
sudo tee /etc/yum.repos.d/CentOS-Vault.repo <<EOF
[vault]
name=CentOS-7 - Vault
baseurl=http://vault.centos.org/7.9.2009/os/\$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
EOF

# update the yum cache
sudo yum clean all
sudo yum install -y make gcc elfutils-libelf-devel flex bison

Then I run `sudo yum install -y perf`, it works!

## 6.2 perf setting for non-root user

1. Change the kernel parameters for perf:
```
echo "kernel.perf_event_paranoid = -1" | sudo tee /etc/sysctl.d/perf.conf
sudo sysctl -p /etc/sysctl.d/perf.conf
```
2. change the permissions of files and directories used by perf
```
sudo chmod a+rw /sys/kernel/debug/tracing/
sudo chmod a+rw /dev/cpu/*/msr  # 如果使用 MSR 寄存器（如 `perf stat -e cycles`）
```

## 6.3 perf used to analyze Java programs

First, I perform CPU statistics on a Java program:
```
perf stat -e cycles,instructions,cache-misses,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools
```
1. `cycles`: CPU cycles, the absolute time the CPU is running
2. `instructions`: sum instructions executed
    - `ins per cycle (IPC)`: higher IPC means better performance, usually above 1.0 is good (1 to 2)
3. `cache-misses`: cache misses, may lead to performance degradation
4. `branch-misses`: branch predict misses, high branch misses may lead to thread performance

The output of the command is like this:
```
 Performance counter stats for 'java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools':

 6,144,293,791,989      cycles
12,584,723,198,969      instructions              #    2.05  insn per cycle
    23,071,128,363      cache-misses
    11,975,545,242      branch-misses

     128.175826763 seconds time elapsed

    5549.520810000 seconds user
     635.074609000 seconds sys
```
analysis:
1. Although the IPC is 2.05, which is good, the cache misses and branch misses are very high, which may lead to performance degradation.
    - use `perf record -e cache-misses -g` to locate hotspots code, improve local cache usage
2. Use `branches` to count more about branches
3. Recorded times are:
    - `time elapsed`: time in the real-world taken by the command
    - `user`: time spent in user space (addition of all threads)
    - `sys`: time spent in kernel space (system calls, I/O operations, etc.)
4. High sys time may source from frequent I/O operations or lock contention
    - use `perf record -e syscalls:sys_enter_*` to analyze system calls

Run `perf stat -e cycles,instructions,cache-misses,branches,branch-misses java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ./00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools > perf.log`. I get the following output:
```
 Performance counter stats for 'java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools':

 6,200,993,415,974      cycles
12,628,342,484,101      instructions              #    2.04  insn per cycle
    23,131,657,000      cache-misses
 2,466,785,321,848      branches
    11,998,386,966      branch-misses             #    0.49% of all branches

     129.766272954 seconds time elapsed

    5658.384766000 seconds user
     592.448297000 seconds sys
```
Then I run `perf record` to find more debug information.

1. `perf record -e cache-misses -g -- java -Xmx100g -jar TIGER_20250526.jar -app FastCall2 -mod disc -a ../00data/chr1_10M.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n ../disc/ -o /data/home/dazheng/miniconda3/envs/debug/bin/samtools`

The output is like this:
```
[ perf record: Woken up 518 times to write data ]
Warning:
15 out of order events recorded.
[ perf record: Captured and wrote 1458.668 MB perf.data (16413076 samples) ]
```
Then I run `nohup perf script > out.perf &` to analyze the `perf.data` file, after which I get the `out.perf` file. Finally, I run `nohup FlameGraph/stackcollapse-perf.pl out.perf | FlameGraph/flamegraph.pl > flame.svg &` to generate the flame graph, with scripts from https://github.com/brendangregg/FlameGraph. 





