feat: add fsst encode by SkyFan2002 · Pull Request #24234 · apache/doris

SkyFan2002 · 2023-09-12T08:07:16Z

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

github-actions · 2023-09-12T08:08:21Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors


'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In be/src/fsst/paper/compare.sh line 4:
  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
  ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
        ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
        ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
           ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
                         ^--^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                 ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
                                          ^--^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
  fgrep "${i}" "$1" | fgrep -v "${i}"2 | fgrep -v "${i}"pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'


In be/src/fsst/paper/evolution.sh line 7:
(for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'
                                     ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                     ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw-strncmp "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'


In be/src/fsst/paper/evolution.sh line 8:
(for i in dbtext/*; do (./cw $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|str-as-long|scalar"}'
                             ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                             ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|str-as-long|scalar"}'


In be/src/fsst/paper/evolution.sh line 9:
(for i in dbtext/*; do (./cw-greedy $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|greedy-match|str-as-long|scalar" }'
                                    ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                    ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw-greedy "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|greedy-match|str-as-long|scalar" }'


In be/src/fsst/paper/evolution.sh line 10:
(for i in dbtext/*; do (./vcw $i 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|binary-search|greedy-match|str-as-long|scalar" }'
                              ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                              ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                         ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./vcw "${i}" 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|binary-search|greedy-match|str-as-long|scalar" }'


In be/src/fsst/paper/evolution.sh line 11:
(for i in dbtext/*; do (./hcw $i 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'
                              ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                              ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                       ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw "${i}" 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'


In be/src/fsst/paper/evolution.sh line 13:
(for i in dbtext/*; do (./hcw-opt $i 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction" }'
                                  ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                           ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw-opt "${i}" 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction" }'


In be/src/fsst/paper/evolution.sh line 14:
(for i in dbtext/*; do (./hcw-opt $i 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'
                                  ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                             ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw-opt "${i}" 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'


In be/src/fsst/paper/kernels.sh line 1:
#/bin/bash
 ^-- SC1113 (error): Use #!, not just #, for the shebang.


In be/src/fsst/paper/kernels.sh line 4:
echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
     ^-----^ SC2086 (info): Double quote to prevent globbing and word splitting.
     ^-----^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
echo "${PARAMS}" | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'


In be/src/fsst/paper/kernels.sh line 5:
echo "\\\\"
     ^----^ SC2028 (info): echo may not expand escape sequences. Use printf.


In be/src/fsst/paper/kernels.sh line 10:
   for m in $PARAMS
            ^-----^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
   for m in ${PARAMS}


In be/src/fsst/paper/kernels.sh line 12:
     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
                       ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                               ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                               ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
     (./hcw-opt dbtext/"${i}" 511 -"${m}" 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'


In be/src/fsst/paper/kernels.sh line 14:
   echo $i
        ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
        ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
   echo "${i}"


In be/src/fsst/paper/lz4-smallblocks.sh line 3:
dd if=$1 of=tmpsplit.out bs=$maxsize count=1 2> /dev/null
      ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                            ^------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                            ^------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
dd if="$1" of=tmpsplit.out bs="${maxsize}" count=1 2> /dev/null


In be/src/fsst/paper/lz4-smallblocks.sh line 5:
    mkdir tmpsplit$blocksize
                  ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                  ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    mkdir tmpsplit"${blocksize}"


In be/src/fsst/paper/lz4-smallblocks.sh line 6:
    split -b $blocksize tmpsplit.out tmpsplit$blocksize/x
             ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                             ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    split -b "${blocksize}" tmpsplit.out tmpsplit"${blocksize}"/x


In be/src/fsst/paper/lz4-smallblocks.sh line 7:
    echo -n $blocksize ""
            ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo -n "${blocksize}" ""


In be/src/fsst/paper/lz4-smallblocks.sh line 8:
    size=$((for f in tmpsplit$blocksize/x*; do lz4 -c $f | wc -c; done) | awk '{s+=$1} END {print s}')
         ^-- SC1102 (error): Shells disambiguate $(( differently or not at all. For $(command substitution), add space after $( . For $((arithmetics)), fix parsing errors.
                             ^--------^ SC2231 (info): Quote expansions in this for loop glob to prevent wordsplitting, e.g. "$dir"/*.txt .
                             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                      ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                                      ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    size=$((for f in tmpsplit${blocksize}/x*; do lz4 -c "${f}" | wc -c; done) | awk '{s+=$1} END {print s}')


In be/src/fsst/paper/lz4-smallblocks.sh line 9:
    echo "$maxsize / $size" | bc -l
          ^------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                     ^---^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo "${maxsize} / ${size}" | bc -l


In be/src/fsst/paper/lz4-smallblocks.sh line 10:
    rm -rf tmpsplit$blocksize/
                   ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                   ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    rm -rf tmpsplit"${blocksize}"/


In be/src/fsst/paper/sorted.sh line 8:
cd dbtext
^-------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Did you mean: 
cd dbtext || exit


In be/src/fsst/paper/sorted.sh line 11:
  sort $i > ../.sorted/$i; 
       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  sort "${i}" > ../.sorted/"${i}"; 


In be/src/fsst/paper/sorted.sh line 14:
cd ..
^---^ SC2103 (info): Use a ( subshell ) to avoid having to cd back.


In be/src/fsst/paper/sorted.sh line 19:
  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
                                   ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                   ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  ./filtertest compare 1000 dbtext/"${i}" | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'


In be/src/fsst/paper/sorted.sh line 20:
  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
                                    ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                    ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  ./filtertest compare 1000 .sorted/"${i}" | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'

For more information:
  https://www.shellcheck.net/wiki/SC1102 -- Shells disambiguate $(( different...
  https://www.shellcheck.net/wiki/SC1113 -- Use #!, not just #, for the sheba...
  https://www.shellcheck.net/wiki/SC2164 -- Use 'cd ... || exit' or 'cd ... |...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- be/src/fsst/paper/compare.sh.orig
+++ be/src/fsst/paper/compare.sh
@@ -1,5 +1,4 @@
 #!/bin/bash
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_commen ps_comment 
- do
-  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
- done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", "AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
+(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_commen ps_comment; do
+    fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
+done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", "AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
--- be/src/fsst/paper/evolution.sh.orig
+++ be/src/fsst/paper/evolution.sh
@@ -1,7 +1,7 @@
 #!/bin/bash
 # output format: STCB CCB CR
 # STCB: symbol table construction cost in cycles-per-compressed byte (constructing a new ST per 8MB text)
-# CCB:  compression speed cycles-per-compressed byte 
+# CCB:  compression speed cycles-per-compressed byte
 # CR:   compression (=size reduction) factor achieved
 
 (for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'
@@ -16,10 +16,10 @@
 # on Intel SKX CPUs| the results look like:
 #
 # 75.117,160.11,1.97194 iterative|suffix-array|dynp-matching|strncmp|scalar
-#   \--> 160 cycles per byte produces a very slow compression speed (say ~20MB/s on a 3Ghz CPU) 
+#   \--> 160 cycles per byte produces a very slow compression speed (say ~20MB/s on a 3Ghz CPU)
 #
 # 73.6948,81.6404,1.97194 iterative|suffix-array|dynp-matching|str-as-long|scalar
-#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves compression speed 2x 
+#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves compression speed 2x
 #
 # 74.4996,37.457,1.94764 iterative|suffix-array|greedy-match|str-as-long|scalar
 #   \--> dynamic programming brought only 3% smaller size. So drop it and gain another 2x compression speed.
@@ -28,7 +28,7 @@
 #   \--> bottom-up is *really* better in terms of compression factor than iterative with suffix array.
 #
 # 1.74783,10.7009,2.28103 bottom-up|lossy-hash|greedy-match|str-as-long|scalar-branch
-#   \--> hashing significantly improves compression speed at only 5% size cost (due to hash collisions) 
+#   \--> hashing significantly improves compression speed at only 5% size cost (due to hash collisions)
 #
 # 1.74783,9.8142,2.28103 bottom-up|lossy-hash|greedy-match|str-as-long|scalar-adaptive
 #   \--> adaptive use of encoding kernels gives compression speed a small bump
@@ -39,4 +39,4 @@
 # optimized construction refers to the combination of three changes:
 # - reducing the amount of bottom-up passes from 10 to 5 (less learning time, but.. slighty worsens CR)
 # - looking at subsamples in early rounds (increasing the sample as the rounds go up). Less compression work.
-# - splitting the counters for less cache pressure and aiding fast skipping over counts-of-0 
+# - splitting the counters for less cache pressure and aiding fast skipping over counts-of-0
--- be/src/fsst/paper/kernels.sh.orig
+++ be/src/fsst/paper/kernels.sh
@@ -1,15 +1,15 @@
 #/bin/bash
 PARAMS='simd1 simd2 simd3 simd4 adaptive'
-(echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
-echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
-echo "\\\\"
-echo "\\hline"
-echo "\\hline"
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment 
- do 
-   for m in $PARAMS
-   do
-     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
-   done
-   echo $i
- done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf "{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize %s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf "{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//') | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/' 
+(
+    echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
+    echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
+    echo "\\\\"
+    echo "\\hline"
+    echo "\\hline"
+    (for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment; do
+        for m in $PARAMS; do
+            (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
+        done
+        echo $i
+    done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf "{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize %s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf "{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//'
+) | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/'
be/src/fsst/paper/lz4-smallblocks.sh:8:17: not a valid arithmetic operator: f
--- be/src/fsst/paper/sorted.sh.orig
+++ be/src/fsst/paper/sorted.sh
@@ -6,17 +6,15 @@
 rm -rf .sorted 2>/dev/null
 mkdir .sorted
 cd dbtext
-for i in * 
-do 
-  sort $i > ../.sorted/$i; 
+for i in *; do
+    sort $i >../.sorted/$i
 done
 cp chinese japanese faust hamlet ../.sorted/
 cd ..
 
 # note sizes, display stats
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment
- do 
-  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
-  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
- done) | 
-awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
+(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment; do
+    ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
+    ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
+done) |
+    awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-09-12T08:13:48Z

clang-tidy review says "All clean, LGTM! 👍"

SkyFan2002 · 2023-09-12T08:14:28Z

run buildall

github-actions · 2023-09-12T08:32:53Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors


'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In be/src/fsst/paper/compare.sh line 4:
  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
  ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
        ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
        ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
           ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
                         ^--^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                 ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
                                          ^--^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
  fgrep "${i}" "$1" | fgrep -v "${i}"2 | fgrep -v "${i}"pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'


In be/src/fsst/paper/evolution.sh line 7:
(for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'
                                     ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                     ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw-strncmp "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'


In be/src/fsst/paper/evolution.sh line 8:
(for i in dbtext/*; do (./cw $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|str-as-long|scalar"}'
                             ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                             ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|str-as-long|scalar"}'


In be/src/fsst/paper/evolution.sh line 9:
(for i in dbtext/*; do (./cw-greedy $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|greedy-match|str-as-long|scalar" }'
                                    ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                    ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw-greedy "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|greedy-match|str-as-long|scalar" }'


In be/src/fsst/paper/evolution.sh line 10:
(for i in dbtext/*; do (./vcw $i 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|binary-search|greedy-match|str-as-long|scalar" }'
                              ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                              ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                         ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./vcw "${i}" 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|binary-search|greedy-match|str-as-long|scalar" }'


In be/src/fsst/paper/evolution.sh line 11:
(for i in dbtext/*; do (./hcw $i 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'
                              ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                              ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                       ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw "${i}" 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'


In be/src/fsst/paper/evolution.sh line 13:
(for i in dbtext/*; do (./hcw-opt $i 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction" }'
                                  ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                           ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw-opt "${i}" 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction" }'


In be/src/fsst/paper/evolution.sh line 14:
(for i in dbtext/*; do (./hcw-opt $i 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'
                                  ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                             ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw-opt "${i}" 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'


In be/src/fsst/paper/kernels.sh line 1:
#/bin/bash
 ^-- SC1113 (error): Use #!, not just #, for the shebang.


In be/src/fsst/paper/kernels.sh line 4:
echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
     ^-----^ SC2086 (info): Double quote to prevent globbing and word splitting.
     ^-----^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
echo "${PARAMS}" | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'


In be/src/fsst/paper/kernels.sh line 5:
echo "\\\\"
     ^----^ SC2028 (info): echo may not expand escape sequences. Use printf.


In be/src/fsst/paper/kernels.sh line 10:
   for m in $PARAMS
            ^-----^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
   for m in ${PARAMS}


In be/src/fsst/paper/kernels.sh line 12:
     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
                       ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                               ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                               ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
     (./hcw-opt dbtext/"${i}" 511 -"${m}" 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'


In be/src/fsst/paper/kernels.sh line 14:
   echo $i
        ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
        ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
   echo "${i}"


In be/src/fsst/paper/lz4-smallblocks.sh line 3:
dd if=$1 of=tmpsplit.out bs=$maxsize count=1 2> /dev/null
      ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                            ^------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                            ^------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
dd if="$1" of=tmpsplit.out bs="${maxsize}" count=1 2> /dev/null


In be/src/fsst/paper/lz4-smallblocks.sh line 5:
    mkdir tmpsplit$blocksize
                  ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                  ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    mkdir tmpsplit"${blocksize}"


In be/src/fsst/paper/lz4-smallblocks.sh line 6:
    split -b $blocksize tmpsplit.out tmpsplit$blocksize/x
             ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                             ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    split -b "${blocksize}" tmpsplit.out tmpsplit"${blocksize}"/x


In be/src/fsst/paper/lz4-smallblocks.sh line 7:
    echo -n $blocksize ""
            ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo -n "${blocksize}" ""


In be/src/fsst/paper/lz4-smallblocks.sh line 8:
    size=$((for f in tmpsplit$blocksize/x*; do lz4 -c $f | wc -c; done) | awk '{s+=$1} END {print s}')
         ^-- SC1102 (error): Shells disambiguate $(( differently or not at all. For $(command substitution), add space after $( . For $((arithmetics)), fix parsing errors.
                             ^--------^ SC2231 (info): Quote expansions in this for loop glob to prevent wordsplitting, e.g. "$dir"/*.txt .
                             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                      ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                                      ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    size=$((for f in tmpsplit${blocksize}/x*; do lz4 -c "${f}" | wc -c; done) | awk '{s+=$1} END {print s}')


In be/src/fsst/paper/lz4-smallblocks.sh line 9:
    echo "$maxsize / $size" | bc -l
          ^------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                     ^---^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo "${maxsize} / ${size}" | bc -l


In be/src/fsst/paper/lz4-smallblocks.sh line 10:
    rm -rf tmpsplit$blocksize/
                   ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                   ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    rm -rf tmpsplit"${blocksize}"/


In be/src/fsst/paper/sorted.sh line 8:
cd dbtext
^-------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Did you mean: 
cd dbtext || exit


In be/src/fsst/paper/sorted.sh line 11:
  sort $i > ../.sorted/$i; 
       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  sort "${i}" > ../.sorted/"${i}"; 


In be/src/fsst/paper/sorted.sh line 14:
cd ..
^---^ SC2103 (info): Use a ( subshell ) to avoid having to cd back.


In be/src/fsst/paper/sorted.sh line 19:
  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
                                   ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                   ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  ./filtertest compare 1000 dbtext/"${i}" | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'


In be/src/fsst/paper/sorted.sh line 20:
  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
                                    ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                    ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  ./filtertest compare 1000 .sorted/"${i}" | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'

For more information:
  https://www.shellcheck.net/wiki/SC1102 -- Shells disambiguate $(( different...
  https://www.shellcheck.net/wiki/SC1113 -- Use #!, not just #, for the sheba...
  https://www.shellcheck.net/wiki/SC2164 -- Use 'cd ... || exit' or 'cd ... |...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- be/src/fsst/paper/compare.sh.orig
+++ be/src/fsst/paper/compare.sh
@@ -1,5 +1,4 @@
 #!/bin/bash
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_commen ps_comment 
- do
-  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
- done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", "AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
+(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_commen ps_comment; do
+    fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
+done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", "AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
--- be/src/fsst/paper/evolution.sh.orig
+++ be/src/fsst/paper/evolution.sh
@@ -1,7 +1,7 @@
 #!/bin/bash
 # output format: STCB CCB CR
 # STCB: symbol table construction cost in cycles-per-compressed byte (constructing a new ST per 8MB text)
-# CCB:  compression speed cycles-per-compressed byte 
+# CCB:  compression speed cycles-per-compressed byte
 # CR:   compression (=size reduction) factor achieved
 
 (for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'
@@ -16,10 +16,10 @@
 # on Intel SKX CPUs| the results look like:
 #
 # 75.117,160.11,1.97194 iterative|suffix-array|dynp-matching|strncmp|scalar
-#   \--> 160 cycles per byte produces a very slow compression speed (say ~20MB/s on a 3Ghz CPU) 
+#   \--> 160 cycles per byte produces a very slow compression speed (say ~20MB/s on a 3Ghz CPU)
 #
 # 73.6948,81.6404,1.97194 iterative|suffix-array|dynp-matching|str-as-long|scalar
-#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves compression speed 2x 
+#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves compression speed 2x
 #
 # 74.4996,37.457,1.94764 iterative|suffix-array|greedy-match|str-as-long|scalar
 #   \--> dynamic programming brought only 3% smaller size. So drop it and gain another 2x compression speed.
@@ -28,7 +28,7 @@
 #   \--> bottom-up is *really* better in terms of compression factor than iterative with suffix array.
 #
 # 1.74783,10.7009,2.28103 bottom-up|lossy-hash|greedy-match|str-as-long|scalar-branch
-#   \--> hashing significantly improves compression speed at only 5% size cost (due to hash collisions) 
+#   \--> hashing significantly improves compression speed at only 5% size cost (due to hash collisions)
 #
 # 1.74783,9.8142,2.28103 bottom-up|lossy-hash|greedy-match|str-as-long|scalar-adaptive
 #   \--> adaptive use of encoding kernels gives compression speed a small bump
@@ -39,4 +39,4 @@
 # optimized construction refers to the combination of three changes:
 # - reducing the amount of bottom-up passes from 10 to 5 (less learning time, but.. slighty worsens CR)
 # - looking at subsamples in early rounds (increasing the sample as the rounds go up). Less compression work.
-# - splitting the counters for less cache pressure and aiding fast skipping over counts-of-0 
+# - splitting the counters for less cache pressure and aiding fast skipping over counts-of-0
--- be/src/fsst/paper/kernels.sh.orig
+++ be/src/fsst/paper/kernels.sh
@@ -1,15 +1,15 @@
 #/bin/bash
 PARAMS='simd1 simd2 simd3 simd4 adaptive'
-(echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
-echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
-echo "\\\\"
-echo "\\hline"
-echo "\\hline"
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment 
- do 
-   for m in $PARAMS
-   do
-     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
-   done
-   echo $i
- done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf "{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize %s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf "{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//') | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/' 
+(
+    echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
+    echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
+    echo "\\\\"
+    echo "\\hline"
+    echo "\\hline"
+    (for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment; do
+        for m in $PARAMS; do
+            (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
+        done
+        echo $i
+    done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf "{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize %s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf "{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//'
+) | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/'
be/src/fsst/paper/lz4-smallblocks.sh:8:17: not a valid arithmetic operator: f
--- be/src/fsst/paper/sorted.sh.orig
+++ be/src/fsst/paper/sorted.sh
@@ -6,17 +6,15 @@
 rm -rf .sorted 2>/dev/null
 mkdir .sorted
 cd dbtext
-for i in * 
-do 
-  sort $i > ../.sorted/$i; 
+for i in *; do
+    sort $i >../.sorted/$i
 done
 cp chinese japanese faust hamlet ../.sorted/
 cd ..
 
 # note sizes, display stats
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment
- do 
-  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
-  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
- done) | 
-awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
+(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment; do
+    ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
+    ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
+done) |
+    awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-09-12T08:40:03Z

clang-tidy review says "All clean, LGTM! 👍"

SkyFan2002 · 2023-09-12T09:18:05Z

run buildall

github-actions · 2023-09-13T02:59:33Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors


'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In be/src/fsst/paper/compare.sh line 4:
  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
  ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
        ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
        ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
           ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
                         ^--^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                 ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
                                          ^--^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
  fgrep "${i}" "$1" | fgrep -v "${i}"2 | fgrep -v "${i}"pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'


In be/src/fsst/paper/evolution.sh line 7:
(for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'
                                     ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                     ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw-strncmp "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'


In be/src/fsst/paper/evolution.sh line 8:
(for i in dbtext/*; do (./cw $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|str-as-long|scalar"}'
                             ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                             ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|str-as-long|scalar"}'


In be/src/fsst/paper/evolution.sh line 9:
(for i in dbtext/*; do (./cw-greedy $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|greedy-match|str-as-long|scalar" }'
                                    ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                    ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw-greedy "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|greedy-match|str-as-long|scalar" }'


In be/src/fsst/paper/evolution.sh line 10:
(for i in dbtext/*; do (./vcw $i 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|binary-search|greedy-match|str-as-long|scalar" }'
                              ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                              ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                         ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./vcw "${i}" 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|binary-search|greedy-match|str-as-long|scalar" }'


In be/src/fsst/paper/evolution.sh line 11:
(for i in dbtext/*; do (./hcw $i 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'
                              ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                              ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                       ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw "${i}" 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'


In be/src/fsst/paper/evolution.sh line 13:
(for i in dbtext/*; do (./hcw-opt $i 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction" }'
                                  ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                           ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw-opt "${i}" 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction" }'


In be/src/fsst/paper/evolution.sh line 14:
(for i in dbtext/*; do (./hcw-opt $i 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'
                                  ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                             ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw-opt "${i}" 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'


In be/src/fsst/paper/kernels.sh line 1:
#/bin/bash
 ^-- SC1113 (error): Use #!, not just #, for the shebang.


In be/src/fsst/paper/kernels.sh line 4:
echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
     ^-----^ SC2086 (info): Double quote to prevent globbing and word splitting.
     ^-----^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
echo "${PARAMS}" | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'


In be/src/fsst/paper/kernels.sh line 5:
echo "\\\\"
     ^----^ SC2028 (info): echo may not expand escape sequences. Use printf.


In be/src/fsst/paper/kernels.sh line 10:
   for m in $PARAMS
            ^-----^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
   for m in ${PARAMS}


In be/src/fsst/paper/kernels.sh line 12:
     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
                       ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                               ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                               ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
     (./hcw-opt dbtext/"${i}" 511 -"${m}" 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'


In be/src/fsst/paper/kernels.sh line 14:
   echo $i
        ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
        ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
   echo "${i}"


In be/src/fsst/paper/lz4-smallblocks.sh line 3:
dd if=$1 of=tmpsplit.out bs=$maxsize count=1 2> /dev/null
      ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                            ^------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                            ^------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
dd if="$1" of=tmpsplit.out bs="${maxsize}" count=1 2> /dev/null


In be/src/fsst/paper/lz4-smallblocks.sh line 5:
    mkdir tmpsplit$blocksize
                  ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                  ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    mkdir tmpsplit"${blocksize}"


In be/src/fsst/paper/lz4-smallblocks.sh line 6:
    split -b $blocksize tmpsplit.out tmpsplit$blocksize/x
             ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                             ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    split -b "${blocksize}" tmpsplit.out tmpsplit"${blocksize}"/x


In be/src/fsst/paper/lz4-smallblocks.sh line 7:
    echo -n $blocksize ""
            ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo -n "${blocksize}" ""


In be/src/fsst/paper/lz4-smallblocks.sh line 8:
    size=$((for f in tmpsplit$blocksize/x*; do lz4 -c $f | wc -c; done) | awk '{s+=$1} END {print s}')
         ^-- SC1102 (error): Shells disambiguate $(( differently or not at all. For $(command substitution), add space after $( . For $((arithmetics)), fix parsing errors.
                             ^--------^ SC2231 (info): Quote expansions in this for loop glob to prevent wordsplitting, e.g. "$dir"/*.txt .
                             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                      ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                                      ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    size=$((for f in tmpsplit${blocksize}/x*; do lz4 -c "${f}" | wc -c; done) | awk '{s+=$1} END {print s}')


In be/src/fsst/paper/lz4-smallblocks.sh line 9:
    echo "$maxsize / $size" | bc -l
          ^------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                     ^---^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo "${maxsize} / ${size}" | bc -l


In be/src/fsst/paper/lz4-smallblocks.sh line 10:
    rm -rf tmpsplit$blocksize/
                   ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                   ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    rm -rf tmpsplit"${blocksize}"/


In be/src/fsst/paper/sorted.sh line 8:
cd dbtext
^-------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Did you mean: 
cd dbtext || exit


In be/src/fsst/paper/sorted.sh line 11:
  sort $i > ../.sorted/$i; 
       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  sort "${i}" > ../.sorted/"${i}"; 


In be/src/fsst/paper/sorted.sh line 14:
cd ..
^---^ SC2103 (info): Use a ( subshell ) to avoid having to cd back.


In be/src/fsst/paper/sorted.sh line 19:
  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
                                   ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                   ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  ./filtertest compare 1000 dbtext/"${i}" | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'


In be/src/fsst/paper/sorted.sh line 20:
  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
                                    ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                    ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  ./filtertest compare 1000 .sorted/"${i}" | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'

For more information:
  https://www.shellcheck.net/wiki/SC1102 -- Shells disambiguate $(( different...
  https://www.shellcheck.net/wiki/SC1113 -- Use #!, not just #, for the sheba...
  https://www.shellcheck.net/wiki/SC2164 -- Use 'cd ... || exit' or 'cd ... |...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- be/src/fsst/paper/compare.sh.orig
+++ be/src/fsst/paper/compare.sh
@@ -1,5 +1,4 @@
 #!/bin/bash
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_commen ps_comment 
- do
-  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
- done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", "AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
+(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_commen ps_comment; do
+    fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
+done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", "AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
--- be/src/fsst/paper/evolution.sh.orig
+++ be/src/fsst/paper/evolution.sh
@@ -1,7 +1,7 @@
 #!/bin/bash
 # output format: STCB CCB CR
 # STCB: symbol table construction cost in cycles-per-compressed byte (constructing a new ST per 8MB text)
-# CCB:  compression speed cycles-per-compressed byte 
+# CCB:  compression speed cycles-per-compressed byte
 # CR:   compression (=size reduction) factor achieved
 
 (for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'
@@ -16,10 +16,10 @@
 # on Intel SKX CPUs| the results look like:
 #
 # 75.117,160.11,1.97194 iterative|suffix-array|dynp-matching|strncmp|scalar
-#   \--> 160 cycles per byte produces a very slow compression speed (say ~20MB/s on a 3Ghz CPU) 
+#   \--> 160 cycles per byte produces a very slow compression speed (say ~20MB/s on a 3Ghz CPU)
 #
 # 73.6948,81.6404,1.97194 iterative|suffix-array|dynp-matching|str-as-long|scalar
-#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves compression speed 2x 
+#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves compression speed 2x
 #
 # 74.4996,37.457,1.94764 iterative|suffix-array|greedy-match|str-as-long|scalar
 #   \--> dynamic programming brought only 3% smaller size. So drop it and gain another 2x compression speed.
@@ -28,7 +28,7 @@
 #   \--> bottom-up is *really* better in terms of compression factor than iterative with suffix array.
 #
 # 1.74783,10.7009,2.28103 bottom-up|lossy-hash|greedy-match|str-as-long|scalar-branch
-#   \--> hashing significantly improves compression speed at only 5% size cost (due to hash collisions) 
+#   \--> hashing significantly improves compression speed at only 5% size cost (due to hash collisions)
 #
 # 1.74783,9.8142,2.28103 bottom-up|lossy-hash|greedy-match|str-as-long|scalar-adaptive
 #   \--> adaptive use of encoding kernels gives compression speed a small bump
@@ -39,4 +39,4 @@
 # optimized construction refers to the combination of three changes:
 # - reducing the amount of bottom-up passes from 10 to 5 (less learning time, but.. slighty worsens CR)
 # - looking at subsamples in early rounds (increasing the sample as the rounds go up). Less compression work.
-# - splitting the counters for less cache pressure and aiding fast skipping over counts-of-0 
+# - splitting the counters for less cache pressure and aiding fast skipping over counts-of-0
--- be/src/fsst/paper/kernels.sh.orig
+++ be/src/fsst/paper/kernels.sh
@@ -1,15 +1,15 @@
 #/bin/bash
 PARAMS='simd1 simd2 simd3 simd4 adaptive'
-(echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
-echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
-echo "\\\\"
-echo "\\hline"
-echo "\\hline"
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment 
- do 
-   for m in $PARAMS
-   do
-     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
-   done
-   echo $i
- done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf "{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize %s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf "{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//') | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/' 
+(
+    echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
+    echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
+    echo "\\\\"
+    echo "\\hline"
+    echo "\\hline"
+    (for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment; do
+        for m in $PARAMS; do
+            (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
+        done
+        echo $i
+    done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf "{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize %s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf "{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//'
+) | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/'
be/src/fsst/paper/lz4-smallblocks.sh:8:17: not a valid arithmetic operator: f
--- be/src/fsst/paper/sorted.sh.orig
+++ be/src/fsst/paper/sorted.sh
@@ -6,17 +6,15 @@
 rm -rf .sorted 2>/dev/null
 mkdir .sorted
 cd dbtext
-for i in * 
-do 
-  sort $i > ../.sorted/$i; 
+for i in *; do
+    sort $i >../.sorted/$i
 done
 cp chinese japanese faust hamlet ../.sorted/
 cd ..
 
 # note sizes, display stats
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment
- do 
-  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
-  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
- done) | 
-awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
+(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment; do
+    ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
+    ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
+done) |
+    awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-09-13T03:06:08Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-09-14T00:48:18Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors


'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In be/src/fsst/paper/compare.sh line 4:
  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
  ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
        ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
        ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
           ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
                         ^--^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                 ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.
                                          ^--^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
  fgrep "${i}" "$1" | fgrep -v "${i}"2 | fgrep -v "${i}"pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'


In be/src/fsst/paper/evolution.sh line 7:
(for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'
                                     ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                     ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw-strncmp "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'


In be/src/fsst/paper/evolution.sh line 8:
(for i in dbtext/*; do (./cw $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|str-as-long|scalar"}'
                             ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                             ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|str-as-long|scalar"}'


In be/src/fsst/paper/evolution.sh line 9:
(for i in dbtext/*; do (./cw-greedy $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|greedy-match|str-as-long|scalar" }'
                                    ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                    ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
(for i in dbtext/*; do (./cw-greedy "${i}" 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|greedy-match|str-as-long|scalar" }'


In be/src/fsst/paper/evolution.sh line 10:
(for i in dbtext/*; do (./vcw $i 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|binary-search|greedy-match|str-as-long|scalar" }'
                              ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                              ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                         ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./vcw "${i}" 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|binary-search|greedy-match|str-as-long|scalar" }'


In be/src/fsst/paper/evolution.sh line 11:
(for i in dbtext/*; do (./hcw $i 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'
                              ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                              ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                       ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw "${i}" 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|branch-scalar" }'


In be/src/fsst/paper/evolution.sh line 13:
(for i in dbtext/*; do (./hcw-opt $i 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction" }'
                                  ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                           ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw-opt "${i}" 511 -adaptive 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|adaptive-scalar|optimized-construction" }'


In be/src/fsst/paper/evolution.sh line 14:
(for i in dbtext/*; do (./hcw-opt $i 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'
                                  ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                  ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                             ^---^ SC2197 (info): fgrep is non-standard and deprecated. Use grep -F instead.

Did you mean: 
(for i in dbtext/*; do (./hcw-opt "${i}" 2>&1) | fgrep -v target | awk '{ l++; if (l==2) t=$2; if (l==4) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " bottom-up|lossy-hash|greedy-match|str-as-long|avx512|optimized-construction" }'


In be/src/fsst/paper/kernels.sh line 1:
#/bin/bash
 ^-- SC1113 (error): Use #!, not just #, for the shebang.


In be/src/fsst/paper/kernels.sh line 4:
echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
     ^-----^ SC2086 (info): Double quote to prevent globbing and word splitting.
     ^-----^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
echo "${PARAMS}" | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'


In be/src/fsst/paper/kernels.sh line 5:
echo "\\\\"
     ^----^ SC2028 (info): echo may not expand escape sequences. Use printf.


In be/src/fsst/paper/kernels.sh line 10:
   for m in $PARAMS
            ^-----^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
   for m in ${PARAMS}


In be/src/fsst/paper/kernels.sh line 12:
     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
                       ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                               ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                               ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
     (./hcw-opt dbtext/"${i}" 511 -"${m}" 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'


In be/src/fsst/paper/kernels.sh line 14:
   echo $i
        ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
        ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
   echo "${i}"


In be/src/fsst/paper/lz4-smallblocks.sh line 3:
dd if=$1 of=tmpsplit.out bs=$maxsize count=1 2> /dev/null
      ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                            ^------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                            ^------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
dd if="$1" of=tmpsplit.out bs="${maxsize}" count=1 2> /dev/null


In be/src/fsst/paper/lz4-smallblocks.sh line 5:
    mkdir tmpsplit$blocksize
                  ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                  ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    mkdir tmpsplit"${blocksize}"


In be/src/fsst/paper/lz4-smallblocks.sh line 6:
    split -b $blocksize tmpsplit.out tmpsplit$blocksize/x
             ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                             ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                                             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    split -b "${blocksize}" tmpsplit.out tmpsplit"${blocksize}"/x


In be/src/fsst/paper/lz4-smallblocks.sh line 7:
    echo -n $blocksize ""
            ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo -n "${blocksize}" ""


In be/src/fsst/paper/lz4-smallblocks.sh line 8:
    size=$((for f in tmpsplit$blocksize/x*; do lz4 -c $f | wc -c; done) | awk '{s+=$1} END {print s}')
         ^-- SC1102 (error): Shells disambiguate $(( differently or not at all. For $(command substitution), add space after $( . For $((arithmetics)), fix parsing errors.
                             ^--------^ SC2231 (info): Quote expansions in this for loop glob to prevent wordsplitting, e.g. "$dir"/*.txt .
                             ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                      ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                                                      ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    size=$((for f in tmpsplit${blocksize}/x*; do lz4 -c "${f}" | wc -c; done) | awk '{s+=$1} END {print s}')


In be/src/fsst/paper/lz4-smallblocks.sh line 9:
    echo "$maxsize / $size" | bc -l
          ^------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                     ^---^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo "${maxsize} / ${size}" | bc -l


In be/src/fsst/paper/lz4-smallblocks.sh line 10:
    rm -rf tmpsplit$blocksize/
                   ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                   ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    rm -rf tmpsplit"${blocksize}"/


In be/src/fsst/paper/sorted.sh line 8:
cd dbtext
^-------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Did you mean: 
cd dbtext || exit


In be/src/fsst/paper/sorted.sh line 11:
  sort $i > ../.sorted/$i; 
       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
                       ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  sort "${i}" > ../.sorted/"${i}"; 


In be/src/fsst/paper/sorted.sh line 14:
cd ..
^---^ SC2103 (info): Use a ( subshell ) to avoid having to cd back.


In be/src/fsst/paper/sorted.sh line 19:
  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
                                   ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                   ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  ./filtertest compare 1000 dbtext/"${i}" | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'


In be/src/fsst/paper/sorted.sh line 20:
  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
                                    ^-- SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                    ^-- SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  ./filtertest compare 1000 .sorted/"${i}" | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'

For more information:
  https://www.shellcheck.net/wiki/SC1102 -- Shells disambiguate $(( different...
  https://www.shellcheck.net/wiki/SC1113 -- Use #!, not just #, for the sheba...
  https://www.shellcheck.net/wiki/SC2164 -- Use 'cd ... || exit' or 'cd ... |...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- be/src/fsst/paper/compare.sh.orig
+++ be/src/fsst/paper/compare.sh
@@ -1,5 +1,4 @@
 #!/bin/bash
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_commen ps_comment 
- do
-  fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
- done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", "AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
+(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_commen ps_comment; do
+    fgrep $i $1 | fgrep -v ${i}2 | fgrep -v ${i}pedia | awk '{ printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", $1, $7, $2, $8, $3, $11, $6}'
+done) | awk '{print$0;k++;for(i=2;i<=NF;i++) r[i]+=$i;}END{printf "% 16s   %1.2f  %1.2f   % 8.2f   % 8.2f   % 8.2f   % 8.2f\n", "AVG",r[2]/k,r[3]/k,r[4]/k,r[5]/k,r[6]/k,r[7]/k,r[8]/k}'
--- be/src/fsst/paper/evolution.sh.orig
+++ be/src/fsst/paper/evolution.sh
@@ -1,7 +1,7 @@
 #!/bin/bash
 # output format: STCB CCB CR
 # STCB: symbol table construction cost in cycles-per-compressed byte (constructing a new ST per 8MB text)
-# CCB:  compression speed cycles-per-compressed byte 
+# CCB:  compression speed cycles-per-compressed byte
 # CR:   compression (=size reduction) factor achieved
 
 (for i in dbtext/*; do (./cw-strncmp $i 2>&1) | awk '{ l++; if (l==3) t=$2; if (l==6) c=$2; d=$1}END{print t " " c " " d}'; done) | awk '{t+=$1;c+=$2;d+=$3;k++}END{ print (t/k) " " (c/k) " " d/k " iterative|suffix-array|dynp-matching|strncmp|scalar" }'
@@ -16,10 +16,10 @@
 # on Intel SKX CPUs| the results look like:
 #
 # 75.117,160.11,1.97194 iterative|suffix-array|dynp-matching|strncmp|scalar
-#   \--> 160 cycles per byte produces a very slow compression speed (say ~20MB/s on a 3Ghz CPU) 
+#   \--> 160 cycles per byte produces a very slow compression speed (say ~20MB/s on a 3Ghz CPU)
 #
 # 73.6948,81.6404,1.97194 iterative|suffix-array|dynp-matching|str-as-long|scalar
-#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves compression speed 2x 
+#   \--> str-as-long (i.e. FSST focusing on 8-byte word symbols) improves compression speed 2x
 #
 # 74.4996,37.457,1.94764 iterative|suffix-array|greedy-match|str-as-long|scalar
 #   \--> dynamic programming brought only 3% smaller size. So drop it and gain another 2x compression speed.
@@ -28,7 +28,7 @@
 #   \--> bottom-up is *really* better in terms of compression factor than iterative with suffix array.
 #
 # 1.74783,10.7009,2.28103 bottom-up|lossy-hash|greedy-match|str-as-long|scalar-branch
-#   \--> hashing significantly improves compression speed at only 5% size cost (due to hash collisions) 
+#   \--> hashing significantly improves compression speed at only 5% size cost (due to hash collisions)
 #
 # 1.74783,9.8142,2.28103 bottom-up|lossy-hash|greedy-match|str-as-long|scalar-adaptive
 #   \--> adaptive use of encoding kernels gives compression speed a small bump
@@ -39,4 +39,4 @@
 # optimized construction refers to the combination of three changes:
 # - reducing the amount of bottom-up passes from 10 to 5 (less learning time, but.. slighty worsens CR)
 # - looking at subsamples in early rounds (increasing the sample as the rounds go up). Less compression work.
-# - splitting the counters for less cache pressure and aiding fast skipping over counts-of-0 
+# - splitting the counters for less cache pressure and aiding fast skipping over counts-of-0
--- be/src/fsst/paper/kernels.sh.orig
+++ be/src/fsst/paper/kernels.sh
@@ -1,15 +1,15 @@
 #/bin/bash
 PARAMS='simd1 simd2 simd3 simd4 adaptive'
-(echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
-echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
-echo "\\\\"
-echo "\\hline"
-echo "\\hline"
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment 
- do 
-   for m in $PARAMS
-   do
-     (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
-   done
-   echo $i
- done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf "{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize %s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf "{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//') | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/' 
+(
+    echo | awk '{ print "{\\begin{tabular}{|rrrr|r|l|}\n\\hline"}'
+    echo $PARAMS | awk "{for(i=1;i<=NF;i++) printf \"{\\\\footnotesize{X%d\$%s\$}}&\",i,\$i}" | sed 's/simd/simd_/g'
+    echo "\\\\"
+    echo "\\hline"
+    echo "\\hline"
+    (for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment; do
+        for m in $PARAMS; do
+            (./hcw-opt dbtext/$i 511 -$m 2>&1) | tail -2 | head -1 | awk '{ printf "%f ", $2 }'
+        done
+        echo $i
+    done) | awk '{for(i=1;i<NF;i++){r[i]+=$i;printf "{\\footnotesize{X%d%5.2f}}& ",i,$i}k++;printf "{\\footnotesize %s}\\\\\n",$NF}END{print "\\hline"; for(j=1;j<i;j++)printf "{\\footnotesize{X%d%5.2f}}& ",j,r[j]/k;print "{\\footnotesize average}\\\\\n\\hline\n\\end{tabular}}"}' | sed 's/_/\\_/g' | sed 's/[0-9]*-//'
+) | sed 's/X[38]/\\bf /g' | sed 's/X[1-9]//g' | sed 's/adaptive/scalar/'
be/src/fsst/paper/lz4-smallblocks.sh:8:17: not a valid arithmetic operator: f
--- be/src/fsst/paper/sorted.sh.orig
+++ be/src/fsst/paper/sorted.sh
@@ -6,17 +6,15 @@
 rm -rf .sorted 2>/dev/null
 mkdir .sorted
 cd dbtext
-for i in * 
-do 
-  sort $i > ../.sorted/$i; 
+for i in *; do
+    sort $i >../.sorted/$i
 done
 cp chinese japanese faust hamlet ../.sorted/
 cd ..
 
 # note sizes, display stats
-(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment
- do 
-  ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
-  ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
- done) | 
-awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
+(for i in hex yago email wiki uuid urls2 urls firstname lastname city credentials street movies faust hamlet chinese japanese wikipedia genome location c_name l_comment ps_comment; do
+    ./filtertest compare 1000 dbtext/$i | tail -1 | awk '{ printf "% 16s %1.2f %1.2f ",$1,$2,$7}'
+    ./filtertest compare 1000 .sorted/$i | tail -1 | awk '{ printf "%1.2f %1.2f\n",$2,$7}'
+done) |
+    awk '{ s1+=$2; s2+=$3; s3+=$4; s4+=$5; k++; print $0} END {printf "% 16s %1.2f% 1.2f %1.2f %1.2f\n", "avg",s1/k, s2/k, s3/k, s4/k}'
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-09-14T00:55:01Z

clang-tidy review says "All clean, LGTM! 👍"

add fsst encode

de0a8c6

Merge branch 'master' into fsst

1d13869

Merge branch 'master' into fsst

6b74b53

Merge branch 'master' into fsst

58a7a99

SkyFan2002 closed this Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fsst encode#24234

feat: add fsst encode#24234
SkyFan2002 wants to merge 4 commits intoapache:masterfrom
SkyFan2002:fsst

SkyFan2002 commented Sep 12, 2023

Uh oh!

github-actions bot commented Sep 12, 2023

Uh oh!

github-actions bot commented Sep 12, 2023

Uh oh!

SkyFan2002 commented Sep 12, 2023

Uh oh!

github-actions bot commented Sep 12, 2023

Uh oh!

github-actions bot commented Sep 12, 2023

Uh oh!

SkyFan2002 commented Sep 12, 2023

Uh oh!

github-actions bot commented Sep 13, 2023

Uh oh!

github-actions bot commented Sep 13, 2023

Uh oh!

github-actions bot commented Sep 14, 2023

Uh oh!

github-actions bot commented Sep 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

SkyFan2002 commented Sep 12, 2023

Proposed changes

Further comments

Uh oh!

github-actions bot commented Sep 12, 2023

sh-checker report

Uh oh!

github-actions bot commented Sep 12, 2023

Uh oh!

SkyFan2002 commented Sep 12, 2023

Uh oh!

github-actions bot commented Sep 12, 2023

sh-checker report

Uh oh!

github-actions bot commented Sep 12, 2023

Uh oh!

SkyFan2002 commented Sep 12, 2023

Uh oh!

github-actions bot commented Sep 13, 2023

sh-checker report

Uh oh!

github-actions bot commented Sep 13, 2023

Uh oh!

github-actions bot commented Sep 14, 2023

sh-checker report

Uh oh!

github-actions bot commented Sep 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

`sh-checker report`

`sh-checker report`

`sh-checker report`

`sh-checker report`