Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

counsel-rg fails with \b #2795

Closed
JJPandari opened this issue Jan 27, 2021 · 8 comments
Closed

counsel-rg fails with \b #2795

JJPandari opened this issue Jan 27, 2021 · 8 comments
Labels
moreinfo Waiting for author to provide more information triage

Comments

@JJPandari
Copy link

I'm trying to search \bsource\b with counsel-rg, without the \bs, it works fine, but with \bs it says "ERROR CODE 2". The same search string works if I use counsel-ag.

I saw in other issues that ivy-prescient may be in the way, I tried to move counsel-rg in/out ivy-prescient-sort-commands, but no luck.

counsel--async-last-command for counsel-rg:

("rg" "--with-filename" "--no-heading" "--line-number" "--color" "never" "--max-filesize" "1M" "--max-columns" "233" "--max-columns-preview" "-i" "(?:[\\﹨\](?:b[̣̱̇]|[bᵇḃḅḇⓑb𝐛𝑏𝒃𝒷𝓫𝔟𝕓𝖇𝖻𝗯𝘣𝙗𝚋])(?:s[̧̣̦́̂̇̌]|[sśŝşšſșˢṡṣṥṧṩẛₛⓢſts𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜])(?:o[̀-̄̆-̨̛̣̉̋̌̏̑]|[oºò-öōŏőơǒǫǭȍȏȫȭȯȱᵒṍṏṑṓọỏốồổỗộớờởỡợₒℴⓞo𝐨𝑜𝒐𝓸𝔬𝕠𝖔𝗈𝗼𝘰𝙤𝚘])(?:u[̀-̄̆̈-̨̛̣̤̭̰̌̏̑]|[uù-üũūŭůűųưǔǖǘǚǜȕȗᵘᵤṳṵṷṹṻụủứừửữựⓤu𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞])(?:r[̧̣̱́̇̌̏̑]|[rŕŗřȑȓʳᵣṙṛṝṟⓡr𝐫𝑟𝒓𝓇𝓻𝔯𝕣𝖗𝗋𝗿𝘳𝙧𝚛])(?:c[̧́̂̇̌]|[cçćĉċčᶜḉⅽⓒc𝐜𝑐𝒄𝒸𝓬𝔠𝕔𝖈𝖼𝗰𝘤𝙘𝚌])(?:e[̀-̄̆-̧̨̣̭̰̉̌̏̑]|[eè-ëēĕėęěȅȇȩᵉḕḗḙḛḝẹẻẽếềểễệₑℯⅇⓔe𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝖾𝗲𝘦𝙚𝚎])[\\﹨\](?:b[̣̱̇]|[bᵇḃḅḇⓑb𝐛𝑏𝒃𝒷𝓫𝔟𝕓𝖇𝖻𝗯𝘣𝙗𝚋])|\\bsource\\b|\\b\\\\\\w*\\W*\\bb\\w*\\W*\\bs\\w*\\W*\\bo\\w*\\W*\\bu\\w*\\W*\\br\\w*\\W*\\bc\\w*\\W*\\be\\w*\\W*\\b\\\\\\w*\\W*\\bb\\w*)")

for counsel-ag:

ag --vimgrep   -i  \(\?\:\[\\\﹨\\\]\(\?\:b\[\̇\̣\̱\]\|\[b\ᵇ\ḃ\ḅ\ḇ\ⓑ\b\𝐛\𝑏\𝒃\𝒷\𝓫\𝔟\𝕓\𝖇\𝖻\𝗯\𝘣\𝙗\𝚋\]\)\(\?\:s\[\́\̂\̇\̌\̣\̦\̧\]\|\[s\ś\ŝ\ş\š\ſ\ș\ˢ\ṡ\ṣ\ṥ\ṧ\ṩ\ẛ\ₛ\ⓢ\ſt\s\𝐬\𝑠\𝒔\𝓈\𝓼\𝔰\𝕤\𝖘\𝗌\𝘀\𝘴\𝙨\𝚜\]\)\(\?\:o\[\̀-\̄\̆-\̉\̋\̌\̏\̑\̛\̣\̨\]\|\[o\º\ò-\ö\ō\ŏ\ő\ơ\ǒ\ǫ\ǭ\ȍ\ȏ\ȫ\ȭ\ȯ\ȱ\ᵒ\ṍ\ṏ\ṑ\ṓ\ọ\ỏ\ố\ồ\ổ\ỗ\ộ\ớ\ờ\ở\ỡ\ợ\ₒ\ℴ\ⓞ\o\𝐨\𝑜\𝒐\𝓸\𝔬\𝕠\𝖔\𝗈\𝗼\𝘰\𝙤\𝚘\]\)\(\?\:u\[\̀-\̄\̆\̈-\̌\̏\̑\̛\̣\̤\̨\̭\̰\]\|\[u\ù-\ü\ũ\ū\ŭ\ů\ű\ų\ư\ǔ\ǖ\ǘ\ǚ\ǜ\ȕ\ȗ\ᵘ\ᵤ\ṳ\ṵ\ṷ\ṹ\ṻ\ụ\ủ\ứ\ừ\ử\ữ\ự\ⓤ\u\𝐮\𝑢\𝒖\𝓊\𝓾\𝔲\𝕦\𝖚\𝗎\𝘂\𝘶\𝙪\𝚞\]\)\(\?\:r\[\́\̇\̌\̏\̑\̣\̧\̱\]\|\[r\ŕ\ŗ\ř\ȑ\ȓ\ʳ\ᵣ\ṙ\ṛ\ṝ\ṟ\ⓡ\r\𝐫\𝑟\𝒓\𝓇\𝓻\𝔯\𝕣\𝖗\𝗋\𝗿\𝘳\𝙧\𝚛\]\)\(\?\:c\[\́\̂\̇\̌\̧\]\|\[c\ç\ć\ĉ\ċ\č\ᶜ\ḉ\ⅽ\ⓒ\c\𝐜\𝑐\𝒄\𝒸\𝓬\𝔠\𝕔\𝖈\𝖼\𝗰\𝘤\𝙘\𝚌\]\)\(\?\:e\[\̀-\̄\̆-\̉\̌\̏\̑\̣\̧\̨\̭\̰\]\|\[e\è-\ë\ē\ĕ\ė\ę\ě\ȅ\ȇ\ȩ\ᵉ\ḕ\ḗ\ḙ\ḛ\ḝ\ẹ\ẻ\ẽ\ế\ề\ể\ễ\ệ\ₑ\ℯ\ⅇ\ⓔ\e\𝐞\𝑒\𝒆\𝓮\𝔢\𝕖\𝖊\𝖾\𝗲\𝘦\𝙚\𝚎\]\)\[\\\﹨\\\]\(\?\:b\[\̇\̣\̱\]\|\[b\ᵇ\ḃ\ḅ\ḇ\ⓑ\b\𝐛\𝑏\𝒃\𝒷\𝓫\𝔟\𝕓\𝖇\𝖻\𝗯\𝘣\𝙗\𝚋\]\)\|\\bsource\\b\|\\b\\\\\\w\*\\W\*\\bb\\w\*\\W\*\\bs\\w\*\\W\*\\bo\\w\*\\W\*\\bu\\w\*\\W\*\\br\\w\*\\W\*\\bc\\w\*\\W\*\\be\\w\*\\W\*\\b\\\\\\w\*\\W\*\\bb\\w\*\)

rg --version:

ripgrep 11.0.1
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

ag --version:

ag version 2.2.0

Features:
  +jit +lzma +zlib

OS: MacOS Catalina 10.15.7

@basil-conto
Copy link
Collaborator

WFM:

  1. make plain
  2. C-c k \bfoo\b
  3. C-g
  4. counsel--async-last-command C-j
("rg" "-M" "240" "--with-filename" "--no-heading" "--line-number" "--color" "never" "-i" "\\bfoo\\b")

This is with latest Ivy/Counsel with the following Emacs versions:

In GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu, X toolkit, cairo version 1.16.0, Xaw3d scroll bars)
 of 2021-01-27 built on tia
Repository revision: 08574a7f40f27ad29efb8f7d975012ecc9111717
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12010000
System Description: Debian GNU/Linux bullseye/sid

Configured using:
 'configure 'CC=ccache gcc' 'CFLAGS=-O2 -march=native' --config-cache
 --prefix=/home/blc/.local --enable-checking=structs
 --with-x-toolkit=lucid --with-file-notification=yes --with-x'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11
XAW3D XDBE XIM XPM LUCID ZLIB

Important settings:
  value of $LANG: en_IE.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix
In GNU Emacs 27.1.90 (build 7, x86_64-pc-linux-gnu, X toolkit, cairo version 1.16.0, Xaw3d scroll bars)
 of 2021-01-24 built on tia
Repository revision: 809503431d47afb9d20e8463853298a595e1fcb2
Repository branch: emacs-27
Windowing system distributor 'The X.Org Foundation', version 11.0.12010000
System Description: Debian GNU/Linux bullseye/sid

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure 'CC=ccache gcc' 'CFLAGS=-O0 -g3 -ggdb -gdwarf-4'
 --config-cache --prefix=/home/blc/.local --program-suffix=-27
 --enable-checking=yes,glyphs --enable-check-lisp-object-type
 --with-x-toolkit=lucid --with-file-notification=yes --with-x
 --with-cairo'

Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG CAIRO SOUND GPM DBUS GSETTINGS GLIB
NOTIFY INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT
LIBOTF ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS
LIBSYSTEMD JSON PDUMPER LCMS2 GMP

Important settings:
  value of $LANG: en_IE.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

I'm trying to search \bsource\b with counsel-rg, without the \bs, it works fine, but with \bs it says "ERROR CODE 2". The same search string works if I use counsel-ag.

Does doubling the backslash have any effect?

counsel--async-last-command for counsel-rg:

Clearly something is applying char-fold-to-regexp or something like that to the search string; see C-h v search-default-mode RET and (info "(emacs) Lax Search").

Can you try reproducing the issue from emacs -Q or make plain and see if you can narrow down which part of your user-init-file gives rise to it e.g. by commenting parts out?

@basil-conto basil-conto added moreinfo Waiting for author to provide more information triage labels Jan 27, 2021
@JJPandari
Copy link
Author

Turns out it's prescient who's adding char-fold-to-regexp to the search, and the search string with char-fold sequences is considered invalid by rg, thus we have the bug.

More precisely, when search for use\b with prescient/ivy-prescient enabled, I have this query (entered in a shell by replacing \\ with \ in counsel--async-last-command, am I right?):

rg --with-filename --no-heading --line-number --color never --max-filesize 1M --max-columns 233 --max-columns-preview   -i  '\(\?\:\(\?\:u\[\̀-\̄\̆\̈-\̌\̏\̑\̛\̣\̤\̨\̭\̰\]\|\[u\ù-\ü\ũ\ū\ŭ\ů\ű\ų\ư\ǔ\ǖ\ǘ\ǚ\ǜ\ȕ\ȗ\ᵘ\ᵤ\ṳ\ṵ\ṷ\ṹ\ṻ\ụ\ủ\ứ\ừ\ử\ữ\ự\ⓤ\u\𝐮\𝑢\𝒖\𝓊\𝓾\𝔲\𝕦\𝖚\𝗎\𝘂\𝘶\𝙪\𝚞\]\)\(\?\:s\[\́\̂\̇\̌\̣\̦\̧\]\|\[s\ś\ŝ\ş\š\ſ\ș\ˢ\ṡ\ṣ\ṥ\ṧ\ṩ\ẛ\ₛ\ⓢ\ſt\s\𝐬\𝑠\𝒔\𝓈\𝓼\𝔰\𝕤\𝖘\𝗌\𝘀\𝘴\𝙨\𝚜\]\)\(\?\:e\[\̀-\̄\̆-\̉\̌\̏\̑\̣\̧\̨\̭\̰\]\|\[e\è-\ë\ē\ĕ\ė\ę\ě\ȅ\ȇ\ȩ\ᵉ\ḕ\ḗ\ḙ\ḛ\ḝ\ẹ\ẻ\ẽ\ế\ề\ể\ễ\ệ\ₑ\ℯ\ⅇ\ⓔ\e\𝐞\𝑒\𝒆\𝓮\𝔢\𝕖\𝖊\𝖾\𝗲\𝘦\𝙚\𝚎\]\)\[\\\﹨\\\]\(\?\:b\[\̇\̣\̱\]\|\[b\ᵇ\ḃ\ḅ\ḇ\ⓑ\b\𝐛\𝑏\𝒃\𝒷\𝓫\𝔟\𝕓\𝖇\𝖻\𝗯\𝘣\𝙗\𝚋\]\)\|use\\b\|\\bu\\w\*\\W\*\\bs\\w\*\\W\*\\be\\w\*\\W\*\\b\\\\\\w\*\\W\*\\bb\\w\*\)'

which returns this error:

regex parse error:
    \(\?\:\(\?\:u\[\̀-\̄\̆\̈-\̌\̏\̑\̛\̣\̤\̨\̭\̰\]\|\[u\ù-\ü\ũ\ū\ŭ\ů\ű\ų\ư\ǔ\ǖ\ǘ\ǚ\ǜ\ȕ\ȗ\ᵘ\ᵤ\ṳ\ṵ\ṷ\ṹ\ṻ\ụ\ủ\ứ\ừ\ử\ữ\ự\ⓤ\u\𝐮\𝑢\𝒖\𝓊\𝓾\𝔲\𝕦\𝖚\𝗎\𝘂\𝘶\𝙪\𝚞\]\)\(\?\:s\[\́\̂\̇\̌\̣\̦\̧\]\|\[s\ś\ŝ\ş\š\ſ\ș\ˢ\ṡ\ṣ\ṥ\ṧ\ṩ\ẛ\ₛ\ⓢ\ſt\s\𝐬\𝑠\𝒔\𝓈\𝓼\𝔰\𝕤\𝖘\𝗌\𝘀\𝘴\𝙨\𝚜\]\)\(\?\:e\[\̀-\̄\̆-\̉\̌\̏\̑\̣\̧\̨\̭\̰\]\|\[e\è-\ë\ē\ĕ\ė\ę\ě\ȅ\ȇ\ȩ\ᵉ\ḕ\ḗ\ḙ\ḛ\ḝ\ẹ\ẻ\ẽ\ế\ề\ể\ễ\ệ\ₑ\ℯ\ⅇ\ⓔ\e\𝐞\𝑒\𝒆\𝓮\𝔢\𝕖\𝖊\𝖾\𝗲\𝘦\𝙚\𝚎\]\)\[\\\﹨\\\]\(\?\:b\[\̇\̣\̱\]\|\[b\ᵇ\ḃ\ḅ\ḇ\ⓑ\b\𝐛\𝑏\𝒃\𝒷\𝓫\𝔟\𝕓\𝖇\𝖻\𝗯\𝘣\𝙗\𝚋\]\)\|use\\b\|\\bu\\w\*\\W\*\\bs\\w\*\\W\*\\be\\w\*\\W\*\\b\\\\\\w\*\\W\*\\bb\\w\*\)
        ^^
error: unrecognized escape sequence

So ultimately this seems to be an incompatibility between char-fold-to-regexp and rg.


Found some similar issues but they're either fixed bug or have different cause:

radian-software/prescient.el#19
doomemacs/doomemacs#3038

BTW, @basil-conto you're amazing to respond the issue within a day!

@basil-conto
Copy link
Collaborator

entered in a shell by replacing \\ with \ in counsel--async-last-command, am I right?

What do you mean? By default counsel-rg calls rg directly, not via the shell. This is controlled by the type of the user option counsel-rg-base-command. This means shell quoting/escaping is not needed.

So ultimately this seems to be an incompatibility between char-fold-to-regexp and rg.

Maybe, but that's not what the error is saying. In your examples the whole regexp is quoted/escaped for use in the shell, i.e. \(?:...\) becomes \(\?\:...\). Whoever's responsible for that is wrong, but even if it weren't happening, there is indeed no guarantee that Elisp regexps produced by char-fold-to-regexp will be understood.

Could you try tracing whether counsel--elisp-to-pcre gets called? If not, I wonder whether the output of char-fold-to-regexp could be transformed with it to something that rg understands.

@JJPandari
Copy link
Author

JJPandari commented Feb 1, 2021

counsel--elisp-to-pcre is called:

======================================================================
1 -> (counsel--elisp-to-pcre (("\\(?:u[̀-̄̆̈-̨̛̣̤̭̰̌̏̑]\\|[uù-üũūŭůűųưǔǖǘǚǜȕȗᵘᵤṳṵṷṹṻụủứừửữựⓤu𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞]\\)\\(?:s[̧̣̦́̂̇̌]\\|[sśŝşšſșˢṡṣṥṧṩẛₛⓢſts𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜]\\)\\(?:e[̀-̄̆-̧̨̣̭̰̉̌̏̑]\\|[eè-ëēĕėęěȅȇȩᵉḕḗḙḛḝẹẻẽếềểễệₑℯⅇⓔe𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝖾𝗲𝘦𝙚𝚎]\\)\\|use\\|\\bu\\w*\\W*\\bs\\w*\\W*\\be\\w*" . t)) "--pcre2")
| 2 -> (counsel--elisp-to-pcre "\\(?:u[̀-̄̆̈-̨̛̣̤̭̰̌̏̑]\\|[uù-üũūŭůűųưǔǖǘǚǜȕȗᵘᵤṳṵṷṹṻụủứừửữựⓤu𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞]\\)\\(?:s[̧̣̦́̂̇̌]\\|[sśŝşšſșˢṡṣṥṧṩẛₛⓢſts𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜]\\)\\(?:e[̀-̄̆-̧̨̣̭̰̉̌̏̑]\\|[eè-ëēĕėęěȅȇȩᵉḕḗḙḛḝẹẻẽếềểễệₑℯⅇⓔe𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝖾𝗲𝘦𝙚𝚎]\\)\\|use\\|\\bu\\w*\\W*\\bs\\w*\\W*\\be\\w*")
| 2 <- counsel--elisp-to-pcre: "(?:u[̀-̄̆̈-̨̛̣̤̭̰̌̏̑]|[uù-üũūŭůűųưǔǖǘǚǜȕȗᵘᵤṳṵṷṹṻụủứừửữựⓤu𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞])(?:s[̧̣̦́̂̇̌]|[sśŝşšſșˢṡṣṥṧṩẛₛⓢſts𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜])(?:e[̀-̄̆-̧̨̣̭̰̉̌̏̑]|[eè-ëēĕėęěȅȇȩᵉḕḗḙḛḝẹẻẽếềểễệₑℯⅇⓔe𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝖾𝗲𝘦𝙚𝚎])|use|\\bu\\w*\\W*\\bs\\w*\\W*\\be\\w*"
1 <- counsel--elisp-to-pcre: "(?:(?:u[̀-̄̆̈-̨̛̣̤̭̰̌̏̑]|[uù-üũūŭůűųưǔǖǘǚǜȕȗᵘᵤṳṵṷṹṻụủứừửữựⓤu𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞])(?:s[̧̣̦́̂̇̌]|[sśŝşšſșˢṡṣṥṧṩẛₛⓢſts𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜])(?:e[̀-̄̆-̧̨̣̭̰̉̌̏̑]|[eè-ëēĕėęěȅȇȩᵉḕḗḙḛḝẹẻẽếềểễệₑℯⅇⓔe𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝖾𝗲𝘦𝙚𝚎])|use|\\bu\\w*\\W*\\bs\\w*\\W*\\be\\w*)"

without loading prescient and ivy-prescient:

======================================================================
1 -> (counsel--elisp-to-pcre "use" "--pcre2")
1 <- counsel--elisp-to-pcre: "use"

@basil-conto
Copy link
Collaborator

Thanks. So is there anything Counsel can do here, or can this issue be closed?

@JJPandari
Copy link
Author

Closing for now.

Looks like counsel--elisp-to-pcre may be getting something wrong dealing with the char-folded regexp, but I'm so not sure about "what happens with these \\\\s and \\s". Even if counsel--elisp-to-pcre does wrong on a few corner cases, the better way to solve/avoid my issue would be making char-folding optional in prescient. Will consider open an issue there.

@JJPandari
Copy link
Author

Found a workaround: ivy-prescient can enable candidate filtering and sorting separately, I can disable filtering to avoid this char-fold issue, while still have sorting (place recently used and frequently used entries on top), which is my favorite feature of prescient.

@basil-conto
Copy link
Collaborator

If you can identify in which corner cases counsel--elisp-to-pcre does the wrong thing, that would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
moreinfo Waiting for author to provide more information triage
Projects
None yet
Development

No branches or pull requests

2 participants