files are being skipped from search when the filename has a spanish accent #503

leitor79 · 2021-05-10T01:06:08Z

Hi,

When I look for some content in some folder, "paths to match"= "", "." or ".pdf", the results list miss files with a positive match if the filename has a character with a spanish accent (IE: "ó").

I've tested this with the same archive, removing the character from one of them, and dnGrep finds only one of them.

The file is a PDF, just in case.

Regards,

doug24 · 2021-05-10T03:22:23Z

I assume you are searching by asterisk pattern, because these aren't good regular expressions. But you aren't including any wildcards in the the pattern either. * matches any number of characters, and ? matches a single character. So ".pdf" would be used to find any file with a pdf extension, and ".*" would be used to find any file name with any extension.

Can you try searching by one of these?
*.pdf
.

leitor79 · 2021-05-16T01:15:17Z

Hi,
Thank you for your answer.
I'm pretty sure I've tried including wildcards; I've tried again, just in case. I've copied the same file with the same content in the same folder and renamed one of them to not have an accent. The accented file was not a match.

https://i.imgur.com/jz1k140.png

Regards,

doug24 · 2021-05-16T02:55:43Z

I did reproduce this bug. It is specific to pdf files, which I had not tested before. As I commented on #504, dnGrep searches plain text, so it uses plug-ins to convert binary formatted files like Word, Excel and PDF to text before searching.

The bug isn't actually in dnGrep, but in the pdftotext.exe application that dnGrep calls to extract text from pdf files. When calling pdftotext.exe dnGrep makes the call like this:

pdftotext.exe -layout -enc UTF-8 -bom "C:\testFiles\test\issue503\Eliseo Verón.pdf" "C:\Users\user\AppData\Local\Temp\dnGrep-lkwsbbww72mP\dnGREP-PDF\Eliseo Verón.txt"

But instead of creating the file "Eliseo Verón.txt", pdftotext.exe creates a file named "Eliseo VerÃ³n.txt", and dnGrep can't find the correct file to search.

This bug appears to be in pdftotext version 4, but not in version 3. The dnGrep installer installs version 4 with the application, but you can overwrite it with the older version.

I attached pdftotext.exe version 3 to this note (see below). To use it, open this directory in Windows Explorer:
C:\Program Files\dnGREP\Plugins\PdfSearch
Rename the existing pdftotext.exe to pdftotext4.exe, and copy version 3 from the zip file into that directory.
Next start dnGrep, and open the Options dialog. Scroll down the PDF section and remove the command line options (the default options only work with pdftotext version 4:

pdftotext.zip

doug24 · 2021-06-27T20:42:19Z

Fixed in Release 2.9.345

doug24 closed this as completed May 15, 2021

doug24 reopened this May 16, 2021

doug24 closed this as completed Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

files are being skipped from search when the filename has a spanish accent #503

files are being skipped from search when the filename has a spanish accent #503

leitor79 commented May 10, 2021

doug24 commented May 10, 2021

leitor79 commented May 16, 2021

doug24 commented May 16, 2021 •

edited

doug24 commented Jun 27, 2021

files are being skipped from search when the filename has a spanish accent #503

files are being skipped from search when the filename has a spanish accent #503

Comments

leitor79 commented May 10, 2021

doug24 commented May 10, 2021

leitor79 commented May 16, 2021

doug24 commented May 16, 2021 • edited

doug24 commented Jun 27, 2021

doug24 commented May 16, 2021 •

edited