Skip to content

Archive and Plugin Options

Doug P edited this page Jan 3, 2024 · 5 revisions

The Archive and Plugin options modify the file associations and enable or disable plugins.

Archive files

The list of extensions are file types to be treated as archives, and dnGrep will search for individual files inside the archive.

When matches are found in files inside archives, and you want to open the file, dnGrep will extract the file into a temporary folder, then show the file.

However, for some archive types, you may want to open the archive itself in the associated application - for example, epub files can be open as a book in the associated e-book reader.

Plug-ins

In dnGrep, plug-ins are used to search binary document files. These include Word, Excel, PowerPoint, and PDF files. dnGrep cannot search these files directly, so it runs some process to extract text from the document, and then search the text. The tools for extracting text have some options to control how the text is extracted.

Word (docx) documents

  • Extract footnotes - this option will enable extracting footnotes and endnotes. Footnotes and endnotes are placed at the end of the document text (not by page since the text version of the document is not paginated). The footnote/endnote reference number or symbol may also be added inline in the document text in different formats. Note that including the footnote/endnote reference may affect searching across multiple words.

    • Not shown
    • Superscript - this option uses Unicode superscript characters which may not be supported in all fonts.
    • Full size character - this option will show the reference number/character as a full-size character, which may be not obvious or confusing in line with the document text.
    • Full size character with parenthesis - adding parenthesis around the reference makes them stand out more.

    Currently dnGrep supports the six numbering formats for footnotes and endnotes found in the US English version of Word: numbers, upper- and lower-case letters, upper- and lower-case Roman numerals and Chicago style, but more can be added.

  • Extract comments - this option will enable extracting comments. The comments are also placed at the end of the document text. Comment reference numbers may also be added inline in the document text, but they are always numbers, starting at zero. The numbers may be shown in these formats:

    • Not shown
    • Subscript - this option uses Unicode subscript characters which may not be supported in all fonts.
    • Full size character with parenthesis
  • Extract headers - this option will extract the header text, once for each header format.

  • Extract footers - this o will extract the footer text, once for each footer format.

  • Extracted header and footer position - header and footer text may be extracted at the start of each document section where the header or footer changes, or all headers and footers at the end of the document.

PDF Files

PDF to Text

PdfToText is an application used to convert PDF files to plain text so dnGrep can search them (it can't search PDF files directly). This option allows users to fine tune the command line arguments to the PdfToText application to meet their needs. To adjust the PdfToText command line options, first run PdfToText.exe in C:\Program Files\dnGREP\Plugins\PdfSearch with the -h or -? argument to see the list of available command line arguments. Then modify the command line arguments in the dnGrep Options dialog. The default options will preserve the document layout, and output the text using UTF-8 character encoding with a byte order mark.

When searching PDF text, the results may be shown by line number (of the extracted text file) or by page number (corresponding to the page in the PDF document).

Plug-in options

Enable "Show plain text in preview" to show the text as extracted from the document file. This will create a temporary file to display in the Preview Window. This temporary file is always created for PDF files but is only created if this option is enabled for Word, Excel and PowerPoint files.

You may modify the file extensions associated with each plugin. Note that the old-style Word .doc files need a different plugin from the new .docx file type.

The plugins may also be disabled from this section.