Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix expansion of bracketed expressions in RegExpBasedFileFinder #7338

Merged
merged 18 commits into from
Jan 24, 2021

Conversation

k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 commented Jan 13, 2021

Fixes #4342

#4342 contains two issues.

  1. The bracketed pattern [title] does not refer to the StandardField title, but a "special" pattern intended to give nicer output (this is why dashes don't work, see Finding files with non-alphabetical characters in title #4342 (comment)). It is intended and can be avoided by using [TITLE].
  2. An expanded bracketed pattern is interpreted as a regexp (which it should not). This is why parenthesizes doesn't work. A pattern that expands into (abc) will match abc, but not (abc).

If the expanded bracketed pattern contains characters that have a special meaning in regexp (such as a parenthesis and brackets), it becomes a problem.
An entry with the title Regexp from [A-Z] and the bracketed expression [TITLE] would match a file titled Regexp from B, but not a file named Regexp from [A-Z].

The goal of this PR

Match files whose expanded bracketed pattern includes symbols that can be miss-interpreted.

  1. Regex: All expanded brackets will match literally, using Pattern.quote. I don't see a reasonable use-case for allowing a bracket to expand into an actual regex.
  2. Cleaned filenames In addition to matching the regularly expanded brackets, they will also match cleaned file names (it will find files containing both ACM/IEEE-CS and ACM_IEEE-CS).

Note: Regex can still be used outside of the bracketed patterns. (except for character classes, since they will be interpreted as a bracketed pattern)

Issues neither addressed in this PR nor the current master

  1. Files that get truncated due to length
  2. Character classes in the RegExpBasedFileFinder. In the following pattern, **/.*[title][a-z]+.*\\.[extension], [a-z] is treated as if it is a bracketed pattern
  3. BracketedExpressions are only expanded in the file name, not in the directory name.

Checklist

  • Check if parts can be replaced by a glob (getPathMatcher)
  • Improve finding and parsing bracketed expressions?
  • Fix this particular use case using regexp literals (\Q\E in the regexp)
    • Add type-safety to the expanded bracket (return a compiled Pattern when it is supposed to be a RegularExpression)
  • Check that the expanded bracketed expression result in a correct filename, and/or convert it into one. (avoid issues finding JabRef generated filenames 😱 )
    • Add test case for an expression that would result in a JabRef corrected filename ("ACM/IEEE-CS Information Technology Curriculum" -> "ACM_IEEE_CS Information Technology Curriculum") I don't think this can be done reliably within the scope of this PR
    • Make sure a warning appears in the debug log if a filename/dirname is expected to be too long
    • Refactor FileUtil.createFileNameFromPattern so the same code is used I don't think this can be done reliably within the scope of this PR
  • Add JavaDoc, improve variable names and comments

This avoid replicating the exact same behaviour in RegExpBasedFileFinder
For this particular use we must assume that the rest of the text is a
regexp, and only put the content of the expanded bracket between quotes.
 It must return a String as this is only a part of the final regexp that
  will be compiled later.
@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title [WIP] Reviewing RegExpBasedFileFinder [WIP] Fix expansion of bracketed expressions in RegExpBasedFileFinder that contain regexp reserved characters Jan 13, 2021
@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title [WIP] Fix expansion of bracketed expressions in RegExpBasedFileFinder that contain regexp reserved characters Fix expansion of bracketed expressions in RegExpBasedFileFinder that contain regexp reserved characters Jan 15, 2021
@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 marked this pull request as ready for review January 15, 2021 15:36
@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title Fix expansion of bracketed expressions in RegExpBasedFileFinder that contain regexp reserved characters Fix expansion of bracketed expressions in RegExpBasedFileFinder Jan 15, 2021
} catch (UncheckedIOException | IOException ioe) {
return Collections.emptyList();
resultFiles.addAll(pathStream.collect(Collectors.toList()));
} catch (UncheckedIOException uncheckedIOException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this even an UncheckedIOException?
If needed, the UncheckedException is just a wrapper for the IOEception. So you could call throw uncheckedException.getCause()

Copy link
Member Author

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 Jan 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I am not sure why an UncheckedIOException is caught in this part of the code. I don't have much experience with nio.* but my interpretation of the API is that the UncheckedIOException must be caught in the parts of the code that make use of the Path reference.
I don't know if there are any other potential issues with a lazily loaded file system walk. Based on DirectoryStream and Files.walk I'd guess it could be thrown if depth > 1 and there is a cycle, hence, not in this part of the code unless it is changed.

@koppor koppor added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Jan 18, 2021
@koppor
Copy link
Member

koppor commented Jan 18, 2021

This needs deeper thinking at review. At first gut feeling, [author] text should be expaned. This is not possible any more in this PR (see remove tests). Not sure whether we really want that. We need to really read the PR description and the linked issue for a proper feedback.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member Author

k3KAW8Pnf7mkmdSMPHz27 commented Jan 18, 2021

This needs deeper thinking at review. At first gut feeling, [author] text should be expaned. This is not possible any more in this PR (see remove tests).

This is possible in this PR. I'll review my PR description and see if I can clarify.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member Author

k3KAW8Pnf7mkmdSMPHz27 commented Jan 18, 2021

@koppor @Siedlerchr sorry about that. Sometimes it is a bit too easy to tunnel-vision and assume something is "obvious", while forgetting all the assumptions needed to make it obvious. The PR description is updated and, hopefully, more readable.

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I've only one suggestion for the tests, otherwise +1 for merge.

@koppor koppor merged commit 7672b2d into JabRef:master Jan 24, 2021
Siedlerchr added a commit to koppor/jabref that referenced this pull request Jan 26, 2021
* upstream/master: (217 commits)
  Fix handling of URL in file field (JabRef#7347)
  Fix expansion of bracketed expressions in RegExpBasedFileFinder (JabRef#7338)
  Refactor unlinked files (JabRef#7209)
  Add pressing enter when the search field is focused as a way to trigger search (JabRef#7377)
  Upgrade citeproc to 3.x snapshot without graal (JabRef#7370)
  Fix Exception if no AzureInstrumentationKey is available (JabRef#7373)
  Update snapcraft source url (JabRef#7372)
  Fix checkstyle and adjust language
  GitBook: [master] 3 pages and 32 assets modified
  Add migration to special field (JabRef#7368)
  GitBook: [master] 5 pages and 29 assets modified
  Modify message at the duplicates found dialog (JabRef#7231)
  Fixes miss-parsed names in `AutomaticPersonsGroup` (JabRef#7228)
  Fix an issue where the password for a shared SQL database was only remembered if it was the same as the username (JabRef#7364)
  Fix harvard exporter by changing AuthorsFormatter (JabRef#7355)
  Bump styfle/cancel-workflow-action from 0.6.0 to 0.7.0 (JabRef#7363)
  Bump mockito-core from 3.7.0 to 3.7.7 (JabRef#7360)
  Bump org.beryx.jlink from 2.23.1 to 2.23.2 (JabRef#7361)
  Bump libreoffice from 7.0.3 to 7.0.4 (JabRef#7362)
  Export full month name instead of number in ms office (JabRef#7358)
  ...

# Conflicts:
#	external-libraries.md
#	src/main/java/module-info.java
#	src/main/java/org/jabref/gui/shared/SharedDatabaseLoginDialogViewModel.java
Siedlerchr added a commit that referenced this pull request Feb 1, 2021
* upstream/master:
  Bump archunit-junit5-api from 0.15.0 to 0.16.0 (#7407)
  Bump classgraph from 4.8.98 to 4.8.102 (#7401)
  Bump archunit-junit5-engine from 0.15.0 to 0.16.0 (#7402)
  Bump mariadb-java-client from 2.7.1 to 2.7.2 (#7406)
  Bump org.beryx.jlink from 2.23.2 to 2.23.3 (#7400)
  Bump checkstyle from 8.39 to 8.40 (#7404)
  Ignore codecov status for automerge
  Fixes issue of Changing font size makes font size field too small (#7398)
  fix "Alt + keyboard shortcuts do not work" (#7379)
  Fixed invisible file path in the dark theme (#7396)
  Fix File Filter and some layout issues (#7385)
  Feature/implement complex queries (#7350)
  Change format for study definition to yaml (#7126)
  Fix handling of URL in file field (#7347)
  Fix expansion of bracketed expressions in RegExpBasedFileFinder (#7338)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finding files with non-alphabetical characters in title
4 participants