Skip to content

OAK-10590 - If includedPaths and excludedPaths are specified as a String instead of array of String, interpret them as a one-element array of Strings#1254

Merged
nfsantos merged 7 commits intoapache:trunkfrom
nfsantos:OAK-10590
Dec 21, 2023
Merged

Conversation

@nfsantos
Copy link
Copy Markdown
Contributor

@nfsantos nfsantos commented Dec 19, 2023

The includedPaths property of an index definition should be an array of strings. But a common mistake made by users is to define it as a String when it has a single element. That is, instead of:

        "includedPaths": [ "/a/b"] , 

it is defined as:

        "includedPaths": "/a/b", 

If includedPaths is defined as a String, the indexing job would ignore its value and instead default to use the root as the includedPaths, which results in downloading the full node store and creating an FFS containing everything (except hidden paths). This will slow down significantly the indexing job, as it will negate any benefits from using regex filtering. And even if regex filtering is not enabled or cannot be used, using / as includedPaths will also result in the FFS containing more nodes than it should, which will once again slow down the indexing job.

The same is true for excludedPaths, but in this case, the default value is an empty list of excludedPaths, so it will ignore the value of this property and will not exclude anything. This may result in parts of the node store being indexed that should not be indexed.

The handling of includedPaths and excludedPaths is also inconsistent with the handling of queryPaths, which is being interpreted as a one-element array if it is defined as a String.

This PR makes the logic that reads the includedPaths and excludedPaths properties more lenient, by treating Strings as one-element arrays and issuing a warning with a suggested fix.

Additionally, the PR makes some minor cleanups in the files that had to be modified (remove usages of Guava and fix some compilation warnings).

…a String instead of array of String, log a warning and treat them as a one-element array instead of assuming that it they are not defined.
return property.getValue(Type.STRINGS);
} else if (property != null && property.getType() == Type.STRING) {
String value = property.getValue(Type.STRING);
LOG.warn("Property \"{}\"=\"{}\" has type String but it should be array of String. Proceeding by treating it as a " +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably not log a warning. Instead, I would document that this is also supported.

Reason: we already have many many entries that are defined as string (mostly "/dummy") instead of array.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it public and add a unit test case for it.

…as being done incorrectly, by setting the property as a string instead of array of strings.
…ept a string value as being a one-element array (NodeCounterMBeanEstimator).

Other refactoring to clean up the code.
…TRING instead of STRINGS.

Simplified the logic of extracting a list of strings from a property that can be STRING or STRINGS by taking advantage of the default value of PropertyState.getValue(STRINGS), which handles the case of the property being a STRING.
@nfsantos nfsantos changed the title OAK-10590 - Indexing job downloads and creates FFS with full node store if includedPaths is specified as a string instead of array of strings OAK-10590 - If includedPaths and excludedPaths are specified as a String instead of array of String, interpret them as a one-element array of Strings Dec 21, 2023
@nfsantos nfsantos merged commit e9bb593 into apache:trunk Dec 21, 2023
@nfsantos nfsantos deleted the OAK-10590 branch December 21, 2023 17:32
rishabhdaim pushed a commit that referenced this pull request Jan 25, 2024
…ing instead of array of String, interpret them as a one-element array of Strings (#1254)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants