Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CURR_DIR_PATH detection in FileUtils should be optimized #520

Closed
elkman opened this issue May 12, 2021 · 15 comments
Closed

CURR_DIR_PATH detection in FileUtils should be optimized #520

elkman opened this issue May 12, 2021 · 15 comments

Comments

@elkman
Copy link

elkman commented May 12, 2021

The initialization of nonapi.io.github.classgraph.utils.FileUtils.CURR_DIR_PATH has some problems and could be improved from my very limited point of view.

Here are some questions or suggestions that I am hoping to get feedback so I can submit a PR for this topic.

  1. If you run java with security manager enabled (i.e. a Tomcat instance), FileUtils may no have access to the current work directory and an AccessControlException (extends SecurityException) is thrown.
    I think in this case this should be handles like the IOException in the static initialize block.

  2. The comment for FileUtils.CURR_DIR_PATH says that it is initialized lazy, which is not true. It is initialized when the class is loaded. So the comment could be changed or the initialization could really be made the first time this field is accessed.

  3. I do not understand the sense for the multiple assignments to currDirPathStr ( FileUtils line 100, 102, and 104).
    In case of an IOException the class loading will fail due to the thrown ClassGraphException. In case of any other exception the class loading will fail too. So the FileUtils line 109 will only be reached if no error occurs and the assignment currDirPathStr = currDirPath.toString(); in line 104 defines the value assigned to CURR_DIR_PATH.

@lukehutch
Copy link
Member

lukehutch commented May 12, 2021

Hi @elkman,

Thanks for the feedback. You're right, some improvements could be made. Did you run into this because you were running with a security manager on Tomcat? What was the exact exception you saw? Did it prevent startup entirely?

Please take a look at the changes I just pushed.

On your point #3, this was left over from a time where I was trying to track down a current dir bug (on Windows, IIRC), and I had print statements between the lines. Turns out none of the method calls but the last one could throw an error or exception anyway, so I just combined them.

I also added a couple of other ways to determine the current directory.

@elkman
Copy link
Author

elkman commented May 13, 2021

Hi @lukehutch,

impressive response time =)

Did you run into this because you were running with a security manager on Tomcat? What was the exact exception you saw? Did it prevent startup entirely?

Yes, Tomcat was started with a working directory for which no read permissions were granted by the security manager policy. This is usually not critical for a web application as long as nothing tries to access this directory.
The integration of webjars-locator-core introduced classgraph into the application. Classgraph is used for webjars detection during Spring initialization, so the deployment of the webapp was canceled (not the Tomcat startup) by the ExceptionInInitializerError caused by constructor of ScanResult, if a SecurityException happend while loading FileUtils.

The trace looks like this:

Caused by: java.lang.ExceptionInInitializerError
    at io.github.classgraph.ScanResult.init(ScanResult.java:215) ~[classgraph-4.8.69.jar:4.8.69]
    at io.github.classgraph.ClassGraph.<init>(ClassGraph.java:91) ~[classgraph-4.8.69.jar:4.8.69]
    at org.webjars.WebJarAssetLocator.<init>(WebJarAssetLocator.java:150) ~[webjars-locator-core-0.46.jar:?]
...
Caused by: java.security.AccessControlException: access denied ("java.io.FilePermission" "/xxx/yyy" "read")
    at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:1.8.0_271]
    at java.security.AccessController.checkPermission(AccessController.java:886) ~[?:1.8.0_271]
    at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) ~[?:1.8.0_271]
    at java.lang.SecurityManager.checkRead(SecurityManager.java:888) ~[?:1.8.0_271]
    at sun.nio.fs.UnixPath.checkRead(UnixPath.java:795) ~[?:1.8.0_271]
    at sun.nio.fs.UnixPath.toRealPath(UnixPath.java:827) ~[?:1.8.0_271]
    at nonapi.io.github.classgraph.utils.FileUtils.<clinit>(FileUtils.java:93) ~[classgraph-4.8.69.jar:4.8.69]
    at io.github.classgraph.ScanResult.init(ScanResult.java:215) ~[classgraph-4.8.69.jar:4.8.69]
    at io.github.classgraph.ClassGraph.<init>(ClassGraph.java:91) ~[classgraph-4.8.69.jar:4.8.69]
    at org.webjars.WebJarAssetLocator.<init>(WebJarAssetLocator.java:150) ~[webjars-locator-core-0.46.jar:?]

Please take a look at the changes I just pushed.

d83ed1e is pretty much what I had in mind. It is lazy and quite fail-safe in case it is called.

But of course there are always a few suggestions when staring at other people's code:

path = FileSystems.getDefault().getPath(".");
could be replaced by
path = Paths.get(".");
which would be more consistent with Paths.get(""); used above.

Possibly it would be better to replace
currDirPathStr = path.toAbsolutePath().normalize().toRealPath(LinkOption.NOFOLLOW_LINKS)
by just
currDirPathStr = path.toRealPath(LinkOption.NOFOLLOW_LINKS)

I haven't tested it, but the JavaDoc says:

  1. toRealPath() calls toAbsolutePath() if needed.
  2. normalize() could lead to another file if symbolic links followed by .. are contained. toRealPath() also normalizes and seems to be a bit more reliable according to the JavaDoc: When not resolving symbolic links and the preceding name is a symbolic link then the names are only removed if it guaranteed that the resulting path will locate the same file as this path.

On your point #3, ...

Then I am relieved, I had wondered if there was something incredibly tricky going on and I just didn't understand it.

Unfortunately, I can't easily test the changes, as the problem only occurred on customer systems and it takes a bit more effort to reproduce the problem on our test systems. I hope I can give you some feedback next week.

Thanks for your great work!

@lukehutch
Copy link
Member

impressive response time =)

Yeah, I have gotten so tired of other projects taking forever to respond to bug reports (or never responding) that I try to always turnaround ClassGraph bugfixes within 24 hours if possible.

Thanks for the additional feedback. I checked in another version with your feedback incorporated (you're right on all counts), and some other tweaks.

I'm working on another bugfix in parallel with this one, but I'll try to get a release out soon.

@elkman
Copy link
Author

elkman commented May 17, 2021

Hi @lukehutch,

I agree that user.dir should be used in the first place. This should always be set correctly.

But I don't see why read access is required. In our Tomcat setup (managed Tomcat provided by the customers hosting provider) user.dir is set to the service user's home directory, but the default security policy only gives read access to catalina.home and catalina.base which are outside of the user.dir tree. I don't think this is a very common or standard setup, but it looks reasonable from a security point of view.

Is it really necessary for classgraph to have these permissions?

And just for interest: Are there any real filesystems that make a difference between "" and "."? If so, the three almost identical try-blocks could be extracted.

@lukehutch
Copy link
Member

lukehutch commented May 17, 2021

The current directory is needed for resolving the location in the filesystem of any jars found on the classpath, since paths on the classpath can be relative, not just absolute. If the classpath only contains absolute paths, then paths will not be resolved relative to the current directory. And any jars that cannot be read (checked using File.canRead()), because of a security policy, or because of directory permissions, will be skipped.

In general, the only things ClassGraph tries to read are jars named on the classpath. The only time ClassGraph tries to write (to a temporary file) is an obscure situation where nested jars are deflated rather than stored (so the inner jar has to be decompressed to be read), and the size of the nested jar is too large to fit within a defined RAM limit (this spillover to disk can be disabled if needed, but it's probably never triggered under most real-world situations, especially since most build systems that place jars inside jars, such as Spring, store rather than try to deflate the inner jars).

I didn't know if there are any real filesystems that make a difference between "" and "." -- I was just being conservative. But actually I checked the Javadoc for Path, and found: "A Path is considered to be an empty path if it consists solely of one name element that is empty. Accessing a file using an empty path is equivalent to accessing the default directory of the file system."

So I'll remove that unnecessary block. Thanks for the feedback!

lukehutch added a commit that referenced this issue May 17, 2021
@elkman
Copy link
Author

elkman commented May 17, 2021

The current directory is needed for resolving the location in the filesystem of any jars found on the classpath, since paths on the classpath can be relative, not just absolute. If the classpath only contains absolute paths, then paths will not be resolved relative to the current directory. And any jars that cannot be read (checked using File.canRead()), because of a security policy, or because of directory permissions, will be skipped.

But does this mean that read permissions are needed for the current directory if only the name is concatenated with a relative path from the classpath? As far as I understand, if classgraph needs to read lib/foo.jar (which is ${user.dir}/lib/foo.jar), no filesystem or security manager permissions are needed to read ${current.dir}.

PS: Are you sure you understand the concept of sleeping periods and weekends properly? ;)

@lukehutch
Copy link
Member

PS: Are you sure you understand the concept of sleeping periods and weekends properly? ;)

Haha, definitely not :-D

But does this mean that read permissions are needed for the current directory if only the name is concatenated with a relative path from the classpath? As far as I understand, if classgraph needs to read lib/foo.jar (which is ${user.dir}/lib/foo.jar), no filesystem or security manager permissions are needed to read ${current.dir}.

No, it never reads the home directory. It only uses it as a path prefix for relative paths. So if lib/foo.jar is the only classpath element that has a relative path, then ${user.dir}/lib/foo.jar is read, but ${user.dir} is never read.

@elkman
Copy link
Author

elkman commented May 18, 2021

But path.toFile().canRead() requires read permissions on ${user.dir} or "", which is more than classgraph requires.

lukehutch added a commit that referenced this issue May 18, 2021
@lukehutch
Copy link
Member

OK, I see what you are saying. I changed canRead() to exists(), but then I even removed that. I needed to also remove the toRealPath call, because that can throw SecurityException. Please let me know if you're happy with the current form.

@lukehutch
Copy link
Member

PS during scanning, when jar paths are tested for existence, after their path is resolved against the current directory, FileUtils.canRead is called rather than File.canRead:

    public static boolean canRead(final File file) {
        try {
            return file.canRead();
        } catch (final SecurityException e) {
            return false;
        }
    }

@elkman
Copy link
Author

elkman commented May 26, 2021

Sorry for the late response. Yes, this looks good for me and my quite special use case. However, I can't tell if this may cause problems for other people, since CURR_DIR_PATH is less validated during startup, which may cause something to fail later. But the current solution is simple and robust, so I would prefer it over a solution that might additionally deal with some rare setups and try to determine a most normalized path.

@lukehutch
Copy link
Member

@elkman OK, thanks for confirming. I don't think this will break anything for anyone else, as the classpath jars still have to pass the canRead test before they are opened, and the current dir is not used for anything else. I'll push out a new release soon, I'm just waiting for a response on another bug report first.

@lukehutch
Copy link
Member

Released in 4.8.106. Thanks @elkman for the bug report!

@elkman
Copy link
Author

elkman commented Jun 8, 2021

Sorry for the late response. Yes, this looks good for me and my quite special use case. However, I can't tell if this may cause problems for other people, since CURR_DIR_PATH is less validated during startup, which may cause something to fail later. But the current solution is simple and robust, so I would prefer it over a solution that might additionally deal with some rare setups and try to determine a most normalized path.

@lukehutch
Copy link
Member

That's OK, I think the reduced validation shouldn't cause a problem, because every jar on the class path also has to be validated with its complete path. Also you're about the third user of this library to report a bug based on the runtime lacking writable or readable directories. So I guess it's not that uncommon.

Thanks again for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants