Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Pruning generated files for relative operations #2994

Merged

Conversation

BobaFetters
Copy link
Member

Updating code generation to always use resolved file paths when writing files, and getting lists of generated files to account for errors with files being deleted by mistake when symlink directories are used.

closes #2969

Resolving symlinks in file paths when writing files and getting lists of generated files.
Adding a test to validate code generation when symlink directories are involved
@netlify
Copy link

netlify bot commented May 3, 2023

Deploy Preview for apollo-ios-docs canceled.

Name Link
🔨 Latest commit 05b9e89
🔍 Latest deploy log https://app.netlify.com/sites/apollo-ios-docs/deploys/6453c05df344b30008cca824

@@ -177,7 +177,7 @@ public struct Glob {
excludedSet.intersection(url.pathComponents).isEmpty
else { continue }

directories.append(url)
directories.append(url.resolvingSymlinksInPath())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change in ApolloCodegen to resolve symlinks does not actually solve the issue of /private/var vs /var paths in test because neither is marked as a symlink. However, running .resolvingSymlinksInPath() does remove /private from paths when it exists, so to be able to run tests for this issue I also had to change this line.

Thoughts on this? Or other suggestions of maybe handling this?

return existingPaths
}

private static func resolveSymlinksInPaths(_ paths: inout Set<String>) throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks like it's just removing symlink paths, not actually resolving the symlinks. Is that correct?

If so, we should probably change the name of the function. But there is also something awkward about this, because there is implicit knowledge that the paths returned from the glob have both the path with and without the symlink in the Set. Is that correct?

I'm feeling like we should be expressing this better and removing the need for implicit knowledge here. Especially since you are now calling resolvingSymlinksInPath() inside of the Glob code.

If we are going to resolve the symlinks in the Glob, I'm not even clear why we also need this here? I know we had talked about this being the responsibility of the Codegen, not the Glob implementation, but it feels like we are violating that anyways now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point renaming the function would make since. And yes, the issue currently is that both a real path and symlink path to a file exist in the Set which causes the deletion.

Also the reason resolvingSymlinksInPath() is called in Glob is specifically due to needing to make sure paths in tests are resolved from /private/var/.... to just /var/...., however that change would not actually fix the non-test issue because the directoryURL used at that point in the code would not actually be the symlinked path. The code I showed you earlier where we could do the symlink resolution in Glob.expand by building out the paths before adding them to the Set in the return could potentially solve for both the test and non-test issue, will have to check to verify, and if so we may just need to do that since it appears we will need some change inside of Glob anyways which is what we were trying to avoid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind if we do this in the Glob, but I feel like we need a clear definition of what we are actually doing.

The real issue here is one of separation of responsibilities. It's a current behavior that the output of the Glob has duplicated paths. That is – two paths that actually point to the same place because of symlinks. If we decide that is an acceptable behavior of Glob, because it is just used to pattern match the paths, and there are two paths there technically, then that's okay. Let's document that behavior and then deduplicate the paths in ApolloCodegen, as we are doing. In which case the fix for the unit tests /private/var needs a different fix probably.

I think it's also reasonable to say that the behavior of the Glob struct should do the resolving of symlinks and deduplication of paths, ensuring that its output only points to a single file one time, regardless of how many different paths are symlinked to the file's actual path. If we do that, we can fix the problem in one place, and we should be documenting (and probably testing) that as an intended and defined behavior of our Glob matching object.

I don't have a strong opinion on where we do this, only that we do it in a way that defines a responsibility of a single object that is sensible; is documented; is tested; and works in consistently in every case (of course bugs are bugs, so "every case" means "every known case").

I think I might be misunderstanding or missing some information here, but having the symlink deduplication in the codegen but also resolving symlinks in the Glob seems to be muddying the single responsibility principle here.

If there is a specific odd edge case where /private/var isn't being picked up as a symlink properly we can hard code in a fix for the edge case. But if we are going to resolve all symlinks lets have a clear definition of what that means, where it's happening, and why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built Glob to be a pattern matching engine; it expands given wildcards and outputs matching paths. IMO those should be resolved symlinks and there should not be any duplicates. Glob is at the crux of path discovery here and it feels that should be doing the work. I don't think there is ever a need for us to know both the symlink and actual path?

expect(ApolloFileManager.default.doesFileExist(atPath: fileValidationPath)).to(beTrue())

try ApolloCodegen.build(with: config, rootURL: directoryURL)
expect(ApolloFileManager.default.doesFileExist(atPath: fileValidationPath)).to(beTrue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming here, without the code added in this PR, this test passes on the first expectation, fails the second, and passes the third, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Moved checks for removing symlinks all to the `Glob` class in the `expand()` function
@@ -185,7 +186,15 @@ public struct Glob {
}

return OrderedSet<String>(directories.compactMap({ directory in
var path = directory.appendingPathComponent(lastPart).standardizedFileURL.path
// build directory URL up the the '*' in the pattern, and check if it is a symlink, if so skip it
let symCheckURL = directory.appendingPathComponent(lastDirComponent).standardizedFileURL
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnthonyMDev this block can run the same symlink check for production use cases we previously had in the ApolloCodegen class. But keeps all symlink related checks contained within this compactMap call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I may be able to build the url up to the * in the pattern like we are doing here, and then run the .resolvingSymlinksInPath on that URL, and then appending the final path component which could cover both production and test use cases without needing to access resource keys and result in less code. Going to look into that now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would look something like this although maybe cleaned up a little:

let last = parts.joined(separator: String.Globstar)
let lastParts = last.components(separatedBy: "*")
let lastDirComponent = lastParts.first ?? ""
let lastPart = lastParts.last ?? ""
....
....
....
let url = directory.appendingPathComponent(lastDirComponent).standardizedFileURL.resolvingSymlinksInPath()
var path = url.appendingPathComponent("*\(lastPart)").standardizedFileURL.path

return nil
}

var path = directory.resolvingSymlinksInPath().appendingPathComponent(lastPart).standardizedFileURL.path
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnthonyMDev Adding .resolvingSymlinksInPath() here is still necessary to resolve test use cases since the above code will never show the /private/var/.... path as a symlink, but this call still removes the /private part of the path based on the documentation for the function call

Copy link
Member

@calvincestari calvincestari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good Zach - thanks! Are you still iterating on the Glob code to simplify it?

Refactoring the symlink check so that the same code handles both production and test use cases instead of needing 2 different pieces of code.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calvincestari I ended up refactoring the Glob code to this so that the same code in the compactMap can handle both production and test use cases instead of checking resource keys for production and then always still resolving symlinks. So just tweaking how the path is built allows for just resolving symlinks for every path that comes through.

@BobaFetters BobaFetters marked this pull request as draft May 4, 2023 14:06
@BobaFetters
Copy link
Member Author

Converting to draft because I need to address some test failures

Refactored symlink resolution to filter out of the final `includedPaths` list of files
@BobaFetters
Copy link
Member Author

Ok after re-reading the documentation for .resolvingSymlinksInPaths() I realized it only works only file: path schemes, so was able to move this check to filter the includedPaths list before returning it from the match function which ends up with a clean solution to this.

@BobaFetters BobaFetters marked this pull request as ready for review May 4, 2023 14:28
@BobaFetters BobaFetters merged commit 4971bf1 into main May 4, 2023
15 of 16 checks passed
@BobaFetters BobaFetters deleted the fix/pruning-generated-files-for-relative-operations branch May 4, 2023 17:33
This was referenced May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pruning generated files for .relative(subpath:) operations deletes only
3 participants