Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ContinuousIntegrationBuild=true produces nondeterministic source paths by default #55860

Open
jnm2 opened this issue Aug 19, 2021 · 23 comments
Open
Assignees
Milestone

Comments

@jnm2
Copy link
Contributor

jnm2 commented Aug 19, 2021

I think this is an SDK bug. The workaround is either to add a specific SourceLink provider package reference (which may be redundant and may even be problematic or impossible) or to add a <SourceRoot Include="$(SolutionDir)" /> item or similar. The actual directory does not matter so long as it contains all source files not already covered by a SourceRoot item.

Without the workaround, you can see absolute paths leaking from the CI image, and NuGet Package Explorer also flags the resulting assembly as failing the determinism check:
image

Using the workaround, determinism is achieved:
image

If no specific SourceLink provider package is referenced in a project, the SDK should add a SourceRoot directory by default containing all source files. This should work the same way whether or not the project is even contained in a source-control repository of any kind. Alternatively, source files outside the solution directory should not be mapped and should cause build warnings when ContinuousIntegrationBuild=true.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@marcpopMSFT marcpopMSFT self-assigned this Aug 24, 2021
@marcpopMSFT marcpopMSFT transferred this issue from dotnet/sdk Aug 24, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Aug 24, 2021
@marcpopMSFT marcpopMSFT removed their assignment Aug 24, 2021
@marcpopMSFT
Copy link
Member

I think that ContinuousIntegrationBuild is used in Microsoft.Managed.Core.targets which comes from Roslyn.

@jaredpar
Copy link
Member

@tmat is this the right place for the issue?

@tmat
Copy link
Member

tmat commented Aug 24, 2021

I'm not sure what the ask is. Setting ContinuousIntegrationBuild does not imply the resulting binaries will be deterministic. ContinuousIntegrationBuild only specifies that build is running on CI allowing other targets to adjust their behavior accordingly.

@tmat
Copy link
Member

tmat commented Aug 24, 2021

the SDK should add a SourceRoot directory by default containing all source files

The SDK doesn't know what that directory is. The directory containing the solution does not need to contain all the source files.
Consider, e.g. repository layout repo\src containing all sources and repo\solutions containing solution files.

@jaredpar jaredpar added Need More Info The issue needs more information to proceed. and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Aug 24, 2021
@clairernovotny
Copy link
Member

Where does that come from? The issue is that without using any of the Sourcelink.* NuGet packages, and with EmbedAllSources set to true, the paths are still non-deterministic without the SourceRoot's set. The ask is that the current directory be set as a source root by default so the paths will be deterministic. When they're embedded, things like submodules and paths to a git repo are irrelevant.

@tmat
Copy link
Member

tmat commented Aug 24, 2021

The project can set <SourceRoot Include="$(RepoRoot)" /> to make them deterministic in that case. The SDK does not have a concept of repository root without Source Link. Source Link determines that based on location of .git directory.

BTW, the paths are not irrelevant even when the source files are embedded in the PDB. The paths can still be used by the program via CallerFileNameAttribute.

@clairernovotny
Copy link
Member

@tmat should the SDK have that concept though? Seems like determinism as a concept does not require SourceLink. And if the paths are different between builds, then it's not deterministic since the paths are local machine paths. I do think this should be an SDK concept.

@tmat
Copy link
Member

tmat commented Aug 25, 2021

Seems like determinism as a concept does not require SourceLink.

Indeed, determinism doesn't require Source Link. Setting <SourceRoot Include="RepoRoot" /> manually in Directory.Build.props is the way to make paths deterministic in the repo. The SDK could add a property for "repository root". But the customer would need to set it anyways in some way. How else would the SDK determine it? Perhaps the directory containing global.json could be considered root? If that works then SourceRoot could be initialized based on that by default.

@clairernovotny
Copy link
Member

I guess what I'm looking for is a "secure by default" "it just works" mechanism.

@jmarolf
Copy link
Contributor

jmarolf commented Aug 25, 2021

It sounds like this should be moved to https://github.com/dotnet/sdk?

@jnm2
Copy link
Contributor Author

jnm2 commented Aug 25, 2021

The SDK doesn't know what that directory is. The directory containing the solution does not need to contain all the source files.
Consider, e.g. repository layout repo\src containing all sources and repo\solutions containing solution files.

That's why I said the SDK could either:

  • Pick the nearest common ancestor of all source files that aren't already accounted for by other SourceRoot items
  • Or, default to the solution directory if there are any source files that aren't already accounted for by other SourceRoot items, and then add build warnings if there were source files outside the solution directory which instruct you to add SourceLink or specify one or more SourceRoot items manually.

If I haven't referenced a provider-specific SourceLink package, the notion of a source control repository is irrelevant for the purposes of build determinism. All that matters is that paths are deterministic between builds.

@tmat
Copy link
Member

tmat commented Aug 25, 2021

Pick the nearest common ancestor of all source files that aren't already accounted for by other SourceRoot items

This is not sufficient. Source files may contain #line directives that specify paths that need to be accounted for as well.
Only the compiler has knowledge of these paths, the SDK does not.

@jnm2
Copy link
Contributor Author

jnm2 commented Aug 25, 2021

I see. So no matter what the default SourceRoot strategy is (including not having a default), files could end up outside it. You've also raised the point that original file paths get embedded via CallerFileNameAttribute which is a good one. (NuGet Package Explorer might be able to heuristically check this with decent accuracy by scanning for absolute paths in the metadata string heap and then scanning all method bodies for ldstr commands using those strings as method invocation arguments and checking/guessing whether the parameter has [CallerFileName]. Not sure if this can be fast.)

Can the SDK provide build-time warnings if absolute source paths don't leak into the outputs, and maybe sensible defaults that cover the 99% case of all source files being under the solution directory tree or already accounted for by SDK SourceRoot items?

@tmat
Copy link
Member

tmat commented Aug 25, 2021

You've also raised the point that original file paths get embedded via CallerFileNameAttribute

Just to be clear - the compiler applies PathMap on these paths. PathMap is set based on SourceRoots.

Can the SDK provide build-time warnings if absolute source paths don't leak into the outputs

The compiler could (the SDK does not know all the paths) but it wouldn't be by default since that would be a breaking change. Perhaps it could be by default for new TFMs.

What are we actually trying to address here? I don't think most customers care about the paths. I believe we have guidelines on authoring high quality packages. Those guidelines include requirement to have Source Link. Once you reference a Source Link package the SourceRoots are automatically set.

@clairernovotny
Copy link
Member

@tmat We don't need SourceLink, that's not what's required for a high quality package -- what is required is access to the full source. When EmbedAllSoruces is set to true, we get that. The issue is that the paths are still not deterministic unless the SourceRoot is set, which the SDK does not do. I don't think we should be requiring anything outside the SDK in this scenario.

@tmat
Copy link
Member

tmat commented Aug 26, 2021

I see, so it's just the EmbedAllSources case. One way of implementing this could be to allow specifying * or auto in /pathmap (along with other existing mappings). Presence of such token would tell the compiler to automatically find the longest common prefix of all paths that do not match any mapping specified in /pathmap and trim that prefix (and normalize the directory separators in the resulting paths to /). Alternatively, adding a new switch /pathmapfallback:(auto|error|none), which would also allow to opt-into making it an error if some path doesn't match any mapping.

The SDK could then include auto in /pathmap or specify /pathmapfallback as needed (e.g. if EmbedAllSources and Deterministic is specified).

@ghost ghost added the fabric-bot-test Testing the impact of changes to the fabric bot label Sep 8, 2021
@ghost ghost closed this as completed Sep 18, 2021
@ghost
Copy link

ghost commented Sep 18, 2021

Closing this issue as we've seen no reply to the request for more information. If you are able to get the requested information, please add it to the issue and we will retriage it.

@jnm2 jnm2 reopened this Sep 18, 2021
@jnm2
Copy link
Contributor Author

jnm2 commented Sep 18, 2021

This is going to be fun.

@ghost ghost added the untriaged Issues and PRs which have not yet been triaged by a lead label Sep 18, 2021
@ghost ghost removed the Need More Info The issue needs more information to proceed. label Sep 18, 2021
@jaredpar jaredpar removed fabric-bot-test Testing the impact of changes to the fabric bot untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 7, 2021
@jaredpar
Copy link
Member

jaredpar commented Oct 7, 2021

Wanted to ping on this issue. What are the next steps here? Unclear from the convo if we believe this is an SDK issue or something we should be mitigating in the compiler.

@jaredpar jaredpar added the Need More Info The issue needs more information to proceed. label Oct 7, 2021
@tmat
Copy link
Member

tmat commented Oct 7, 2021

As noted in my comment I think we may need to enhance /pathmap a bit.

@ghost ghost added untriaged Issues and PRs which have not yet been triaged by a lead and removed Need More Info The issue needs more information to proceed. labels Oct 7, 2021
@jaredpar
Copy link
Member

jaredpar commented Oct 8, 2021

@tmat

Presence of such token would tell the compiler to automatically find the longest common prefix of all paths that do not match any mapping specified in /pathmap and trim that prefix (and normalize the directory separators in the resulting paths to /)

Consider the case where the Compilation was consuming code from multiple drives. Say c:\src\hello.cs and d:\src\world.cs. In that case there is no longest common prefix. I think we could fix this though by using the longest common prefix per path root. That should work ... but possible it creates duplicate paths. Not sure if that is something to worry about

@jaredpar jaredpar added Bug and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 8, 2021
@jaredpar jaredpar added this to the 17.1 milestone Oct 8, 2021
@tmat
Copy link
Member

tmat commented Oct 8, 2021

In that case there is no longest common prefix. I think we could fix this though by using the longest common prefix per path root.

Yeah. After trimming each path we could prefix it with /N/, where N is smallest such that /N/ does not match any existing mapped path, or something like that. If we have just one drive then all automatically trimmed paths would be /0/-prefixed, for 2 drives would have /0/- and /1/- etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants