Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: nuget restore thread safety #7060

Open
ticktickMOF opened this issue Jun 26, 2018 · 28 comments
Open

Question: nuget restore thread safety #7060

ticktickMOF opened this issue Jun 26, 2018 · 28 comments
Labels
Functionality:Restore Priority:3 Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog. Resolution:Question This issues appears to be a question, not a product defect SeQuality Type:Docs

Comments

@ticktickMOF
Copy link

are nuget.exe restore, dotnet restore, msbuild /restore thread safe across processes? Like if I have 2 msbuild /restore instances running in parallel on different projects that consume the same nuget package do I need to worry about them trying to write the same files at the same time?

@lanfeng
Copy link

lanfeng commented Jun 26, 2018

It might be related to issue #7020

@nkolev92 nkolev92 added Resolution:Question This issues appears to be a question, not a product defect Functionality:Restore labels Jun 27, 2018
@nkolev92
Copy link
Member

@lanfeng
I don't think they are related.

@ticktickMOF

NuGet does have some cross process locking.

For example, two different processes could attempt to install to the same directory and it should work.
The same holds for writing assets files and g.props and g.targets.

The biggest takeaway, is that small individual operations are thread safe. The whole operation might not be.

In PackageReference, consuming merely means enumerating the directory files for an already installed (in the global packages folder) package. In packages.config, this could mean copying.

Please note that having multiple restores at the same time, means a lot of http connections created, which could lead to #6742.

tl;dr;

Should work, but there are a lot of caveats.

What's the reason you would want to do that?

@ticktickMOF
Copy link
Author

we have a parallelized build process that builds ~300 projects in a non-persistent VM (no global nuget cache). It takes a long time to run nuget restore on each project in serial before spawning the task executors and I was hoping that it might be faster to just allow msbuild to do the restore when each project is built by adding /restore to the command line.

@mishra14
Copy link
Contributor

@ticktickMOF I think it should be safe to run msbuild /t:restore /m <SOLUTION.sln>

@nkolev92
Copy link
Member

nkolev92 commented Jul 1, 2018

@ticktickMOF
You can do what @mishra14 suggested.

Additionally, which version of Visual Studio are you using?
Do you have restore perf problems locally? Or is it just the CI?

@cortex93
Copy link

We have an issue with multiple TFS agent on the same machine. If two builds run at the same time, nuget restore may fail on locked files and will leave the cache broken.
For example :
2018-09-12T13:05:53.9490262Z Installing Microsoft.EntityFrameworkCore.SqlServer 2.1.2. 2018-09-12T13:05:54.0271496Z C:\Program Files\dotnet\sdk\2.1.401\NuGet.targets(114,5): error : Could not find file 'C:\Users\srv_TFSBuild\.nuget\packages\microsoft.entityframeworkcore.design\2.1.2\1glqnhg4.izi'. [C:\agent2\_work\37\s\MyProject.sln] 2018-09-12T13:05:54.0896572Z Retrying 'FindPackagesByIdAsyncCore' for source 'http://nuget.org/api/nuget/nuget-buildagent/FindPackagesById()?id='Microsoft.EntityFrameworkCore.Relational'&semVerLevel=2.0.0'. 2018-09-12T13:05:54.0896572Z The process cannot access the file 'C:\Users\srv_TFSBuild\AppData\Local\NuGet\v3-cache\d61d78efe08f5f75cf81bb3d1393234a5ff30750$s.com_api_nuget_nuget-buildagent\list_microsoft.entityframeworkcore.relational_page1.dat-new' because it is being used by another process.

Then, all my build failed with
C:\Program Files\dotnet\sdk\2.1.401\NuGet.targets(114,5): error NU5000: Nuspec file does not exist in package.

I must clear the local cache to make it build again

@robsonj
Copy link

robsonj commented Oct 29, 2018

We see this same error under the same circumstances with TeamCity

@genscape-agodfrey
Copy link

Reporting in with the same issue

@genscape-agodfrey
Copy link

genscape-agodfrey commented Feb 22, 2019

I am now adding -NoCache and -DirectDownload to my nuget restores on my build server. The "isolated mode" is described here:
NuGet/NuGet.Client#856

@imanushin
Copy link

@genscape-agodfrey, yes, this can be workaround, however it does not solve issue (from my point of view).

We use the same script for the local build and for TeamCity build (for consistency of course). And sometimes TeamCity compiles the same project from the different branches in parallel (our TeamCity virtual machine has multiple TeamCity agents). And it leads to build failure exact with "v3-cache" error.

Gradle is able to work with caches in parallel because of the following logic: it waits shared file unlock for the several minutes.

E.g. instead of code line if (someResource.IsLocked) right code line is if(SpinWait.SpinUntil(()=> someResource.IsLocked), TimeSpan.FromMinutes(2)) (link to SpinWait.SpinUntil). This behavior will mitigate shared resource lock issue.

So question for the Nuget repository owners: is it reasonable to just change code above to the following logic?

@marcin-krystianc
Copy link

We've had the same problem on our build servers so I've spent some time investigating this and here is what I've found out.

Theoretically, running multiple restore operations should be ok because there is a proper cross-process locking mechanism in place.
Unfortunately due to its implementation details it is easy to accidentally render it useless and here is why.

To acquire a lock, NuGet translates the file path that is going to be locked into a new path which is actually used for the lock file (ConcurrencyUtilities.cs#L242-L268). Then the exclusive file descriptor for a lock file is created. If the operation succeeds then it means that the lock is acquired, if it doesn't then the operation is repeated with some delay (ConcurrencyUtilities.cs#L134-L171).

The problem is that lock files are not created in the location of global packages cache, but they are created in the temp folder ⚠️
So if there are concurrent restore operations running, which use different temp paths but they share the same global packages cache, the cache is not really protected with the locking mechanism at all (that was causing issues for us).

Another problem is that this PR (merged at 17 Aug 2020) introduced backwards incompatibility into locking mechanism. Due to this change it is not possible to run safely concurrent restores when two incompatible nuget clients are used at the same time (e.g. NuGet v5.7.x/dotnet v3.x and NuGet v5.8.x/dotnet v5.x).

@nkolev92
Copy link
Member

nkolev92 commented Dec 1, 2020

The problem is that lock files are not created in the location of global packages cache, but they are created in the temp folder ⚠️

I'm not confident that's a problem.
NuGet writes in many different locations, so the locking is performed on different levels.
For example, some of those particular locations are the msbuildprojecteextensionspath where the assets file, nuget.g.props/nuget.g.targets are written and the global packages folder.

We use the temp to lock across all locations, so it's really a design decision. It's a NuGet contract with NuGet, no right or wrong imo.

Another problem is that this PR (merged at 17 Aug 2020) introduced backwards incompatibility into locking mechanism. Due to this change it is not possible to run safely concurrent restores when two incompatible nuget clients are used at the same time (e.g. NuGet v5.7.x/dotnet v3.x and NuGet v5.8.x/dotnet v5.x).

Yeah, that's unfortunate :(

@marcin-krystianc
Copy link

We use the temp to lock across all locations, so it's really a design decision. It's a NuGet contract with NuGet, no right or wrong

Yes, I partially agree with you. I think that the downside of it that it is unknown to users which leads to problems.
E.g. when I run dotnet nuget locals -l all I get:

http-cache: C:\Users\<MyUser>\AppData\Local\NuGet\v3-cache
global-packages: C:\Users\<MyUser>\.nuget\packages\
temp: C:\Users\<MyUser>\AppData\Local\Temp\NuGetScratch
plugins-cache: C:\Users\<MyUser>\AppData\Local\NuGet\plugins-cache

nothing really suggests that temp location is so important and used as a location of lock files. Therefore the contract that requires me to use the same temp location for all concurrent NuGet operations (when there are any other shared locations used) is not really visible in any way.

I think if the output was e.g.:

http-cache: C:\Users\<MyUser>\AppData\Local\NuGet\v3-cache
global-packages: C:\Users\<MyUser>\.nuget\packages\
temp: C:\Users\<MyUser>\AppData\Local\Temp\NuGetScratch
lock: C:\Users\<MyUser>\AppData\Local\Temp\NuGetScratch\lock
plugins-cache: C:\Users\<MyUser>\AppData\Local\NuGet\plugins-cache

it would be clear that there is dedicated location for locking.

If you think that it is good idea to add this a can make a PR.

@nkolev92
Copy link
Member

nkolev92 commented Dec 2, 2020

I think that's exposing NuGet internals to customers that are probably not interested in that detail.
I think we can do a better job of providing guidance on running multiple nuget operations concurrently, but not sure we need a separate location for that.

cc @JonDouglas @zivkan

@dedale
Copy link

dedale commented Feb 26, 2021

We solved this problem with a custom MSBuildWithMutex task.
https://gist.github.com/dedale/675ec80313f2a70266deb0ab78a0e2c6

@japj
Copy link

japj commented Apr 16, 2021

@nkolev92 how will the locking in the TEMP folder interact with having multiple azure-devops build agents (that have their own isolated TEMP folders).

We see some weird "concurrency" writing problems if 2 agents perform a restore related to the same package that eventually results in a next build to fail with an "Error NU5037: The package is missing the required nuspec file."

@marcin-krystianc
Copy link

@japj According to my findings if you have agents using their own TEMP folders but the same global packages folder then the package extraction into global packages folder is not thread safe.
Also if you look at the implementation it becomes clear that if two processes try to extract same package at the same time, the second process is going to delete files that are already extracted by the first process. Thus the missing .nuspec file issue is actually very likely to happen.

@nkolev92
Copy link
Member

@dedale
Copy link

dedale commented Apr 30, 2021

@nkolev92 not sure which files this code refer to, but before we definitely fixed the problem with a custom MSBuild task using a mutex, the problem was exactly what @marcin-krystianc describes: located in the user packages cache (%USERPROFILE%\.nuget), occurring especially when packages were updated and the symptoms were missing files or access denied because of two NuGet restore trying to unzip / clean in parallel.

@nkolev92
Copy link
Member

nkolev92 commented Apr 30, 2021

@dedale
Was the temp folder the same for the invocations?

I also looked at your gist, and I noticed:

<MSBuild Projects="$(MSBuildProjectFullPath)" Targets="Restore" Properties="$(PackageRestoreProperties)" BuildInParallel="true" />

If you execute restore on a solution, it only runs 1 task per invocation, rather than 1 per project.
It's significantly more efficient when the restore task gets to parallelize the projects and not MSBuild.

@dedale
Copy link

dedale commented Apr 30, 2021

@dedale
Was the temp folder the same for the invocations?

I also looked at your gist, and I noticed:

<MSBuild Projects="$(MSBuildProjectFullPath)" Targets="Restore" Properties="$(PackageRestoreProperties)" BuildInParallel="true" />

If you execute restore on a solution, it only runs 1 task per invocation, rather than 1 per project.
It's significantly more efficient when the restore task gets to parallelize the projects and not MSBuild.

I realize this gist is not up-to-date with the latest version of my hack.
Now we are using a class MSBuildWithMutex that inherits from default MSBuild task otherwise global properties are lost.

In our CI setup, every agent running on the same machine is using its own TEMP folder.

@dedale
Copy link

dedale commented Apr 30, 2021

We solved this problem with a custom MSBuildWithMutex task.
https://gist.github.com/dedale/675ec80313f2a70266deb0ab78a0e2c6

I have updated my gist using a child class of MSBuild class (previous approach with a UsingTask and C# TaskFactory did not work well because global properties are lost).

@nkolev92
Copy link
Member

In our CI setup, every agent running on the same machine is using its own TEMP folder.

If you refer to #7060 (comment), NuGet depends on a shared temp folder as a locking mechanism.

@marcin-krystianc
Copy link

@nkolev92 Not sure I follow your conclusion.

I was referring to the case where there are two processes which use different TEMP folder (so locking will not actually work) but same global packages folder.
In such scenario:

  • first process will start the extraction by creating the *.nuspec file - link
  • second process will notice that the installation folder is not empty so it will try to clean it up - link
  • the first process continues to extract the rest of the NuGet package - it is not aware that the *.nuspec file is already removed by the second process - link

So the consequence of two processes simultaneously installing the same package (when these processes use different TEMP folder so locking mechanism will not actually work) is that the installation is going to be corrupted and the *.nuspec file is going to be missing.

@nkolev92
Copy link
Member

nkolev92 commented May 4, 2021

two processes which use different TEMP folder

Ahh, I totally misread your original statement, mb.

@marcin-krystianc
Copy link

I've just realized that, there was recently another change that breaks compatibility of inter-process locking mechanism on Linux (Default location of NuGetScratch folder has changed) -> NuGet/NuGet.Client@288a479
This has been broken so many times, maybe it is ok to break it again and fix it once for all? For example we could move the location of lock files for inter-process locking mechanism of global packages, from NuGetScratch to the global packages?

@jeffkl jeffkl added Priority:3 Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog. and removed Pipeline:Icebox labels Apr 3, 2024
@darthkurak
Copy link

@marcin-krystianc
Did you manage work it on two different CI agents which share the same cache?
I have CI agents on Kubernetes. Had volume mapping:

- name: dotnet-shared-cache-nuget-packages
   mountPath: /root/.nuget/packages/

and had from time to time similar errors as here. I changed that mapping to:

          - name: dotnet-shared-cache-nuget-packages
            mountPath: /root/.nuget/packages/
          - name: dotnet-shared-cache-nuget-scratch
            mountPath: /tmp/NuGetScratchroot/
          - name: dotnet-shared-cache-nuget-http
            mountPath: /root/.local/share/NuGet/http-cache/
          - name: dotnet-shared-cache-nuget-plugins
            mountPath: /root/.local/share/NuGet/plugin-cache/

and after this, from time to time, I have errors similar to this:
Unhandled exception: System.IO.IOException: The process cannot access the file '/root/.nuget/packages/husky/0.7.1/husky.0.7.1.nupkg' because it is being used by another process.
Is it actually possible to configure Nuget in such a way that the cache will work on two different agents?

@marcin-krystianc
Copy link

@marcin-krystianc Did you manage work it on two different CI agents which share the same cache? I have CI agents on Kubernetes. Had volume mapping:

- name: dotnet-shared-cache-nuget-packages
   mountPath: /root/.nuget/packages/

and had from time to time similar errors as here. I changed that mapping to:

          - name: dotnet-shared-cache-nuget-packages
            mountPath: /root/.nuget/packages/
          - name: dotnet-shared-cache-nuget-scratch
            mountPath: /tmp/NuGetScratchroot/
          - name: dotnet-shared-cache-nuget-http
            mountPath: /root/.local/share/NuGet/http-cache/
          - name: dotnet-shared-cache-nuget-plugins
            mountPath: /root/.local/share/NuGet/plugin-cache/

and after this, from time to time, I have errors similar to this: Unhandled exception: System.IO.IOException: The process cannot access the file '/root/.nuget/packages/husky/0.7.1/husky.0.7.1.nupkg' because it is being used by another process. Is it actually possible to configure Nuget in such a way that the cache will work on two different agents?

Both agents not only need to share NuGet folder but also the TEMP folder (because lock files for interposes synchronisation are created in the TEMP/NuGetScratch folder ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Functionality:Restore Priority:3 Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog. Resolution:Question This issues appears to be a question, not a product defect SeQuality Type:Docs
Projects
None yet
Development

No branches or pull requests