Add named mutex for cross-process synchronization #5030

Merged
merged 1 commit into from May 20, 2016

Conversation

Projects
None yet
9 participants
@kouvel
Member

kouvel commented May 17, 2016

Fixes #3422

  • On systems that support pthread process-shared robust recursive mutexes, they will be used
  • On other systems, file locks are used. File locks unfortunately don't have a timeout in the blocking wait call, and I didn't find any other sync object with a timed wait with the necessary properties, so polling is done for timed waits.

Shared memory files:

  • Session-scoped mutexes (name not prefixed, or prefixed with Local) go in /tmp/coreclr/shm/session/
  • Globally-scoped mutexes (name prefixed with Global) go in /tmp/coreclr/shm/global/
  • Contains shared state, and is mmap'ped into the process, see SharedMemorySharedDataHeader and NamedMutexSharedData for data stored
  • Creation and deletion is synchronized using an exclusive file lock on the shm directory
  • Any process using the shared memory file holds a shared file lock on the shared memory file
  • Upon creation, if the shared memory file already exists, an exclusive file lock is attempted on it, to see if the file data is valid. If no other processes have the mutex open, the file is reinitialized.
  • Upon releasing the last reference to a mutex in a process, it will try to get an exclusive lock on the shared memory file to see if any other processes have the mutex opened. If not, the file is deleted, along with the session directory if it's empty. The coreclr and shm directories are not deleted.
  • This allows managing the lifetime of mutex state based on active processes that have the mutex open. Depending on how the process terminated, the file may still be left over in the tmp directory, I haven't found anything that can be done about that.

Lock files when using file locks:

  • In addition to the shared memory file, we need another file for the actual synchronization file lock, since a file lock on the shared memory file is used for lifetime purposes.
  • These files go in /tmp/coreclr/lockfiles/session|global/
  • The file is empty, and is only used for file locks

Process data

  • See SharedMemoryProcessDataHeader and NamedMutexProcessData for data stored
  • Per mutex name, there is only one instance of process data that is ref-counted. They are currently stored in a linked list in SharedMemoryManager. It should use a hash table, but of the many hash table implementations that are already there, none seem to be easily usable in the PAL. I'll look into that and will fix later.
  • Refers to the associated shared memory, and knows how to clean up both the process data and shared data
  • When using file locks for synchronization, a process-local mutex is also created for synchronizing threads, since file locks are owned at the file descriptor level and there is only one open file descriptor in the process per mutex name. The process-local mutex is locked around the file lock, so that only one thread per process is ever trying to flock on a given file descriptor.

Abandon detection

  • When a lock is acquired, the process data is added to a linked list on the owning thread
  • When a thread exits, the list is walked, each mutex is flagged as abandoned and released
  • For detecting process abruptly terminating, pthread robust mutexes give us that. When using file locks, the file lock is automatically released by the system. Upon acquiring a lock, the lock owner info in the shared memory is checked to see if the mutex was abandoned.

Miscellaneous

  • CreateMutex and OpenMutex both create new handles for each mutex opened. Each handle just refers to the process data header for the mutex name.
  • Some of the above features are already available in the PAL, but not quite in a way that I can use for this purpose. The existing shared memory, naming, and waiting infrastructure is not suitable for this purpose, and is not used.
@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 17, 2016

Member

@sergiy-k @jkotas @stephentoub

The change is undergoing further testing, flagged as [WIP] for now. It's done for the most part though, could you please take a look?

Member

kouvel commented May 17, 2016

@sergiy-k @jkotas @stephentoub

The change is undergoing further testing, flagged as [WIP] for now. It's done for the most part though, could you please take a look?

kouvel added a commit to kouvel/corefx that referenced this pull request May 17, 2016

@kouvel kouvel referenced this pull request in dotnet/corefx May 17, 2016

Merged

Fix some named mutex tests #8625

@jkotas

This comment has been minimized.

Show comment
Hide comment
Member

jkotas commented May 17, 2016

}
+#if !PLATFORM_UNIX
+ if (name != null && System.IO.Path.MaxPath < name.Length)

This comment has been minimized.

@stephentoub

stephentoub May 18, 2016

Member

Is there a length limitation on Unix, and if so, do we need to check for that here, too?

@stephentoub

stephentoub May 18, 2016

Member

Is there a length limitation on Unix, and if so, do we need to check for that here, too?

This comment has been minimized.

@stephentoub

stephentoub May 18, 2016

Member

Ah, nevermind, I see the checking for it below.

@stephentoub

stephentoub May 18, 2016

Member

Ah, nevermind, I see the checking for it below.

src/mscorlib/src/mscorlib.txt
@@ -2138,7 +2138,8 @@ event_Barrier_PhaseFinished=Barrier finishing phase {1}.
#if PLATFORM_UNIX
; Unix threading
-PlatformNotSupported_NamedSynchronizationPrimitives=Named synchronization primitives are not supported on this platform.
+PlatformNotSupported_NamedSynchronizationPrimitives=The named version of this synchronization primitive is not supported on this platform.
+PlatformNotSupported_NamedSyncObjectWaitAnyWaitAll=Wait operations on multiple wait handles including a named synchronization primitive, are not supported on this platform.

This comment has been minimized.

@stephentoub

stephentoub May 18, 2016

Member

Nit: the comma can be removed

@stephentoub

stephentoub May 18, 2016

Member

Nit: the comma can be removed

This comment has been minimized.

@kouvel

kouvel May 18, 2016

Member

Agreed, will do

@kouvel

kouvel May 18, 2016

Member

Agreed, will do

src/pal/src/include/pal/sharedmemory.h
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE (0x1000)
+#endif // !PAGE_SIZE

This comment has been minimized.

@stephentoub

stephentoub May 18, 2016

Member

If the actual page size is needed for alignment or something, should it be looked up with sysconf rather than hardcoding it?

@stephentoub

stephentoub May 18, 2016

Member

If the actual page size is needed for alignment or something, should it be looked up with sysconf rather than hardcoding it?

This comment has been minimized.

@kouvel

kouvel May 18, 2016

Member

It's hard-coded everywhere else. I had to redefine it here only because the header it's defined in is not includable in PAL (it's in common headers that are only allowed for use in VM as far as I understand). I think there are currently several assumptions on the size of a page, and it's probably ok.

@kouvel

kouvel May 18, 2016

Member

It's hard-coded everywhere else. I had to redefine it here only because the header it's defined in is not includable in PAL (it's in common headers that are only allowed for use in VM as far as I understand). I think there are currently several assumptions on the size of a page, and it's probably ok.

This comment has been minimized.

@sergiy-k

sergiy-k May 19, 2016

Contributor

PAL has VIRTUAL_PAGE_SIZE that is defined in pal/src/include/pal/virtual.h and is used in various places in PAL.

@sergiy-k

sergiy-k May 19, 2016

Contributor

PAL has VIRTUAL_PAGE_SIZE that is defined in pal/src/include/pal/virtual.h and is used in various places in PAL.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

Fixed to use that instead, thanks

@kouvel

kouvel May 19, 2016

Member

Fixed to use that instead, thanks

src/pal/src/include/pal/sharedmemory.h
+#define SHARED_MEMORY_RUNTIME_TEMP_DIRECTORY_PATH "/tmp/coreclr"
+
+#define SHARED_MEMORY_SHARED_MEMORY_DIRECTORY_PATH "/tmp/coreclr/shm"
+#define SHARED_MEMORY_LOCK_FILES_DIRECTORY_PATH "/tmp/coreclr/lockfiles"

This comment has been minimized.

@stephentoub

stephentoub May 18, 2016

Member

This is probably a more general question than this PR, but does coreclr already generate temp files? We have some in corefx, and I'm wondering if it's important that we align on the locations for these. corefx uses "/tmp/dotnet/corefx/featurename/...", though the "tmp" isn't hardcoded and instead come from Path.GetTempPath(), which will use the environment variable TMPDIR for the temp path and then fall back to "/tmp/" if that env var wasn't set.

@stephentoub

stephentoub May 18, 2016

Member

This is probably a more general question than this PR, but does coreclr already generate temp files? We have some in corefx, and I'm wondering if it's important that we align on the locations for these. corefx uses "/tmp/dotnet/corefx/featurename/...", though the "tmp" isn't hardcoded and instead come from Path.GetTempPath(), which will use the environment variable TMPDIR for the temp path and then fall back to "/tmp/" if that env var wasn't set.

This comment has been minimized.

@kouvel

kouvel May 18, 2016

Member

I looked at TMPDIR, there's also GetTempPathA in PAL that uses it. I'm not comfortable using an environment variable though. For backward compatibility (even if it means to deterministically be incompatible with a previous version after a breaking change), it needs to be a non-configurable, system-wide location. It wouldn't be possible to change the location easily without breaking back-compat.

Regarding the path, I searched for uses of /tmp in coreclr and didn't find any specifics. It's a good point though. Considering the /tmp/dotnet base used by CoreFX, I'm thinking maybe I can use /tmp/dotnet/shm and /tmp/dotnet/shm/lockfiles such that it is not specific to coreclr (perhaps corert or other runtimes will also share the sync system). Or if that's not an issue I can follow the current pattern and use /tmp/dotnet/coreclr/*. What do you think?

@kouvel

kouvel May 18, 2016

Member

I looked at TMPDIR, there's also GetTempPathA in PAL that uses it. I'm not comfortable using an environment variable though. For backward compatibility (even if it means to deterministically be incompatible with a previous version after a breaking change), it needs to be a non-configurable, system-wide location. It wouldn't be possible to change the location easily without breaking back-compat.

Regarding the path, I searched for uses of /tmp in coreclr and didn't find any specifics. It's a good point though. Considering the /tmp/dotnet base used by CoreFX, I'm thinking maybe I can use /tmp/dotnet/shm and /tmp/dotnet/shm/lockfiles such that it is not specific to coreclr (perhaps corert or other runtimes will also share the sync system). Or if that's not an issue I can follow the current pattern and use /tmp/dotnet/coreclr/*. What do you think?

This comment has been minimized.

@kouvel

kouvel May 18, 2016

Member

Also to add, all of the Unix platforms we support seem to have a /tmp so it shouldn't be an issue.

@kouvel

kouvel May 18, 2016

Member

Also to add, all of the Unix platforms we support seem to have a /tmp so it shouldn't be an issue.

This comment has been minimized.

@kouvel

kouvel May 18, 2016

Member

On second thought, I'm putting specific permissions on /tmp/coreclr and all files/directories below so that they have read/write permissions for all users. I'm not sure if other pieces that use /tmp/dotnet do this. Although I make an attempt to update permissions on the directories if they already exist, that may fail if the user doesn't have the necessary permissions to do so. I'll leave the root as /tmp/coreclr for now.

@kouvel

kouvel May 18, 2016

Member

On second thought, I'm putting specific permissions on /tmp/coreclr and all files/directories below so that they have read/write permissions for all users. I'm not sure if other pieces that use /tmp/dotnet do this. Although I make an attempt to update permissions on the directories if they already exist, that may fail if the user doesn't have the necessary permissions to do so. I'll leave the root as /tmp/coreclr for now.

This comment has been minimized.

@ellismg

ellismg May 19, 2016

Contributor

On second thought, I'm putting specific permissions on /tmp/coreclr and all files/directories below so that they have read/write permissions for all users. I'm not sure if other pieces that use /tmp/dotnet do this. Although I make an attempt to update permissions on the directories if they already exist, that may fail if the user doesn't have the necessary permissions to do so. I'll leave the root as /tmp/coreclr for now.

We had a bug about this in CoreFX where it was not setting the correct permissions, but Steve fixed that, so we should be good there now.

@ellismg

ellismg May 19, 2016

Contributor

On second thought, I'm putting specific permissions on /tmp/coreclr and all files/directories below so that they have read/write permissions for all users. I'm not sure if other pieces that use /tmp/dotnet do this. Although I make an attempt to update permissions on the directories if they already exist, that may fail if the user doesn't have the necessary permissions to do so. I'll leave the root as /tmp/coreclr for now.

We had a bug about this in CoreFX where it was not setting the correct permissions, but Steve fixed that, so we should be good there now.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

Ok great. It looks like the 'dotnet' folder is also a hidden folder (prefixed with dot). I'll use the following then:
shm files: /tmp/.dotnet/shm
lock files: /tmp/.dotnet/lockfiles

@kouvel

kouvel May 19, 2016

Member

Ok great. It looks like the 'dotnet' folder is also a hidden folder (prefixed with dot). I'll use the following then:
shm files: /tmp/.dotnet/shm
lock files: /tmp/.dotnet/lockfiles

@@ -52,17 +52,17 @@ public Mutex(bool initiallyOwned, String name, out bool createdNew)
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public unsafe Mutex(bool initiallyOwned, String name, out bool createdNew, MutexSecurity mutexSecurity)
{
- if (name != null)
+ if (name == string.Empty)

This comment has been minimized.

@jamesqo

jamesqo May 18, 2016

Contributor

Nit: should be name.Length == 0 (this is how string.IsNullOrEmpty does it)

@jamesqo

jamesqo May 18, 2016

Contributor

Nit: should be name.Length == 0 (this is how string.IsNullOrEmpty does it)

This comment has been minimized.

@kouvel

kouvel May 18, 2016

Member

I did it this way because this check is done before the null check

@kouvel

kouvel May 18, 2016

Member

I did it this way because this check is done before the null check

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 18, 2016

Member

@dotnet-bot test this please

Member

kouvel commented May 18, 2016

@dotnet-bot test this please

@kouvel kouvel changed the title from [WIP] Add named mutex for cross-process synchronization to Add named mutex for cross-process synchronization May 18, 2016

kouvel added a commit to kouvel/corefx that referenced this pull request May 18, 2016

Remove named mutex PNSE tests that will fail soon
The tests removed would fail soon after dotnet/coreclr#5030 is merged.

Related to dotnet/coreclr#5030
Related to #8625
Related to dotnet/coreclr#3422

@kouvel kouvel referenced this pull request in dotnet/corefx May 18, 2016

Merged

Remove named mutex PNSE tests that will fail soon #8666

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 19, 2016

Member

@dotnet-bot test this please

Member

kouvel commented May 19, 2016

@dotnet-bot test this please

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 19, 2016

Member

@dotnet-bot test this please

Member

kouvel commented May 19, 2016

@dotnet-bot test this please

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 19, 2016

Member

@dotnet-bot test this please

Member

kouvel commented May 19, 2016

@dotnet-bot test this please

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 19, 2016

Member

@dotnet-bot test this please

Member

kouvel commented May 19, 2016

@dotnet-bot test this please

src/pal/src/synchobj/mutex.cpp
+ if (lpName != nullptr)
+ {
+ SharedMemoryProcessDataHeader *processDataHeader;
+ bool created;

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

Initialize this to false to cover the case when the function call below throws

@janvorli

janvorli May 19, 2016

Member

Initialize this to false to cover the case when the function call below throws

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

Fixed

@kouvel

kouvel May 19, 2016

Member

Fixed

+ int closeResult;
+ do
+ {
+ closeResult = close(fileDescriptor);

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

'close' doesn't return the error code, just -1 if the operation has failed. You need to use errno here.

@janvorli

janvorli May 19, 2016

Member

'close' doesn't return the error code, just -1 if the operation has failed. You need to use errno here.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

Thanks for that, fixed

@kouvel

kouvel May 19, 2016

Member

Thanks for that, fixed

+ return true;
+}
+
+int SharedMemoryHelpers::OpenDirectory(LPCSTR path)

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

The OpenDirectory and CreateOrOpenFile have almost the same block of code for handling errors repeated three times. I would recommend refactoring that into a helper "Open" function that would handle the EINTR internally, contain the throws and only return if the status was ok or ENOENT.

@janvorli

janvorli May 19, 2016

Member

The OpenDirectory and CreateOrOpenFile have almost the same block of code for handling errors repeated three times. I would recommend refactoring that into a helper "Open" function that would handle the EINTR internally, contain the throws and only return if the status was ok or ENOENT.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

Done

+{
+ _ASSERTE(fileDescriptor != -1);
+
+ int flockResult = flock(fileDescriptor, LOCK_UN);

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

I think you need to handle EINTR here

@janvorli

janvorli May 19, 2016

Member

I think you need to handle EINTR here

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

I believe the system call may only be interrupted during a wait for a lock:

"EINTR While waiting to acquire a lock, the call was interrupted by delivery of a signal caught by a handler; see signal(7)."

Doesn't hurt to check for it though, added a check for EINTR anyway.

@kouvel

kouvel May 19, 2016

Member

I believe the system call may only be interrupted during a wait for a lock:

"EINTR While waiting to acquire a lock, the call was interrupted by delivery of a signal caught by a handler; see signal(7)."

Doesn't hurt to check for it though, added a check for EINTR anyway.

+
+ SharedMemoryId id(name);
+
+ struct AutoCleanup

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

In CoreCLR itself, we use "Holder" derived classes for cleanup purposes of pointers, handles, etc. It uses a templated implementation with easily pluggable cleanup functions and unified interface. It seems it would be nice to use the same concept in your code instead of this bulk cleanup. The Holder implementation from CoreCLR is not directly usable in PAL due to the fact that it contains contracts etc. But we have a version in CoreRT that should be directly usable.
But please take it just as a suggestion for a later cleanup.

@janvorli

janvorli May 19, 2016

Member

In CoreCLR itself, we use "Holder" derived classes for cleanup purposes of pointers, handles, etc. It uses a templated implementation with easily pluggable cleanup functions and unified interface. It seems it would be nice to use the same concept in your code instead of this bulk cleanup. The Holder implementation from CoreCLR is not directly usable in PAL due to the fact that it contains contracts etc. But we have a version in CoreRT that should be directly usable.
But please take it just as a suggestion for a later cleanup.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

In some cases, things need to be cleaned up in a specific order, some things need to be skipped and others not, depending on where we are, etc. With holders, that cleanup logic would be complicated to understand. With this method, it is made directly clear, and is flexible for doing any sort of cleanup. I'll take a look and will reconsider later.

@kouvel

kouvel May 19, 2016

Member

In some cases, things need to be cleaned up in a specific order, some things need to be skipped and others not, depending on where we are, etc. With holders, that cleanup logic would be complicated to understand. With this method, it is made directly clear, and is flexible for doing any sort of cleanup. I'll take a look and will reconsider later.

src/pal/src/synchobj/mutex.cpp
+{
+ m_lockOwnerProcessId = GetCurrentProcessId();
+ _ASSERTE(m_lockOwnerProcessId != SharedMemoryHelpers::InvalidPid);
+ m_lockOwnerThreadId = GetCurrentThreadId();

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

I think that using the GetCurrentThreadId is not the right thing here. It returns the real thread id trimmed to 32 bits. While it is ok on Linux, it can be a problem on OSX and other systems where the thread id is wider and so in a rare cases, we could theoretically get two threads haveing the same lower 32 bits of their id. The GetCurrentThreadId should be used just as an emulation of the Windows API for external uses. Internally, we should be using THREADSilentGetCurrentThreadId that returns size_t id.

@janvorli

janvorli May 19, 2016

Member

I think that using the GetCurrentThreadId is not the right thing here. It returns the real thread id trimmed to 32 bits. While it is ok on Linux, it can be a problem on OSX and other systems where the thread id is wider and so in a rare cases, we could theoretically get two threads haveing the same lower 32 bits of their id. The GetCurrentThreadId should be used just as an emulation of the Windows API for external uses. Internally, we should be using THREADSilentGetCurrentThreadId that returns size_t id.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

I see, and it looks like gettid() is the wrong thing as well. Fixed to use the function you suggested, and changed all thread IDs stored in shared-memory to 64-bit ints.

@kouvel

kouvel May 19, 2016

Member

I see, and it looks like gettid() is the wrong thing as well. Fixed to use the function you suggested, and changed all thread IDs stored in shared-memory to 64-bit ints.

src/pal/src/synchobj/mutex.cpp
+ #endif // !HAVE_FULLY_FEATURED_PTHREAD_MUTEXES
+ );
+ }
+ catch (...)

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

I would prefer catching the explicit exception instead.

@janvorli

janvorli May 19, 2016

Member

I would prefer catching the explicit exception instead.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

When handing an exception, I would agree it's important to catch a specific type of exception. But when doing resource cleanup and rethrowing, where the cleanup code is not specific to any particular type of exception and needs to happen regardless of the type of exception, the code should reflect that.

@kouvel

kouvel May 19, 2016

Member

When handing an exception, I would agree it's important to catch a specific type of exception. But when doing resource cleanup and rethrowing, where the cleanup code is not specific to any particular type of exception and needs to happen regardless of the type of exception, the code should reflect that.

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

Actually, now that I think about it, in this case, it looks like a holder kind of pattern would be nicer for debugging. With catch / rethrow, we loose the call stack upto this catch, with holders, we don't.

@janvorli

janvorli May 19, 2016

Member

Actually, now that I think about it, in this case, it looks like a holder kind of pattern would be nicer for debugging. With catch / rethrow, we loose the call stack upto this catch, with holders, we don't.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

Oh didn't realize that, I will change it.

@kouvel

kouvel May 19, 2016

Member

Oh didn't realize that, I will change it.

+ {
+ return new(buffer) SharedMemoryProcessDataHeader(id, fileDescriptor, sharedDataHeader, sharedDataTotalByteCount);
+ }
+ catch (...)

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

I would prefer catching the explicit exception instead.

@janvorli

janvorli May 19, 2016

Member

I would prefer catching the explicit exception instead.

+ s_processDataHeaderListHead = processDataHeader;
+}
+
+void SharedMemoryManager::RemoveProcessDataHeader(SharedMemoryProcessDataHeader *processDataHeader)

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

Since we need to remove the processDataHeader from an arbitrary position in the list, it seems it would be better to use the existing double linked list abstraction (LIST_ENTRY and the related macros for the list manipulation) to make this operation O(1).

@janvorli

janvorli May 19, 2016

Member

Since we need to remove the processDataHeader from an arbitrary position in the list, it seems it would be better to use the existing double linked list abstraction (LIST_ENTRY and the related macros for the list manipulation) to make this operation O(1).

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

I actually need a hash table here, since I also need to find an item in O(1) time by name. I have a TODO in the Find function regarding this. I need to go through all the various hash tables we have and see if I can move one to be usable in PAL, or try to use the STL one. Figured I'd do that later since it's not a big deal for now.

@kouvel

kouvel May 19, 2016

Member

I actually need a hash table here, since I also need to find an item in O(1) time by name. I have a TODO in the Find function regarding this. I need to go through all the various hash tables we have and see if I can move one to be usable in PAL, or try to use the STL one. Figured I'd do that later since it's not a big deal for now.

This comment has been minimized.

@janvorli

janvorli May 19, 2016

Member

Ok, makes sense. I would recommend getting hash table implementation from corert (shash.h/inl). It seems pretty standalone (the only thing that would need a change is the usage of "new" to dynamically allocate memory at one place. It is basically the same as the one from coreclr, but without the CONTRACT stuff.

@janvorli

janvorli May 19, 2016

Member

Ok, makes sense. I would recommend getting hash table implementation from corert (shash.h/inl). It seems pretty standalone (the only thing that would need a change is the usage of "new" to dynamically allocate memory at one place. It is basically the same as the one from coreclr, but without the CONTRACT stuff.

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

That would be perfect, I'll look into that in a separate change.

@kouvel

kouvel May 19, 2016

Member

That would be perfect, I'll look into that in a separate change.

src/pal/src/synchobj/mutex.cpp
+
+// This value should only be incremented if a non-backward-compatible change to the sync system is made. A process would fail to
+// open a mutex created with a different sync system version.
+const UINT8 NamedMutexProcessData::SyncSystemVersion = 0;

This comment has been minimized.

@sergiy-k

sergiy-k May 19, 2016

Contributor

Should we start from a non-zero value?

@sergiy-k

sergiy-k May 19, 2016

Contributor

Should we start from a non-zero value?

This comment has been minimized.

@kouvel

kouvel May 19, 2016

Member

Sure, changed to 1

@kouvel

kouvel May 19, 2016

Member

Sure, changed to 1

Add named mutex for cross-process synchronization
Fixes #3422

- On systems that support pthread process-shared robust recursive mutexes, they will be used
- On other systems, file locks are used. File locks unfortunately don't have a timeout in the blocking wait call, and I didn't find any other sync object with a timed wait with the necessary properties, so polling is done for timed waits.

Shared memory files:
- Session-scoped mutexes (name not prefixed, or prefixed with Local\) go in /tmp/coreclr/shm/session<sessionId>/<mutexName>
- Globally-scoped mutexes (name prefixed with Global\) go in /tmp/coreclr/shm/global/<mutexName>
- Contains shared state, and is mmap'ped into the process, see SharedMemorySharedDataHeader and NamedMutexSharedData for data stored
- Creation and deletion is synchronized using an exclusive file lock on the shm directory
- Any process using the shared memory file holds a shared file lock on the shared memory file
- Upon creation, if the shared memory file already exists, an exclusive file lock is attempted on it, to see if the file data is valid. If no other processes have the mutex open, the file is reinitialized.
- Upon releasing the last reference to a mutex in a process, it will try to get an exclusive lock on the shared memory file to see if any other processes have the mutex opened. If not, the file is deleted, along with the session directory if it's empty. The coreclr and shm directories are not deleted.
- This allows managing the lifetime of mutex state based on active processes that have the mutex open. Depending on how the process terminated, the file may still be left over in the tmp directory, I haven't found anything that can be done about that.

Lock files when using file locks:
- In addition to the shared memory file, we need another file for the actual synchronization file lock, since a file lock on the shared memory file is used for lifetime purposes.
- These files go in /tmp/coreclr/lockfiles/session<sessionId>|global/<mutexName>
- The file is empty, and is only used for file locks

Process data
- See SharedMemoryProcessDataHeader and NamedMutexProcessData for data stored
- Per mutex name, there is only one instance of process data that is ref-counted. They are currently stored in a linked list in SharedMemoryManager. It should use a hash table, but of the many hash table implementations that are already there, none seem to be easily usable in the PAL. I'll look into that and will fix later.
- Refers to the associated shared memory, and knows how to clean up both the process data and shared data
- When using file locks for synchronization, a process-local mutex is also created for synchronizing threads, since file locks are owned at the file descriptor level and there is only one open file descriptor in the process per mutex name. The process-local mutex is locked around the file lock, so that only one thread per process is ever trying to flock on a given file descriptor.

Abandon detection
- When a lock is acquired, the process data is added to a linked list on the owning thread
- When a thread exits, the list is walked, each mutex is flagged as abandoned and released
- For detecting process abruptly terminating, pthread robust mutexes give us that. When using file locks, the file lock is automatically released by the system. Upon acquiring a lock, the lock owner info in the shared memory is checked to see if the mutex was abandoned.

Miscellaneous
- CreateMutex and OpenMutex both create new handles for each mutex opened. Each handle just refers to the process data header for the mutex name.
- Some of the above features are already available in the PAL, but not quite in a way that I can use for this purpose. The existing shared memory, naming, and waiting infrastructure is not suitable for this purpose, and is not used.
@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 20, 2016

Member

@dotnet-bot test this please

Member

kouvel commented May 20, 2016

@dotnet-bot test this please

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 20, 2016

Member

@dotnet-bot test Windows_NT corefx_baseline

Member

kouvel commented May 20, 2016

@dotnet-bot test Windows_NT corefx_baseline

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 20, 2016

Member

@dotnet-bot test Ubuntu corefx_baseline

Member

kouvel commented May 20, 2016

@dotnet-bot test Ubuntu corefx_baseline

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 20, 2016

Member

@dotnet-bot test OSX corefx_baseline

Member

kouvel commented May 20, 2016

@dotnet-bot test OSX corefx_baseline

@kouvel

This comment has been minimized.

Show comment
Hide comment
@kouvel

kouvel May 20, 2016

Member

@dotnet-bot test OSX corefx_baseline

Member

kouvel commented May 20, 2016

@dotnet-bot test OSX corefx_baseline

@kouvel kouvel merged commit d3b3410 into dotnet:master May 20, 2016

8 of 10 checks passed

Ubuntu x64 Checked Build and Test (Jit - CoreFx) Build started sha1 is merged.
Details
Windows_NT x64 Checked Build and Test (Jit - CoreFx ) Build started sha1 is merged.
Details
CentOS7.1 x64 Debug Build and Test Build finished.
Details
FreeBSD x64 Checked Build Build finished.
Details
OSX x64 Checked Build and Test Build finished.
Details
Ubuntu x64 Checked Build and Test Build finished.
Details
Windows_NT x64 Debug Build and Test Build finished.
Details
Windows_NT x64 Release Priority 1 Build and Test Build finished.
Details
Windows_NT x86 legacy_backend Checked Build and Test Build finished.
Details
Windows_NT x86 ryujit Checked Build and Test Build finished.
Details

@kouvel kouvel deleted the kouvel:NamedMutex branch May 20, 2016

kouvel added a commit to kouvel/corefx that referenced this pull request May 20, 2016

kouvel added a commit to kouvel/corefx that referenced this pull request May 20, 2016

kouvel added a commit to kouvel/corefx that referenced this pull request May 20, 2016

kouvel added a commit to kouvel/corefx that referenced this pull request May 20, 2016

kouvel added a commit to kouvel/corefx that referenced this pull request May 20, 2016

kouvel added a commit to kouvel/corefx that referenced this pull request May 20, 2016

@nguerrera nguerrera referenced this pull request in dotnet/roslyn Apr 23, 2018

Closed

Compiler server isn't unique on *nix #26338

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment