Make the library fork safe and drop the mutex #50

casperisfine · 2023-08-02T19:20:32Z

When forking, file descriptors are inherited and their state shared.

In the context of MiniMime this means that the offset of the file opened by RandomAccessDb is shared across processes, so the seek + read combo is subject to inter-process race conditions.

Of course that file is lazily opened, so assuming most applications don't query MiniMime before fork, it's not a big problem.

However when reforking post boot (e.g. https://github.com/Shopify/pitchfork) this becomes an issue.

Additionally, even if the file descriptor isn't shared across processes, the file position is still process global requiring a Mutex.

By using pread instead of seek + read we can both make the library fork safe and get rid of the need to synchronize accesses.

This also happens to fix an outstanding JRuby issue.

Fix: #37
Fix: #38

cc @SamSaffron

casperisfine · 2023-08-02T19:20:49Z

test/fixtures/custom_content_type_mime.db

@@ -1,2 +1,2 @@
-liquid      application/x-liquid                                                      8bit
+liquid      application/x-liquid                                                      8bit            


This padding was missing.

When forking, file descriptors are inherited and their state shared. In the context of MiniMime this means that the offset of the file opened by RandomAccessDb is shared across processes, so the `seek + read` combo is subject to inter-process race conditions. Of course that file is lazily opened, so assuming most applications don't query MiniMime before fork, it's not a big problem. However when reforking post boot (e.g. https://github.com/Shopify/pitchfork) this becomes an issue. Additionally, even if the file descriptor isn't shared across processes, the file position is still process global requiring a Mutex. By using `pread` instead of `seek + read` we can both make the library fork safe and get rid of the need to synchronize accesses. This also happens to fix an outstanding JRuby issue. Fix: discourse#37 Fix: discourse#38

casperisfine · 2023-08-03T06:51:08Z

lib/mini_mime.rb

-        @db.lookup_by_extension(extension) ||
-          @db.lookup_by_extension(extension.downcase)
-      end
+      @db ||= new


There is a small race condition here, that could lead to more than one DB to be instantiated, but only one will be kept so not sure if it's a big deal.

We could eagerly instantiate the db, and reset it when the paths are changed, but that would change the semantic a bit.

Looks OK to me, but I guess there could be a performance impact if a lot of threads try to init the DB at the same time (e.g. because they're all executing the same startup code in lockstep). A possible alternative to avoid multiple initialization could be the DCL-like idiom @db || MUTEX.synchronize { @db ||= new }. (I can submit a PR if you think that would be a good idea.)

Yeah, that's a good idea. Not sure why I didn't think about it.

@casperisfine PR here: #56

Thanks, note that I'm not maintainer though, it's up to @SamSaffron to merge this or not.

SamSaffron · 2023-08-03T23:53:09Z

I like this change! the concurrency issue I guess is not ideal, but the complexity of resolving it is high and impact extremely low.

Perf wise I assume pread will be faster anyway cause there is one less syscall

I am good to merge this!

casperisfine · 2023-08-04T06:02:29Z

lib/mini_mime.rb

@@ -146,8 +140,7 @@ def lookup_uncached(val)
      end

      def resolve(row)
-        @file.seek(row * @row_length)
-        Info.new(@file.readline)
+        Info.new(@file.pread(@row_length, row * @row_length).force_encoding(Encoding::UTF_8))


Hum, I really sorry, I realized that some platforms (e.g. Windows) don't have pread.

I'll submit another PR to shim it for these.

casperisfine commented Aug 2, 2023

View reviewed changes

casperisfine force-pushed the fork-safety branch from 85db05a to 08004a1 Compare August 2, 2023 19:44

casperisfine commented Aug 3, 2023

View reviewed changes

SamSaffron approved these changes Aug 3, 2023

View reviewed changes

SamSaffron merged commit f67eef6 into discourse:main Aug 3, 2023

casperisfine commented Aug 4, 2023

View reviewed changes

casperisfine mentioned this pull request Aug 4, 2023

Shim IO#pread when not supported #52

Merged

casperisfine deleted the fork-safety branch August 8, 2023 09:57

This was referenced Aug 27, 2023

Seeking the DB file does not work in a bundled JRuby application, crashes randomly #37

Closed

Avoid possible redundant database initialization from multiple threads #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the library fork safe and drop the mutex #50

Make the library fork safe and drop the mutex #50

casperisfine commented Aug 2, 2023

casperisfine Aug 2, 2023

casperisfine Aug 3, 2023

ikaronen-relex Aug 27, 2023 •

edited

casperisfine Aug 30, 2023

ikaronen-relex Aug 31, 2023

casperisfine Aug 31, 2023

SamSaffron commented Aug 3, 2023

casperisfine Aug 4, 2023

		@@ -1,2 +1,2 @@
		liquid application/x-liquid 8bit
		liquid application/x-liquid 8bit

Make the library fork safe and drop the mutex #50

Make the library fork safe and drop the mutex #50

Conversation

casperisfine commented Aug 2, 2023

casperisfine Aug 2, 2023

Choose a reason for hiding this comment

casperisfine Aug 3, 2023

Choose a reason for hiding this comment

ikaronen-relex Aug 27, 2023 • edited

Choose a reason for hiding this comment

casperisfine Aug 30, 2023

Choose a reason for hiding this comment

ikaronen-relex Aug 31, 2023

Choose a reason for hiding this comment

casperisfine Aug 31, 2023

Choose a reason for hiding this comment

SamSaffron commented Aug 3, 2023

casperisfine Aug 4, 2023

Choose a reason for hiding this comment

ikaronen-relex Aug 27, 2023 •

edited