perldelta entry for the new key behaviour for large hashes

nwc10 · Nicholas Clark · commit fa92924b30d8 · 2022-03-19T07:30:44.000+01:00
Note that large hashes (that are neither objects nor symbol tables) no
longer used the shared string table, and what the performance implications
might be.

This commit and the related code commits incorporate several improvements
suggested by Hugo during review.
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
@@ -117,7 +117,58 @@ There may well be none in a stable release.
 
 =item *
 
-XXX
+Large hashes no longer allocate their keys from the shared string table.
+
+The same internal datatype (C<PVHV>) is used for all of
+
+=over 4
+
+=item *
+
+Symbol tables
+
+=item *
+
+Objects (by default)
+
+=item *
+
+Associative arrays
+
+=back
+
+The shared string table was originally added to improve performance for blessed
+hashes used as objects, because every object instance has the same keys, so it
+is an optimisation to share memory between them. It also makes sense for symbol
+tables, where derived classes will have the same keys (typically method names),
+and the OP trees built for method calls can also share memory. The shared
+string table behaves roughly like a cache for hash keys.
+
+But for hashes actually used as associative arrays - mapping keys to values -
+typically the keys are not re-used in other hashes. For example, "seen" hashes
+are keyed by object IDs (or addresses), and logically these keys won't repeat
+in other hashes.
+
+Storing these "used just once" keys in the shared string table increases CPU
+and RAM use for no gain. For such keys the shared string table behaves as a
+cache with a 0% hit rate. Storing all the keys there increases the total size
+of the shared string table, as well as increasing the number of times it is
+resized as it grows. B<Worse> - in any environment that has "copy on write"
+memory for child process (such as a pre-forking server), the memory pages used
+for the shared string table rapidly need to be copied as the child process
+manipulates hashes. Hence if most of the shared string table is such keys that
+are used only in one place, there is no benefit from re-use within the perl
+interpreter, but a high cost due to more pages for the OS to copy.
+
+The perl interpreter now disables shared hash keys for "large" hashes (that are
+neither objects nor symbol tables). "Large" is a heuristic - currently the
+heuristic is that sharing is disabled when adding a key to a hash triggers
+allocation of more storage, and the hash has more than 42 keys.
+
+This B<might> cause slightly increased memory usage for programs that create
+(unblessed) data structures that contain multiple large hashes that share the
+same keys. But generally our testing suggests that for the specific cases
+described it is a win, and other code is unaffected.
 
 =back