Skip to content

Commit fa92924

Browse files
nwc10Nicholas Clark
authored andcommitted
perldelta entry for the new key behaviour for large hashes
Note that large hashes (that are neither objects nor symbol tables) no longer used the shared string table, and what the performance implications might be. This commit and the related code commits incorporate several improvements suggested by Hugo during review.
1 parent f9c625b commit fa92924

File tree

1 file changed

+52
-1
lines changed

1 file changed

+52
-1
lines changed

pod/perldelta.pod

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,58 @@ There may well be none in a stable release.
117117

118118
=item *
119119

120-
XXX
120+
Large hashes no longer allocate their keys from the shared string table.
121+
122+
The same internal datatype (C<PVHV>) is used for all of
123+
124+
=over 4
125+
126+
=item *
127+
128+
Symbol tables
129+
130+
=item *
131+
132+
Objects (by default)
133+
134+
=item *
135+
136+
Associative arrays
137+
138+
=back
139+
140+
The shared string table was originally added to improve performance for blessed
141+
hashes used as objects, because every object instance has the same keys, so it
142+
is an optimisation to share memory between them. It also makes sense for symbol
143+
tables, where derived classes will have the same keys (typically method names),
144+
and the OP trees built for method calls can also share memory. The shared
145+
string table behaves roughly like a cache for hash keys.
146+
147+
But for hashes actually used as associative arrays - mapping keys to values -
148+
typically the keys are not re-used in other hashes. For example, "seen" hashes
149+
are keyed by object IDs (or addresses), and logically these keys won't repeat
150+
in other hashes.
151+
152+
Storing these "used just once" keys in the shared string table increases CPU
153+
and RAM use for no gain. For such keys the shared string table behaves as a
154+
cache with a 0% hit rate. Storing all the keys there increases the total size
155+
of the shared string table, as well as increasing the number of times it is
156+
resized as it grows. B<Worse> - in any environment that has "copy on write"
157+
memory for child process (such as a pre-forking server), the memory pages used
158+
for the shared string table rapidly need to be copied as the child process
159+
manipulates hashes. Hence if most of the shared string table is such keys that
160+
are used only in one place, there is no benefit from re-use within the perl
161+
interpreter, but a high cost due to more pages for the OS to copy.
162+
163+
The perl interpreter now disables shared hash keys for "large" hashes (that are
164+
neither objects nor symbol tables). "Large" is a heuristic - currently the
165+
heuristic is that sharing is disabled when adding a key to a hash triggers
166+
allocation of more storage, and the hash has more than 42 keys.
167+
168+
This B<might> cause slightly increased memory usage for programs that create
169+
(unblessed) data structures that contain multiple large hashes that share the
170+
same keys. But generally our testing suggests that for the specific cases
171+
described it is a win, and other code is unaffected.
121172

122173
=back
123174

0 commit comments

Comments
 (0)