You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'll start by oversimplifying and then move onto potential complications.
The Concept
Let's see what evaluation-results depend on. we're comparing those two layouts:
abcde fghij zyxwv utsrq
... ...
... ...
Let's compare the evaluation-result for two bigrams with letters on the same positions (same Keys and same layers):
Layout 1: "ab"
Layout 2: "zy"
We get our evaluation result by calling individual_cost() and supplying parameters to it. The main ones are:
Weight
LayerKeys
The LayerKey are what actually is used in most of the calculation.
pubstructLayerKey{/// Layer of the layout which the symbol belongs topublayer:u8,/// Key to press for the symbolpubkey:Key,/// Symbol belonging to a layoutpubsymbol:char,/// Vec of modifiers required to activate the layer (in terms of a [`LayerKeyIndex`] for a layout)pubmodifiers:Vec<LayerKeyIndex>,/// If the key shall not be permutated for optimizationpubis_fixed:bool,/// If the symbol itself is a modifierpubis_modifier:bool,}
How do the LayerKeys of 'ab" and 'zy' differ? layer is equal, key is equal, modifiers is equal, is_fixed is equal, is_modifier is equal. Only symbol is different, I'll address that later.
The LayerKeys are derived from a certain index of a layout. Thus, in the evaluation of 'ab" and 'zy", only (or mainly) two things change:
The bigram's weight
The indices of the bigram's letters on our current layout.
To jump straight to the point: There is only a certain numer of possible index-arrangements. Thus, there's only a certain number of scores that need to be multiplied with weight before returning them from individual_cost(). Due to the limited variability, we could actually
Pre-calculate the scores for every metric for every possible index-arrangement.
Place those scores into a Vec.
Instead of costly evaluation (individual_cost(LayerKey1, LayerKey2, weight)), we could reduce this to an index-access. (precalculated_scores[idx1][idx2] * weight)
This would remove a lot of code:
Most of the evaluation taking place for each layout
get_filtered_layerkeys()
The creation of LayerKeys for each layout
Possibly some additional stuff I'm not aware of
RAM
Shouldn't be a problem. The number of possible index-combinations we need to store scores for is roughly the following. We assume that there's 60 different keys/positions/indices (due to the fact that we split all higher-layer-ngrams into L1-ngrams):
Unigrams: 60^1 different sequences ~5 metrics – 5 vecs with len 60
Bigrams: 60^2 different sequences, ~10 metrics – 10 vecs with len 3600
Trigrams: 60^3 different sequences, 5 metrics – 5 vecs with len 216000 (comparable to the number of Trigrams we process)
Caveats
Sometimes, the layerkey.symbol information is used as well. This seems to happen in two scenarios:
To create informative messages ("Worst Bigrams: …")
To filter out unwanted n-grams (the pause-indicator-stuff)
I'm not sure whether this is doable (the rare need for symbols makes some things slightly more complicated) and worthwhile (there would be speed-improvements, but we would still need to use split_trigram_modifiers(), which probably makes out at least half of all computational work.
The text was updated successfully, but these errors were encountered:
I believe that such a speedup can be achieved by introducing a cache within the individual metrics.
I guess, for most metrics the hashing of the keys is similarly expensive as the actual computation of the metric. But for the secondary bigrams for instance, this could speed things up.
The metrics know which data can be used as cache keys. Probably the matrix position will do... maybe for completeness take the layer, too, even though it will be 0 after resolving modifiers.
I believe that such a speedup can be achieved by introducing a cache within the individual metrics.
Mostly, yeah. It still forces us to manually create LayerKeys for every layout, but that's probably not that big of a performance-hit. It's kind of redundant, but not tragic.
I guess, for most metrics the hashing of the keys is similarly expensive as the actual computation of the metric.
You're probably right.
But for the secondary bigrams for instance, this could speed things up.
If you want, you could try. It turns out that the biggest speed-improvements could probably be gained by improving split_trigram_modifiers(), but I'm not sure whether this is even possible. There's a lot of cloning going on, but it might be necessary.
The metrics know which data can be used as cache keys. Probably the matrix position will do...
I think so.
...maybe for completeness take the layer, too, even though it will be 0 after resolving modifiers.
I think this might be unnecessary, but it's your call whether you want to include it.
I'll start by oversimplifying and then move onto potential complications.
The Concept
Let's see what evaluation-results depend on. we're comparing those two layouts:
Let's compare the evaluation-result for two bigrams with letters on the same positions (same
Key
s and same layers):ab
"zy
"We get our evaluation result by calling
individual_cost()
and supplying parameters to it. The main ones are:LayerKey
sThe
LayerKey
are what actually is used in most of the calculation.How do the
LayerKeys
of 'ab
" and 'zy
' differ?layer
is equal,key
is equal,modifiers
is equal,is_fixed
is equal,is_modifier
is equal. Onlysymbol
is different, I'll address that later.The
LayerKey
s are derived from a certain index of a layout. Thus, in the evaluation of 'ab
" and 'zy
", only (or mainly) two things change:To jump straight to the point: There is only a certain numer of possible index-arrangements. Thus, there's only a certain number of scores that need to be multiplied with
weight
before returning them fromindividual_cost()
. Due to the limited variability, we could actuallyindividual_cost(LayerKey1, LayerKey2, weight)
), we could reduce this to an index-access. (precalculated_scores[idx1][idx2] * weight
)This would remove a lot of code:
get_filtered_layerkeys()
LayerKey
s for each layoutRAM
Shouldn't be a problem. The number of possible index-combinations we need to store scores for is roughly the following. We assume that there's 60 different keys/positions/indices (due to the fact that we split all higher-layer-ngrams into L1-ngrams):
Caveats
Sometimes, the
layerkey.symbol
information is used as well. This seems to happen in two scenarios:I'm not sure whether this is doable (the rare need for symbols makes some things slightly more complicated) and worthwhile (there would be speed-improvements, but we would still need to use
split_trigram_modifiers()
, which probably makes out at least half of all computational work.The text was updated successfully, but these errors were encountered: