Idea for reducing redundant calculations #41

Glitchy-Tozier · 2022-04-19T07:10:31Z

I'll start by oversimplifying and then move onto potential complications.

The Concept

Let's see what evaluation-results depend on. we're comparing those two layouts:

abcde fghij         zyxwv utsrq
    ...                 ...
    ...                 ...

Let's compare the evaluation-result for two bigrams with letters on the same positions (same Keys and same layers):

Layout 1: "ab"
Layout 2: "zy"

We get our evaluation result by calling individual_cost() and supplying parameters to it. The main ones are:

Weight
LayerKeys

The LayerKey are what actually is used in most of the calculation.

pub struct LayerKey {
    /// Layer of the layout which the symbol belongs to
    pub layer: u8,
    /// Key to press for the symbol
    pub key: Key,
    /// Symbol belonging to a layout
    pub symbol: char,
    /// Vec of modifiers required to activate the layer (in terms of a [`LayerKeyIndex`] for a layout)
    pub modifiers: Vec<LayerKeyIndex>,
    /// If the key shall not be permutated for optimization
    pub is_fixed: bool,
    /// If the symbol itself is a modifier
    pub is_modifier: bool,
}

How do the LayerKeys of 'ab" and 'zy' differ? layer is equal, key is equal, modifiers is equal, is_fixed is equal, is_modifier is equal. Only symbol is different, I'll address that later.
The LayerKeys are derived from a certain index of a layout. Thus, in the evaluation of 'ab" and 'zy", only (or mainly) two things change:

The bigram's weight
The indices of the bigram's letters on our current layout.

To jump straight to the point: There is only a certain numer of possible index-arrangements. Thus, there's only a certain number of scores that need to be multiplied with weight before returning them from individual_cost(). Due to the limited variability, we could actually

Pre-calculate the scores for every metric for every possible index-arrangement.
Place those scores into a Vec.
Instead of costly evaluation (individual_cost(LayerKey1, LayerKey2, weight)), we could reduce this to an index-access. (precalculated_scores[idx1][idx2] * weight)

This would remove a lot of code:

Most of the evaluation taking place for each layout
get_filtered_layerkeys()
The creation of LayerKeys for each layout
Possibly some additional stuff I'm not aware of

RAM

Shouldn't be a problem. The number of possible index-combinations we need to store scores for is roughly the following. We assume that there's 60 different keys/positions/indices (due to the fact that we split all higher-layer-ngrams into L1-ngrams):

Unigrams: 60^1 different sequences ~5 metrics – 5 vecs with len 60
Bigrams: 60^2 different sequences, ~10 metrics – 10 vecs with len 3600
Trigrams: 60^3 different sequences, 5 metrics – 5 vecs with len 216000 (comparable to the number of Trigrams we process)

Caveats

Sometimes, the layerkey.symbol information is used as well. This seems to happen in two scenarios:

To create informative messages ("Worst Bigrams: …")
To filter out unwanted n-grams (the pause-indicator-stuff)

I'm not sure whether this is doable (the rare need for symbols makes some things slightly more complicated) and worthwhile (there would be speed-improvements, but we would still need to use split_trigram_modifiers(), which probably makes out at least half of all computational work.

The text was updated successfully, but these errors were encountered:

dariogoetz · 2022-04-19T13:12:51Z

I believe that such a speedup can be achieved by introducing a cache within the individual metrics.

I guess, for most metrics the hashing of the keys is similarly expensive as the actual computation of the metric. But for the secondary bigrams for instance, this could speed things up.

The metrics know which data can be used as cache keys. Probably the matrix position will do... maybe for completeness take the layer, too, even though it will be 0 after resolving modifiers.

Glitchy-Tozier · 2022-04-19T14:58:50Z

I believe that such a speedup can be achieved by introducing a cache within the individual metrics.

Mostly, yeah. It still forces us to manually create LayerKeys for every layout, but that's probably not that big of a performance-hit. It's kind of redundant, but not tragic.

I guess, for most metrics the hashing of the keys is similarly expensive as the actual computation of the metric.

You're probably right.

But for the secondary bigrams for instance, this could speed things up.

If you want, you could try. It turns out that the biggest speed-improvements could probably be gained by improving split_trigram_modifiers(), but I'm not sure whether this is even possible. There's a lot of cloning going on, but it might be necessary.

The metrics know which data can be used as cache keys. Probably the matrix position will do...

I think so.

...maybe for completeness take the layer, too, even though it will be 0 after resolving modifiers.

I think this might be unnecessary, but it's your call whether you want to include it.

Glitchy-Tozier added enhancement New feature or request question Further information is requested labels Apr 19, 2022

dariogoetz closed this as completed Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea for reducing redundant calculations #41

Idea for reducing redundant calculations #41

Glitchy-Tozier commented Apr 19, 2022

dariogoetz commented Apr 19, 2022

Glitchy-Tozier commented Apr 19, 2022

Idea for reducing redundant calculations #41

Idea for reducing redundant calculations #41

Comments

Glitchy-Tozier commented Apr 19, 2022

The Concept

RAM

Caveats

dariogoetz commented Apr 19, 2022

Glitchy-Tozier commented Apr 19, 2022