# Polish Keyboard Layout Optimizer

## Assumptions and goal
Let's dive straight in: our mission is to craft the ultimate keyboard layout for typing in Polish. How do we define "ultimate"? Simple. It's all about minimizing the distance your fingers need to travel while typing. Less travel, more speed, better efficiency.

### What does the best mean?
In the realm of keyboard layouts, "best" is all about efficiency. It's a game of finding a layout where your fingers barely have to move to hit the right keys. Think of it as choreographing a finger ballet—smooth, swift, and efficient.

### Our Target User
Our muse is the classic 10-finger typist, using the traditional home row keys. We're designing for the typist who's all about A, S, D, F, and J, K, L, ; - but with a Polish twist. This means embracing all the unique characters of the Polish language.

### A Note on Variability
Here's where it gets spicy: different datasets lead to different layouts. From formal documents to casual tweets, the variety in language use means our optimal layout might shift a bit. And remember, keyboards themselves are diverse creatures. The spacing and placement of keys can vary, so our ideal layout might not be universally perfect.

## Methodology
It's all about the data. We're diving deep into the CC100-Polish dataset to explore common words and typing patterns, including those unique Polish characters. Feel adventurous? Mix in other datasets for a broader view. Our goal? A keyboard layout that feels like it's reading your mind, with the least amount of finger travel possible.

## Imports

In [28]:
import random
import numpy as np

## Calculate distance between keystrokes

In [47]:
qwerty_layout = {
    'q': (0, 0), 'w': (0, 1), 'e': (0, 2), 'r': (0, 3),
    't': (0, 4), 'y': (0, 5), 'u': (0, 6), 'i': (0, 7), 'o': (0, 8), 'p': (0, 9),
    'a': (1, 0), 's': (1, 1), 'd': (1, 2), 'f': (1, 3),
    'g': (1, 4), 'h': (1, 5), 'j': (1, 6), 'k': (1, 7), 'l': (1, 8), ';': (1, 9),
    'z': (2, 0), 'x': (2, 1), 'c': (2, 2), 'v': (2, 3),
    'b': (2, 4), 'n': (2, 5), 'm': (2, 6), ',': (2, 7), '.': (2, 8), '/': (2, 9)
}

In [62]:
def calculate_keystroke_distance(key1, key2, layout):
    # Define the distances
    standard_distances = {
        'same_col': {(1, 0): 1.032, (0, 1): 1.032, (1, 2): 1.118, (2, 1): 1.118, (2, 0): 2.138, (0, 2): 2.138},  # middle to upper, middle to lower, bottom to top
        'diff_col': {'middle_upper': 1.032, 'middle_lower': 1.118},
    }
    special_cases = {
        ((1, 3), (0, 4)): 1.247, ((1, 5), (0, 6)): 1.247, # F to T, H to U
        ((1, 4), (0, 3)): 1.605, ((1, 6), (0, 5)): 1.605, # G to R, J to Y
        ((2, 4), (1, 3)): 1.803, ((2, 6), (1, 5)): 1.803, # B to F, M to H
        ((2, 4), (0, 3)): 2.661, ((2, 6), (0, 5)): 2.661, # B to R, M to Y
        ((2, 3), (0, 4)): 2.015, ((2, 5), (0, 6)): 2.015, # V to T, N to U
        ((1, 3), (1, 4)): 1.000, ((1, 5), (1, 6)): 1.000  # F to G, J to H
    }
    
    # Get positions
    pos1, pos2 = layout[key1], layout[key2]

    # If the second key is a default position, distance is 0
    if key1 == key2:
        return 0

    # Same column cases
    if pos1[1] == pos2[1]:
        return standard_distances['same_col'][(pos1[0], pos2[0])]
    
    # Adjust for special cases based on columns
    if pos1[1] not in [3, 4] and pos2[1] in [3, 4]:
        pos1 = (1, 3)  # Adjust to F (QWERTY) position
    elif pos1[1] not in [5, 6] and pos2[1] in [5, 6]:
        pos1 = (1, 6)  # Adjust to J (QWERTY) position
        
    # Special cases
    if (pos1, pos2) in special_cases:
        return special_cases[(pos1, pos2)]
    if (pos2, pos1) in special_cases:
        return special_cases[(pos2, pos1)]

    if key2 in 'asdfghjkl; ':
        return 0
    
    # Different column cases
    # Determine middle to top or middle to bottom distance
    if pos2[0] == 0:  # Top row
        return standard_distances['diff_col']['middle_upper']
    elif pos2[0] == 2:  # Bottom row
        return standard_distances['diff_col']['middle_lower']

# Example of using the function
calculate_keystroke_distance('v', 'l', qwerty_layout)

0

## Load Data

In [64]:
def load_and_process_file(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        data = file.read(2000000)

    # Replace specified characters to non-alt ones
    replacements = {
        '<': ',', 
        '>': '.', 
        ':': ';'
    }
    for old, new in replacements.items():
        data = data.replace(old, new)

    # Convert to lowercase
    data = data.lower()

    # Replace Polish characters
    polish_replacements = {
        'ą': 'a', 'ę': 'e', 'ó': 'o', 
        'ż': 'z', 'ź': 'x', 'ń': 'n', 
        'ć': 'c', 'ś': 's', 'ł': 'l', 
    }
    
    for old, new in polish_replacements.items():
        data = data.replace(old, new)

    # Remove all characters except the specified ones
    allowed_chars = ',.<>;:abcdefghijklmnopqrstuvwxyz'
    data = ''.join([c for c in data if c in allowed_chars])

    # Remove numbers
    data = ''.join([c for c in data if not c.isdigit()])

    return data

# Example usage
file_path = 'pl.txt'  # Adjust the file path if necessary
processed_data = load_and_process_file(file_path)
print(processed_data)


tezwlasniemyslalemokolialewolalemsieupewniczebyjakosnieuszkodzicszczotki;katalogszczotekelektrycznychfirmysimarumiescilemtenkatalogwinstrukcjealezostalusunietyprzezmoderatora.dodajeponownietutaj,poniewazkilkaosobprosiloopodeslanienapw.jeslizlydzialtoproszeoprzeniesienielubwskazaniemiejscagdziemogeudostepnictenkatalog.elektromaszynyiurzadzeniapiottrolip;odpowiedzi;wyswietlen;wczesniejpisalemrazemzatdarkdarkmanipisalemowiekszejszczotce,aklejpowiniendacspokojniewtwoimwypadkuradeelpoxezfixklejepoksydowyprzewodzacypradelektryczny,dwuskladnikowy.klejtenprzeznaczonyjestdowykonywaniapolaczenwszedzietam,gdzieniemozliwejeststosowanie...witam.welektrycznymwozkuwidlowympojawilsiebladnr.,niemamozliwoscizmianypredkosci.czasamipojawiasieostrzezenie.wdtrlistabledowzostalamocnookrojonadotychnajczesciejwystepujacych.bateriarozladowananiewlasciwabateriaaktywnyhamulecpostojowypo...witamostatniouelektrykawymienianowmojejskodzieszczotkiwalternatorze.pojakimstygodniuzorientowalemsie,zeniedzialaelektrycznareg