In [7]:
import os
import re

We want to load all source files in, so we can have as many unique words to practice on.

The sources used are listed below, with links to where they were acquired.
- [dwyl_english_words.txt](https://github.com/dwyl/english-words/blob/master/words.txt)

In [18]:
source_path = './source/'
source_files = []

output_path = './subsets/'

for file_name in os.listdir(source_path):
    with open(source_path+file_name,"r") as f:
        source_files.append(f.readlines())

Create a general word bank without any punctuation or digits, to be used for consequent generation of subsets.

In [19]:
non_numeric = re.compile("^[A-Za-z]+$")

word_bank = set()

for source in source_files:
    for word in source:
        if non_numeric.match(word):
            word_bank.add(word.lower())

with open(output_path+'words_english.txt','w') as f:
    f.writelines(word_bank)

With this code, we generate two word banks: one that targets the left hand, and one that targets the right hand.
While practicing with these word banks, I immediately noticed that my right hand is significantly less efficient at typing. My WPM on the left hand word bank is at roughly 70-80, whereas my WPM on the right hand word bank lies between 30 and 50.

In [30]:
left_hand_qwerty = re.compile("^[qwertasdfgzxcvb]{5,}$")
right_hand_qwerty = re.compile("^[yuiophjklnm]{5,}$")

def create_subset_from_regex(name, regex_patt: re.Pattern):
    words = [word for word in word_bank if regex_patt.match(word)]
    with open(output_path+name+'.txt','w') as f:
        f.writelines(words)

create_subset_from_regex('lh_qwerty_english',left_hand_qwerty)
create_subset_from_regex('rh_qwerty_english',right_hand_qwerty)

Our right hand word bank has significantly fewer words than our left hand word bank, due to QWERTY layout. To add some variation and practice aligning my keypresses between the left and right hand, I add the rightmost row on the left hand: t, g, and b.

In [33]:
# right hand + left index
rhplus_qwerty = [re.compile("^[yuiophjklnmtgb]{5,}$"),re.compile("[tgb]+")]
with open(output_path+"rhplus_qwerty_english.txt","w") as f:
    f.writelines([word for word in word_bank if rhplus_qwerty[0].match(word) and rhplus_qwerty[1].match(word)])