# Consonant-Vowel Patterns of Short English Words

Have you ever wondered what are the possible [consonant (C) and vowel (V) patterns](https://www.spellzone.com/unit08/page5.cfm) of short words in the English language? Furthermore, with what frequency do these patterns occur? As an example, for four letter words, it is clear that the pattern **VVVV** is impossible and intuitively we would expect that the pattern **CVCC** (e.g. *blab*, *drew*, *shut*, *chew*, *walk*, etc.) is far more common than the pattern **VVVC** (e.g. *aeon* and *eaux* are the only possibilities).

The aim of this notebook is to find out the answers to the above questions for short words from three to six characters in length. We will use words from [All Scrabble Words](http://www.allscrabblewords.com) as our corpus.

In [None]:
from collections import Counter

## Mapping Words to Consonant-Vowel Equivalents

We need to read all the words from a file into a list, convert them to their consonant-vowel representations and then count the number of occurences of each pattern. The code below helps to accomplish this.

In [None]:
def patterns_count(filePath):
    words = _word_list(filePath)
    patterns = Counter(_patterns(words)).most_common()
    return patterns
    

def _patterns(words):
    patterns = []
    for word in words:
        patterns.append(_pattern(word))
        
    assert(len(patterns) == len(words))
    return patterns


def _pattern(word):
    word_lower = word.lower()
    if not word_lower.isalpha():
        raise ValueError(f'Word "{word}" does not consist of only alphabetic characters.')
    
    vowels = list('aeiou')
    pattern = ''
    for letter in word_lower:
        if letter in vowels:
            pattern += 'V'
        else:
            pattern += 'C'
    
    assert('C' in pattern or 'V' in pattern)
    assert(len(pattern) == len(word))
    return pattern


def _word_list(filePath):
    return [line.rstrip('\n') for line in open(file)]

## 3-Letter Words

Let's take a look at the 3-letter words first.

In [None]:
file = '../data/3-letter-words.txt'
patterns = patterns_count(file)
print(f'We have {len(patterns)} possible patterns from {2**3} theoretical possibilities:\n')
display(patterns)

## 4-Letter Words

Now let's take a look at the 4-letter words.

In [None]:
file = '../data/4-letter-words.txt'
patterns = patterns_count(file)
print(f'We have {len(patterns)} possible patterns from {2**4} theoretical possibilities:\n')
display(patterns)

## 5-Letter Words

Now let's take a look at the 5-letter words.

In [None]:
file = '../data/5-letter-words.txt'
patterns = patterns_count(file)
print(f'We have {len(patterns)} possible patterns from {2**5} theoretical possibilities:\n')
display(patterns)

## 6-Letter Words

Finally let's take a look at the 6-letter words.

In [None]:
file = '../data/6-letter-words.txt'
patterns = patterns_count(file)
print(f'We have {len(patterns)} possible patterns from {2**6} theoretical possibilities:\n')
display(patterns)