Generator of bags of words wordnet word2vec seedlist
===

This is a script to generate bags of words which are both as comprehensive as possible, but also as specific as possible.

To do so, we will use a number of strategies: 


> 1) expansion of seed words, obtaining all synonyms, hyponyms and semantically related words; 

> 2) filter out words with unintended meanings.


We will use two phases of expansion and selection: 

> A. First we will use WordNet to obtain synonyms and hyponyms of the seed word, and then prune them according to meaning.

> B. Second, we will use Word2vec to: 
>>  B1. Create a semantic vector map of your custom corpus;

>>  B2. Use the semantic map to evaluate whether the semantic cloud of each word is consistent with the intended meaning. E.g. is 'brawn' in our corpus closer to words related to grease or to strenght?

>>  B3. Filter out clouds of words with irrelevant meanings, and add new words from the appropriate clouds if meaningful.

In [1]:
import jupyter_black

jupyter_black.load()

In [10]:
import os
import re
import nltk
from nltk.corpus import wordnet as wn
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import ipywidgets as widgets
from itertools import compress
import contractions
from gensim.models.word2vec import Word2Vec
from num2words import num2words
import itertools


## from NLTK import wordnet, stopwords, Lemmatizer

# Download if necessary
try:
    os.path.exists(os.path.expanduser("~/nltk_data/"))
except:
    nltk.download("all")

layout = widgets.Layout(width="auto")


language_wordnet = "eng"
stopword_language = "english"
number_language = "en"  # for converting numbers to words


lemma = WordNetLemmatizer()

The main free parameters are the list of seed words and the language in which you want to obtain the final list of words: note you can add the seed words in english and obtain the final word list in your language of choice

In [11]:
## to see available wordnet languages and their codes type  wn.langs()
print("wordnet languages")
print(wn.langs())

## to see available stopword languages and their codes type stopwords.fileids()
print("stopword languages")
print(stopwords.fileids())

wordnet languages
dict_keys(['eng', 'als', 'arb', 'bul', 'cmn', 'dan', 'ell', 'fin', 'fra', 'heb', 'hrv', 'isl', 'ita', 'ita_iwn', 'jpn', 'cat', 'eus', 'glg', 'spa', 'ind', 'zsm', 'nld', 'nno', 'nob', 'pol', 'por', 'ron', 'lit', 'slk', 'slv', 'swe', 'tha'])
stopword languages
['arabic', 'azerbaijani', 'basque', 'bengali', 'catalan', 'chinese', 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'greek', 'hebrew', 'hinglish', 'hungarian', 'indonesian', 'italian', 'kazakh', 'nepali', 'norwegian', 'portuguese', 'romanian', 'russian', 'slovene', 'spanish', 'swedish', 'tajik', 'turkish']


## List of seed words 

the main part

In [12]:
seed_words = ["finance", "money", "invest"]

### Part A. Use WordNet to obtain synonyms and hyponyms of the seed word ...

... and then prune them according to meaning.

In [13]:
def generate_word_list(seed_word, language):
    """
    input is a seed word and a language, output is a list of words, length of the list, and a list of synsets

    """
    ## we create an empty list to store the final word list
    list_of_lemmas = []
    list_of_meanings = []

    ## a function to add a word to a list
    add_to_list = lambda list1, item1: list1.append(item1)

    ## a function to return the hyponyms of a synset
    hypos = lambda s: s.hyponyms()

    ## wn.synset obtains the list of synonyms and meanings for that word, in different syntactic categories
    meanings = wn.synsets(seed_word, pos=wn.NOUN + wn.VERB + wn.ADJ)

    ## loop over set of meanings in synset
    for meaning in meanings:

        ## add synset, definition, and a list of all associated lemmas into the list_of_meanings
        list_of_meanings += [
            [
                meaning,
                meaning.definition(),
                [lemma.name() for lemma in meaning.lemmas(language)],
            ]
        ]

        ## append all synonyms (lemmas()) of that meaning to the list_of_lemmas
        [
            add_to_list(list_of_lemmas, lemma.name())
            for lemma in meaning.lemmas(language)
        ]

        ## loop over the list of all possible hyponyms
        for hyponym in meaning.closure(hypos):

            ## add synsets, definition, and a list of all associated lemmas into the list_of_meanings
            list_of_meanings += [
                [
                    hyponym,
                    hyponym.definition(),
                    [lemma.name() for lemma in hyponym.lemmas(language)],
                ]
            ]

            ## append all synonyms (lemmas()) of that hyponym to the list_of_lemmas
            [
                add_to_list(list_of_lemmas, lemma.name())
                for lemma in hyponym.lemmas(language)
            ]

    ##eliminate list duplications by applying the set transformation
    set_of_lemmas = [*set(list_of_lemmas)]

    ## sort alphabetically
    set_of_lemmas.sort()

    ##length
    length = len(set_of_lemmas)

    return (set_of_lemmas, length, list_of_meanings)

#### Loop to run the function for every seed in the list of seed words

In [14]:
list_meanings = []

##create list of meanings
for seed_word in seed_words:
    meanings = generate_word_list(seed_word, language_wordnet)[2]
    list_meanings += meanings

##eliminate list (of lists) duplications by applying the set transformation
list_meanings.sort()

## groupby also eliminates duplications
list(list_meanings for list_meanings, _ in itertools.groupby(list_meanings))
print(list_meanings)

[[Synset('appropriation.n.01'), 'money set aside (as by a legislature) for a specific purpose', ['appropriation']], [Synset('arbitrage.n.01'), 'a kind of hedged investment meant to capture slight differences in price; when there is a difference in the price of something on two different markets the arbitrageur simultaneously buys at the lower price and sells at the higher price', ['arbitrage']], [Synset('back.v.05'), 'support financial backing for', ['back']], [Synset('banking.n.01'), 'engaging in the business of keeping money for savings and checking accounts or for exchange or for issuing loans and credit etc.', ['banking']], [Synset('banking.n.02'), 'transacting business with a bank; depositing or withdrawing funds or requesting a loan etc.', ['banking']], [Synset('boodle.n.01'), 'informal terms for money', ['boodle', 'bread', 'cabbage', 'clams', 'dinero', 'dough', 'gelt', 'kale', 'lettuce', 'lolly', 'lucre', 'loot', 'moolah', 'pelf', 'scratch', 'shekels', 'simoleons', 'sugar', 'wam

### Select the appropriate meanings

In [16]:
selection_widget = widgets.VBox(
    [
        widgets.Checkbox(
            value=True,
            description=str(item),
            disabled=False,
            indent=False,
            layout=layout,
        )
        for item in list_meanings
    ]
)

selection_widget

VBox(children=(Checkbox(value=True, description="[Synset('appropriation.n.01'), 'money set aside (as by a legi…

### Create filtered list of meanings from the above selection

In [20]:
filtered_meanings = list(
    compress(list_meanings, [widget.value for widget in selection_widget.children])
)

filtered_list = [word for lemmas in filtered_meanings for word in lemmas[2]]

# eliminate duplications and sort alphabetically
filtered_list = sorted([*set(filtered_list)])

print(filtered_list)

['Civil_List', 'ETF', 'adorn', 'appropriation', 'arbitrage', 'back', 'bank_deposit', 'banking', 'big_bucks', 'big_money', 'boodle', 'bread', 'budget', 'bull', 'bundle', 'buy_into', 'cabbage', 'clams', 'clothe', 'commit', 'consecrate', 'coronate', 'corporate_finance', 'cover', 'crown', 'demand_deposit', 'deposit', 'dinero', 'dough', 'empower', 'endow', 'endue', 'enthrone', 'exchange_traded_fund', 'finance', 'financing', 'floatation', 'flotation', 'foreign_direct_investment', 'fund', 'funding', 'gelt', 'gift', 'high_finance', 'home_banking', 'index_fund', 'induct', 'indue', 'invest', 'investing', 'investment', 'job', 'kale', 'lettuce', 'leverage', 'leveraging', 'lolly', 'loot', 'lucre', 'megabucks', 'monetary_fund', 'money', 'moolah', 'mutual_fund', 'nest_egg', 'operating_budget', 'ordain', 'order', 'ordinate', 'pelf', 'pension_fund', 'petty_cash', 'pile', 'place', 'pork', 'pork_barrel', 'put', 'refinance', 'revolving_fund', 'risk_arbitrage', 'roll_over', 'savings', 'scratch', 'seat', 's

### Part B

#### B1. Create a semantic vector map of your custom corpus

Word2Vec will generate a semantic vector map in which words are closer in a vector manifold if they are semantically more similar. For an introduction to Word2Vec, see the tutorial:

https://medium.com/@zafaralibagh6/a-simple-word2vec-tutorial-61e64e38a6a1

Basically, word2vec will evaluate the context in which words are used (which other words co-occur with them) and map each word in the multi-dimensional space in relation to other words. 

To create that vector space, we need to feed a list of sentences to word2vec.So first we need a dataset of texts to input to word2vec in order to create the semantic vector map. 

You can also download a pretrained model: see here https://radimrehurek.com/gensim/models/word2vec.html

**If you are using colab, we simply copy paste a zipfile with all the texts, if using jupyter, place the zip folder in the same directory as the script**

In [None]:
## set the zip_dir based on the name of the folder that you compressed
zip_dir = "/../data/text"

base_dir1 = os.getcwd()
root_folder = base_dir1 + zip_dir

Here is set of processing functions to clean the input texts

In [40]:
def clean_url(input):
    output = re.sub(r"http\S+", "", input)
    return output


def fix_contraction(input):
    output = contractions.fix(input)
    return output


def clean_non_alphanumeric(input):
    output = re.sub(r"[^a-zA-Z0-9]", " ", input)
    return output


def clean_lowercase(input):
    output = str(input).lower()
    return output


def clean_tokenization(input):
    output = nltk.word_tokenize(input)
    return output


def clean_stopwords(input):
    stop_words = set(stopwords.words(stopword_language))
    output = [item for item in input if item not in stop_words]
    return output


def numbers_to_words(input):
    output = []
    for item in input:
        if item.isnumeric() == True:
            output += [num2words(item, lang=number_language)]
        else:
            output += [item]
    return output


def clean_lemmatization(input):
    output = [lemma.lemmatize(word=w, pos="v") for w in input]
    return output


def clean_length(input):
    output = [word for word in input if len(word) > 2]
    return output


def count_word_frequencies(input):
    word_count = nltk.FreqDist(input)
    word_count_most_common = word_count.most_common()
    output = [(count[0], count[1] / len(input)) for count in word_count_most_common]
    return output


def convert_to_string(input):
    output = " ".join(input)
    return output

## here we can select the relevant functions to add to the preprocessing pipeline

In [55]:
def pre_processing_pipeline(input):
    w = clean_url(input)

    w = fix_contraction(w)
    w = clean_non_alphanumeric(w)
    w = clean_lowercase(w)
    w = clean_tokenization(w)
    w = numbers_to_words(w)
    w = clean_stopwords(w)
    w = clean_lemmatization(w)
    clean_list = clean_length(w)
    # word_frequencies = count_word_frequencies(clean_list)
    # filtered_string = convert_to_string(clean_list)

    # return(filtered_string,word_frequencies,clean_list)
    return clean_list

We now create a list of sentences, each sentence a list of words. This is the necessary input for word2vec.

In [42]:
def literary_words_list(root_folder):
    """

    Parameters
    ----------
    root_folder : a file path where .txt files are located
    e.g. 'c:\\Users\\Maria\\Dropbox\\Maria Brackin\\Finance and Text Analysis\\texts'

    Returns
    -------
    A list of the paragraphs, each paragraph a list of words


    """

    # Create list for clean sentences
    words_list = []

    # Iterates over the path, folders and subfolders looking for txt files
    for path, subdirs, files in os.walk(root_folder):
        for file in files:
            if ".txt" in file and "model" not in file:
                print(file)
                name = os.path.join(path, file)
                file_text = open(name, encoding="utf-8").read()

                # Creates a list of paragraphs - lines
                text_list_paragraphs = file_text.split("\n")

                # Iterate over paragraphs
                for paragraph in text_list_paragraphs:

                    ## remove \r at the end of some paragraphs
                    paragraph = paragraph.replace("\r", "")

                    ## for each pragaph create a list of sentences
                    list_of_sentences = nltk.sent_tokenize(paragraph)

                    ## for each sentence apply the preprocessing pipeline, including sord tokenization, and add the tokeized sentence list to the final words_list
                    for sentence in list_of_sentences:
                        # print(sentence)
                        # Add the paragraphs to the word2vec input list
                        words_list += [pre_processing_pipeline(sentence)]

    return words_list

We run the function to create a list of sentences, each with a list of words (we also apply the preprocessing pipeline to clean the texts)

In [43]:
# Create list of sentences from .txt files of literary plays

word2vec_input = literary_words_list(root_folder)

CareyGeorge Saville-1770-Analects in verse an22.txt
CentlivreSusanna-1707-The platonick lady 26.txt
KingThomas-1769-Wit's last stake A 19.txt
PhilipsWilliam-1700-St StephensGreen 213.txt
Middleton Thomas-1606-The Revenger's Tragedy.txt
ShadwellThomas-1669-The royal shepherdes284.txt
Shirley James-1632-The Ball.txt
Nabbes Thomas-1637-Hannibal and Scipio.txt
RavenscroftEdward-1673-The careless lovers 244.txt
RavenscroftEdward-1687-Titus Andronicus or234.txt
HullThomas-1767-The perplexities a 172.txt
Munday Anthony-1598-The Death of Robert Earl of Huntingdo.txt
CorneillePierre-1664-Pompey the Great a t45.txt
MacklinCharles-1783-The trueborn Irishm214.txt
Davenport Robert-1625-A New Trick to Cheat the Devil.txt
RavenscroftEdward-1697-The anatomist or T242.txt
Dekker Thomas-1599-Old Fortunatus.txt
OtwayThomas-1675-Alcibiades a tragedy210.txt
EtheregeGeorge-1668-She wou'd if she cou98.txt
WaldronF G Francis Godolphin-1797-The virgin queen a 295.txt
LacyJohn-1684-Sr Hercules Buffoon158.txt
Cr

The necessary input for word2vec is a list of sentences, each a list of words


In [44]:
word2vec_input[:100]

[['room', 'brainly', 'house'],
 ['yes',
  'truly',
  'sir',
  'month',
  'past',
  'four',
  'five',
  'clock',
  'morning',
  'could',
  'think',
  'till',
  'day',
  'green',
  'pasture',
  'saw',
  'celon',
  'bill',
  'coo',
  'like',
  'young',
  'pidgeons'],
 ['indeed'],
 ['odds',
  'life',
  'remember',
  'coax',
  'young',
  'pug',
  'get',
  'posies',
  'suppose'],
 ['sixteen'],
 ['come', 'hold', 'nose', 'morning'],
 ['dear', 'papa', 'jade'],
 ['think', 'best', 'tell', 'worship', 'least', 'harm', 'might', 'come'],
 ['mighty', 'industrous', 'soul', 'indeed'],
 ['thou',
  'better',
  'mind',
  'thy',
  'business',
  'think',
  'strive',
  'like',
  'ill',
  'natur',
  'fool',
  'set',
  'child',
  'father',
  'variance'],
 ['hold',
  'tongue',
  'pray',
  'madam',
  'let',
  'speak',
  'see',
  'follies',
  'though',
  'honest',
  'tell'],
 ['sure', 'brainly', 'may', 'speak', 'turn'],
 ['turn', 'yet', 'madam'],
 ['sorry',
  'dear',
  'see',
  'angry',
  'think',
  'approve',
  '

Now we build the semantic vector space by adding the Sentence corpus to word2vec

In [45]:
## Here we build the vector space with Word2Vec

SentenceCorpus = word2vec_input
word2vec_output = Word2Vec(SentenceCorpus, min_count=1)

In [46]:
## Save vector space

word2vec_output.save("w2v_model.txt")
model = Word2Vec.load("w2v_model.txt")

In [47]:
## Note, you can also import pretrained models, using the following syntax:

"""
import gensim.downloader
# Show all available models in gensim-data
print(list(gensim.downloader.info()['models'].keys()))
['fasttext-wiki-news-subwords-300',
 'conceptnet-numberbatch-17-06-300',
 'word2vec-ruscorpora-300',
 'word2vec-google-news-300',
 'glove-wiki-gigaword-50',
 'glove-wiki-gigaword-100',
 'glove-wiki-gigaword-200',
 'glove-wiki-gigaword-300',
 'glove-twitter-25',
 'glove-twitter-50',
 'glove-twitter-100',
 'glove-twitter-200',
 '__testing_word2vec-matrix-synopsis']

# Download the "glove-twitter-25" embeddings - this will take a while, and longer for larger corpora - e.g. google-news
model = gensim.downloader.load('glove-twitter-25')
"""

'\nimport gensim.downloader\n# Show all available models in gensim-data\nprint(list(gensim.downloader.info()[\'models\'].keys()))\n[\'fasttext-wiki-news-subwords-300\',\n \'conceptnet-numberbatch-17-06-300\',\n \'word2vec-ruscorpora-300\',\n \'word2vec-google-news-300\',\n \'glove-wiki-gigaword-50\',\n \'glove-wiki-gigaword-100\',\n \'glove-wiki-gigaword-200\',\n \'glove-wiki-gigaword-300\',\n \'glove-twitter-25\',\n \'glove-twitter-50\',\n \'glove-twitter-100\',\n \'glove-twitter-200\',\n \'__testing_word2vec-matrix-synopsis\']\n\n# Download the "glove-twitter-25" embeddings - this will take a while, and longer for larger corpora - e.g. google-news\nmodel = gensim.downloader.load(\'glove-twitter-25\')\n'

### B2. Use the semantic map to evaluate whether the semantic cloud of each word is consistent with the intended meaning.

E.g. is 'brawn' in our corpus closer to words related to grease or to strenght?

In [48]:
print(filtered_list)

['Civil_List', 'ETF', 'adorn', 'appropriation', 'arbitrage', 'back', 'bank_deposit', 'banking', 'big_bucks', 'big_money', 'boodle', 'bread', 'budget', 'bull', 'bundle', 'buy_into', 'cabbage', 'clams', 'clothe', 'commit', 'consecrate', 'coronate', 'corporate_finance', 'cover', 'crown', 'demand_deposit', 'deposit', 'dinero', 'dough', 'empower', 'endow', 'endue', 'enthrone', 'exchange_traded_fund', 'finance', 'financing', 'floatation', 'flotation', 'foreign_direct_investment', 'fund', 'funding', 'gelt', 'gift', 'high_finance', 'home_banking', 'index_fund', 'induct', 'indue', 'invest', 'investing', 'investment', 'job', 'kale', 'lettuce', 'leverage', 'leveraging', 'lolly', 'loot', 'lucre', 'megabucks', 'monetary_fund', 'money', 'moolah', 'mutual_fund', 'nest_egg', 'operating_budget', 'ordain', 'order', 'ordinate', 'pelf', 'pension_fund', 'petty_cash', 'pile', 'place', 'pork', 'pork_barrel', 'put', 'refinance', 'revolving_fund', 'risk_arbitrage', 'roll_over', 'savings', 'scratch', 'seat', 's

In [49]:
def get_word2vec_list(word_list, model):

    """
    Function to use word2vec to inquiry about the 10 most similar semantically words to each seed word in word_list
    """

    list_of_word2vec_lists = []
    for word in word_list:
        try:

            ## here is the crucial line - we are using the model that we trained to get the most similar words within our corpus
            list_vects = model.wv.most_similar([word], topn=10)

            new_list = []
            new_list += [word]
            for item in list_vects:
                word1 = item[0]
                new_list += [word1]

            # print(new_list)
            # print('\n')
            list_of_word2vec_lists += [new_list]

        except KeyError:
            continue
    return list_of_word2vec_lists

In [50]:
# create lists of ecologically valid words
check_vectorSpace = get_word2vec_list(filtered_list, model)

## we print the cloud of the 20 words most similar to each of the lemmas in our filtered list
for item in check_vectorSpace:
    print(item)

['adorn', 'wreath', 'deck', 'wreaths', 'glitter', 'ornament', 'grac', 'wreathe', 'garland', 'laurel', 'display']
['appropriation', 'badie', 'lievtenants', 'alchammon', 'brittaine', 'transmuter', 'alcides', 'epocha', 'handkerchers', 'mussulmen', 'huju']
['back', 'along', 'home', 'away', 'thither', 'holburn', 'coach', 'stairs', 'panadarus', 'thence', 'boathire']
['boodle', 'cofe', 'rawness', 'supp', 'september', 'debeat', 'superb', 'repast', 'gowell', 'coverture', 'felleo']
['bread', 'cheese', 'beef', 'victual', 'butter', 'meat', 'scrap', 'porridge', 'mutton', 'meal', 'ale']
['budget', 'westphalia', 'thrash', 'pies', 'snip', 'kett', 'cole', 'jug', 'ladle', 'magog', 'snush']
['bull', 'whelp', 'hog', 'frog', 'goose', 'elephant', 'crow', 'cat', 'ram', 'hare', 'rabbit']
['bundle', 'thimble', 'bushel', 'chests', 'poke', 'tin', 'slop', 'budget', 'rix', 'leathern', 'buckram']
['cabbage', 'prune', 'whey', 'muskadine', 'barley', 'veal', 'bunch', 'puddings', 'chickens', 'potato', 'curd']
['clothe'

### B3. Filter out clouds of words with irrelevant meanings, and add new words from the appropriate clouds if meaningful.

We can now use the same widget to select the relevant semantic coulds, i.e. the ones which reflect the meanings we are interested in (this selection is a bit subjective, I would say that the cloud is approproate if either the first 5 words, or more than 10 overall have semantically appropriate meanings

In [51]:
import ipywidgets as widgets
from itertools import compress

layout = widgets.Layout(width="auto")

selection_widget2 = widgets.VBox(
    [
        widgets.Checkbox(
            value=True,
            description=str(item),
            disabled=False,
            indent=False,
            layout=layout,
        )
        for item in check_vectorSpace
    ]
)

selection_widget2

VBox(children=(Checkbox(value=True, description="['adorn', 'wreath', 'deck', 'wreaths', 'glitter', 'ornament',…

In [52]:
## Here is the filtered list of semantic clouds

filtered_word2vec = list(
    compress(check_vectorSpace, [widget.value for widget in selection_widget2.children])
)

## now we extract just the words
flat_list = [word for lists in filtered_word2vec for word in lists]

## eliminate duplications, and sort alphabetically
flat_list = [*set(flat_list)]
flat_list.sort()
print(flat_list)

['35dozen', '41he', 'absint', 'abstract', 'acceptance', 'accessary', 'accomplishments', 'accompts', 'accuse', 'ach', 'acquit', 'addition', 'adorn', 'adorne', 'adultery', 'alchammon', 'alcides', 'ale', 'allot', 'along', 'altars', 'ammonian', 'apartments', 'apparel', 'appeas', 'appertain', 'apples', 'appointment', 'appropriation', 'aqua', 'attribute', 'aubrey', 'away', 'bacchi', 'back', 'badie', 'bak', 'bargain', 'barley', 'barony', 'barren', 'bedeck', 'beef', 'bench', 'benevolence', 'bithynia', 'bloodpropped', 'boathire', 'boodle', 'boughs', 'bounteous', 'bounty', 'bouse', 'brandy', 'bread', 'brittaine', 'buckram', 'budget', 'bull', 'bunch', 'bundle', 'bushel', 'butter', 'cabbage', 'cake', 'calculation', 'calf', 'camply', 'canary', 'capons', 'carnations', 'carri', 'carrot', 'carry', 'cash', 'cat', 'chaplet', 'chaplets', 'chaque', 'chaseth', 'chaw', 'cheese', 'chests', 'chew', 'chickens', 'christning', 'circumscribe', 'clap', 'claw', 'cloaths', 'cloth', 'clothe', 'clout', 'coach', 'cofe'

## Now we do a final check to eliminate words that are way off

In [53]:
selection_widget3 = widgets.VBox(
    [
        widgets.Checkbox(
            value=True,
            description=str(item),
            disabled=False,
            indent=False,
            layout=layout,
        )
        for item in flat_list
    ]
)

selection_widget3

VBox(children=(Checkbox(value=True, description='35dozen', indent=False, layout=Layout(width='auto')), Checkbo…

In [54]:
final_list = list(
    compress(flat_list, [widget.value for widget in selection_widget3.children])
)

print(final_list)

['35dozen', '41he', 'absint', 'abstract', 'acceptance', 'accessary', 'accomplishments', 'accompts', 'accuse', 'ach', 'acquit', 'addition', 'adorn', 'adorne', 'adultery', 'alchammon', 'alcides', 'ale', 'allot', 'along', 'altars', 'ammonian', 'apartments', 'apparel', 'appeas', 'appertain', 'apples', 'appointment', 'appropriation', 'aqua', 'attribute', 'aubrey', 'away', 'bacchi', 'back', 'badie', 'bak', 'bargain', 'barley', 'barony', 'barren', 'bedeck', 'beef', 'bench', 'benevolence', 'bithynia', 'bloodpropped', 'boathire', 'boodle', 'boughs', 'bounteous', 'bounty', 'bouse', 'brandy', 'bread', 'brittaine', 'buckram', 'budget', 'bull', 'bunch', 'bundle', 'bushel', 'butter', 'cabbage', 'cake', 'calculation', 'calf', 'camply', 'canary', 'capons', 'carnations', 'carri', 'carrot', 'carry', 'cash', 'cat', 'chaplet', 'chaplets', 'chaque', 'chaseth', 'chaw', 'cheese', 'chests', 'chew', 'chickens', 'christning', 'circumscribe', 'clap', 'claw', 'cloaths', 'cloth', 'clothe', 'clout', 'coach', 'cofe'