I made a thing! One that others might like as well: a library of word lists. Use them for word games. Use them to quickly prototype new ideas or generate random words / concepts for inspiration. Use them for a spelling contest. Use them however you like!
You can see it in effect in a game like That's Amorphe. On the website, you can generate a PDF with word cards, which you can print and use for playing. Those words are, obviously, drawn directly from these word lists.
It also powers my Dictionary Tool. It's useful when you play one of my word games---like Keebble---to check if something is a valid word. You know, the thing that starts all lively discussions in Scrabble-like games :p
Current word count: ~11,208
- English words
- Separated by word type (nouns, verbs, ...)
- Then by complexity (core, easy, medium, hard, hardcore)
- And then by category (animals, places, occupations, ...)
- Inside
.txt
files
This is not an exhaustive list, but it's big. The library will grow as I make more of these games and refine the lists.
I split geography and proper names into their own word types. Some players don't like having real-life names in games like these and many projects can't use them. Including them with the nouns has more cons than pros. (They are also most likely to be unknown or outdated.)
Similarly, pronouns and prepositions are their own tiny category. They technically belong to nouns and adverbs, but are pretty useless for most use cases, so I didn't want to include them there by default. (I might change my mind on this once the list is completely done.)
The adverbs category is rather small, with only the most common or irregular ones. Almost all nouns, adjectives and verbs can be turned into an adverb, so I wasn't sure how useful it was to add more.
The verbs category is considered done (~1000 of the most common ones), but could see updates. Because of how many darn verbs there are and how unsure I am about the categorization used.
You can use this any way you like. But it was mostly meant to be used in some digital system (such as a website), so I'll explain more about that below.
You can deliver the words as separate .txt
files (loaded individually). Or you can use the lib-pqWords.json
file that has all data in it already.
I've included a small JavaScript file to collect and query them. This "library" is called PQ_WORDS ( = Pandaqi Words).
- Host the words on your server + the scripts. (They are coded in TypeScript and to be included as ES6 modules.)
- First you load the words. (This takes time. It's an async request, wait for it.)
- Now you can query this list however you like.
Here's an example.
async function getXRandomWords(num)
{
const params = { useAll: true }
await PQ_WORDS.loadWithParams(params);
const wordList = PQ_WORDS.getRandomMultiple(num);
for(const wordData of wordList)
{
console.log(wordData.word);
}
}
getXRandomWords();
The most important functions are (all arguments optional) ...
loadWithParams(params)
: loads (and caches/combines) text files based on parameters you setmethod
: eithertxt
orjson
(default), determines from where it loads the datapath
: a custom path to your words folderuseAll
(boolean): if true, will load everything it hastypes
(string array; nouns): which types you want to loadlevels
(string array; easy): which complexity levels you want to loadcategories
(string array): which categories you want to loaduseAllSubcat
(boolean): if true, auto-includes subcategories of a general categoryuseAllCategories
(boolean): if true, uses all known categories,useAllLevelsBelow
(boolean): if true, auto-includes lower difficulty levels up to the the one you chosetypeExceptions
(string array): which types to excludecategoryExceptions
(string array): which category (or subcategory) to exclude
getRandom()
: gets a random wordgetRandomMultiple(number)
: gets number random words in a listgetAll()
: gets the entire listfindWord(word, fuzziness, maxMatches)
: finds a word.- Fuzziness means how many characters you may be off. The search will "fail", but it provides near-matches.
Word searching builds an index first to make this fast. It uses simple Levenshtein Distance and is quite rudimentary. (If you want seriously fast lookups on huge word lists, this won't do. But in this case it's more than fast enough.)
Words return as a WordData object, with interface ...
getWord()
: the actual wordgetMetadata()
: returns an object with those metadata propertiestype
: word typelevel
: word complexity levelcat
: word categorysubcat
: word sub category (like animals > pets, "general" otherwise)
I used three simple steps. Surely it can be automated further, but I saw no need.
- Call
getAllAsJSON()
on PQ_WORDS => this prints the full word library in your console - Right-click and choose "copy object"
- Paste it inside a JSON file, optionally minify, and you're good to go!
Make sure you right-click the top-level object. Otherwise it'll store whatever sub object you clicked.
They are very easy to open, edit, parse, search-and-replace, etcetera. On any system, with any software, without any delay. There's no overhead.
But requesting ten, twenty, fifty individual .txt
files each time isn't great for (server) performance. That's why I also included an easy way to output/request as a single .json
file.
The alternative was to create a database for this. But the pros didn't outweigh the cons. In general, I try to stay lean and only use a basic folder-file structure for systems. Others might scoff at my badly written JavaScript, but then again, I created this whole system in a week and it works wonders.
I needed them for a number of (board) games. These ideas required you to draw or play with (random) words on your turn. For most of them, the words needed to be very easy / common or within a specific category.
These games needed to be playable by children or people who didn't know that much English. These games are also "party" games, which certainly don't benefit from very complex words. (Nothing kills the fun like having to play with ten words nobody has ever seen.)
I scoured the internet but found nothing. Other word lists were ...
- Not in any usable format
- Too long, filled with words nobody actually knows or uses
- Not separated by type or category
- Inconsistent. They often had duplicates or inconsistent spelling / input
- (Simply ... incorrect? They invented words?)
So I spent a week, a few hours every evening, creating this library.
I saved the word lists I liked the best. I copied all the words, then started designating them myself.
- What type of word is it?
- Do I deem it an easy, medium or hard word?
- Do I recognize a common theme among these words that I can turn into a category?
Remember that I lean towards the easy side. A word is put into the "hard" folder very quickly, if I think there's any chance people won't know what it means. (Or its meaning is too vague / abstract to use in a practical context, like a word association game.)
Words, therefore, come from about 10 different online sources. I also asked others to provide more, but their lists were usually too "specific", so not much of that was used. (They'd give me slang, brand names, inside jokes, or English words that Dutch people use but don't really mean what they think it means.)
Then I realized I was being stupid. I have several word (board) games at home! I can just look at their cards, translate to English, and use that. This was a huge revelation for me and greatly increased the quality of these word lists.
Once done, I went through all categories again and thought "Can I add something? Is there something I'm missing?"
After that, I had thousands of words and a brain that hurt from this project, so I called it quits.
Well, you can always send me a message (email, issue, whatever) and I'll see what I can do. But this is not an "active project": it grows and changes "behind the scenes" as I use it for more projects. So responses might be slow.
For your own purposes, of course, do whatever you want with this.
I've included a simple command-line tool (written in Rust) to easily update or modify the files, once downloaded to your local system.
- Place the
.exe
in the root of the words project - Open a command line, move to that root
Now you can type pqwordshelper
, followed by a command.
-c readfile
=> to read the content of a specific file-c addword
=> to add a new word-c createfile
=> to create a new file-c removeduplicates
=> to both sort and remove unnecessary whitespace/duplicates-c printvaluelist
=> prints all possible categories in a list. (Similarly,printvaluestring
prints them as a single string.) I copy the result of this to my JavaScript file to ensure it knows about all possible categories.
After typing the line, it asks you for the necessary info (one at a time): which category, what filename, etcetera.
If you want to add many words with the same metadata, you can fix those at the start. For example,
pqwordshelper -c addword -type nouns -category animals
Now it will only ask you for the level (easy, medium, ...) and the word each time.
It always creates a backup beforehand in a _backup
folder. On top of that, I've added checks against corruption or accidental deletion---but still, use with caution.
(Feedback? Tell me. It's my first command line tool---and in Rust no less.)