# Domesday
## An interactive notebook to explore the PASE Domesday Database

The [Prosopography of Anglo-Saxon England (PASE)](http://pase.ac.uk/) aims to provide information on every recorded inhabitant of Anglo-Saxon England. It offers a [curated database](http://domesday.pase.ac.uk/) representing landholders recorded in Domesday Book.

We can use data from the PASE Domesday database, Pandas, and Jupyter Notebook to explore the structure of pre- and post-Conquest England.

## Setup

Download a CSV copy of the Domesday database.

In [None]:
%%bash
wget --no-clobber --output-document=domesday.csv http://domesday.pase.ac.uk/Domesday?op=7

Import Pandas and friends.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Define helper functions.

In [None]:
def display_side_by_side(*objs, **kwargs):
    """
    Display two Pandas objects side-by-side.

    https://stackoverflow.com/a/47093289
    """
    from pandas.io.formats.printing import adjoin
    space = kwargs.get('space', 8)
    reprs = [repr(obj).split('\n') for obj in objs]
    print(adjoin(space, *reprs))

Clean the data and load it into an SQLite3 database.

In [None]:
import domesday

db = domesday.Database('domesday.db')
db.load_csv('domesday.csv')

## Exploration

Let's examine the structure of the database.

### Fields

`name`

    Person's given name

`gender`

    Person's gender

    Note: the PASE database uses "Male" and "Female" instead of more specific gender terms.
`pase_name`

    Person's unique PASE identifier
`holder_1066`

    Total value of estates held by this person in 1066

`lord_1066`

    Total value of estates held by people connected to this person in 1066

`demesne_1086`

    Total value of estates held in demesne by this person in 1086

`subtenanted_1086`

    Total value of estates let to subtenants in 1086

`subtenant_1086`

    Total value of estates held as a subtenant in 1086

`editor`

    PASE author responsible for record
`editorial status`

    Indicates how complete the person's profile is in the PASE Domesday database.
    
    Refer to the historial research methods for more information:

        <http://domesday.pase.ac.uk/pde/about.jsp#historical-research>

In [None]:
df = db.to_dataframe()
df.head()

The `holder_1066`, `lord_1066`, `demesne_1086`, `subtenanted_1086`, and `subtenant_1086` fields record the total taxable value of land in [hides][].
There is no clear relationship between a hide and the size of a property.
At the time of Domesday, one hide corresponded to &pound;1 of income per annum.

The economy of medieval Britain was vastly different from our own.
There is no direct conversion rate for 1086 pounds to modern pounds.
For our purposes let's assume that &pound;1 (1086) = &pound;7200 (2018)<sup>1</sup>.

[hides]: https://en.wikipedia.org/wiki/Hide_(unit)

## Before and after the Conquest
### Wealth

We can easily caclulate the total value of every landholder's estates in 1066 and 1086.

In [None]:
df['total_1066'] = df.holder_1066 + df.lord_1066
df['total_1086'] = df.demesne_1086 + df.subtenanted_1086 + df.subtenant_1086
df.describe(include=[np.number])

What fraction of landholders were completely dispossessed by the conquest?

In [None]:
has_holdings_1066 = df.total_1066 != 0
has_holdings_1086 = df.total_1086 != 0
no_holdings_1066 = df.total_1066 == 0
no_holdings_1086 = df.total_1086 == 0

df[has_holdings_1066 & no_holdings_1086].shape[0] / df[has_holdings_1066].shape[0]

The vast majority of Anglo-Saxon landholders lost their holdings after the Conquest.

Who were the richest people in England before the conquest?

In [None]:
# columns to exclude from summary reports
detailed_columns = ['gender', 'pase_name', 'holder_1066', 'lord_1066', 'demesne_1086', 'subtenanted_1086', 'subtenant_1086', 'editor', 'editorial_status']

df.drop(detailed_columns, axis=1).sort_values('total_1066', ascending=False).head(10)

How about in 1086?

In [None]:
df.drop(detailed_columns, axis=1).sort_values('total_1086', ascending=False).head(10)

As expected, the nouveau riche belonged to the new Norman aristocracy. Odo and Robert were William's half-brothers.

Did the number of landholders grow after the Conquest, or did William consolidate power during his purge?

In [None]:
fix, ax = plt.subplots()
ax.set_title('Landholders pre- and post-Conquest')
ax.set_ylabel('Number of landholders')

ind = np.arange(2)
ax.set_xticks(ind)
ax.set_xticklabels([1066, 1086])
landholders_by_year = [df[has_holdings_1066].shape[0], df[has_holdings_1086].shape[0]]

ax.bar(ind, landholders_by_year)

Clearly William restructured the aristrocracy in his favour. Who could blame him?

Who held land in 1066, and still came out on top in 1086?

In [None]:
df['absolute_D'] = df.total_1086 - df.total_1066  # total change in land value
df['relative_D'] = df.absolute_D / df.total_1066  # relative change in land value

survivors = (
    df[has_holdings_1066 & has_holdings_1086 & (df.relative_D > 1)]  # did not lose equity
    .drop(detailed_columns, axis=1)
    .sort_values('total_1086', ascending=False)
    .head(10)
)
survivors

Who profited the most (relatively) after the Conquest?

In [None]:
survivors.sort_values('relative_D', ascending=False)

## Names

How many unique names does the Domesday database contain? See `freq` below.

In [None]:
df.describe(include=[np.object])

What were the most common names before the Conquest?

In [None]:
is_anonymous = df.name.isin(['Anonymous', 'Anonymi'])

named_men = (~is_anonymous) & (df.gender == 'Male')
named_women = (~is_anonymous) & (df.gender == 'Female')

display_side_by_side(
    df[named_men & has_holdings_1066].name.value_counts().head(20),
    df[named_women & has_holdings_1066].name.value_counts().head(20),
)

&AElig;lfric and &AElig;lfgifu were the John and Mary of Anglo-Saxon England. After the conquest, we see a large influx of Norman French names. Most &AElig;lfrics and &AElig;lfgifus lost their lands.

In [None]:
has_holdings_1086 = df.total_1086 != 0
display_side_by_side(
    df[named_men & has_holdings_1086].name.value_counts().head(20),
    df[named_women & has_holdings_1086].name.value_counts().head(20),
)

Some landholders only appear in 1066. We can use the `total_1066` and `total_1086` fields to isolate them.

In [None]:
names_men_1066 = df[named_men & has_holdings_1066].name
names_men_1086 = df[named_men & has_holdings_1086].name

names_women_1066 = df[named_women & has_holdings_1066].name
names_women_1086 = df[named_women & has_holdings_1086].name

display_side_by_side(
    names_men_1066[~names_men_1066.isin(names_men_1086)].value_counts().head(30),
    names_women_1066[~names_women_1066.isin(names_women_1086)].value_counts().head(30),
)

Similarly, we can get a better picture of the Norman aristocracy by isolating people who held no land in 1066.

In [None]:
display_side_by_side(
    names_men_1086[~names_men_1086.isin(names_men_1066)].value_counts().head(30),
    names_women_1086[~names_women_1086.isin(names_women_1066)].value_counts().head(30)
)

### Generating random names

You'll recognize by now that there is a pattern to the Anglo-Saxon names we've encountered.
Germanic names traditionally comprise a prefix and a suffix.
For example, Leofgifu is derived from *leof* (friendly) and *gifu* (gift).
Some components appear as both prefixes and suffixes, such as *wulf* or *ulf* (wolf).

We can generate pseudo-Germanic names from the Domesday data.
Perfect for your *Dungeons & Dragons* campaign!

First we need a way to detect syllables in a given string. 
This is a very hard problem, not least because of English orthography.

We could instead use hyphenation rules to approximate syllable boundaries.
We will use [Pyphen](https://github.com/Kozea/Pyphen), a Python wrapper for the OpenOffice hyphenation dictionaries, to detect name components.

In [None]:
from collections import Counter
import random
from typing import (
    List,
    Tuple,
)

import pyphen


def count_components(names: List[str]) -> Tuple[Counter]:
    """
    Count prefixes and suffixes in a list of Germanic names using
    hyphenation rules.
    """
    hyphenator = pyphen.Pyphen(lang='en')
    prefixes, suffixes = Counter(), Counter()
    for name in names:
        for prefix, suffix in hyphenate(name):
            prefixes.update([prefix])
            suffixes.update([suffix])
    return prefixes, suffixes


def generate_names(names: List[str], n: int = 10) -> str:
    """
    Generate n random Germanic names from a list of real names.
    """
    prefixes, suffixes = count_components(names)
    # random.choices generates weighted choices from a population:
    #
    #     >>> random.choices(choices, weights)
    #
    # Use the counter counts as relative weights.
    pairs = zip(
        random.choices(*zip(*prefixes.most_common()), k=n),
        random.choices(*zip(*suffixes.most_common()), k=n),
    )
    for pair in pairs:
        yield ''.join(pair)

Let's try our generator with the list of landholders from 1066.

In [None]:
for name in generate_names(names_men_1066.tolist()):
    print(name)

Those sound pretty good!
Unfortunately, the population of womens' names is much smaller, so names are decidely more predictable.

In [None]:
for name in generate_names(names_women_1066.tolist()):
    print(name)

So far we've generated pseudo-Germanic names that approximate historical names.

For good fun, we can generate pseudo- Norman and Anglo-Norman names.
Of course, French names do not follow the same prefix–suffix convention, so these names are much more fantastic.

In [None]:
# Generate pseudo-Norman mens' names from the list of landholders that only appear in 1086.
for name in generate_names(names_men_1086[~names_men_1086.isin(names_men_1066)].tolist()):
    print(name)

In [None]:
# Generate pseudo-Norman womens' names from the list of landholders that only appear in 1086.
for name in generate_names(names_women_1086[~names_women_1086.isin(names_women_1066)].tolist()):
    print(name)

In [None]:
# Generate pseudo-Anglo-Norman mens' names from the list of all landholders in 1086.
for name in generate_names(names_men_1086.tolist()):
    print(name)

In [None]:
# Generate pseudo-Anglo-Norman mens' names from the list of all landholders in 1086.
for name in generate_names(names_women_1086.tolist()):
    print(name)

---

## Footnotes

1. Adjusted for inflation: &pound;1 = &pound;4800 in 2003 (https://regia.org/research/misc/costs.htm).