# Hermits Need to Wax Their Signs

Look—I don't know for sure that that waxing written (vs. unwritten) signs reduces lag, but I still think it's fair to say that it's **good practice** when it comes to preparing an archival world download. And the sheer number of unwaxed signs I've found throughout the world is making me cry.

I could just go around with a bunch of honeycomb... or I could just write a script to fix that for everyone.

In [1]:
import json
import re
from collections import Counter
from collections.abc import Collection
from concurrent.futures import ProcessPoolExecutor, as_completed
from functools import partial
from os import environ
from pathlib import Path
from typing import Any

import mutf8
import pandas as pd
from IPython.display import Markdown, display
from nbt import nbt, region

In [2]:
def format_file_size(path: Path) -> str:
    """Print the size of the specified file in
    human-readible form (KB / MB / GB)

    Parameters
    ----------
    path : Path
        The path to the file

    Returns
    -------
    str
        A prettily formatted file size

    Notes
    -----
    I would be shocked if there isn't a utility already built
    into the standard library to do this, but all I could find
    via Googling was a bunch of recipes and examples
    """
    size: float = path.stat().st_size  # in bytes
    for unit in ("B", "KB", "MB", "GB"):
        if size < 1024 / 2:
            return f"{size:.1f} {unit}"
        size /= 1024
    return f"{size} TB"

In [3]:
def summarize_keystore(keystore: dict[str, Any]) -> None:
    """Display a summary of the contents of a key-value store

    Parameters
    ----------
    keystore : dict
        The keystore to summarize

    Returns
    -------
    None
    """

    def _summarize_keystore(keystore: dict[str, Any]) -> str:
        summary = ""
        for k, v in keystore.items():
            summary += f"\n - `{k}` : "
            if isinstance(v, (str, nbt.TAG_String)):
                summary += f'`"{v}"`'
            elif not isinstance(v, Collection):
                summary += f"`{str(v)}`"
            else:
                length = len(v)
                if 0 < length < 3:
                    summary += "\n"
                    if not isinstance(v, dict):
                        v = {i: item for i, item in enumerate(v)}
                    summary += "\n".join(
                        (f"\t{line}" for line in _summarize_keystore(v).split("\n"))
                    )
                else:
                    summary += f"({len(v)} items)"
        return summary

    display(Markdown(_summarize_keystore(keystore)))

In [4]:
save_folder = Path(environ["SAVE_PATH"])

# make sure this is set correctly
for path in sorted(save_folder.glob("*")):
    print(f"- {path.name} ({'folder' if path.is_dir() else format_file_size(path)})")

- DIM-1 (folder)
- DIM1 (folder)
- advancements (folder)
- audio_player_data (folder)
- carpet-fixes.conf (22.0 B)
- carpet.conf (57.0 B)
- data (folder)
- datapacks (folder)
- entities (folder)
- icon.png (9.0 KB)
- level.dat (3.3 KB)
- level.dat_old (3.3 KB)
- playerdata (folder)
- poi (folder)
- region (folder)
- resources.zip (34.0 MB)
- scripts (folder)
- session.lock (3.0 B)
- stats (folder)


In [5]:
all_overworld_regions = sorted(
    (save_folder / "region").glob("*"), key=lambda path: -path.stat().st_size
)
all_nether_regions = sorted(
    (save_folder / "DIM-1" / "region").glob("*"), key=lambda path: -path.stat().st_size
)
all_end_regions = sorted(
    (save_folder / "DIM1" / "region").glob("*"), key=lambda path: -path.stat().st_size
)
all_region_files = all_overworld_regions + all_nether_regions + all_end_regions
for path in all_region_files[:10]:
    print(f"- {path.name} ({'folder' if path.is_dir() else format_file_size(path)})")
print(f"... {len(all_region_files) - 10} more")

- r.2.-1.mca (13.3 MB)
- r.-2.-1.mca (13.3 MB)
- r.-1.-1.mca (13.3 MB)
- r.-2.0.mca (12.6 MB)
- r.-4.-1.mca (12.2 MB)
- r.0.0.mca (12.1 MB)
- r.0.-6.mca (12.0 MB)
- r.-3.-1.mca (11.7 MB)
- r.-1.0.mca (11.7 MB)
- r.-3.0.mca (11.5 MB)
... 306 more


## Find a Sign

We did this [yesterday](There%27s%20Your%20Sign.ipynb)

In [6]:
%%time
for path in all_region_files:
    region_data = region.RegionFile(path)
    for chunk in region_data.iter_chunks():
        for entity in chunk["block_entities"]:
            if entity["id"].value == "minecraft:sign":
                break
        else:
            continue
        break
    else:
        continue
    break
summarize_keystore(entity)


 - `z` : `-504`
 - `x` : `1235`
 - `is_waxed` : `0`
 - `id` : `"minecraft:sign"`
 - `y` : `38`
 - `front_text` : (3 items)
 - `keepPacked` : `0`
 - `components` : (0 items)
 - `back_text` : (3 items)

CPU times: user 469 ms, sys: 4.87 ms, total: 474 ms
Wall time: 476 ms


In [7]:
entity["is_waxed"].value

0

And now let's see if we can find a _waxed_ sign.

In [8]:
%%time
for path in all_region_files:
    region_data = region.RegionFile(path)
    for chunk in region_data.iter_chunks():
        for entity in chunk["block_entities"]:
            if entity["id"].value == "minecraft:sign":
                if entity["is_waxed"].value != 0:
                    break
        else:
            continue
        break
    else:
        continue
    break
summarize_keystore(entity)
entity["is_waxed"].value


 - `z` : `-118`
 - `x` : `-904`
 - `is_waxed` : `1`
 - `id` : `"minecraft:sign"`
 - `y` : `119`
 - `front_text` : (3 items)
 - `keepPacked` : `0`
 - `components` : (0 items)
 - `back_text` : (3 items)

CPU times: user 1.68 s, sys: 11.8 ms, total: 1.69 s
Wall time: 1.7 s


1

Cool, so it's just a matter of setting it from 0 to 1.

## Sizing the Damage

I'm just curious what fraction of signs on HermitCraft were actually waxed

In [9]:
def extract_text_from_sign(entity: nbt.TAG_Compound) -> list[str]:
    """Extract all text lines from a sign

    Parameters
    ----------
    entity
        The sign block entity

    Returns
    -------
    list of str
        The lines of text on that sign
    """
    lines = []
    for side in ("front_text", "back_text"):
        for line in entity[side]["messages"]:
            lines.append(line.value)
    return lines


def count_waxed_and_unwaxed_signs(path: Path) -> Counter[tuple[int, bool]]:
    """Count and return the number of waxed and unwaxed signs in a given region

    Parameters
    ----------
    path : Path
        The path of the region file to scan

    Returns
    -------
    dict of (int, bool) tuples to int
        The counts of signs by their waxed state (0: unwaxed, 1: waxed) and whether they
        had writing on them (True: had writing, False: completely blank)
    """
    waxed_counts: Counter[tuple[int, bool]] = Counter()
    region_data = region.RegionFile(path)
    for chunk in region_data.iter_chunks():
        for entity in chunk["block_entities"]:
            if entity["id"].value not in ("minecraft:sign", "minecraft:hanging_sign"):
                continue
            not_blank = any([line != "" for line in extract_text_from_sign(entity)])
            waxed_counts[entity["is_waxed"].value, not_blank] += 1
    return waxed_counts

In [10]:
%%time
combined_waxed_counts: Counter[tuple[int, bool]] = Counter()
with ProcessPoolExecutor(max_workers=24) as executor:
    futures = []
    for region_file_path in all_region_files:
        futures.append(executor.submit(count_waxed_and_unwaxed_signs, region_file_path))
    for result in as_completed(futures):
        combined_waxed_counts += result.result()
combined_waxed_counts

CPU times: user 72.9 ms, sys: 65.5 ms, total: 138 ms
Wall time: 11.2 s


Counter({(0, False): 7887, (0, True): 5005, (1, True): 3305, (1, False): 701})

... not even a **quarter** of the **almost 17000** signs on HermitCraft were waxed. 😭

And _it's even worse_ when we consider fully blank signs (which, if they're being used for decoration, should be waxed just to prevent other Hermits from graffiti-ing).

## Welp, let's fix that

In [11]:
def wax_all_signs(path: Path) -> None:
    """Wax all signs in a given region

    Parameters
    ----------
    path : Path
        The path of the region file to scan

    Returns
    -------
    None
    """
    region_data = region.RegionFile(path)
    for chunk in region_data.iter_chunks():
        needs_writing = False
        for entity in chunk["block_entities"]:
            if entity["id"].value not in ("minecraft:sign", "minecraft:hanging_sign"):
                continue
            if entity["is_waxed"].value == 0:
                entity["is_waxed"].value = 1
                needs_writing = True
        if needs_writing:
            region_data.write_chunk(chunk.loc.x, chunk.loc.z, chunk)

In [12]:
%%time
with ProcessPoolExecutor(max_workers=24) as executor:
    futures = []
    for region_file_path in all_region_files:
        futures.append(executor.submit(wax_all_signs, region_file_path))  # type: ignore[arg-type]
    for result in as_completed(futures):
        pass

CPU times: user 60.5 ms, sys: 74.8 ms, total: 135 ms
Wall time: 11.8 s


### Verify that it worked

In [13]:
%%time
combined_waxed_counts = Counter()
with ProcessPoolExecutor(max_workers=24) as executor:
    futures = []
    for region_file_path in all_region_files:
        futures.append(executor.submit(count_waxed_and_unwaxed_signs, region_file_path))
    for result in as_completed(futures):
        combined_waxed_counts += result.result()
combined_waxed_counts

CPU times: user 78.8 ms, sys: 69.6 ms, total: 148 ms
Wall time: 11.2 s


Counter({(1, False): 8588, (1, True): 8310})

## Who's to Blame?

Now we're getting to the important question: who do we have to blame for all these unwaxed signs?

That's something we can look at from the (conveniently JSON-formatted) player stats

In [14]:
import pandas as pd

In [15]:
data: list[tuple[str, int, int, int]] = []
player_stats = list((save_folder / "stats").glob("*.json"))
for stats_file in player_stats:
    stats = json.loads(stats_file.read_text())["stats"]
    if "minecraft:used" not in stats:
        continue
    all_signs = 0
    for item, count in stats["minecraft:used"].items():
        if item.endswith("_sign"):
            all_signs += count
    wax_count = stats["minecraft:used"].get("minecraft:honeycomb", 0)
    glow_count = stats["minecraft:used"].get("minecraft:glow_ink_sac", 0)
    data.append((stats_file.stem, all_signs, wax_count, glow_count))
sign_dataframe = (
    pd.DataFrame(
        data, columns=["player_id", "signs_placed", "wax_count", "glow_inks_used"]
    )
    .sort_values("signs_placed", ascending=False)
    .reset_index(drop=True)
)
sign_dataframe

Unnamed: 0,player_id,signs_placed,wax_count,glow_inks_used
0,62fec5a3-1896-4beb-94e0-36e34898c787,2623,2466,313
1,88e2afec-6f2e-4a34-a96a-de61730bd3ca,2545,1511,803
2,87d91548-6f18-491f-a267-7833caa5d7d8,2102,4194,373
3,53bae456-dbbb-4c2f-8c79-9e8ec26c8382,2058,7,638
4,7163fbce-39ac-4a02-b836-a991c45d2dd1,1367,517,13
5,75c863ae-bb92-486d-911c-53030c552be0,1361,3039,111
6,cae9554c-31be-47e2-ba2b-4b8867adacc5,1223,593,78
7,3f28c559-0898-4be1-9f20-9fd37ca9cd22,1009,202,393
8,69b3107a-6d03-4122-b567-7652fcc3cdb2,1007,559,662
9,21ef397c-3a76-4eb7-aa17-a99d3fc658e2,999,1120,41


So all this is circumstantial. There's no actual stat for "signs placed but never waxed." We're looking at:
- total signs placed
- total objects (which may not have been signs!) waxed

People could have been waxing other people's signs, people could have been placing signs that would later be broken. That's why I did not actually translate any of these player UUIDs into player names. Because while row 3 definitely raises some eyebrows, especially when the server's top sign placer was also the top waxer, I don't think we can say anything definitive, once way or the other. And certainly no one should be looking at this data and going and harassing any Hermits.

## Please don't be a jerk about this

There's also, to be clear, no evidence I'm aware of that waxing a sign reduces its lag. The best I can find is [this assertion](https://www.reddit.com/r/technicalminecraft/comments/1gf5v5f/comment/luf2h7j/), which is made without explanation, and without explaining whether the savings is server-side or client-side.

**Please** take this notebook for what it is—a fun little data dive and a script to resolve a _personal_ pet peeve.