emptyset

A stupid module that makes ∅ an empty set literal

For the 69105th time, the idea of having an empty set literal arose on python-ideas.

Terry Reedy suggested that the way to do this would be to write a proprocessor that converted ".pyu" files into ".py" files, but then random832 pointed out that this can't be done as a source conversion, because the whole problem is that there is no Python source code that evaluates to the empty-set literal bytecode. That is, [] compiles to BUILD_LIST 0, {1} compiles to LOAD_CONST 1 then BUILD_SET 1, but nothing compiles to BUILD_SET 0. If you follow the thread from there, you'll see increasingly complicated solutions attempting to get around that problem.

But, as I discovered, while there is no source that compiles to BUILD_SET 0, there is a dead-simple AST that compiles to it. And it's very easy to hook the import loader to compile everything up to the AST, then do something, then finish compiling.

Just run emptymain.py, and it should print out this:

set() is the empty set ∅

(If it fails, or the first part isn't set(), then it hasn't transformed the ∅ in the source code into a valid empty set literal. If the lasst part isn't ∅, then it's tampered with the strings. If both are right, then everything is perfect with the world, so go take the rest of the day off.)

Implementation details

The character ∅ can legally appear in a string or bytes literal or a comment, but nowhere else in Python code. So, what we want to do is replace it everywhere but string and bytes literals (we don't care about comments) with something that's legal everywhere we want ∅ to be legal. An identifier seems like exactly what we want here, so we just need an identifier that doesn't exist in the source.

So, first we need to find an identifier that doesn't exist anywhere in the source. Then we want to replace every ∅ in the source with that identifier. That will affect string and bytes literals, but we can deal with that below.

Next, we need an AST transformer for the import hook to run. This can be just a simple NodeTransformer. Every Name node is an identifier; if it's our magic identifier, replace it with the empty set AST. Every Str node is a string; if it contains our magic identifier, change it back. Every Bytes node is a bytes; do the same as with Str, except that you'll need to store ∅ encoded in the source-file encoding, which you stored earlier in the import process.

Bugs, hackiness, other caveats

This requires Python 3.4+, because importlib didn't work the same in 3.3, and didn't exist in 2.7. Similar tricks can be done with older versions of the importer; MacroPy, Hylang, etc. have import hooks for older versions (I think 2.7 and 3.2, respectively, but don't quote me on that).

As with any import hook, the hook cannot affect your main script, only scripts that are imported after the hook is imported. That's why emptymain.py exists: to import the hook from emptymain.py, then import the actual script emptyset.py.

I'm lazy and didn't store the source-file encoding earlier in the import process, I just called importlib._bootstrap.decode_source (which is undocumented, in an implementation-details module, and almost certainly not portable to other Pythons, or even future CPythons). So, screw bytes literals. If you want them to work, decode_source is only a few lines of code, and you should be writing it yourself anyway.

Terry's idea was not an import hook that adds the empty set literal to all .py files after being installed, but some way to add the literal to .pyu files only. This is doable and easy (see the Hylang project linked above, which adds a hook for .hy files, while leaving .py files alone), but I haven't done it.

I've never used Python 3.4's importlib, and things keep changing from version to version. (I think it's finally stable, but that's not much help when I learned on a much earlier version, and I'm lazy.) And there aren't any good examples out there. Some of what I did is clearly hacky and the wrong way to do it; if someone actually wanted to use this, they'd want to read the docs and do it right. (The ast part of the code should be fine, it's the import hooking that isn't.)

Finally, the whole idea of adding a Unicode empty set literal seems like a bad idea to me, since set() is good enough and readable and familiar and already working. And, even if you needed an empty set literal, the idea that it must compile to an actual empty set literal rather than a call to set (whether for performance, or because you really need to redefine the name set but also need set literals) makes it even sillier. So, really, consider this whole thing a proof of concept, not something you should actually fix and use.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
emptify.py		emptify.py
emptymain.py		emptymain.py
emptyset.py		emptyset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

emptyset

Implementation details

Bugs, hackiness, other caveats

About

Releases

Packages

Languages

License

abarnert/emptyset

Folders and files

Latest commit

History

Repository files navigation

emptyset

Implementation details

Bugs, hackiness, other caveats

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages