GitHub - Zannick/demystify: A Magic: The Gathering parser

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
antlr3		antlr3
demystify		demystify
deps		deps
.gitignore		.gitignore
COPYING		COPYING
COPYING.LESSER		COPYING.LESSER
INSTALL		INSTALL
LICENSE		LICENSE
README		README

Repository files navigation

Demystify
A Magic: The Gathering Parser

1) ABOUT

Demystify is an attempt to make it possible for a computer to understand what
general Magic: The Gathering cards do. It is currently written in a combination
of ANTLR 3.5 and Python 3.2.

2) INSTALLATION

Demystify depends on the following libraries and/or programs, which you will
need to have installed in order to build the parser and run Demystify.
See INSTALL for more specific instructions.

- Java 1.6.0 (or later)
- ANTLR 3.5 (or later 3.5 release)
- ANTLR 3 Python3 runtime [available at github.com/antlr/antlr3]
- jpackage-utils 1.7.5 (or later)
- Python 3.2
- python-progressbar2

3) BUILDING

To build the parser generator, you need to have the macro.g and Words.g grammar
files up-to-date. If they don't exist, or demystify/keywords.py has changed,
run
    $ python3 demystify/keywords.py
to regenerate them.

Then, you need to run antlr3 on the grammar. If you've followed the instructions
in INSTALL and gotten yourself an antlr3 script, all you need to do is:
    $ cd demystify/grammar/
    $ antlr3 Demystify.g
If not, your commandline will look something like this (supposing your classpath
is properly set):
    $ cd demystify/grammar/
    $ java org.antlr.Tool Demystify.g
(Note that cd instruction; if you "antlr3 demystify/grammar/Demystify.g"
 instead, you'll get the .py output files where they belong, but the .tokens
 files are put in your working directory.)

This takes a little while.

If it worked (and it worked if all the output you received were of the form
"warning(138): ... no start rule ...", and not "error(12345)" etc.),
demystify/grammar/ should now contain a series of Demystify*.py
files. These will be used by the main demystify.py script.

If it doesn't work, and you're sure you installed antlr3 and java correctly,
check that you followed instructions in INSTALL again. If antlr3 is actually
giving grammar errors, please check the Issues tab in github for the issue
or file a new bug.

4) RUNNING

The main entry point is the demystify.py script in the demystify/ folder.
It currently offers two modes of running:

load

Loads the card data from the Scryfall data file in demystify/data/cache/
(may download it from Scryfall if necessary), and performs preprocessing steps
necessary before any lexing and parsing is done. With -i, opens an interactive
prompt afterward, at which functions in demystify.py and card.py can be called.
This is useful for actually invoking the parser, as well as doing special card
searches using the utility functions in card.py.

test

Runs the tests (or a specific test) in demystify/tests/.

These can be run from within demystify with:
    $ python3 demystify.py load -i

Add -h or --help for more information:
    $ python3 demystify.py -h
    $ python3 demystify.py test -h

5) LEXING AND PARSING

The intention of this project is to eventually generate full parse trees for
every line of text on every card. However, as this is a low priority personal
project, Demystify is far from this goal. At the moment, all that you can do is
test the lexer or parser.

First, load the environment.
    $ python3 demystify.py load -i
    Processing cards for card names...
    Innocent Blood   [##########################] 14715 of 14715 Time: 0:00:01
    >>> cards = card.get_cards()

get_cards() returns all the cards loaded. If you like, you can select a smaller
set of cards via card.get_card_set(), perform a search (note that searches
return matching text rather than matching cards), or get an individual card.
    >>> karn = card.get_card('Karn, Silver Golem')

The lexer is essentially done (modulo any new words or symbols in sets not yet
added to the project), but it can still be tested, with test_lex or lex_card.
lex_card will print the preprocessed text followed by a table of lex symbols.
(You can also get at the preprocessed text by simply printing the card object.)
    >>> lex_card(karn)
    whenever SELF blocks or becomes blocked, it gets -4/+4 until end of turn.
    {1}: target non-creature artifact becomes an artifact creature with...
     1    0   0 whenever  WHEN
     1    9   2 SELF      SELF
     1   14   4 blocks    BLOCK
     1   21   6 or        OR
     1   24   8 becomes   BECOME
     1   32  10 blocked   BLOCKED
     1   39  11 ,         COMMA
    ...
     1   72  30 .         PERIOD
     2    0  32 1         MANA_SYM
     2    3  33 :         COLON
    ...

Note that test_lex is very slow, but very effective at finding new vocabulary.

The parse_all function is meant to eventually parse rules text, but at the
moment it parses only mana cost and type lines. It takes a list of cards
to parse, and adds the parse tree to each card individually.
    >>> parse_all(cards)
    Innocent Blood   [##########################] 14715 of 14715 Time: 0:00:25
    0 total errors.
    >>> karn.parsed_cost
    <antlr3.tree.CommonTree instance (COST (MANA 5))>
    >>> karn.parsed_typeline
    <antlr3.tree.CommonTree instance (TYPELINE (SUPERTYPES legendary)
     (TYPES artifact creature) (SUBTYPES golem))>

The other parse functions use a regex to narrow down the text they attempt
to parse. You may still pass all the cards.
    >>> parse_triggers(cards)
    Whippoorwill     [############################] 3702 of 3702 Time: 0:00:02
    181 total errors.
    68 unique cases missing.
    >>> parse_keyword_lines(cards)
    Wasp Lancer      [############################] 5085 of 5085 Time: 0:00:02
    1 total errors.
    1 unique cases missing.
    >>> parse_ability_costs(cards)
    Helm of Obedienc [############################] 4369 of 4369 Time: 0:00:02
    2 total errors.
    2 unique cases missing.

Each parse function reports how many times it couldn't parse a line of text,
and (attempts to) group them together into cases. These are all logged to the
LOG file, which is useful for development.
The parse results themselves are again attached to the cards, but as lists
of strings rather than antlr tree objects.
    >>> karn.parsed_costs
    ['(COST (MANA 1))']
    >>> karn.parsed_triggers
    ['(TRIGGER (EVENT (SUBSET SELF) (OR (BECOME BLOCKING) (BECOME blocked))))']
    >>> akroma = card.get_card('Akroma, Angel of Wrath')
    >>> akroma.parsed_keywords
    ['(KEYWORDS flying FIRST_STRIKE vigilance trample haste
     (protection (PROPERTIES black) (PROPERTIES red)))']

You can test individual rules with arbitrary text by calling test_parse, or by
creating a parsing unittest in test/.

6) MISCELLANEOUS

Dependency visualization:

The deps/deps.py script reads the grammar files and outputs to stdout a
dot format graph file, which describes how parser rules in the grammar call
other parser rules, as well as how the grammar files reference other grammar
files. You'll want to use graphviz's dot program to visualize this, e.g. with:
    $ python3 deps/deps.py | dot -Tpng -o gdeps.png

graphviz can be installed via your package management tool, or downloaded from
http://www.graphviz.org.

7) CONTRIBUTIONS

Are welcome! Please use github forks and pull requests. Bugs without fixes can
be reported as well.

http://github.com/Zannick/demystify/

8) LICENSE AND DISCLAIMER

The files in this repository fall into four categories:
    A) antlr3/antlr3
    B) Python code in demystify/ and ANTLRv3 code in demystify/grammar/
    C) deps/deps.py
    D) Card data in demystify/data/ and demystify/tests/

(A) is based on the antlr3 script that was installed on one of my machines
when I installed ANTLRv3 via yum. It is therefore (to the best of my
knowledge) covered until ANTLRv3's license (BSD),
which is included as antlr3/LICENSE.

(B) is the Demystify program itself, and the core of this repository. It
is licensed under the Lesser GNU Public License version 3. See COPYING
for the GPL v3 and COPYING.LESSER for the LGPL v3.

(C) is also licensed under the LGPL v3 but it is not a part of Demystify
(though it runs by reading parts of Demystify) which is why I've set it apart.

(D) consists of Magic: The Gathering card data, in the form of full card
records, slightly modified to fix errors (data/), or in the form of test cases,
which may include actual card rules or possible card rules. All card
information is copyrighted by Wizards of the Coast.

Demystify is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of 
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

Demystify is not produced, endorsed, supported,
or affiliated with Wizards of the Coast.

9) THANKS

Thanks go to Terence Parr and the ANTLR project (without which this project
would be much more difficult), and to Wizards of the Coast (for making
Magic: The Gathering, without which this project would not exist).