Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts for XML -> Python object -> YAML conversion (re #129) #133

Merged
merged 2 commits into from
Feb 12, 2019
Merged

Scripts for XML -> Python object -> YAML conversion (re #129) #133

merged 2 commits into from
Feb 12, 2019

Conversation

mbollmann
Copy link
Member

This PR adds a script that converts the authoritative XML to YAML format, in preparation for using these files as a data source in the static rewrite of the site.

  • anthology.py contains classes that read in the XML and represent it as simple Python objects. They might potentially be useful for further checks or conversions as well.

  • xml_to_yaml.py outputs one YAML file per XML file with the paper infos, and additionally compiles an author index in a separate YAML file. This is done to facilitate the generation of the static pages, and might be extended with other auxiliary files in the future. Conflation of name variants could, in principle, also be done at this point, depending on how exactly we decide to handle them ( Authors being stored under multiple spellings #86 ).

I will continue to add to and/or modify these scripts during the static rewrite, according to the needs of the page generation.

@mjpost
Copy link
Member

mjpost commented Feb 8, 2019

Wow, this is really cool!

Are you familiar with the argparse module? It's builtin to Python and is quite nice. Just wondering—maybe you have a preference or reason for docopt.

I think you can just merge it unless you wanted comments.

@mbollmann
Copy link
Member Author

I've used argparse for a long time, but have recently come to prefer docopt for all but very complex CLI interfaces. The main reasons are that it's easier to write, self-documenting, and an implementation exists for most programming languages, so you can apply it everywhere. It's not a strong preference though, argparse is still pretty good. :)

@mbollmann mbollmann merged commit 0a89ebb into acl-org:master Feb 12, 2019
@mbollmann mbollmann deleted the xml-to-yaml branch February 12, 2019 15:04
najtin pushed a commit to ir-anthology/ir-anthology that referenced this pull request Jun 9, 2021
Scripts for XML -> Python object -> YAML conversion (re acl-org#129)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants