This project is deprecated. It is now merged with pytex.
Please, download pytex
LaTeXParser
is a small parser of LaTeX written in Python. It allows to know, in Python, where such and such macro is used and replace the occurrence of a macro by an user-defined string.
The aim is to help writing pre-(LaTeX)compilation scripts in Python for complex documents. Examples are :
- There is in fact no code duplication in the sources of the preprint BTZ black hole from the structure of the algebra so(2,n)
- Extracting a managing the source code of Le Frido from the ones of mazhe
The XML file in which are recorded the sha1sum of the followed files is of the form
<?xml version="1.0" ?>
<Followed_files>
<fichier name="ess.py" sha1sum="a329313819092a183ca8b08bb7c178807a1a68b7"/>
<fichier name="ess.aux" sha1sum="be730c54ff1d1a75398a496283efe45c675dc54f"/>
</Followed_files>
The principal XML object is got by root = minidom.parse()
Then the «list of lists» of elements "Followed_files" is got by fileNodes = root.getElementsByTagName("Followed_files")
In the example above, there is only one. At this point fileNodes is a list whole element 0 represents the lines
Each element in these lines has the tag "fichier". Then the list is given by fileNode.getElementsByTagName("fichier")
The first element of that list represents the line
If F = fileNode.getElementsByTagName("fichier")[0], then we get the name by F.getAttribute("sha1sum")
See the "DOM example" in "Python Library Reference Release 2.3.5".
The file containing the pieces of LaTeX code have the structure +++++++++++++++++++++++++++++++++++++++++++
Bonjour Au revoir +++++++++++++++++++++++++++++++++++++++++++We extract the interesting informations in the following way :
dom = minidom.parse("ess.xml") for box in dom.getElementsByTagName("CodeBox"): print box.getAttribute("label") text = getText(box.childNodes) print "\n".join(text.split("\n")[1:-1]) # Because minidom adds an empty line at first and last position.
See also tests.py and magical_box.tex