A LaTeX pre-compilation and maintenance script
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.




A wrapper around pdflatex which gives some flexibility for compiling large files.

What problem do we solve ?

Let say you have a (very) large LaTeX document that is divided into dozen or hundreds of .tex files (like this one for example -- 3500 pages). The usual mechanism that allows to compile only a part of the document is \includeonly.

The problems are :

  • \includeonly implies that you input your .tex files with \include, which basically restricts you to only one file by chapter.
  • Each time you want to compile an other part of the document, you have to change the \includeonly line.

How do we solve ?

pytex is a python script that generates on the fly an intermediate .tex file and then launches pdflatex on it.

Suppose to have the LaTeX document main.tex as



\chapter{First chapter}

\chapter{Second chapter}

\chapter{Third chapter}

If you want to compile the files premable.tex, foo1 and foo2 you creates the following file named lst_example.py :

#! /usr/bin/python
# -*- coding: utf8 -*-

from __future__ import unicode_literals

import LaTeXparser
import LaTeXparser.PytexTools

myRequest = LaTeXparser.PytexTools.Request()




and you compile with

pytex lst_example.py

This will generates MySmallPart.pdf containing the result of the compilation with only premable, foo1 and foo2 (and also some intermediate files).

Installation and dependencies

  • Download LaTeXparser and save it somewhere Python will be able to find.

  • Download pytex and save it somewhere bash will be able to find.

Other functionalities

  • pytex generates on the fly an intermediate .tex file that contains the requested \input lines. You can perform arbitrary string manipulations in Python on that file before the compilation. Some are predefined.

  • pytex will compile as much times as necessary for all the cross-references to be done.

  • pytex reads the .aux file and presents the missing and multiple labels in a convenient way.

  • The option --verif checks if the document contains \ref or \eqref for which the corresponding \label lies later in the document (in a text math, one should refer to theorems that will be proven later). You can define exceptions : sentences that you allow to refer to "future" label.

  • pytex inherits from the functionalities of LaTeXparser.


  • The documents le frido and mazhe are created from the same main LaTeX file. pytex performs quite a lot of "pre-compilation" work on the fly. Notice by example the fact that the first is not divided in parts while the second is.

  • The paper BTZ black hole from the structure of so(2,n) is divided in two parts : one "short version" and one "long version" that share a lot of text. Believe it or not : there is no code duplucation on my computer. I wrote only once each statements and pytex made the work or recomposing the tex file. There are of course a lot of code duplication in the tex file I uploaded, which was automatically generated.




LaTeXParser is a small parser of LaTeX written in Python. It allows to know, in Python, where such and such macro is used and replace the occurrence of a macro by an user-defined string.

The aim is to help writing pre-(LaTeX)compilation scripts in Python for complex documents. Examples are :


The XML file in which are recorded the sha1sum of the followed files is of the form

<?xml version="1.0" ?>
	<fichier name="ess.py" sha1sum="a329313819092a183ca8b08bb7c178807a1a68b7"/>
	<fichier name="ess.aux" sha1sum="be730c54ff1d1a75398a496283efe45c675dc54f"/>

The principal XML object is got by root = minidom.parse()

Then the «list of lists» of elements "Followed_files" is got by fileNodes = root.getElementsByTagName("Followed_files")

In the example above, there is only one. At this point fileNodes is a list whole element 0 represents the lines

Each element in these lines has the tag "fichier". Then the list is given by fileNode.getElementsByTagName("fichier")

The first element of that list represents the line

If F = fileNode.getElementsByTagName("fichier")[0], then we get the name by F.getAttribute("sha1sum")


See the "DOM example" in "Python Library Reference Release 2.3.5".

The file containing the pieces of LaTeX code have the structure +++++++++++++++++++++++++++++++++++++++++++

Bonjour Au revoir +++++++++++++++++++++++++++++++++++++++++++

We extract the interesting informations in the following way :

dom = minidom.parse("ess.xml") for box in dom.getElementsByTagName("CodeBox"): print box.getAttribute("label") text = getText(box.childNodes) print "\n".join(text.split("\n")[1:-1]) # Because minidom adds an empty line at first and last position.

See also tests.py and magical_box.tex