parses valid LaTeX and provides a variety of BeautifulSoup-esque methods and Pythonic idioms for iterating and searching the parse tree
Clone or download
SimonMaenaut and alvinwan Fixes and Features: arguments and everything (#44)
* Fixed issues in class TokenWithPosition
Moved buffer tests with lambda functions

* Added new features arguments and everything

* Added new tests for the new features

* Added examples of new features

* Fixed all warnings
Latest commit 4c4dbe3 Aug 27, 2018

README.md

TexSoup

Build Status Coverage Status

Parses valid LaTeX and provides a variety of BeautifulSoup-esque methods and Pythonic idioms for iterating and searching the parse tree. Unlike BeautifulSoup however, TexSoup is modeled after an interpreter, providing a set of Pythonic structures for processing environments, commands, and arguments.

Note TexSoup currently only supports Python3.

created by Alvin Wan

Installation

Just install via pip.

pip install texsoup

Soup

There is one main utility, TexSoup, which translates any LaTeX string or iterator into a soupified object.

Basic Usage

You have two options. Either give (1) a file buffer (open('file.tex')) or (2) a string.

from TexSoup import TexSoup
soup = TexSoup("""
\begin{document}

\section{Hello \textit{world}.}

\subsection{Watermelon}

(n.) A sacred fruit. Also known as:

\begin{itemize}
\item red lemon
\item life
\end{itemize}

Here is the prevalence of each synonym.

\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}

\end{document}
""")

With the soupified LaTeX, you can now search and traverse the document tree. The below is a demonstration of basic functions that TexSoup provides.

>>> soup.section  # grabs the first `section`
\section{Hello \textit{world}.}
>>> soup.section.name
'section'
>>> soup.section.string
'Hello \\textit{world}.'
>>> soup.section.parent.name
'document'
>>> soup.tabular
\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}
>>> soup.tabular.args[0]
'c c'
>>> soup.item
\item red lemon
>>> list(soup.find_all('item'))
[\item red lemon, \item life]

Search

For (slightly) more advanced searches, include arguments. For example, to search for all references to a particular label, search for ref{<label>}. This way you can count the number of times a particular label is referenced.

>>> soup = TexSoup("""
... \section{Heading}\label{Section:Heading}
...
... Some text about the \ref{Section:Heading} heading goes here. Yet another
... sentence about the \ref{Section:Heading} heading.
... """)
>>> soup.count('\ref{Section:Heading}')
2

Modification

Additionally, modify the TeX parse tree in place, to generate new LaTeX.

>>> soup = TexSoup("""\textbf{'Hello'}\textit{'Y'}O\textit{'U'}""")
>>> soup.textbf.delete()
>>> 'Hello' not in repr(soup)
True
>>> soup.textit.replace('S')
>>> soup.textit.replace('U', 'P')
>>> soup
SOUP

Parser

There is one main utility, read, which translates any LaTeX string or iterator into a Python abstraction.

Basic Usage

>>> from TexSoup import read
>>> expr = read('\section{textbf}')
>>> expr
TexCmd('section', [RArg('textbf')])
>>> print(expr)
\section{textbf}

TexSoup in the Wild

TexSoup has a variety of practical applications, whether it be minor conveniences or more powerful LaTeX extensions. The below exhibits a few of these use cases, from simple reference counts to integration with computer algebra systems (coming soon).

Examples

See the examples/ folder for example scripts and usages for TexSoup.

Uses

See slightly more complex uses for TexSoup.

  • LaTex2Python converts LaTeX into a document tree, organizing content by either a default or custom hierarchy.
  • Tex2Ipy by Prabhu Ramachandran, converts LaTeX beamer files to Jupyter notebooks