Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

LaTeX2Python (tex2py)

Build Status Coverage Status

Tex2py converts LaTeX into a Python parse tree, using TexSoup. This allows you to navigate latex files as trees, using either the default or a custom hierarchy. See md2py for a markdown parse tree.

Note tex2py currently only supports Python3.

created by Alvin Wan

Installation

Install via pip.

pip install tex2py

Usage

LaTeX2Python offers only one function tex2py, which generates a Python parse tree from Latex. This object is a navigable, "Tree of Contents" abstraction for the latex file.

Take, for example, the following latex file. (See pdf)

chikin.tex

\documentclass[a4paper]{article}
\begin{document}

\section{Chikin Tales}

\subsection{Chikin Fly}

Chickens don't fly. They do only the following:

\begin{itemize}
\item waddle
\item plop
\end{itemize}

\section{Chikin Scream}

\subsection{Plopping}

Plopping involves three steps:

\begin{enumerate}
\item squawk
\item plop
\item repeat, unless ordered to squat
\end{enumerate}

\subsection{I Scream}

\end{document}

Akin to a navigation bar, the TreeOfContents object allows you to expand a latex file one level at a time. Running tex2py on the above latex file will generate a tree, abstracting the below structure.

          <Document>
          /        \
  Chikin Tales   Chikin Scream
      /            /     \
 Chikin Fly  Plopping   I Scream

At the global level, we can access the title.

>>> from tex2py import tex2py
>>> with open('chikin.tex') as f: data = f.read()
>>> toc = tex2py(data)
>>> toc.section
Chikin Tales
>>> str(toc.section)
'Chikin Tales'

Notice that at this level, there are no subsections.

>>> list(toc.subsections)
[]

The main section has two subsections beneath it. We can access both.

>>> list(toc.section.subsections)
[Chikin Fly, Chikin Scream]
>>> toc.section.subsection
Chikin Fly

The TreeOfContents class also has a few more conveniences defined. Among them is support for indexing. To access the ith child of an <element> - instead of <element>.branches[i] - use <element>[i].

See below for example usage.

>>> toc.section.branches[0] == toc.section[0] == toc.section.subsection
True
>>> list(toc.section.subsections)[1] == toc.section[1]
True
>>> toc.section[1]
Chikin Scream

You can now print the document tree. (There is some weirdness with branches beyond titles, so for only titles, we have the following:

           ┌Chikin Tales┐
           │            └Chikin Fly
 [document]┤
           │             ┌Plopping
           └Chikin Scream┤
                         │        
                         │        
                         └I Scream

Additional Notes

  • Behind the scenes, tex2py uses TexSoup. All tex2py objects have a source attribute containing a TexSoup object.

About

converts LaTeX into a Python parse tree, allowing navigation using the default or a custom hierarchy

Topics

Resources

License

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •