# Generate LaTeX code from a Dataframe

Manually copying each entry from the dataframe to Overleaf was not very appealing as an afternoon activity. So, I decided to waste a few days trying to generate the table code from the Dataframe.

At the end of the notebook is an example section with mock data


In [3]:
import pandas as pd
from pathlib import Path
import re

from utils import print_pretty_df

# Quick ANSI color code shortcuts
r = "\033[31m"
y = "\033[33m"
g = "\033[32m"
b = "\033[34m"
e = "\033[0m"

pickleName = "all_datapoints.pkl"
datapointsDfPath = Path("..") / ".." / "data" / "Review_ML-RS-FPGA" / "Dataframes" / pickleName
datapointsDf = pd.read_pickle(datapointsDfPath)

## Creation of the main Figure: a big table

Here is how I think I will proceed. I will have to rebuild a dataset using Multiindex columns and rows for the first 2 rows (Model: Name & Backbone) and the first 3 columns: Task, Modality and Dataset.
Then I use `pandas.io.formats.style.Styler.to_latex` to get the LateX code in the output cell, copy-paste that in my Overleaf and perform the last changes manually, i.e., use LaTeX `\multicolumn{<no of columns>}{<column alignment>}{<content>}` and `multirow{}{}{}` for the look.

In general, go have a deep look at https://www.overleaf.com/learn/latex/Tables for a hands-on @TODO.

**Manual modifications:**

- Replace all '%' by '\%' (using search and replace in VSCode)
- Change columns specification in `\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|}`
- Rotate the ML Task multirow with `\parbox[t]{2mm}{\multirow{<NUMBER_OF_CELLS>}{*}{\rotatebox[origin=c]{90}{<YOUR_TEXT>}}}`


### ML/RS Table generation

@TODO:

**In the code**

- Remove the units in the cells (of teh columns that have a fixed unit)
- Try some rough sort already
- Add a third column for the model which combines both name and backbone, hide the 2 others
- Rename the generated .tex file as ML/RS table

  **Manually**

- Fine-tune the sort manually
- Fine tune the model naming
- Add the \cmidrule{}


In [31]:
# @TODO, rename the columns and get it printed close to a nice format, i.e., that also fit son the Overleaf page and has colors and lines
# I "hide" the columns by not specifying them in the new order

# ----- Select, order and rename the columns -----
reorderedDF = datapointsDf.loc[:, [
    "Task",
    "Modality",
    "Dataset",
    "BBT Citation Key",
    "Publication year",
    "Model",
    "Backbone",
    "Board",
    "Task score",
    "Footprint",
    "Latency",
    "Throughput",
    "Power consumption",
]]
reorderedDF.rename(
    columns={
        "BBT Citation Key": "Article",
        "Publication year": "Year",
        "Task score": "Score",
        "Power consumption": "Power",
    },
    inplace=True,
)

# ----- Content modifications -----
# Add `\cite{}` around the citation keys
reorderedDF["Article"] = reorderedDF["Article"].apply(lambda x: f"\\cite{{{x}}}")
# Rename the board content
def determine_board(boardTag: str) -> str:
    family = boardTag.split("(")[0].strip()
    model = boardTag.split("(")[1].split(")")[0].strip()
    if family == "Zynq US+" or family == "Zynq 7000":
        return model
    elif family == "Kintex US":
        return model[2:] # Kintex models starts with "XC" for "Xilinx Commercial"
    elif family == "Alveo":
        return f"{family} {model}"
    else:
        return family
reorderedDF["Board"] = reorderedDF["Board"].apply(determine_board)
# Remove the task from the dataset tag
def remove_task_from_dataset_tag(datasetTag: str) -> str:
    return datasetTag.split("{")[0].strip()
reorderedDF["Dataset"] = reorderedDF["Dataset"].apply(remove_task_from_dataset_tag)

# ----- Sort -----
sortedDF = reorderedDF.sort_values(by=["Task", "Modality", "Dataset"])
# ----- Multi-index -----
rowIndexedDF = sortedDF.set_index(["Task", "Modality", "Dataset"])
multiIndexColumns = pd.MultiIndex.from_tuples(
    [
        ("Article", "Article"),
        ("Year", "Year"),
        ("Model", "Name"),
        ("Model", "Backbone"),
        ("Board", "Board"),
        ("Metrics", "Score"),
        ("Metrics", "Footprint"),
        ("Metrics", "Latency"),
        ("Metrics", "Throughput"),
        ("Metrics", "Power"),
    ]
)
rowIndexedDF.columns = multiIndexColumns
print_pretty_df(rowIndexedDF, max_rows=5)

# ----- Generate LaTeX -----
latexCode = rowIndexedDF.style.to_latex(
    column_format="c c c c c c H H c M{1.7cm} M{1.05cm} M{1.3cm} M{1.3cm} M{0.8cm}",
    hrules=True,
    environment="longtable",
    label="table:main_table",
    caption="Taxonomy of the survey's records."
)
# ----- Post-processing -----
# Escape the '%' character
latexCode = re.sub(r'%', r'\%', latexCode)
# Replace the weirdly generated header
newTableHeader = r"""
\scriptsize
\newcolumntype{M}[1]{>{\centering\arraybackslash}m{#1}}
\newcolumntype{H}{>{\setbox0=\hbox\bgroup}c<{\egroup}@{}}

\begin{longtable}{ c c c c c c c c M{1.7cm} M{1.05cm} M{1.3cm} M{1.3cm} M{0.8cm} }
 \caption[An optional table caption ...]{Taxonomy of the survey's records. \label{table:main_table}} \\
 \toprule
 \multirow[c]{2}{*}{Task} & \multirow[c]{2}{*}{Mode\footnote{Modality}} & \multirow[c]{2}{*}{Dataset (\# classes)} & \multicolumn{2}{c}{Article} & \multicolumn{2}{c}{Model} & \multirow[c]{2}{*}{Board} & \multicolumn{5}{c}{Metrics} \\
 \cmidrule(rl){4-5}\cmidrule(rl){6-7}\cmidrule(rl){9-13}
 &  &  & Ref & Year & Model & Backbone & & ~~~~Score\newline[\textbf{\%}] & Footprint [\textbf{MB}] & Processing Speed & Throughput [\textbf{GOP/s}] & Power [\textbf{W}] \\
 \midrule
 \endhead
 """
parts = latexCode.split(r"\endlastfoot", 1)
if len(parts) == 2:
    latexCode = newTableHeader + parts[1]

# Rotate the first column multirows
latexCode = re.sub(
    r'\n\\multirow\[c\]\{(\d+)\}\{\*\}\{([^}]+)\}',
    r'\n\\parbox[t]{2mm}{\\multirow{\1}{*}{\\rotatebox[origin=c]{90}{\2}}}',
    latexCode
)
# # I gave up trying to insert the \cmidrule automatically
# # 2) Insert \cmidrule{2-13} before multirows in the second column
# latexCode = re.sub(
#     r'(?m)^( *& *\\multirow\[c\]\{\d+\}\{\*\}\{[^}]+\})',
#     r'\\cmidrule{2-13}\n\1',
#     latexCode
# )

# ----- Save to file -----
latexPath = Path("latex/latex_table.tex")
latexPath.parent.mkdir(parents=True, exist_ok=True)
with open(latexPath, "w") as f:
    f.write(latexCode)
print(f"{g}LaTeX table saved to {latexPath}{e}")

+-------------------------------------------------+------------------------------------------------------------+------------------+-------------------+-----------------------+--------------------+----------------------+--------------------------+------------------------+---------------------------+----------------------+
|                                                 |                   ('Article', 'Article')                   | ('Year', 'Year') | ('Model', 'Name') | ('Model', 'Backbone') | ('Board', 'Board') | ('Metrics', 'Score') | ('Metrics', 'Footprint') | ('Metrics', 'Latency') | ('Metrics', 'Throughput') | ('Metrics', 'Power') |
+-------------------------------------------------+------------------------------------------------------------+------------------+-------------------+-----------------------+--------------------+----------------------+--------------------------+------------------------+---------------------------+----------------------+
|  ('Classification', '1D', '1D

### Using Regex to fine-tune the table


In [None]:
# Read the whole Latex table from a file at ../results/test_table.tex and store it as a string
# Then, for each article in the dictionary, find the article and add " & <year>" afterwards, essentially adding a new column to the table
# Finally, write the new table to a file at ../results/test_table_with_year.tex
with open("../results/test_table.tex", "r") as file:
    latexTable = file.read()
    for article, year in articleYearDict.items():
        latexTable = re.sub(article[1:], rf"{article[1:]} & {year}", latexTable)
    
    with open("../results/test_table_with_year.tex", "w") as newFile:
        newFile.write(latexTable)


## Example: Mock small data


In [8]:
# ----- Setup example -----
exampleData = {
    "Task": ["Task1", "Task1", "Task1"],
    "Modality": ["Modality1", "Modality2", "Modality2"],
    "Dataset": ["Dataset1", "Dataset2", "Dataset3"],
    "Model Name": ["Model1", "Model2", "Model3"],
    "Backbone": ["Backbone1", "Backbone2", "Backbone3"],
    "Metric1": [0.9, 0.8, 0.85],
    "Metric2": [0.7, 0.75, 0.8],
}
exampleDf = pd.DataFrame(exampleData)
print_pretty_df(exampleDf)
print()

# ----- Multi-index -----
multiIndexDF = exampleDf.set_index(["Task", "Modality", "Dataset"])
columns = pd.MultiIndex.from_tuples(
    [
        ("Model", "Name"),
        ("Model", "Backbone"),
        ("Metrics", "Metric1"),
        ("Metrics", "Metric2"),
    ]
)
multiIndexDF.columns = columns
print(multiIndexDF) # Can't use print_pretty_df() here, because tabulate hides the multi-index cells
print()

# ----- Generate LaTeX -----
latexCodeDF = multiIndexDF.style.to_latex()
print(f"{b}Generated LaTeX code for the dataframe{e}")
print(latexCodeDF)

+---+-------+-----------+----------+------------+-----------+---------+---------+
|   | Task  | Modality  | Dataset  | Model Name | Backbone  | Metric1 | Metric2 |
+---+-------+-----------+----------+------------+-----------+---------+---------+
| 0 | Task1 | Modality1 | Dataset1 |   Model1   | Backbone1 |   0.9   |   0.7   |
| 1 | Task1 | Modality2 | Dataset2 |   Model2   | Backbone2 |   0.8   |  0.75   |
| 2 | Task1 | Modality2 | Dataset3 |   Model3   | Backbone3 |  0.85   |   0.8   |
+---+-------+-----------+----------+------------+-----------+---------+---------+

                           Model            Metrics        
                            Name   Backbone Metric1 Metric2
Task  Modality  Dataset                                    
Task1 Modality1 Dataset1  Model1  Backbone1    0.90    0.70
      Modality2 Dataset2  Model2  Backbone2    0.80    0.75
                Dataset3  Model3  Backbone3    0.85    0.80

[34mGenerated LaTeX code for the dataframe[0m
\begin{tabular}{