In [2]:
import subprocess

with open("report.tex", "w") as f:
    f.write(r"""

\documentclass{article}


% if you need to pass options to natbib, use, e.g.:
%     \PassOptionsToPackage{numbers, compress}{natbib}
% before loading neurips_2023

% ready for submission
\usepackage[final]{neurips_2023}

% to avoid loading the natbib package, add option nonatbib:
%    \usepackage[nonatbib]{neurips_2023}


\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{xcolor}         % colors


\title{Data Exploration Report 2025}


% The \author macro works with any number of authors. There are two commands
% used to separate the names and addresses of multiple authors: \And and \AND.
%
% Using \And between authors leaves it to LaTeX to determine where to break the
% lines. Using \AND forces a line break at that point. So, if LaTeX puts 3 of 4
% authors names on the first line, and the last on the second line, try using
% \AND instead of \And before the third author name.


\author{% 
  Team TOOTHPASTE \AND
  Aral Cimcim (k11720457)
  \And
  Author Name 2 
  \And 
  Author Name 3 
  \And 
  Author Name 4
}


\begin{document}


\maketitle


\begin{contributions}
  Aral Cimcim: Case Study \& Annotation Quality\\ 
  
  % \textcolor{gray}{E.g.: Tara Jadidi did the entire experimental setup, including the data-split and selection of features and pre-processing. Florian Schmid trained four different classifiers for the task, and Paul Primus was responsible for evaluating the classifiers, generating figures and writing the report.}
\end{contributions}


\section{Case Study}

We have selected the following files:

\textbf{102744.mp3:}

This is a recording of several sentences that span around military content. The first annotator (ID starting with 114) has described it as "Military person speaking clearly and distinctly" while another annotator (ID starting with 946) has described it as "Calm mature male voice telling coordinates and military news", "Rough, a bit aggressive male voice repeating the same phrase 3 times". 

Annotator 114 did not separate the audio file into parts to consider the temporal characteristics fully while annotator 946 has divided the annotation into parts to emphasize repetitions in the recording. In terms of the textual annotations, annotator 946 has provided more detail in comparison to annotator 114 who seemed to have generalized the audio content. Both annotators have matching descriptions (946 has provided slightly more detail) when the metadata is taken into consideration; character, coordinates, latitude, longitude, mature, military, rough, screaming, yelling.

Overall, both annotators have followed the task description with varying granularity.

\vspace{3mm}

\textbf{110921.mp3:}

This is a recording of farmers collecting sheep from a field in Argentina by whistling and shouting. The first annotator (ID starting with 435 has described it as "dog barking", "farm animal sounds in the background", "a man whistles enthusiastically" while another annotator (ID starting with 611) has described it as "Persons shouting in a far outdoors", "Noise of sheep on a field outdoors". Both annotators have provided detailed information as per task description. The texts do match the metadata; argentina, del, forest, fuego, patagonia, rodeo, sheep, shout, tierra, whistle, whistling, blume, field-recording, felix. Both annotators have identified the main occurences in the recording such as "shouting", "field", "whistling" in a similar manner. 

The annotations do not deviate from the metadata content-wise.

\section{Annotation Quality}
\label{sec:headings}

For both files temporal annotations are relatively precise, the gaps are taken into consideration and separate events have been noted in accordance to the task description. It is worth mentioning that there is still a level of ambiguity in both files, which suggests subtle differences in the sound perception of human annotators.

The text annotations that correspond to the
same region are described similarly for the most part (e.g. annotator with ID 114 has not provided any gender for the person speaking in the recording, additionally the annotator did not reflect on the change of the speakers voice from calm to aggressive).

There is not a single number for the number of annotations per file in the whole dataset. Here we have listed the occurences of the top 10 annotations in \autoref{tab:annot}.

\begin{table}
  \caption{Number of annotations per audio file (top 10)}
  \label{tab:annot}
  \centering
  \begin{tabular}{lr}
    \toprule
    filename & num. of annotations \\
    \midrule
    623187.mp3 & 96 \\
    94017.mp3  & 73 \\
    591203.mp3 & 65 \\
    518570.mp3 & 63 \\
    620967.mp3 & 42 \\
    406538.mp3 & 40 \\
    777608.mp3 & 40 \\
    352225.mp3 & 39 \\
    406166.mp3 & 38 \\
    272516.mp3 & 38 \\
    \bottomrule
  \end{tabular}
\end{table}

The shortest annotation consists of 2 characters while the longest contains 507 characters. Average annotation length (as characters) is 45 and (as words) is 7. There are 268243 words in total and 10476 unique words resulting in a vocabulary diversity ratio of 0.004.
There are 2684 misspelled words and a typo frequency rate of 0.01 as the typo count over the whole vocabulary. (We have not used additional tables for the annotation quality task as they would extend the number of pages)

There are a few poor quality annotations in the data set such as "man laughs", "dog barking", "A foot step", "click". A method to solve this issue could be to set a minimum and maximum length for the annotations and filter them from the data set.

\subsection{Headings: second level}

This is a second level heading. 

\subsubsection{Headings: third level}

And this is a third level heading. Make sure to structure your report s.t. no deeper levels are necessary. 

\section{Footnotes, Figures and Tables}

\subsection{Footnotes}
Footnotes should be used sparingly. Note that footnotes are properly typeset \emph{after} punctuation marks.\footnote{As in this example.}


\subsection{Figures}

% \begin{figure}
%   \centering
%   \fbox{\rule[-.5cm]{0cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
%   \caption{Sample figure caption.}
%   \label{fig:example}
% \end{figure}

All artwork must be neat, clean, and legible. Lines should be dark enough for
purposes of reproduction. You may use color figures. Please refer to all your figures in text, by using e.g., Figure~\ref{fig:example}. 

\subsection{Tables}
All tables must be centered, neat, clean and legible. Please refer to all your tables in text, by using e.g., Table~\ref{tab:example}.

Note that publication-quality tables \emph{do not contain vertical rules.} We
strongly suggest the use of the \verb+booktabs+ package.\footnote{\url{https://www.ctan.org/pkg/booktabs}}


% \begin{table}
%   \caption{Sample table title}
%   \label{tab:example}
%   \centering
%   \begin{tabular}{lll}
%     \toprule
%     \multicolumn{2}{c}{Part}                   \\
%     \cmidrule(r){1-2}
%     Name     & Description     & Size ($\mu$m) \\
%     \midrule
%     Dendrite & Input terminal  & $\sim$100     \\
%     Axon     & Output terminal & $\sim$10      \\
%     Soma     & Cell body       & up to $10^6$  \\
%     \bottomrule
%   \end{tabular}
% \end{table}


\section{Final instructions}

Do not change any aspects of the formatting parameters in the style files. In
particular, do not modify the width or length of the rectangle the text should
fit into, and do not change font sizes (this will result in a deduction of points). 
Please note that pages should be numbered, and adhere to the given \emph{page limit} to avoid further point deductions. Your final submission should be a \texttt{pdf} file.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\end{document}

""")

subprocess.run(["pdflatex", "report.tex"])

This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./report.tex
LaTeX2e <2021-11-15> patch level 1
L3 programming layer <2022-01-21>
(/usr/share/texlive/texmf-dist/tex/latex/base/article.cls
Document Class: article 2021/10/04 v1.4n Standard LaTeX document class
(/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo)) (./neurips_2023.sty
(/usr/share/texlive/texmf-dist/tex/latex/environ/environ.sty
(/usr/share/texlive/texmf-dist/tex/latex/trimspaces/trimspaces.sty))
(/usr/share/texlive/texmf-dist/tex/latex/natbib/natbib.sty)
(/usr/share/texlive/texmf-dist/tex/latex/geometry/geometry.sty
(/usr/share/texlive/texmf-dist/tex/latex/graphics/keyval.sty)
(/usr/share/texlive/texmf-dist/tex/generic/iftex/ifvtex.sty
(/usr/share/texlive/texmf-dist/tex/generic/iftex/iftex.sty))))
(/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty)
(/usr/share/texlive/texmf-dist/tex/latex/base/fontenc.s

CompletedProcess(args=['pdflatex', 'report.tex'], returncode=0)