# When Harry Met Sally Sam Sammy Samuel Sandie Sandra Sandy Sara Sarah Sascha...

By [Allison Parrish](https://www.decontextualize.com/)

Quick project to get the juices flowing for [NaNoGenMo 2019](https://nanogenmo.github.io/). Using [this notebook](https://github.com/aparrish/corpus-driven-narrative-generation/blob/master/creating-a-wikiplots-subcorpus.ipynb), I created a text file with many thousands of sentences from plot summaries of romantic comedies on Wikipedia. (Thank you [WikiPlots](https://github.com/markriedl/WikiPlots) for making it fast and easy to construct this corpus!) In an effort to better understand the structure of the romantic comedy, I decided to introduce my own structure, i.e., sorting the sentences alphabetically, just to see what would happen. This is the result!

First, open the plain text romantic comedy sentence corpus and make a list from the lines:

In [18]:
lines = open("./romcom_export.txt", encoding='utf8').read().split("\n")

Couple of things we need from the standard library:

In [101]:
import re, textwrap, random

To try to ensure we're only getting whole sentences and not weird fragments leftover from errors in the sentence parsing process, only use lines that begin with a capital letter. This cell also sorts the lines alphabetically.

In [48]:
caps_filtered = sorted([line for line in lines if re.search(r"^[A-Z]", line)])

Get a random snippet to see if it works:

In [106]:
snippet_len = 24
start = random.randrange(len(caps_filtered) - snippet_len)
print(' '.join(caps_filtered[start:start+snippet_len]))

Trip emphasizes how he dumped his girlfriend in the same fashion that Mike did to score with women and was unsuccessful. Tripp (Matthew McConaughey), a 35-year-old man, is still living with his parents Al (Terry Bradshaw) and Sue (Kathy Bates), in Baltimore. Tripp angrily confronts his parents, and breaks up with Paula. Tripp's best friends Demo (Bradley Cooper) and Ace (Justin Bartha) are also still living in their parents' homes and seem proud of it. Tripp's parents and friends devise a plan to reconcile the two lovers. Trish rushes to him in concern, and he finally confesses to her that he is a virgin. Trish's jealous TV-star husband crashes the wedding and gets into a fight with Ulysses. Trouble develops. Trouble is, Tod's been romantically involved with Betty Gilbert, a nightclub singer, while Gert's gotten engaged to Tod's football rival, Andy Mason. Troy Bolton is still dating Gabriella Montez, who decides to stay in Albuquerque with her mother. Troy agrees to sing with his frie

I produce two outputs. First, the plain text output, wrapped at 65 characters:

In [107]:
with open("output.txt", "w") as fh:
    fh.write(textwrap.fill(' '.join(caps_filtered), 65))

And then a more nicely typeset version in LaTeX. The function below does some simple LaTeX special character escaping:

In [108]:
# from https://stackoverflow.com/a/25875504
def tex_escape(text):
    conv = {
        '&': r'\&',
        '%': r'\%',
        '$': r'\$',
        '#': r'\#',
        '_': r'\_',
        '{': r'\{',
        '}': r'\}',
        '~': r'\textasciitilde{}',
        '^': r'\^{}',
        '\\': r'\textbackslash{}',
        '<': r'\textless{}',
        '>': r'\textgreater{}',
        "'": r'\textquotesingle{}',
        '"': r'\textquotedbl{}',
    }
    regex = re.compile('|'.join(re.escape(key) for key in sorted(conv.keys(), key=lambda item: -len(item))))
    return regex.sub(lambda match: conv[match.group()], text)

In [124]:
book_latex = r"""
\documentclass[10pt,twoside,openright]{memoir}
\usepackage[paperwidth=6in, paperheight=9in, bindingoffset=1in]{geometry}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{tgpagella}
\usepackage{textcomp}

\usepackage[protrusion=true,expansion=true]{microtype}

\makeatletter
\def\maketitle{%
  \null
  \thispagestyle{empty}%
  \vfill
  \begin{center}\leavevmode
    \normalfont
    {\LARGE\raggedleft \@author\par}%
    \vskip 1cm
    {\huge\raggedleft \@title\par}%
    \vskip 1cm
  \end{center}%
  \vfill
  \null
  \cleardoublepage
  }
\makeatother
\author{Allison Parrish}
\title{When Harry Met Sally Sam Sammy Samuel Sandie Sandra Sandy Sara Sarah Sascha...}
\date{}


\begin{document}

\let\cleardoublepage\clearpage

\maketitle

\frontmatter

\null\vfill

\begin{flushleft}
\textit{When Harry Met Sally Sam Sammy Samuel Sandie Sandra Sandy Sara Sarah Sascha...}

\abnormalparskip{10pt}
Text in this book was taken from plot summaries of romantic comedies on Wikipedia.

This work is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0).

A human-readable summary follows, which is not a substitute for the license itself.

You are free to copy and redistribute the material in any medium or format; and to remix,
transform, and build upon the material for any purpose, even commercially. The licensor cannot
revoke these freedoms as long as you follow the license terms.

The following terms apply: You must give appropriate credit, provide a link to the license,
and indicate if changes were made. You may do so in any reasonable manner, but not in any
way that suggests the licensor endorses you or your use. If you remix, transform, or build
upon the material, you must distribute your contributions under the same license as the
original. You may not apply legal terms or technological measures that legally restrict others
from doing anything the license permits.

https://creativecommons.org/licenses/by-sa/3.0/
\traditionalparskip
\end{flushleft}

\let\cleardoublepage\clearpage

\mainmatter
\sloppy

##replaceme##

\end{document}
""".replace("##replaceme##", tex_escape(textwrap.fill(' '.join(caps_filtered), 80)))

In [125]:
with open("output.tex", "w") as fh:
    fh.write(book_latex)

You can then use a LaTeX processor to create a PDF from the LaTeX source. (I had to use `lualatex` because the file was too big for `pdflatex`—over 1500 pages!)