# Single-Line Drawing SketchRNN

> Training a sequence model on a dataset of my single-line drawings.



<!--

Introduction to the Intersection of Machine Learning and Art

Personal fascination with machine learning and art.
Belief in the educational potential of art in understanding human cognition.
Early Artistic Endeavors and Introduction to Machine Learning

Beginnings in painting during a stressful startup period.
Self-taught approach with a focus on ambition and novelty in art.
Initial challenges in achieving artistic 'flow'.
Discovery of machine learning concepts like deepdreams, style transfer, and GANs.
Experimentation with art projects incorporating machine learning outputs.
Transition to Single-Line Drawing and Discovering SketchRNN

Shift to single-line drawing as a creative constraint.
Intrigue with SketchRNN's capabilities in enhancing drawings.
Concept of Training SketchRNN with Personal Art

The idea to train SketchRNN with own single-line drawings.
Contrast between SketchRNN and other models like GANs in representing single-line art.
Initial challenges due to lack of sufficient personal artwork.
Sketchbook Management During COVID

Digitization of sketchbooks for design merchandise.
Development of a meticulous scanning and sorting system.
Reflection on the digital transformation of a traditionally organic process.
Personal Rituals and Systematic Organization of Artwork

The ritualistic aspect of sketchbook management.
Detailed process of organizing and digitizing sketchbooks.
Personal reasons for not automating the process.
Revisiting the SketchRNN Goal and Dataset Preparation

Accidental accumulation of a large number of digitized drawings.
The realization of potential for training SketchRNN.
Separating Artwork Using Machine Learning

Use of computer vision models to differentiate between watercolors and single-line drawings.
Fine-tuning and utilizing machine learning for efficient sorting.
Challenges in Vectorizing Scans for SketchRNN

Difficulties in converting JPEG scans to vector graphics.
Discovering and utilizing tools for appropriate vectorization.
Addressing issues with autotrace results and path joining.
Final Preparations for Training the Model

Developing algorithms for bounding box separation and filtering drawings.
Decisions on dataset composition and final preparations for training SketchRNN.

-->

## Introduction to the Intersection of Machine Learning and Art

<!-- 

- Personal fascination with machine learning and art.
- Belief in the educational potential of art in understanding human cognition.
- Early Artistic Endeavors and Introduction to Machine Learning

-->

I've always been fascinated by machine learning and art. There's a magic to me in how machines represent information. I believe that making art to explore this can teach us something about how our own minds work, or at least make us think.

I started painting when I was stressed out during my first startup. Teaching myself, I wanted to constantly make more ambitious and difficult paintings. My goal was to never repeat myself, always do something bigger and better. I'd rarely get in flow since it felt like I was always pushing uphill. Soon afterwards, I was working with machine learning and discovered deepdreams, style transfer, GANs. I found myself drawn to art projects using these technologies, making paintings from their outputs.

## Shift to single-line drawing as a creative constraint.

<!-- 

Intrigue with SketchRNN's capabilities in enhancing drawings.

-->

Eventually my paintings of ML outputs became cumbersome, and I needed a constraint to keep my art practice fun and interesting. I found single-line drawing to be challenging, engaging, and expressive.

## Concept of Training SketchRNN with Personal Art

<!-- 
The idea to train SketchRNN with own single-line drawings.
Contrast between SketchRNN and other models like GANs in representing single-line art.
Initial challenges due to lack of sufficient personal artwork.
-->

When SketchRNN came out from Google, I was struck by a demo that lets you start a drawing, and a model trained on the "collective consciousness" of thousands of people can complete your drawing in diverse ways.

But the model was trained on 20-second sketches from non-artists, so the model mainly draws stick figures (or drawings of similar skill level). I thought it'd be awesome to train this on a dataset of my own drawings.

Particularly for single-line drawings, the trajectory of the drawing is so essential to the medium that SketchRNN seemed like the perfect representation. By comparison, GANs (and later, diffusion models) generate all pixels simultaneously - this misses the aspect of single-line drawings flowing and taking into account the previous path of the line in deciding where to go next.

When I looked further into SketchRNN and spoke with its creator, David Ha, he told me I'd "only" need a few thousand drawings. I had maybe 100 scans on my computer at the time. I forgot about the SketchRNN goal, but continued filling my sketchbooks.

## Sketchbook Management During COVID

<!-- 
Digitization of sketchbooks for design merchandise.
Development of a meticulous scanning and sorting system.
Reflection on the digital transformation of a traditionally organic process.

-->

My sketchbook scanning had become a little ritual that I'd do. It was funny to take one of the most offline, internal, organic things I do - sitting quietly, drawing in my sketchbook - and systematically digitize it. In the back of my mind, I imagined that being able to visually search them or cluster them would help me see patterns in my drawings over time. But it'd be a lot of engineering work, so I never quite got around to it.

## Personal Rituals and Systematic Organization of Artwork

<!-- 
The ritualistic aspect of sketchbook management.
Detailed process of organizing and digitizing sketchbooks.
Personal reasons for not automating the process.
-->

But at the same time, it's this personal ritual. I usually make a collage on the cover of my sketchbook, with plane tickets or scraps of paper capturing what was going on in my life while that sketchbook was in progress. I put a post-it note with the start and end dates of the sketchbook, and give it a number (like "SB48" for sketchbook number 48). I number each of the pages in the bottom corner closest to the binder rings (this is a practical step, so that after I scan all the pages I can easily see which page it is within the sketchbook).

I bought a book scanner, since using a flatbed scanner took too long. I just turn it on, flip through the pages, and I automatically get separate images per page. I scan the pages, and throw them all in a folder.

I name the folder "sb48". Then I flip through the files, naming the pages to correspond to the page numbers - for example, "005.jpg" for page 5, since I want the file names to sort nicely. *If I want to refer to a drawing, I can just type "sb48p005.jpg" and find what I'm looking for.*

![](assets/singleline-dataset/01-folder-system/sb-flipping-art1.gif)

Then I go through and use "tags". Yellow for "notes" (diagrams/mindmaps), Red for anything "private", Orange for "extras" (bad scans, happy accidents, alternate scans), Green for favorite drawings, and anything untagged is assumed to be art. I separate the covers too.

Then, I create 4 sub-folders inside "sb48":
- cover
- art
- notes
- xtra
- priv

![](assets/singleline-dataset/01-folder-system/folder-list-view.png)


And I sort by tags so it's easy to move them into the right sub-folders. Then I move the folder for sketchbook 48 into a big folder alongside all of my other sketchbooks.

![](assets/singleline-dataset/01-folder-system/folder-icons.png)

Engineer friends have asked why I don't automate this. I've streamlined it somewhat, so that I have keyboard shortcuts and can move fairly quickly.

I don't mind the manual effort though - I find that spending a little extra time seeing my old drawings helps me fall back in love with my art practice, especially when it's started to feel stagnant or burdensome. I flag my favorites, get new ideas. Or I remember what problems I was working through when I see my notes and engineering diagrams.

## Revisiting the SketchRNN Goal and Dataset Preparation

<!-- 
Accidental accumulation of a large number of digitized drawings.
The realization of potential for training SketchRNN.
-->

3 or 4 years went by. I kept up my ritual of scanning my sketchbooks whenever I finished one, every few weeks or months (sometimes in a matter of days, if I was in a drawing frenzy).

At some point I remembered SketchRNN, and did a file count. I saw that I had at least a couple thousand drawings scanned already - it might actually be possible to train this model I'd been dreaming about!

The first step (and often the most time-consuming in machine learning projects) was to prepare the dataset.

## Separating Artwork Using Machine Learning

<!-- 
Use of computer vision models to differentiate between watercolors and single-line drawings.
Fine-tuning and utilizing machine learning for efficient sorting.
-->

My sketchbook filing system let me separate out notes and diagrams from artwork - which was great. However, I still had a lot of watercolor paintings mixed in with my single-line drawings. Reluctant to go through and hand-sort the watercolors one by one, I thought I could use machine learning to make this process faster.

I took a computer vision model (resnet34) trained to identify photographs of everyday objects and animals (think cat, dog, truck, toaster). I used this to compute "embeddings" for each of the images in the "art" folders of my sketchbooks. Embeddings are like a compressed version of the essential information need for the model to identify what it's trained to look for, and images with similar embeddings tend to contain similar content.

My goal was to get embeddings of all my images, and then group them by similarity - the hope being that my watercolors would separate cleanly from the thousands of single-line drawings.

Since the off-the-shelf computer vision model was trained on photographs, it didn't do as well with black-and-white scans. So I fine-tuned the model on my own data, in order to teach the model to work within the context of scanned pages. Since I had lots of pages already bucketed into folders labeled `art`, `notes`, and `covers`, I made a dataset of images along with their category. I trained a model on this data.

Then I took those embeddings.

When I visualized the embedding space, the notes were all grouped together and the art also appeared in a cleanly separated group. This was a good start, and meant that the model had learned to represent the difference between these scanned pages in a meaningful way.

Next, I ran K-Means clustering on these embeddings. This finds a number of groupings that minimize the distance of each point to its nearest group, for whatever number of groups was requested. I tried 16 groups to start.

When I looked through each of the groups, I was thrilled - several of the groups contained only watercolors, many contained only drawings.

I gave each of the clusters a label (`0_drawings`, `1_watercolors`, etc) and then assigned each page to its cluster and attached a label based on the cluster. Then I moved each page into a folder corresponding to its label. I went through every single page manually to see if it had been mis-categorized. A surprisingly small number were out of place, but I moved them into a new folder that I called "hand-labeled" - these would be my ground-truth.

Now I had well over a thousand scans of single-line drawings in my folder `0_drawings`, with all of the watercolors separated out.

<!-- *Mention embedding projector?* -->

## Challenges in Vectorizing Scans for SketchRNN

<!-- 
Difficulties in converting JPEG scans to vector graphics.
Discovering and utilizing tools for appropriate vectorization.
Addressing issues with autotrace results and path joining.
-->

Vector graphics are notorious in the machine learning world for being a pain to work with. Instead of a simple grid of pixel colors, as you get in a PNG or JPEG file, SVG's have paths composed of various key points.

SketchRNN requires input in a format called stroke-3. The idea is to represent pen strokes as a series of triplets, where each triplet in the sequence contains:
- `delta_x`: how much did the pen move left or right?
- `delta_y`: how much did the pen move upwards or downwards?
- `lift_pen`: does this move continue the active stroke, or "lift the pen off the page" and move to the beginning of a new stroke?

But before I could convert my JPEG scans into stroke-3, I had to vectorize them (convert them into SVG paths, which could then be transformed into stroke-3).

My first obstacle was that all of the automated ways of vectorizing scans represented my lines as a closed shape - so one line would be represented by two paths with no thickness around the outside of the line, filled in with black. I discovered that what I wanted was a "centerline trace" - a single path, with line thickness.

After trying a variety of approaches, I found an open-source library called `autotrace`. It's quite painful to work with. I found an Inkscape plugin that used `autotrace` in a way that worked for my drawings. Adobe Illustrator has something called "Live Trace" that could work, but I ruled out the idea of Live Tracing several thousand drawings.

I found that I could automate the code from that inkscape plugin (it's ugly, but I wrote a wrapper script that can apply it to my drawings and get passable results).

When I saw drawings come out that more or less mirrored the scanned pages, I was excited and decided to start trying to train a model.

## Final Preparations for Training the Model

<!-- 
Developing algorithms for bounding box separation and filtering drawings.
Decisions on dataset composition and final preparations for training SketchRNN.
-->

What came out was pure gibberish. A flurry of small separate lines.

I realized that I had a problem: autotrace hadn't realized that the different segments of my lines were actually connected. It was producing a series of small, centerline-traced segments.

I'd have to go back and somehow connect them. So, I tried a simple little algorithm that I called "path joining."

The idea: each of my drawings was a collection of line segments, each with a start and end point. I'd start with the longest one. Then, for each of the other line segments, I'd find the minimum distance between the start and end points of the line segments. Put simply, what's the shortest connection I can draw to connect two segments? I found that I needed an upper bound on the distance - I'd resized all of my drawings to 200x200 pixels, so something like 15-30 pixels was the maximum distance I observed made sense before I started to get irrelevant lines connecting to each other.

This worked remarkably well. Still, I noticed some drawings had large contiguous strokes that weren't getting connected. I realized that while the strokes were close together and in some cases touching, their start and end points were far apart from each other.

So I wrote a second algorithm, which I called "path splicing."

After I'd applied path joining and had a smaller number of long strokes left over, I wanted to see which strokes should be joined together. So, I'd take the longest path, then for each of the remaining shorter paths, I'd step through each point on the longest path and compare its distance from the start and end points of the shorter paths. When I found the smallest gap, I would "insert" the shorter line into the longer path at the point with the smallest distance.

I reviewed the results of this - it came out better than I'd expected.

My last remaining problem: there were some pages where I'd make 4 or 5 separate drawings that had nothing to do with each other, and had a lot of space between them. I wanted to separate those into separate examples before making a training dataset, for 2 reasons:
1) To get more training examples from the scans I had
2) To avoid confusing the model I was training - if some examples make a complete drawing and then start a second unrelated drawing, how does the model know when to finish a drawing vs. when to start a second drawing alongside the first one?

So I wrote a third algorithm, which i called "bounding box separation".

My intuition was that separate drawings had some space between them, and didn't overlap much. So I'd take each of the strokes within a drawing, and determine its top/bottom/left/right extremes, so I could draw a box around each stroke. Then for each combination of bounding boxes, I'd compute a ratio of the area of their overlap compared to the area of the non-overlapping parts. If the ratio exceeds some threshold, I consider them to be part of the same drawing, and I merge the bounding boxes. Then I take that larger bounding box along with all the remaining bounding boxes, and repeat the process until no overlaps remain that exceed the threshold. Also, if any bounding boxes have a really small area, I just drop them. It turns out this helps exclude little page numbers / bits of text from ending up in my training data.

Once I have all the separated drawings, I save them out as separate files.

I noticed that some drawings were of one figure, and some drawings were patterned or chained with many faces. I wanted to exclude those patterns/chains from my training data, so I could give my model the best chance of learning to draw one person at a time.

So, I computed embeddings for these bounding-box separated drawings, clustered them, and got reasonably coherent groups.

Finally, I excluded the ones that didn't fit the composition I wanted, and saved a filtered-down dataset.

To train the model, I had to pick a maximum number of points in a given drawing. 250 was the recommended default.

I looked at the distribution of number of points in all the drawings. At the very low end, some drawings had snuck in that were just little squiggles, and at the upper end, some really convoluted messes of lines were in there. I cut out the top and bottom 5% of drawings by number of points.

For the remaining drawings, I ran the RDP algorithm with varying values for its `epsilon` parameter, until the number of points dipped under 250. Then I saved the result as a zipped numpy file.

## 

<!-- 

-->