```{=latex}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{listings}
\usepackage{textcomp}
\usepackage{fancyvrb}

\newcommand{\passthrough}[1]{\lstset{mathescape=false}#1\lstset{mathescape=true}}
```

```{=latex}
\title{Building Containers for Python Applications}
\author{Moshe Zadka -- https://cobordism.com}
\date{2021}

\begin{document}
\begin{titlepage}
\maketitle
\end{titlepage}
```

Python is a popular language for many applications.
Those that run in backend services,
now in the 2020s,
are usually run inside containers.
Building containers for Python applications is common.

Often,
with microservice architectures,
it makes sense to build a
"root"
base image which all of the services will build off of.
Most of the following will focus on the
base
image,
since this is where it is easiest to make mistakes.

However,
the applications themselves will also be covered:
what is a good
base
if not something to 
build on top of?

```{=latex}
\frame{\titlepage}
```

Before continuing,
I want to make an acknowledgement of country.
I come from the city of Belmont,
in the San Francisco Bay Area peninsula.

It was built on the ancestral homeland
of the Ramaytush Ohlone people.
You can learn more about them on their
[website](https://www.ramaytush.org/),

```{=latex}
\begin{frame}
\frametitle{Acknowledgement of Country}

Belmont (in San Francisco Bay Area Peninsula)

Ancestral homeland of the Ramaytush Ohlone people

\end{frame}
```

## Good and Bad

Before talking about
*how*
to build good containers,
there needs to be understanding of what
*are*
good containers?
What distinguishes good containers
from bad ones?

```{=latex}
\begin{frame}
\frametitle{What is good}

\pause


\begin{itemize}
\item To crush your enemies \pause
\item To see them driven before you \pause
\item Um, wrong slides
\end{itemize}


\end{frame}
```

Ah, woops,
these slides are from a different
talk,
about what is good
*in life*,
not
what is good
*in containers*.
It's time to focus on the topic at hand.
What kind of criteria distinguish
good containers from bad?

```{=latex}
\begin{frame}
\frametitle{What is good}

\begin{itemize}
\item Fast \pause
\item Small \pause
\item Secure \pause
\item Usable
\end{itemize}

\end{frame}
```

This is pretty high-level.
What does
"fast"
mean?
Fast at what?
How small is "small"?
What does it mean to be
"secure"?

These guidelines are rough.
Time to focus on concrete,
measurable,
criteria.

```{=latex}
\begin{frame}
\frametitle{Specifying the requirements}

Let's be more concrete

\begin{itemize}
\item Keep up to date \pause
\item Reproducible builds \pause
\item No compilers in prod \pause
\item Keep size (reasonably) small
\end{itemize}

\end{frame}
```

OK, a bit better.
But still not specific enough.
How exactly is
"keep up to date"
a criterion?
What size is
"reasonably"
small?

Start with
"up to date".
The most important part is that
security updates from the upstream
distribution will be installed
on a regular cadence.

```{=latex}
\begin{frame}
\frametitle{Up to date}

\begin{itemize}
\item Install security updates \pause
\item But when?
\end{itemize}


\end{frame}
```

This directly conflicts with the next goal:
"reproducible builds".
The abstract theory of reproducible builds
says that giving the same source
must result in bit-for-bit identical results.
This has many advantages,
but is non-trivial to achieve.

Lowering the bar a bit,
the same source must lead to equivalent results.
While this removes some advantages,
it maintains the most important one.
*Changing*
the source by some amount only results in
*commensurate* changes.

This is the main benefit of reproducible builds.
It allows pushing small fixes
with confidence that there
are no unrelated changes.
This allows less testing for small fixes,
and faster delivery of hot patches.

```{=latex}
\begin{frame}
\frametitle{Reproducible builds}

Same code gives same results \pause

...mostly

\end{frame}
```

The next criterion sounds almost trivial:
"no compilers in prod".
Compile ahead of time,
and store results in the image.

This criterion is here because without
careful thinking and implementation,
it is surprisingly easy to get wrong.
Many containers have been shippped with
`gcc`
included
because someone did not write their
`Dockerfile`
carefully enough.

```{=latex}
\begin{frame}
\frametitle{No compilers in prod}

A common anti-pattern \pause

...surprisingly easy to get wrong!


\end{frame}
```

On size,
however,
it is possible to spend an infinite amount of time.
Every byte can be debated if it is worth it.

In practice,
after getting into the low hundreds of megabytes,
this quickly becomes a game of diminishing returns.
Hours of work can go into carefully trimming
a few hundred extra kilobytes.

The point at which to stop depends on the cost structure.
Do you pay per GB? How much?
How many different images use the base image?
Is there something more valuable to do?

In practice,
getting images down to low hundreds of megabytes
(200 or 300)
is fairly easy.
Getting them below 200 is possible with a little more work.
This is usually a good stopping point.

```{=latex}
\begin{frame}
\frametitle{Size}

\begin{itemize}
\item Diminishing returns \pause
\item Cost savings
\end{itemize}

\end{frame}
```

One way to make the process of building a container image
faster and more reliable is to use
*binary wheels*
for packages with native code.
Whether it is in getting the wheels from PyPI,
building wheels into an internal package index,
or even building the wheels as part of a multistage
container build,
binary wheels are a useful tool.

```{=latex}
\begin{frame}
\frametitle{Support binary wheels}

Installing and building \pause

Faster \pause

Simplifies images


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Not run as root}

General hygiene

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Minimal privileges}

Especially avoid permissions to \lstinline|pip install|

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Fast rebuilds}

Responsiveness!

\end{frame}
```

## Bases

```{=latex}
\begin{frame}
\frametitle{Base OS}

The distro wars are back?

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Base - size}

Most modern distros have a decent minimal server \pause

...but Debian is easiest to get smallest.


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Base - LTS/support}

Usually around 5 years \pause

Gives you time to upgrade!


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Base - Volatility}

How much change?

Security? Backports? Fixes?

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Debian}

LTS: 5 years

Conservative


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Ubuntu}

LTS: 5 years

(Universe, Multiverse, etc...)

Fairly conservative

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Alpine (probably not)}

Uses musl, not manylinux compatible

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Rolling releases (probably not)}

Up to date, but... \pause

updates can change major versions!


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{CentOS}

Rolling release!

\end{frame}
```

## Installing Python

```{=latex}
\begin{frame}
\frametitle{How to get Python?}

So many options...

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Not system Python}

Distros aim Python at distro packages\pause

not user programs.

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Appropriate repositories}

Famous examples: deadsnakes PPA for Ubuntu

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{pyenv}

Builds and installs Python


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{python-build}

Builds and installs Python

\end{frame}
```

```{=latex}
\begin{frame}[fragile]
\frametitle{Source}

\begin{lstlisting}
RUN configure [...]
RUN make
RUN make install
\end{lstlisting}

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Trade-offs}

Control vs. Work vs. Problems

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Versions}

Support multiple for upgrade path\pause

2-3


\end{frame}
```

## Thinking in Stages

```{=latex}
\begin{frame}
\frametitle{Docker multistage (quick recap)}

Only one stage output \pause

other stages help

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{FROM}

Use previous stage as starting image

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{COPY --from}

Copy files from previous stage

\end{frame}
```

```{=latex}
\begin{frame}[fragile]
\frametitle{Stages a as modules}

\begin{lstlisting}
FROM ubuntu as security-updates
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update
RUN apt-get upgrade

FROM security-updates as with-38
RUN apt-get install python3.8

FROM security-updates as with-39
RUN apt-get install python3.9
\end{lstlisting}

\end{frame}
```

```{=latex}
\begin{frame}[fragile]
\frametitle{Separate build and runtime}

Especially when building from source! \pause

\begin{lstlisting}
FROM ubuntu as builder
# install build dependencies
# build Python into /opt/myorg/python

FROM ubuntu as as runtime
COPY --from=builder \
      /opt/myorg/python \
      /opt/myorg/python
\end{lstlisting}

\end{frame}
```

```{=latex}
\begin{frame}[fragile]
\frametitle{Optimizing layers}

Put everything under \lstinline|/opt/myorg|

Use one \lstinline|COPY --from=...|


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Optimizing size}

After building Python, remove:

\begin{itemize}
\item Tests
\item Builder dependencies (in runtime)
\item ....and more
\end{itemize}

\end{frame}
```

## Use in Applications

```{=latex}
\begin{frame}
\frametitle{Binary wheels}

\begin{itemize}
\item Build with builder
\item Copy to runtime
\item Install in virtual environment
\end{itemize}


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Binary wheels (alt)}

\begin{itemize}
\item Build with builder
\item Install in virtual environment
\item Copy virtual environment to runtime
\end{itemize}


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Patchelf}

Used to make wheels self-contained

Newst version needed

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Auditwheel}

Use pip to install

\end{frame}
```

```{=latex}
\begin{frame}[fragile]
\frametitle{Self-contained binary wheels}

Run 

\begin{lstlisting}
auditwheel repair --platform linux_x86_64
\end{lstlisting}

\pause

No need for binary dependencies!
\end{frame}
```

```{=latex}
\begin{frame}[fragile]
\frametitle{Portable binary wheels}

\begin{itemize}
\item Oldest supported?
\end{itemize} 

\pause

Example:

\begin{lstlisting}
auditwheel repair --platform manylinux_2_27_x86_64
\end{lstlisting}


\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Generating binary wheels}

Build instructions in docs
\pause

Build dependencies

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Optimizing layers}

Reduce copies
\pause

Prep

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Optimizing caching}


Where to build wheel?

\pause

What invalidates caching?


\end{frame}
```

## Final Thoughts

```{=latex}
\begin{frame}
\frametitle{Conclusion}

\begin{itemize}
\item Wrong easier than right \pause
\item But right is amazing \pause
\item Think before  you docker
\end{itemize}

\end{frame}
```

```{=latex}
\begin{frame}
\frametitle{Further Resources}

Itamar's series -- https://pythonspeed.com/docker/

\end{frame}
```

```{=latex}
\end{document}
```