# Using Make to Build Software and Manage Workflows

Chris Paciorek

Statistical Computing Facility, Department of Statistics, UC Berkeley

## Overview and references

Make is software that allows you to build software, create files, or carry out other tasks in a way that respects the dependencies amongst a series of files/tasks.

It was developed for building software; for example compiling .cpp files to .o files and then linking to build an executable and move the executable into place on a computer. 

As stated in the Make manual "You can use make with any programming language whose compiler can be run with a shell command. Indeed, make is not limited to programs. You can use it to describe any task where some files must be updated automatically from others whenever the others change."

Make is a program that processes a Makefile, which contains the information about the rules to create files or carry out tasks and the dependencies between files/tasks.

Here are some useful links:
  - [Make manual](http://www.gnu.org/software/make/manual/make.html)

  - [Tips on using Make for scientific workflows (data analysis, document preparation, etc.)](http://kbroman.org/minimal_make/)

## Basic aspects of a Makefile

A Makefile is made up of *rules* that create *targets* (or carry out other tasks).

For example, here is a basic rule:

`
algo.o: algo.cpp defaults.h
    g++ -c -o algo.o algo.cpp
    `
    
 - Target: algo.o
 - Prerequisites: algo.cpp defaults.h
 - Recipe: g++ ...
 
 Careful: the recipe line(s) must be indented with a TAB and not with any spaces.

## Example Makefile for building software

In [4]:
%%bash
cat code_example/Makefile

# default rule is the first one
algo: vec1.h base.h vec1.o vecMain.o
	g++ -o algo vec1.o vecMain.o

vec1.o: vec1.h base.h vec1.cpp
	g++ -c -o vec1.o vec1.cpp

vecMain.o: vecMain.cpp
	g++ -c -o vecMain.o vecMain.cpp


If we wanted to create *algo*, we could invoke it specifically 
```bash 
make algo
``` 

or just by typing `make`, because the first target is the default.

If we just want to create `vecMain.o` we can invoke 
```bash
make vecMain.o
```


## Dependencies

Make only runs a recipe when any of the dependent files have changed. So in the above example, if vecMain.cpp changes and we invoke `make`, compilation of vecMain.o and linking to create algo is done, but compilation of vec1.o is **not** done.

As your Makefile involves more recipes and more complicated dependencies you harness more of the power of make.

If you try to invoke a rule for which no dependencies have changed, you'll get this message: "make: Nothing to be done for `foo'."



## Using variables in Makefiles

Also, much of the power of make comes from using variables to automate recipes and avoid duplicated syntax. 

In the above example, we might avoid rewriting multiple header file names by setting a variable:

` HEADERS = vec1.h base.h `

and using `$(HEADERS)` in place of the file names whereever they appear.

## Extending the example

Here's a more involved example:

In [5]:
%%bash
cat code_example/Makefile_better

DEBUG = True
CXX = g++
CXXFLAGS = -O
LFLAGS = -L /opt/acml/lib
INCLUDES = -I/opt/acml/include
LIBS = -lm -lacml

HEADERS = vec1.h
OBJS = vec1.o vecMain.o

ifeq ($(DEBUG), True)
     CXXFLAGS += -g
endif

$(HEADERS): base.h

algo: $(OBJS)
	$(CXX) $(OBJS) $(LIBS) -o algo

vec1.o: $(HEADERS) vec1.cpp
	$(CXX) -c $(CXXFLAGS) $(LFLAGS) $(INCLUDES) -o vec1.o vec1.cpp 

vecMain.o: $(HEADERS) vecMain.cpp
	$(CXX) -c $(CXXFLAGS) $(LFLAGS) $(INCLUDES) -o vecMain.o vecMain.cpp

# prevent confusion if there is a file 'clean'
.PHONY: clean

clean:
        rm -f $(OBJS) algo

# more general:
# %.o: %.cpp
#     $(CXX) -c $(CXXFLAGS) $(LFLAGS) $(INCLUDES) -o $@ $< 
# or this, which uses substitution
# %.o: %.cpp
#     $(CXX) -c $(CXXFLAGS) $(LFLAGS) $(INCLUDES) -o $@ $(@:.o=.cpp) 
# OBJS = $(wildcard *.o)  
# use of wildcard forces expansion rather than creating "*.o" as OBJS
# or OBJS could be any set of .o files
# $(OBJS): %.o: %.c
#        $(CXX) -c $(CXXFLAGS) $(LFLAGS) $(INCLUDES) $< -o $@


## What make does

Make processes the Makefile, figuring out the dependency tree, evaluating variables, and running the recipes that have prerequisites that have changed. More specifically, it

 - parses the Makefile
 - builds up variable chains
 - builds up a database of rules
 - looks at the target specified (by default the first target)
 - creates chain of rules from files that exist to the target
     - evaluates needed '=' variable assignments
 - use date stamps on files to determine what dependencies need to be executed for target to be done

## Types of rules

We've seen a bunch of explicit rules that directly specify the recipe for a given target. We can also streamline the above by using pattern rules to make a class of targets. Here's a modified version of the Makefile

In [2]:
%%bash
cat code_example/Makefile_pattern_rule

DEBUG = True
CXX = g++
CXXFLAGS = -O
LFLAGS = -L /opt/acml/lib
INCLUDES = -I/opt/acml/include
LIBS = -lm -lacml

HEADERS = vec1.h
# use of wildcard forces expansion rather than creating "*.o" as OBJS
OBJS = $(wildcard *.o)

ifeq ($(DEBUG), True)
     CXXFLAGS += -g
endif

$(HEADERS): base.h

algo: $(OBJS)
	$(CXX) $(OBJS) $(LIBS) -o algo

# implicit rule
%.o: %.cpp
     $(CXX) -c $(CXXFLAGS) $(LFLAGS) $(INCLUDES) -o $@ $< 

# or this, which uses substitution
# %.o: %.cpp
#     $(CXX) -c $(CXXFLAGS) $(LFLAGS) $(INCLUDES) -o $@ $(@:.o=.cpp) 
#
# or OBJS could be any set of specific .o files
# $(OBJS): %.o: %.c
#        $(CXX) -c $(CXXFLAGS) $(LFLAGS) $(INCLUDES) $< -o $@

# prevent confusion if there is a file 'clean'
.PHONY: clean

clean:
        rm -f $(OBJS) algo




## More details on defining variables

 - := causes the assignment to occur at the time the variable is defined
 - = causes the assignment to occur at the time the variable is used in a command, so order of variable definition does not matter
 - += appends to the variable, e.g.
 
 ```bash
 CFLAGS += -O
 ```
 
  - $@ matches the target
  - $< matches the first prerequesite
  - $^ matches all prerequisites
 
 Given the above, what's the problem with this:
 
 ```bash
 CFLAGS = $(CFLAGS) -O
 ```

In [12]:
%%bash
cat variable_example/Makefile
echo "####################################################"
make --silent -f variable_example/Makefile

# Test 1
foo1 = $(bar1)
bar1 = $(dum1)
dum1 = Huh?

# Test 2
foo2 := ${bar2}
bar2 := Huh?

# Test 3
foo3 = hi there
bar3 := ${foo3}
foo3 = see ya

# Test 4
foo4 = hi there
bar4 = ${foo4}
foo4 = see ya

all:
	echo "Test 1: foo1 is $(foo1)"
	echo "Test 2: foo2 is $(foo2)"
	echo "Test 3: bar3 is $(bar3)"
	echo "Test 3: foo3 is: $(foo3)"
	echo "Test 4: bar4 is $(bar4)"
	echo "Test 4: foo4 is $(foo4)"
####################################################
Test 1: foo1 is Huh?
Test 2: foo2 is 
Test 3: bar3 is hi there
Test 3: foo3 is: see ya
Test 4: bar4 is see ya
Test 4: foo4 is see ya


## Another basic example: preparing documents

Here's an example that illustrates how you might use make to prepare presentation materials for a workshop.

Note the use of pattern rules and the use of some auxiliary convenience targets to save typing. Also note how you can have a top-level Makefile that calls one more Makefiles elsewhere (often in subdirectories).

In [14]:
%%bash
cat workshop_example/Makefile
echo "  "
echo "################ nested Makefile ##############################"
echo "  "
cat workshop_example/modules/Makefile

BASE_DIR = .
MODULES_DIR = $(BASE_DIR)/modules

modules: 
	cd $(MODULES_DIR) && $(MAKE) all

clean:
	cd $(MODULES_DIR) && $(MAKE) clean
  
################ nested Makefile ##############################
  
.PHONY: clean  # so clean always done even if a file named clean is present

all: clean 0 1 2 3 4 5 6 7 8 9 10 11

clean:
	rm -rf *.md *.html *.pdf \
	cache/ figure/
# 	-rm -rf *.md *.html *.pdf cache/ figure/  
# -rm ignores errors when rm is called


%.html: %.Rmd
	echo $(@)
	./make_slides $(basename $(@))
	rm $(basename $(@)).md   # remove temporary
	echo $(basename $(@))

0: module0_induction.html
1: module1_basics.html
2: module2_managingR.html
3: module3_data.html
4: module4_calc.html
5: module5_analysis.html
6: module6_programming.html
7: module7_coreTools.html
8: module8_graphics.html
9: module9_workflows.html
10: module10_advanced.html
11: module11_next.html


In [20]:
%%bash
cd workshop_example/modules
make clean
echo " "
echo "####################################"
echo " "
make --silent 0

/accounts/gen/vis/paciorek/staff/workshops/make-thw-2015/workshop_example/modules
rm -rf *.md *.html *.pdf \
	cache/ figure/
 
####################################
 
module0_induction.html
[1] "module0_induction"
  |                                                                         |                                                                 |   0%  |                                                                         |.................................................................| 100%
  ordinary text without R code


  |                                                                         |                                                                 |   0%  |                                                                         |.................................................................| 100%
  ordinary text without R code


module0_induction




processing file: module0_induction.Rmd
output file: module0_induction.md



processing file: module0_induction.Rmd
output file: module0_induction.md



## Functions applied to variables

One can apply functions to variables using the following syntax
```bash
$(function_name argument)
```

For example to substitute 'html' for 'md' in variable FILE:
```bash
$(subst html, md, $(FILE))
```

Some of the functions you can use deal with string substitution and manipulation as well as manipulating file names and extensions and paths.

Finally you can use 'if' in the context of functions: `$(if condition,then[,else])'. And there is a foreach function that will iterate over white-spaced separated values in a variable.

Here are some examples:

In [23]:
%%bash
cat functions_example/Makefile
echo "  "
echo "#############################"
echo "  "
make --silent -f Makefile_functions 

.PHONY: test

OPTIMIZE = 1

FILE = workshop.md
HTMLFILE = $(subst md,html, $(FILE))

MDFILES = mod1.md mod2.md default.md example.md
FILES = $(patsubst mod%.md,mod%.html, $(MDFILES))

INPUT = foo bar foo duh
SORTED = $(sort $(INPUT))

INPUTFILES = /tmp/foo.c /var/tmp/bar.o
INPUTDIRS = $(dir $(INPUTFILES))
SUFFIXES = $(suffix $(INPUTFILES))
BASES = $(basename $(INPUTFILES))
TRANSFORMED = $(addsuffix .cpp, $(BASES))

DIRS := code_example workshop_example
ALLFILES := $(foreach dir, $(DIRS), $(wildcard $(dir)/*))

test:
	echo HTMLFILE is $(HTMLFILE)
	echo FILES is $(FILES)
	echo SORTED is $(SORTED)
	echo INPUTDIRS is $(INPUTDIRS)
	echo SUFFIXES is $(SUFFIXES)
	echo TRANSFORMED is $(TRANSFORMED)
	echo ALLFILES is $(ALLFILES)
  
#############################
  
HTMLFILE is workshop.html
FILES is mod1.html mod2.html default.md example.md
SORTED is bar duh foo
INPUTDIRS is /tmp/ /var/tmp/
SUFFIXES is .c .o
TRANSFORMED is /tmp/foo.cpp /var/tmp/bar.cpp
ALLFILES is code_example/demo.sh code_examp

## Analysis workflows using make

One nice use of make is to automate workflows. As with building software, we have a series of things we want to do as part of an overall workflow. We may want to run one or more of those steps. Or we might want to run the full analysis but without rerunning analyses whose dependencies have not changed. Another benefit of this is just taking a structured, programmatic approach to show exactly what needs to be done for each piece of the analysis. 

What are the steps you might have in your Makefile?
- getting data
- preprocessing/cleaning
- analysis/modeling
- postprocessing output
- figures/tables
- presentations/papers

Here's an example workflow in the form of a Makefile:

In [25]:
%%bash
cat analysis_example/Makefile

INTERIMS = *.aux *.bbl *.blg *.log *.bak *~ *.Rout
DATA = *.Rda *.csv
USER = paciorek
SERVER = foo.berkeley.edu
WEBDIR = /web/share/$(USER)/files

R_OPTS = --no-save
R = R CMD BATCH $(R_OPTS)

mypaper.pdf: mypaper.bib mypaper.tex tables.tex Figs/fig1.pdf Figs/fig2.pdf
	pdflatex mypaper
	bibtex mypaper
	pdflatex mypaper
	pdflatex mypaper

data.csv: code/make_data.py
	cd code; python make_data.py

results.Rda: data.csv model.R
	cd code; $(R) model.R model.Rout

tables.tex: results.Rda make_tables.R
	cd code; $(R) make_tables.R make_tables.Rout

Figs/fig1.pdf: R/fig1.R results.Rda
	cd code; $(R) fig1.R fig1.Rout

Figs/fig2.pdf: R/fig2.R results.Rda
	cd code; $(R) fig2.R fig2.Rout

web: *.pdf *.tex *.R *.py
	zip analysis.zip code/*.{R,py} *.pdf *.tex
	scp analysis.zip $(USER)@$(SERVER):$(WEBDIR)/.

clean:
	rm -f $(INTERIMS)

cleanall:
	rm -f $(INTERIMS) $(DATA)

.PHONY: web clean cleanall


## make for building software

The standard process for building software from source (in particular on UNIX based systems) uses make:
- configure
- make 
- make install

In more detail:

- The configure script figures out what tools/software you have on your system and sets things up to take account of that in the building and installation. It then creates a Makefile.
- make builds the software (e.g., compiling from code to binary) using the Makefile
- make install runs the install recipe in the Makefile to put files in the right place (binaries, header files, library files - DLLs/SOs) so that they are readily accessible (e.g., on a user PATH or in a directory where the system looks for header files or library files)

Often, the configure file is automatically generated using a tool called autoconf from a file often called configure.ac.

It's common to have a 'test' target that will create executables that run tests of the software.

### Nested Makefiles

In the context of building software, you'll often see nested Makefiles. The top-level Makefile may simply serve to call Makefiles in the various subdirectories. 

We'll look at this in a specific context (the Magma package for linear algebra on CPUs/GPUs). Here's an example Makefile from Magma.



In [1]:
%%bash
cat install_example/Makefile

# set prefix only if not already set (i.e., this provides a default value)
prefix ?= /usr/local/magma

bin:
        cd bin && $(MAKE)

lib:
	( cd magmablas      && $(MAKE) )
	( cd src            && $(MAKE) )
	( cd control        && $(MAKE) )
	( cd interface_cuda && $(MAKE) )

dir:
	mkdir -p $(prefix)
	mkdir -p $(prefix)/bin
	mkdir -p $(prefix)/include
	mkdir -p $(prefix)/lib
	mkdir -p $(prefix)/lib/pkgconfig

install: bin lib dir
	cp $(MAGMA_DIR)/include/*.h  $(prefix)/include
	cp $(LIBMAGMA)               $(prefix)/lib
	-cp $(LIBMAGMA_SO)           $(prefix)/lib
        cp $(BINARY}                 $(prefix)/bin


You may see syntax like this:
```bash
include make.inc
-include Makefile.internal
```

This will process the 'included' files. For example make.inc may contain a bunch of variables that are to be shared across Makefiles.

The '-include' says to ignore the inclusion if the file doesn't exist.

In [1]:
%%bash
cd /tmp
wget http://icl.cs.utk.edu/projectsfiles/magma/downloads/magma-1.6.1.tar.gz
tar -xvzf magma-1.6.1.tar.gz

magma-1.6.1/
magma-1.6.1/blas_fix/
magma-1.6.1/CMakeLists.txt
magma-1.6.1/control/
magma-1.6.1/COPYRIGHT
magma-1.6.1/docs/
magma-1.6.1/example/
magma-1.6.1/exp/
magma-1.6.1/exp_magma_quark/
magma-1.6.1/include/
magma-1.6.1/interface_cuda/
magma-1.6.1/lib/
magma-1.6.1/magmablas/
magma-1.6.1/make.check-acml
magma-1.6.1/make.check-atlas
magma-1.6.1/make.check-cuda
magma-1.6.1/make.check-mkl
magma-1.6.1/make.inc.acml
magma-1.6.1/make.inc.atlas
magma-1.6.1/make.inc.goto
magma-1.6.1/make.inc.macos
magma-1.6.1/make.inc.mkl-gcc
magma-1.6.1/make.inc.mkl-gcc-ilp64
magma-1.6.1/make.inc.mkl-icc
magma-1.6.1/make.inc.mkl-icc-ilp64
magma-1.6.1/make.inc.openblas
magma-1.6.1/Makefile
magma-1.6.1/Makefile.internal
magma-1.6.1/README
magma-1.6.1/README-Windows
magma-1.6.1/ReleaseNotes
magma-1.6.1/sparse-iter/
magma-1.6.1/src/
magma-1.6.1/testing/
magma-1.6.1/testing/checkdiag/
magma-1.6.1/testing/CMake
magma-1.6.1/testing/flops.h
magma-1.6.1/testing/lin/
magma-1.6.1/testing/magma_cutil.cpp
magma-1.6.1/te

--2015-04-17 16:44:27--  http://icl.cs.utk.edu/projectsfiles/magma/downloads/magma-1.6.1.tar.gz
Resolving icl.cs.utk.edu (icl.cs.utk.edu)... 160.36.131.221
Connecting to icl.cs.utk.edu (icl.cs.utk.edu)|160.36.131.221|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5106097 (4.9M) [application/x-gzip]
Saving to: ‘magma-1.6.1.tar.gz’

     0K .......... .......... .......... .......... ..........  1%  296K 17s
    50K .......... .......... .......... .......... ..........  2%  601K 12s
   100K .......... .......... .......... .......... ..........  3% 5.92M 8s
   150K .......... .......... .......... .......... ..........  4% 5.17M 6s
   200K .......... .......... .......... .......... ..........  5%  681K 6s
   250K .......... .......... .......... .......... ..........  6% 3.09M 6s
   300K .......... .......... .......... .......... ..........  7% 4.43M 5s
   350K .......... .......... .......... .......... ..........  8%  970K 5s
   400K .......... .......... .

Now we can take a look at the various Makefiles. Poking around a bit, it looks like the master Makefile includes Makefile.internal, which has a bunch of variable definitions and pattern recipes. Makefile.internal also includes make.inc, which varies depending on the BLAS used. E.g., make.inc.openblas has compilation-related variables specific to linking in OpenBLAS. The master Makefile invokes Makefiles in subdirectories to compile various components of the codebase. 