|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: Why ocaml? |
| 4 | +permalink: why_ocaml |
| 5 | +--- |
| 6 | + |
| 7 | +BAP is written in OCaml, and I often get asked why. Often the person |
| 8 | +asking has good reasons to prefer another language. For example, |
| 9 | +Python has more libraries and is quicker to prototype, C is faster, |
| 10 | +and so on. |
| 11 | + |
| 12 | +We do not take this question lightly. BAP has undergone several |
| 13 | +significant rewrites, and for the last three iterations we have chosen |
| 14 | +OCaml. To understand why, first a little history |
| 15 | + |
| 16 | +## Compilers at CMU |
| 17 | + |
| 18 | +In the mid 2000's I (David Brumley) was a TA for the CMU undergraduate |
| 19 | +compiler course (course number: 15-411). Peter Lee was the instructor. |
| 20 | +(Peter is a former CMU Computer Science Departmet Head, now a Vice |
| 21 | +President at Microsoft, and one of the co-inventors of Proof Carrying |
| 22 | +Code.) We use the excellent Appel book for compilers, which comes in |
| 23 | +three forms: one for C, one for Java, and one for SML. |
| 24 | + |
| 25 | +At the beginning of semester Peter told all the students that in his |
| 26 | +experience, final grades historically roughly corresponded to the |
| 27 | +language chosen. He also emphasized it didn't matter what language you |
| 28 | +were familiar with now: he had seen many people who had never used ML |
| 29 | +before write the course compiler in SML, and do quite well. |
| 30 | + |
| 31 | +At the end of the semester, Peter's observations seemed quite |
| 32 | +true. Students who programmed in C received a C or lower (often |
| 33 | +fighting with hard-to-debug memory management issues that lead to |
| 34 | +crashing compilers), students in Java received a B (lots of template |
| 35 | +code, unexpected NULL exceptions), and students who wrote in ML got |
| 36 | +As. I do not remember whether this was true in every case, but it was |
| 37 | +certainly a strong enough sign that I remember it. |
| 38 | + |
| 39 | +## OCaml for Binary Analysis |
| 40 | + |
| 41 | +As a graduate student, I worked with Dawn Song and created Vine, the |
| 42 | +static analysis part of [BitBlaze](http://bitblaze.eecs.berkeley.edu). |
| 43 | +That architecture underwent 3 main revisions. The first was in C, then |
| 44 | +C++, and then OCaml. Just like in compilers, I found C memory safety |
| 45 | +was hard to get right, my C++ relied on a huge amount of template code |
| 46 | +(we tried using the visitor design pattern as specified in the Appel |
| 47 | +compiler book), while OCaml seemed to be a sweet spot. |
| 48 | + |
| 49 | +### Win: Static Type Safety |
| 50 | + |
| 51 | +Others have spoken at length at the benefit of type safety. |
| 52 | + |
| 53 | + |
| 54 | +Type safety is a big deal, not just for correctness but also because |
| 55 | +it makes programs easier to debug. |
| 56 | + |
| 57 | +### Win: Pattern Matching |
| 58 | +Beyond the well-discussed benefits of type safety covered elsewhere |
| 59 | +(Hello [Jane Street](http://janestreet.com)), the ability to |
| 60 | +**pattern match** ended up being a huge advantage. |
| 61 | + |
| 62 | +Compilers, and most binary analysis platforms, desugar assembly |
| 63 | +statements into an intermediate representation (IR). The majority of |
| 64 | +code analysis is a pattern match on IR statements. For example, in |
| 65 | +def-use analysis you first match on assignment statements, and then |
| 66 | +set anything on the left-hand side as a definition and on the |
| 67 | +right-hand side as a use. Pattern matching rules. |
0 commit comments