Switch branches/tags
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
540 lines (467 sloc) 30.1 KB
<!DOCTYPE html>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="initial-scale=1, maximum-scale=1">
<link href=",700,900%7CRoboto+Mono" rel="stylesheet">
<link rel="stylesheet" href="//">
<script src=""></script>
<script src=""></script>
<link rel="stylesheet" href="">
<script src=""></script>
<script src=""></script>
<script src=""></script>
<link rel="stylesheet" href="">
<script src=""></script>
<style type="text/css">
body {
font-family: 'Roboto', sans-serif;
code {
font-family: 'Roboto Mono', monospace;
header pre {
border:2px solid;
border-color: firebrick;
text-align: center;
header code {
ul {
list-style-type: square;
a {
a:hover {
.token.operator {
nav button {
box-shadow: 0 0 0 1px rgba(0,0,0,.15) inset,0 0 6px rgba(0,0,0,.2) inset;
font-family: inherit;
font-size: 100%;
padding: .5em 1em;
color: #444;
border: 1px solid #999;
border: 0 rgba(0,0,0,0);
background-color: #E6E6E6;
text-decoration: none;
border-radius: 2px;
nav.hide ul {
nav ul {
list-style-type: none;
nav li:before {
content: "- ";
figure {
text-align: center;
font-style: italic;
font-size: smaller;
text-indent: 0;
border: thin silver solid;
box-shadow: 1px 1px 3px black;
margin: 0.5em;
padding: 0.5em;
.clear { clear:both;}
<title>Capturing Packets with Scala Native and libpcap</title>
<meta name="twitter:title" content="Capturing Packets with Scala Native and libpcap">
<meta property="og:title" content="Capturing Packets with Scala Native and libpcap">
<meta property="og:url" content="/scala-native-libpcap/">
<link rel="canonical" href="/scala-native-libpcap/">
<!-- -->
<meta property="og:type" content="article">
<meta property="article:published_time" content="2017-03-26">
<meta property="article:modified_time" content="2017-03-26">
<meta property="og:description" content="William Narmontas takes you through how to use Scala Native and libpcap to capture and analyse network packets. Scala Native compiles Scala to native code.">
<meta itemprop="description" content="William Narmontas takes you through how to use Scala Native and libpcap to capture and analyse network packets. Scala Native compiles Scala to native code.">
<meta name="description" content="William Narmontas takes you through how to use Scala Native and libpcap to capture and analyse network packets. Scala Native compiles Scala to native code.">
<meta name="twitter:description" content="William Narmontas takes you through how to use Scala Native and libpcap to capture and analyse network packets. Scala Native compiles Scala to native code.">
<meta property="og:site_name" content="Scala William">
<link rel="author" href="">
<link rel="publisher" href="">
<meta itemprop="name" content="The most important Streaming abstraction">
<meta property="og:image" content="">
<meta itemprop="image" content="">
<meta name="twitter:image" content="">
<meta name="author" content="William Narmontas">
<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="@ScalaWilliam">
<body class="h-entry">
<h1 class="p-name">Capturing Packets with Scala Native and libpcap</h1>
<h2>By <a href="/" class="u-author">William Narmontas</a>, <a href="/scala-native-libpcap/" class="u-url"><time class="dt-published" datetime="2017-03-26">March 2017</time></a></h2>
<p>For the source code, see the Scala application (updated July 2017, March 2018): <a href="">ScalaWilliam/scala-native-libpcap @ GitHub</a> (<a href="">PcapExample.scala</a> in particular).
This page can be <a href="">edited on GitHub</a>.</p>
<nav id="contents">
<button onclick="toggle_contents();">Toggle Index / Table of Contents</button>
<li> Test 1</li>
<article class="e-content">
<section style="column-count:2">
<p><a href="">libcap</a> (also: <a href="">pcap</a>) is a <a href="">network traffic packet capture library</a> that enables real-time and offline packet capture and analysis. Packet capture and analysis has <a href="">many use cases</a>.</p>
<p><a href="">Scala Native</a> is an ahead-of-time compiler for <a href="">Scala</a> targeting <a href="">LLVM</a> and so capable of producing native binaries. This brings the promise of high performance coding using existing Scala skills and high quality tooling such as <a href="">SBT (Scala Build Tool)</a> and <a href="">ScalaTest</a> as well as availability of patterns like <a href="">type classes</a>.</p>
<p>Scala, which runs on the JVM, can interact with native libraries in two ways: <a href="">JNA (Java Native Access)</a>
and <a href="">JNI (Java Native Interface)</a>. <a href="">JNA slower but easier than JNI</a>.
For JNA you need nothing more <a href=""> than a dependency</a> but for JNI you need <a href="">to write native code</a>. When doing in Scala, you can benefit from the <a href="">sbt-jni plugin</a> that automates this compilation. Scala Native's interop is similar to JNA.</p>
<p><a href="">Pcap4j</a> is an actively maintained library that wraps libpcap using JNA. Then there is <a href="">jNetPcap</a> which as a project appears inactive and <a href="">uses JNI</a> and <a href="">ByteBuffer</a>. And a last way would be to use JNI with <a href="">Unsafe</a> <a href="">for the highest performance</a>. The performance is difference is huge. There may be other even higher performance ways, but this is beyond the scope - if you have ideas do let me know <a href="">on Twitter</a>.</p>
<p>Packets can be captured in live mode using <a href="">tcpdump</a>, replayed with <a href="">tcpreplay</a> and visually analysed with <a href="">Wireshark</a>. libpcap supports live capture and reading from files.</p>
<p><img src="pcap-flow.svg"></p>
<figcaption>libpcap flow involving data copy from kernel to user space</figcaption>
<p>In live capture mode, the Kernel will look for the next packet at the pcap_next call, pass through any defined filters, and then copy the data into user-space.</p>
<p>There are <a href="">solutions</a> <a href="">for pure zero-copy</a> approach but it's beyond the scope of this article.</p>
<h2>Why this interests me</h2>
<li>A client needed a high performance online packet analyser for the binary-encoded <a href="">GPRS Tunnelling Protocol (GTP)</a> which runs over UDP (see the <a href="">GTPv2 specification</a> &#8212; large PDF!) and contains cell tower identifiers for 4G mobile subscribers. I implemented one solution with Scala &amp; libpcap.</li>
<li>I worked on Scala <a href="">projects</a> <a href="">of mine</a> needed a native (JNA) interface layer to <a href="">ENet, "Reliable UDP networking library"</a> and lots of binary parsing.</li>
<li>I was researching ways of achieving high performance data processing, including not only JVM Unsafe but also memory mapping and ring buffers.</li>
<li>I'm a big Scala fan, having worked on it professionally and non-commercially since 2013.</li>
<h2>Developing a program</h2>
<p>For showcasing what's possible - we're going to go through developing a simple libpcap program using Scala Native.</p>
<p>Goal of the program would be to output packet information from files and also from a live interface. Minimal packet information would be:
Timestamp, source IP, destination IP, packet length, a few bytes of the packet in hexadecimal.</p>
<p>Note that functional and pure programming are not in scope of this article.</p>
<li>Scala Native <a href="">The documentation</a> (<a href="">PDF</a>)</li>
<li>libpcap manpages, <a href=""></a> and <a href=""></a></li>
<li><a href="">libpcap sources</a></li>
<h3>Preparing some reference data</h3>
<p>Which is basically a pcap file. Using tcpdump:</p>
<pre class="language-bash"><code>$ tcpdump -i [interface] -w sample.pcap</code></pre>
<p>Capture some packets, maybe do a <a href="">speed test</a>, then terminate the app and now you have a pcap file which you can consume later.</p>
<p>Use <code>capinfos</code> to get basic information about your pcap file. You can also look at it visually with Wireshark.</p>
<p><img src="wireshark.png"></p>
<figcaption>How a packet from sample.pcap looks in Wireshark</figcaption>
<p>I use SBT to <a href="">continuously run the program</a> using <a href="">Triggered Execution</a>.</p>
<p>I use Docker and <a href="">ScalaWilliam/scala-native-sbt Docker image</a> to get an isolated Linux execution environment:</p>
<pre class="language-bash"><code>$ docker run -v $PWD:/workspace -w /workspace -it <a href="">scalawilliam/scala-native-sbt</a>
root@0c957f870d61:/workspace# apt-get -y install libpcap-dev</code></pre>
<p>And of course IntelliJ IDEA for its <a href="">excellent Scala support</a>.</p>
<h3>Minimal required application flow</h3>
<h4>Packet-reading flows</h4>
<figure style="float:right">
<p><img src="pcap-offline-app-flow.svg"></p>
<figcaption>libpcap offline flow</figcaption>
<figure style="float:right">
<p><img src="pcap-live-app-flow.svg"></p>
<figcaption>libpcap live flow</figcaption>
<p>We shall combine two flows into one app: Live and Offline (File).</p>
<h4>Processing the packet</h4>
<p><code>tshark</code> can give you this output already, but we're not interested in replacing <code>tshark</code>.</p>
<li>Read timestamp</li>
<li>Read packet length</li>
<li>Determine whether packet is IPv4</li>
<li>Read source IP</li>
<li>Read destination IP</li>
<li>Read some bytes of data</li>
<li>... and at every step of the way, bound-check</li>
<div class="clear"></div>
<h3>Minimal native mapping</h3>
<figure style="float:right">
<p><img src="pcap-header.svg"></p>
<figcaption>It's the same in memory and in storage</figcaption>
<p>In order to call native methods we need some sort of interface definition. It's similar to defining a C header file which then is then <code>#import</code>'ed.</p>
<p>This was not particularly difficult achieve with the resources. Will be obvious to anyone who's done some C.</p>
<div class="clear"></div>
<pre class="language-scala"><code data-source="src/main/scala/pcap.scala" data-from="9" data-to="36">"pcap")
object pcap {
/** This is just a pointer for us, we don't care what is inside **/
type pcap_handle = native.Ptr[Unit]
type pcap_pkthdr = native.CStruct4[native.CUnsignedLong,
def pcap_open_live(deviceName: CString,
snapLen: CInt,
promisc: CInt,
to_ms: CInt,
errbuf: CString): pcap_handle =
def pcap_open_offline(fname: CString, errbuf: CString): pcap_handle =
def pcap_next(p: native.Ptr[Unit],
h: native.Ptr[pcap_pkthdr]): native.CString = native.extern
def pcap_close(p: native.Ptr[Unit]): Unit = native.extern
<h3>Minimal code for opening a pcap handle</h3>
<p>Here we'll capture from <code>any</code> interface by default.</p>
<pre class="language-scala"><code data-source="src/main/scala/PcapExample.scala" data-from="82" data-to="93">val pcapHandle = if (live) {
deviceName = c"any",
snapLen = Short.MaxValue,
promisc = 0,
to_ms = 10,
errbuf = errorBuffer
} else {
pcap.pcap_open_offline(fname = toCString(args.last),
errbuf = errorBuffer)
<h3>Minimal code for continuously reading the handle</h3>
<p>At this point, I was getting closer to pointers and the like and if I did something wrong, I'd get a segfault with exit code 139. Still dislike Java exceptions and verbose stack traces?</p>
<pre class="language-scala"><code data-source="src/main/scala/PcapExample.scala" data-from="98" data-to="116">val packetHeaderPointer: native.Ptr[pcap.pcap_pkthdr] =
var packetReadData = pcap.pcap_next(pcapHandle, packetHeaderPointer)
var continue = true
while (continue) {
if (packetReadData != null) {
epochSecond = (!packetHeaderPointer._1).toLong,
dataLength = (!packetHeaderPointer._3).toInt,
data = packetReadData,
cooked = cooked
} else if (!live) {
continue = false
if (continue) {
packetReadData = pcap.pcap_next(pcapHandle, packetHeaderPointer)
<h3>Processing an individual packet</h3>
<p>So at this point we have extracted the key information and pass a C-style string (pointer) to the method.</p>
<p>Note that this C-string is NOT a <a href="">null-terminated string</a> because packets may contain the byte 0x00 anywhere.
So you have to rely on input length to manipulate the incoming data.</p>
<pre class="language-scala"><code data-source="src/main/scala/PcapExample.scala" data-from="25" data-to="34">/**
* We have a separate processing function to separate out the plumbing.
* @param data remember this is a pointer! But note that it may contain byte 0x00
* which is typically a string termination character - so we must pass dataLength explicitly.
def process_packet(epochSecond: Long,
dataLength: Int,
data: CString,
cooked: Boolean): Unit = {</code></pre>
<h3>Linux Cooked Capture</h3>
<p>IF we're capturing on Linux, consider <a href="">Linux cooked-mode capture (SLL)</a> which <a href="">can be</a> <a href="">confusing</a>.</p>
<p>When we're in Cooked mode, there are 2 extra bytes at the front of the packet.</p>
<pre class="language-scala"><code data-source="src/main/scala/PcapExample.scala" data-from="36" data-to="36">val offsetBytes = if (cooked) 2 else 0</code></pre>
<h3>Check bounds and verify it's IPv4</h3>
<p>Dealing with IPv6 and and others is a different matter. Here we start incrementing pointers, in a pure manner, mind you.</p>
<pre class="language-scala"><code data-source="src/main/scala/PcapExample.scala" data-from="38" data-to="43">val hasEnoughData = dataLength &gt; (offsetBytes + PcapDestinationIpv4AddressOffset + 4)
if (!hasEnoughData) return
/** IP version is stored in the first nibble of the target byte **/
val isIpv4 = (!(data + IpVersionByteOffset + offsetBytes) &gt;&gt; 4) == 4
if (!isIpv4) return</code></pre>
<h3>Read source and destination IPs</h3>
<pre class="language-scala"><code data-source="src/main/scala/PcapExample.scala" data-from="51" data-to="52">val ip = !(data + PcapSourceIpv4AddressOffset + offsetBytes)
<h3>Make an IP human readable</h3>
<p>This was one thing that was easier in native land than in JVM land. As far as I'm aware there is no <a href=""><code>inet_ntoa</code></a> in the JVM to convert an IP address from Int into text form. This was easily achievable by native binding.
<pre class="language-scala"><code data-source="src/main/scala/inet.scala" data-from="6" data-to="15">/**
* We use this to avoid our own byte manipulation.
* Ironically I have to do this with bytes in Java, so scala-native is already proving itself!
object inet {
def inet_ntoa(input: CUnsignedInt): native.CString = native.extern
<p>The usage is super simple:</p>
<pre class="language-scala"><code data-source="src/main/scala/PcapExample.scala" data-from="50" data-to="54">val sourceIp = {
val ip = !(data + PcapSourceIpv4AddressOffset + offsetBytes)
<h3>Printing the packet summary</h3>
<p>One line, one packet - with some data bytes in hex.</p>
<pre class="language-scala"><code data-source="src/main/scala/PcapExample.scala" data-from="60" data-to="68">print(s"Time: $epochSecond, $sourceIp --&gt; $destIp, $dataLength bytes: [")
(0 to Math.min(dataLength, 12))
.map { n =&gt;
!(data + offsetBytes + n)
.foreach { v =&gt;
native.stdio.printf(c"%02X", v)
<h2>Running for yourself</h2>
<p> you can reproduce this yourself.</p>
<p>Let's assume you've already started the Docker container as earlier in the article, and produced a sample pcap file.
After cloning the <a href=""><code>ScalaWilliam/scala-native-libpcap</code></a> repository,
</p><pre class="language-bash"><code>root@0c957f870d61:/workspace/scala-native-libpcap# sbt clean 'show nativeLink'
[info] /workspace/scala-native-libpcap/target/scala-2.11/scala-native-libpcap-out
[success] Total time: 37 s, completed Mar 26, 2017 3:49:38 AM
root@0c957f870d61:/workspace/scala-native-libpcap# /workspace/scala-native-libpcap/target/scala-2.11/scala-native-libpcap-out /workspace/sample.pcap |head
Time: 1490492402, --&gt;, 790 bytes: [1C872...] &lt;-- Google
Time: 1490492402, --&gt;, 579 bytes: [38C9E...] &lt;-- Google
Time: 1490492403, --&gt;, 54 bytes: [1C872...] &lt;-- CloudFare
Time: 1490492403, --&gt;, 64 bytes: [38C98...] &lt;-- CloudFare
<h3>Running live</h3>
<p>Exercise for the reader... just read the source :-)</p>
<p>scala-native opens up a plethora of integration opportunities: you are no longer restricted to using
JVM-only libraries or waiting for those wrappers, no longer restricted to using C++/C for lower level or high performance programming. You can now rapidly iterate and test your code in JVM mode and port it to native easily.</p>
<p>This opens up the possibility of native interoperation with <a href="">Python</a> and <a href="">Lua</a> - and from JVM via <a href="">Luaj</a>, <a href="">jep</a> and <a href="">Jython</a>. Good article: <a href="">Integrating Python into Scala Stack</a>.</p>
<p>You can write your application in the JVM first knowing you can potentially scale it out later, should the JVM be the pain point. Though in my experience it really isn't, but still worth having that possibility.</p>
<p>Of course there will be many other use cases, and I'd like to mention them here as well - so why not <a href="">Tweet me</a> about them?</p>
<p>Make sure to watch <a href="">Denys Shabalin</a>'s <a href="">Scala Days</a> talk <a href="">"Scala Goes Native"</a> (<a href="">slides</a>).</p>
<p>We managed to read offline and online packets using a native packet capture library and Scala.</p>
<p>There were no major difficulties while doing so, proving scala-native is a viable platform for native applications for an existing JVM development team who should strongly consider Scala.</p>
<p>I also came across some interesting (<a href=";rep=rep1&amp;type=pdf">PDF</a>) reading about packet capture. And learned that <a href="">ifconfig is missing from latest Debian</a>!</p>
<section id="copy">
<p>Big thanks for free graphing software <a href=""></a>.</p>
<p>&#169; William Narmontas.</p>
<h2>Social media</h2>
<h3>Share on Twitter</h3>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">New! Capturing Packets with <a href="">@scala_native</a> and <a href="">#libpcap</a>.<br>See: <a href=""></a> <a href="">@scala_lang</a> <a href="">#scala</a></p>&#8212; William Narmontas (@ScalaWilliam) <a href="">March 26, 2017</a></blockquote>
<script async src="//" charset="utf-8"></script>
<div class="addthis_inline_share_toolbox"></div>
<style type="text/css">
#atftbx > p > span { display:none; }
<div class="addthis_inline_follow_toolbox"></div>
<h3>My other articles</h3>
<li><a href="">Most important streaming abstraction</a></li>
<li><a href="">Limit degrees of freedom in development
<li><a href="">Essential SBT</a></li>
<li><a href="" target="_blank">Feature Switches, Inheritance and Agile with Scala &amp; JMX on the JVM</a></li>
<!-- Go to to customize your tools --> <script type="text/javascript" src="//"></script>
<script type="text/javascript">
var ul = document.querySelector("#contents ul");
while ( ul.firstChild ) {
[]"article h2, article h3"),function(item) {
var title = item.textContent;
var id = item.getAttribute("id");
if ( !id ) {
id = title.toLowerCase()
.replace(/[^a-z]/g, "-")
item.setAttribute("id", id);
var li = document.createElement("li");
var a = document.createElement("a");
a.setAttribute("href", "#" + id);
var a2 = document.createElement("a");
a2.setAttribute("href", "#" + id);
while(item.firstChild) {
<script type="text/javascript">
function toggle(cnt){
if ( cnt.classList.contains("hide") ) {
} else {
function toggle_contents(){
return toggle(document.querySelector("#contents"));
<div id="disqus_thread"></div>
var disqus_config = function () { = PAGE_URL; // Replace PAGE_URL with your page's canonical URL variable = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page's unique identifier variable
(function() { // DON'T EDIT BELOW THIS LINE
var d = document, s = d.createElement('script');
s.src = '';
s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
<noscript>Please enable JavaScript to view the <a href="">comments powered by Disqus.</a></noscript>