An example on the usage of tcpdump. Let's install some useful tools firts.

In [1]:
!apt-get install net-tools
!apt-get install tcpdump
!apt-get install iputils-ping
!apt-get install dnsutils
!apt-get install tshark
!apt-get install curl

!pip install --pre scapy[basic]
!pip install nest_asyncio
!pip install pyshark

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  net-tools
0 upgraded, 1 newly installed, 0 to remove and 19 not upgraded.
Need to get 196 kB of archives.
After this operation, 864 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal/main amd64 net-tools amd64 1.60+git20180626.aebd88e-1ubuntu1 [196 kB]
Fetched 196 kB in 1s (219 kB/s)
Selecting previously unselected package net-tools.
(Reading database ... 128221 files and directories currently installed.)
Preparing to unpack .../net-tools_1.60+git20180626.aebd88e-1ubuntu1_amd64.deb ...
Unpacking net-tools (1.60+git20180626.aebd88e-1ubuntu1) ...
Setting up net-tools (1.60+git20180626.aebd88e-1ubuntu1) ...
Processing triggers for man-db (2.9.1-1) ...
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libpcap0.8
S

Let's see what network interfaces are available on the virtual machine

In [None]:
!ifconfig

It seems only one ethernet network interface is available (lo is the loopback interface). Let's start a tcpdump capture and see what happens

In [None]:
!tcpdump -i eth0

Kind of a mess, right? And we are on a VM doing basically nothing (imagine what happens when you monitor a working connected system). This gives you an idea of the massive amount of data to be analyzed with passive monitoring approaches. 

Luckily, we can make use of packet filters to reduce a bit the number of packets to capture. As an example, let's see if there is UDP traffic...

In [None]:
!tcpdump -i eth0 udp

It seems that there is no significant UDP traffic! Let's try to produce some: we'll ask a DNS server the address of a website (e.g., www.gazzetta.it)

Some tricks used here:


*   I use dig <url> for making a DNS call and retrieve the IP address. Note that DNS uses UDP on port 53.
*   To launch tcpdump and dig at the same time, I use the & operator (run in background). To make sure to capture traffic, I'm postponing the DNS query by 5 seconds with the sleep command.
*   I'll use the -nn option for tcpdump not to resolve addresses and hostnames
*   I'll redirect the output of dig on /dev/null to have a clearer output (we'll see only the tcpdump output)





In [None]:
!tcpdump -i eth0 -nn udp & (sleep 5; dig www.gazzetta.it > /dev/null)

OK! Now an exercise for you. Let's try to capture some ICMP traffic. 

In [None]:
!tcpdump -i eth0 -nn icmp & (sleep 5; ping www.gazzetta.it -c 10 > /dev/null)

Now let's try to make some web browsing (I'll use curl). I'll visit the FIRST website ever created


In [None]:
!curl http://info.cern.ch/

And let's see what's behind it.  First I'll retrieve the IP address of the cern server. Then I'm going to tcpdump asking only for TCP traffic in/outgoing the CERN server. 

In [None]:
!dig info.cern.ch

In [None]:
!tcpdump -i eth0 -nn 'tcp and host 188.184.21.108' & (sleep 5; curl -s 'http://info.cern.ch/' > /dev/null)

We can also ask tcpdump to be more verbose and inspect packet payload with the -X option (we're actually doing a deep packet inspection here!).


In [None]:
!tcpdump -i eth0 -nnX 'tcp and host 188.184.21.108' & (sleep 5; curl -s 'http://info.cern.ch/' > /dev/null)

To write the output on a pcap file, we can use the -w option

In [None]:
!tcpdump -i eth0 -nnX 'tcp and host 188.184.21.108' -w 'http_capture.pcap' & (sleep 5; curl -s 'http://info.cern.ch/' > /dev/null)

Good! Now we have the 'http_capture.pcap' file in our local folder. We can open it with python and analyze its content autmoatically, e.g., via scapy!

In [None]:
import os

import pyshark
import nest_asyncio
import pandas as pd

from scapy.all import *

In [None]:
cap = rdpcap('http_capture.pcap')
for packet in cap:
  packet.show()

Similarly, we can use the python pyshark library for doing the same.

In [None]:
nest_asyncio.apply()
cap = pyshark.FileCapture('http_capture.pcap')
for packet in cap:
  print(packet)
cap.close()

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Packet (Length: 74)
Layer ETH:
	Destination: 02:42:dc:86:e1:4f
	Address: 02:42:dc:86:e1:4f
	.... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
	.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
	Source: 02:42:ac:1c:00:0c
	.... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
	.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
	Type: IPv4 (0x0800)
	Address: 02:42:ac:1c:00:0c
Layer IP:
	0100 .... = Version: 4
	.... 0101 = Header Length: 20 bytes (5)
	Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
	0000 00.. = Differentiated Services Codepoint: Default (0)
	.... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
	Total Length: 60
	Identification: 0x561a (22042)
	Flags: 0x4000, Don't fragment
	0... .... .... .... = Reserve

  attributes = dict(field.attrib)


Packet (Length: 66)
Layer ETH:
	Destination: 02:42:dc:86:e1:4f
	Address: 02:42:dc:86:e1:4f
	.... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
	.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
	Source: 02:42:ac:1c:00:0c
	.... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
	.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
	Type: IPv4 (0x0800)
	Address: 02:42:ac:1c:00:0c
Layer IP:
	0100 .... = Version: 4
	.... 0101 = Header Length: 20 bytes (5)
	Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
	0000 00.. = Differentiated Services Codepoint: Default (0)
	.... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
	Total Length: 52
	Identification: 0x561d (22045)
	Flags: 0x4000, Don't fragment
	0... .... .... .... = Reserved bit: Not set
	.1.. .... .... .... = Don't fragment: Set
	..0. .... .... .... = More fragments: No

Let's see some powerful features of pyshark

In [None]:
nest_asyncio.apply()
cap = pyshark.FileCapture('http_capture.pcap')
for packet in cap:
  print('From: ' + packet.ip.src + ':' + packet.tcp.srcport +' To: '+ packet.ip.dst + ':' + packet.tcp.dstport + ', ' + packet.length + ' bytes')


From: 172.28.0.12:34966 To: 188.184.21.108:80, 74 bytes
From: 188.184.21.108:80 To: 172.28.0.12:34966, 74 bytes
From: 172.28.0.12:34966 To: 188.184.21.108:80, 66 bytes
From: 172.28.0.12:34966 To: 188.184.21.108:80, 142 bytes
From: 188.184.21.108:80 To: 172.28.0.12:34966, 66 bytes
From: 188.184.21.108:80 To: 172.28.0.12:34966, 944 bytes
From: 172.28.0.12:34966 To: 188.184.21.108:80, 66 bytes
From: 188.184.21.108:80 To: 172.28.0.12:34966, 66 bytes
From: 172.28.0.12:34966 To: 188.184.21.108:80, 66 bytes
From: 188.184.21.108:80 To: 172.28.0.12:34966, 66 bytes


Ok! Now let's try an exercise. Let's see what kind of traffic the VM is exchanging when idle. We'll use tcpdump for monitoring say 5 minutes of traffic and process the resulting pcap file automatically with pyshark!!

Let's say we want to reproduce the Wireshark 'conversations' tab, which summarizes the following information for each flow (identified by the 5-tuple ipa,ipb,porta,portb,protocol) like this:

*   ADDRESS A
*   PORT A
*   ADDRESS B
*   PORT B
*   BYTES
*   PACKETS












In [None]:
!tcpdump -G 300 -W 1 -w 'tcp_capture.pcap' -i eth0 'tcp'

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
Maximum file limit reached: 1
700 packets captured
709 packets received by filter
0 packets dropped by kernel


In [None]:
nest_asyncio.apply()
cap = pyshark.FileCapture('tcp_capture.pcap')

INDEXES = [];
CONVERSATIONS = {'address_a': [], 'address_b': [], 'port_a': [], 'port_b': [], 'bytes': [], 'packets': [], 'start': [],'duration': []};

for packet in cap:
  
  address_a = packet.ip.src
  address_b = packet.ip.dst
  port_a = packet.tcp.srcport
  port_b = packet.tcp.dstport
  packet_bytes = packet.length
  packet_timestamp = packet.sniff_timestamp
  
  d = tuple([address_a, address_b, port_a, port_b])
  entry = hash(d)

  if entry in INDEXES:
    loc = INDEXES.index(entry)
    CONVERSATIONS['bytes'][loc] += int(packet_bytes)
    CONVERSATIONS['packets'][loc] += 1 
    CONVERSATIONS['duration'][loc] = float(packet_timestamp) - float(CONVERSATIONS['start'][loc])

  else:
    INDEXES.append(entry)
    CONVERSATIONS['address_a'].append(address_a)
    CONVERSATIONS['address_b'].append(address_b)
    CONVERSATIONS['port_a'].append(port_a)
    CONVERSATIONS['port_b'].append(port_b)
    CONVERSATIONS['bytes'].append(int(packet_bytes))
    CONVERSATIONS['packets'].append(1)
    CONVERSATIONS['start'].append(float(packet_timestamp))
    CONVERSATIONS['duration'].append(0)

df = pd.DataFrame(CONVERSATIONS)
display(df)


Unnamed: 0,address_a,address_b,port_a,port_b,bytes,packets,start,duration
0,172.28.0.1,172.28.0.12,60550,8080,14190,31,1.677705e+09,58.396585
1,172.28.0.12,172.28.0.1,8080,60550,8275,24,1.677705e+09,58.396530
2,172.28.0.1,172.28.0.12,60560,8080,2426,7,1.677705e+09,6.574728
3,172.28.0.12,172.28.0.1,8080,60560,1173,4,1.677705e+09,6.573628
4,172.28.0.12,172.28.0.1,6000,50258,51026,36,1.677705e+09,286.857066
...,...,...,...,...,...,...,...,...
83,172.28.0.12,172.28.0.1,8080,58406,623,5,1.677705e+09,0.292146
84,172.28.0.1,172.28.0.12,45274,8080,515,6,1.677705e+09,0.007568
85,172.28.0.12,172.28.0.1,8080,45274,1070,4,1.677705e+09,0.007531
86,172.28.0.1,172.28.0.12,45278,8080,2767,8,1.677705e+09,1.113594
