# Automation Workshop: Analysing network captures

## I - Initializing your environment

### A) Setting up a virtual environment
(Optional but recommended)

```bash
virtualenv -p python3 venv
source venv/bin/activate
```
(Use `deactivate` to exit from `source` once you are done)

Alternatively you can also prefix all your `python` and `pip` commands with `./venv/bin/` (e.g: `./venv/bin/pip3 install -U pip`)



### B) Setting up Jupyter

In order to follow along on your computer:

```bash
pip3 install notebook
jupyter-notebook
```

### C) Installation of PyMISP

#### 1. Make sure the submodules are up-to-date and cloned

```bash
git submodule update --init --recursive PyMISP/
```

#### 2. Install PyMISP with the developer options

```bash
cd PyMISP
pip3 install -e .
```

#### 3. To be able to use the additional PyMISP helpers

```bash
# Make sure the package required for pydeep is installed
sudo apt-get install -y libfuzzy-dev

pip3 install python-magic lief git+https://github.com/kbandla/pydeep.git
```

## II - Automate the collection of data from network captures

Network captures provide invaluable insights into network activity, enabling analysts to detect intrusions, malware communications, and other security threats. However, manually analyzing PCAP files can be extremely time-consuming, requiring the inspection of thousands—or even millions—of packets to extract relevant indicators of compromise (IOCs).

Automation is key to streamlining this process. By leveraging the appropriate tools to parse network captures, extract meaningful threat intelligence and directly ingest it into MISP, analysts can significantly reduce the time spent on manual review. This approach not only accelerates incident response but also ensures that threat data is consistently structured and shared efficiently within the community. In this exercise, we will illustrate how automation can transform network capture analysis from a tedious task into an efficient, repeatable workflow that enhances security operations.

### A) Introduction - Using the right tools

#### 1. Analysis Tools

With this exercise, we will focus on the analysis of network captures rather than the capture process itself. If you are interested in discovering more about packet capture, you can have a look at the documentation of tools such as `tcpdump` or `wireshark`/`tshark`.

For our analysis, we will be working with **PCAP files**.  
A wide range of command-line tools are available for analyzing network captures, including:
- capinfos (Wireshark) – Provides metadata about PCAP files (packet count, duration, etc.).
- mergecap (Wireshark) – Merges multiple PCAP files into one.
- editcap (Wireshark) – Edits and filters packets within a PCAP file.
- tcpdump – Displays and filters packet data from a capture file.
- ipsumdump – Summarizes network traffic for analysis.
- tshark – A powerful packet analyzer with extensive filtering and parsing capabilities.
- tcpflow – Reconstructs TCP flows from a capture (two versions exist with different capabilities).
- ngrep – A grep-like tool for searching packet data.
- yaf – Parses and processes network flows.

#### 2. PyMISP

Our ultimate goal is to **structure and share** the information we extract from network packets in MISP. But manual encoding of the extracted data into MISP would be tedious and error-prone, which is why automation is essential.

PyMISP,  the official Python library for MISP, provides a powerful way to interact with the platform programmatically. It allows us to create, enrich, and query events, ensuring a seamless flow of extracted intelligence from our analysis tools into MISP. By leveraging PyMISP in a Python script, we can automate the entire encoding process, transforming raw network data into actionable threat intelligence with minimal manual effort.

In this exercise, we will explore how to use PyMISP to automate this workflow efficiently.

### B) Exercise description

We will use **`tshark`**, the command-line tool for network traffic analysis, which has the same filtering capacity as its UI equivalent version - Wireshark - and automate the packets parsing with some Python code.

#### 0. Preliminary step - Gather our dataset and declare some variables

Let's download PCAP files that are publicly available.

With your favourite browser, visit the latest *malware-traffic-analysis.net* blog posts from [2025](https://malware-traffic-analysis.net/2025/index.html) and download some of the latest example of PCAP file, like:
- [2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap.zip](https://malware-traffic-analysis.net/2025/01/31/2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap.zip)
- [2025-01-13-KongTuke-leads-to-infection-abusing-BOINC.pcap.zip](https://malware-traffic-analysis.net/2025/02/10/2025-01-13-KongTuke-leads-to-infection-abusing-BOINC.pcap.zip)

Those zip files are protected with a password: *infected_YYYYMMDD* - depending on the date mentioned in the file name.

**Alternatively**, you can execute the following python script which will gather some of the zip files from the website and extract the PCAPs for you:

```bash
# bash
python download_samples.py
```

We now have our PCAP files, we can start our analysis and see the relevant information we can extract from the network packets and encode as MISP objects.

In order to store those objects, we start with the creation of a MISP Event which will be their container.

In [41]:
import os
from pathlib import Path
from pymisp import MISPEvent, MISPObject

data_path = Path(os.getcwd()).parent / 'exercises' / 'data'
pcap_file = data_path / '2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap'

misp_event = MISPEvent()
misp_event.info = 'AgentTesla variant with CVE-2017-0199'

#### 1. Extract information on the PCAP file

As a first step, we will describe the PCAP file to keep a reference on our source of information.

More specifically, we want to describe the file itself, using a `file` MISP object, as well as details on the PCAP metadata, with the `pcap-metadata` object. Both object templates are part in the [list of available object templates](https://www.github.com/MISP/misp-objects) on Github, where you can find their description.

Starting with the file object, we could create a MISPObject and the related Attributes by ourselves, but PyMISP has a pretty convenient helper for this: `FileObject`. Let's see how to use it in order to have a file object describing our PCAP file added to our MISP Event.

In [None]:
from pymisp.tools import FileObject

file_object = FileObject(filepath=pcap_file, standalone=False)
for attribute in file_object.attributes:
    print(attribute.object_relation, attribute.value)
misp_event.add_object(file_object)

filename 2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap
size-in-bytes 3074017
entropy 7.869856339761419
md5 652387a0ae7fb87fa2f9122a6c937514
sha1 9a59ed3e1828e8a171c505903ad34b25d1233ba7
sha256 f25c1c7719b39dc253d4c3df3f29724f9422244762c3c3fbe146fd0dd55e2362
sha512 77d688107578ee7a92a903d5a6b9be0bc870ed03a60d944a93b9b489a31c0fc24eb807bdcb97beae92fc521f98f699722c4d334ee1246bfddcc07902ad3f88e0
malware-sample 2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap
mimetype application/vnd.tcpdump.pcap
ssdeep 49152:OAiL/4uyXOIvilMM6vMQ/wgD77IWXXCRO9dch92E15n:iL/4Bv/wgDiRjF


Now let's see the information given by `capinfos`, the command-line tool to describe PCAP metadata:

In [37]:
import subprocess

proc = subprocess.Popen(f'capinfos {pcap_file}', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

for line in proc.stdout.readlines():
    print(line)

b'File name:           /Users/chrisr3d/git/MISP/automation4MISP/exercises/data/2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap\n'
b'File type:           Wireshark/tcpdump/... - pcap\n'
b'File encapsulation:  Ethernet\n'
b'File timestamp precision:  microseconds (6)\n'
b'Packet size limit:   file hdr: 65535 bytes\n'
b'Number of packets:   2,532\n'
b'File size:           3,074 kB\n'
b'Data size:           3,033 kB\n'
b'Capture duration:    877.902506 seconds\n'
b'Earliest packet time: 2025-01-09 19:41:51.295616\n'
b'Latest packet time:   2025-01-09 19:56:29.198122\n'
b'Data byte rate:      3,455 bytes/s\n'
b'Data bit rate:       27 kbps\n'
b'Average packet size: 1198.06 bytes\n'
b'Average packet rate: 2 packets/s\n'
b'SHA256:              f25c1c7719b39dc253d4c3df3f29724f9422244762c3c3fbe146fd0dd55e2362\n'
b'SHA1:                9a59ed3e1828e8a171c505903ad34b25d1233ba7\n'
b'Strict time order:   True\n'
b'Number of interfaces in file: 1\n'
b'Interface #0 

Now based on the `pcap-metadata` object template, we can extract some information and generate our MISP object to add it to our Event.

In [42]:
import re

PCAP_METADATA_OBJECT_MAPPING = {
    'Capture length': 'capture-length',
    'File encapsulation': 'protocol',
    'First packet time': 'first-packet-seen',
    'Last packet time': 'last-packet-seen'
}

def parse_pcap_info_line(line: str) -> tuple:
    if ' = ' in line:
        return line.split(' = ')
    return re.split(r': +', line)

pcap_object = misp_event.add_object(name='pcap-metadata')
proc = subprocess.Popen(
    f'capinfos {pcap_file}', shell=True,
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
for line in proc.stdout.readlines():
    decoded = line.decode().strip().strip('\n')
    try:
        key, value = parse_pcap_info_line(decoded)
    except ValueError:
        continue
    if key not in PCAP_METADATA_OBJECT_MAPPING:
        continue
    relation = PCAP_METADATA_OBJECT_MAPPING[key]
    pcap_object.add_attribute(
        relation,
        value.upper() if relation == 'protocol' else value
    )
pcap_object.add_reference(file_object.uuid, 'describes')


<MISPObjectReference(object_uuid=7e54f8d4-b695-4b87-b731-85577bbb4faf, referenced_uuid=18026965-12bb-40d2-ad0d-d506cd556aa8, relationship_type=describes)

#### 2. Parsing packets from a network capture

After a few preliminary easy steps, it is now time to remind the rudiments of network packets parsing and declare a few helpers to build for us the command to use to parse different types of information from the packets we have in our capture file

In [44]:
# Generic method used later to easily generate a tshark command
def define_command(input_file: Path, fields: tuple,
                   display_filter: str = '!(arp || dhcp)') -> str:
    param = '-o tcp.relative_sequence_numbers:FALSE -E separator="|"'
    fields_cmd = ' -e '.join(fields)
    tshark = f'tshark -T fields {param} -e {fields_cmd} -Y "{display_filter}"'
    return f'{tshark} -r {input_file}'


With `define_command`, we set a few parameters for our `tshark` command, including:
- `-o tcp.relative_sequence_numbers:FALSE` to visualise absolute sequence numbers rather than relative
- `-E separator="|"` in case we extract some text with `,` and want to avoid issues with the python code separating our parsing results in a wrong way
- `-Y "!(arp || dhcp)"` to excluse ARP & DHCP packets from the results
- `-T fields` to determine fields to filter, in association with `-e` to specify each of those fields
- `-r` followed by the PCAP file name

#### 3. Extract DNS records

An interesting kind of information we want to share in MISP from our network capture are the DNS records.

A Domain Name System (DNS) record is a set of instructions used to connect domain names with internet protocol (IP) addresses within DNS servers. DNS makes it possible for users to browse the internet with customizable domain names and URLs rather than complex numerical IP addresses.

MISP has a `dns-record` object template that could be used to describe the DNS information we extract from packets.

The following list gives you the fields we want to have a look at in ordre to describe a DNS record:
- `dns`: Used to check whether the packet is a DNS request or response
- `dns.a`: A record - The record that holds the IPv4 address of a domain
- `dns.aaaa`: AAAA record - The record that contains the IPv6 address for a domain
- `dns.cname`: CNAME record - Forwards one domain or subdomain to another domain, does NOT provide an IP address
- `dns.mx.mail_exchange`: MX record - Directs mail to an email server
- `dns.ns`: NS record - Stores the name server for a DNS entry
- `dns.ptr.domain_name`: PTR record - Provides a domain name in reverse-lookups
- `dns.qry.name`: queried domain
- `dns.soa.rname`: SOA record - Stores admin information about a domain
- `dns.spf`: SPF record - Used to identify the mail servers that can send emails through your domain
- `dns.srv.name`: SRV record - Specifies a port for specific services

In [None]:
# Tshark filters
dns_fields = (
    'dns', 'dns.qry.name', 'dns.a', 'dns.aaaa', 'dns.cname',
    'dns.mx.mail_exchange', 'dns.ns', 'dns.ptr.domain_name',
    'dns.soa.rname', 'dns.spf', 'dns.srv.name'
)

dns_cmd = define_command(pcap_file, dns_fields, display_filter='dns')
print(dns_cmd)

tshark -T fields -o tcp.relative_sequence_numbers:FALSE -E separator="|" -e dns -e dns.qry.name -e dns.a -e dns.aaaa -e dns.cname -e dns.mx.mail_exchange -e dns.ns -e dns.ptr.domain_name -e dns.soa.rname -e dns.spf -e dns.srv.name -Y "udp" -r /Users/chrisr3d/git/MISP/automation4MISP/exercises/data/2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap


We can then use this command with the `subprocess.Popen` method, which replicates a command-line process as if we were executing the command directly from our terminal.

The idea is then to read from the standard output with `proc.stdout`, and return a list where each element is a line in the standard output with `readlines`:

In [56]:
proc = subprocess.Popen(dns_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# We read the results of the tshark command from the standard output
for line in proc.stdout.readlines():
    print(line)

b'Domain Name System (query)|s.deemos.com|||||||||\n'
b'Domain Name System (response)|s.deemos.com|14.103.79.10||||||||\n'
b'Domain Name System (query)|res.cloudinary.com|||||||||\n'
b'Domain Name System (response)|res.cloudinary.com|104.17.201.1,104.17.202.1||resc.cloudinary.com.cdn.cloudflare.net||||||\n'
b'Domain Name System (query)|ip-api.com|||||||||\n'
b'Domain Name System (response)|ip-api.com|208.95.112.1||||||||\n'
b'Domain Name System (query)|ftp.horeca-bucuresti.ro|||||||||\n'
b'Domain Name System (response)|ftp.horeca-bucuresti.ro|89.39.83.184||||||||\n'


As you can see, the resulting lines returned from the terminal process are byte strings.

In order to use the information we have here, we still need to extract the relevant information with:
- `decode`, to decode the byte string into a regular string
- `strip('\n')`, to remove the special character at the end of the line
- `split('|')`, to decompose our lines, based on the separator we chose previously (`|`)

In [57]:
DNS_RECORDS_OBJECT_RELATIONS = (
    'queried-domain', 'a-record', 'aaaa-record', 'cname-record', 'mx-record',
    'ns-record', 'ptr-record', 'soa-record', 'spf-record', 'srv-record'
)

proc = subprocess.Popen(dns_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

for line in proc.stdout.readlines():
    # We decode each line and split it based on the separator
    dns_type, *fields = line.decode().strip().split('|')

    # We skip query packets to focus on responses
    if 'query' in dns_type:
        continue

    # creation of a new `dns-record` object
    dns_record = misp_event.add_object(name='dns-record')

    # We iterate over both the object relations and field values to add them as attributes
    for relation, values in zip(DNS_RECORDS_OBJECT_RELATIONS, fields):
        if values:
            if ',' in values:
                for value in values.split(','):
                    dns_record.add_attribute(relation, value)
                continue
            dns_record.add_attribute(relation, values)

    dns_record.add_reference(file_object.uuid, 'included-in')

    print(dns_record)
    for attribute in dns_record.attributes:
        print(f' - {attribute.object_relation}: {attribute.value}')

<MISPObject(name=dns-record)
 - queried-domain: s.deemos.com
 - a-record: 14.103.79.10
<MISPObject(name=dns-record)
 - queried-domain: res.cloudinary.com
 - a-record: 104.17.201.1
 - a-record: 104.17.202.1
 - cname-record: resc.cloudinary.com.cdn.cloudflare.net
<MISPObject(name=dns-record)
 - queried-domain: ip-api.com
 - a-record: 208.95.112.1
<MISPObject(name=dns-record)
 - queried-domain: ftp.horeca-bucuresti.ro
 - a-record: 89.39.83.184


#### 4. Extract HTTP requests

We can then try to fetch some HTTP requests information and generate some `http-request` MISP objects.

Based on the attributes defining the object template, we can have a look at the dedicated fields in tshark, to describe HTTP requests:
- `http.content_type`: MIME type of the body of the request
- `http.cookie`: HTTP cookie
- `http.host`: Domain name of the server
- `http.referer`: address of the previous web page from which a link to the currently requested page was followed
- `http.request.method`: HTTP Method invoked (one of GET, POST, PUT, HEAD, DELETE, OPTIONS, CONNECT)
- `http.request.full_uri`: request URL
- `http.request.uri`: request URI
- `http.user_agent`: characteristic string that lets servers and network peers identify the application, operating system, vendor, and/or version of the requesting user agent

In [60]:
http_fields = (
    'http.request.method', 'http.host', 'http.content_type', 'http.cookie',
    'http.referer', 'http.request.full_uri', 'http.request.uri', 'http.user_agent'
)

http_cmd = define_command(pcap_file, ('ip.src', 'ip.dst') + http_fields, display_filter='http')
print(http_cmd)

proc = subprocess.Popen(http_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

for line in proc.stdout.readlines():
    print(line)

tshark -T fields -o tcp.relative_sequence_numbers:FALSE -E separator="|" -e ip.src -e ip.dst -e http.request.method -e http.host -e http.content_type -e http.cookie -e http.referer -e http.request.full_uri -e http.request.uri -e http.user_agent -Y "http" -r /Users/chrisr3d/git/MISP/automation4MISP/exercises/data/2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap
b'10.1.9.101|192.3.27.144|GET|192.3.27.144||||http://192.3.27.144/xampp/mpa/seemebestthingsevermeetgivenbestthingsfornewways.hta|/xampp/mpa/seemebestthingsevermeetgivenbestthingsfornewways.hta|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0\n'
b'192.3.27.144|10.1.9.101|||application/hta|||http://192.3.27.144/xampp/mpa/seemebestthingsevermeetgivenbestthingsfornewways.hta|/xampp/mpa/seemebestthingsevermeetgivenbestthingsfornewways.hta|\n'
b'10.1.9.101|107.172.31.5|GET|107.172.31.5||||http://107.172.31.5/comonstraints.vbs|

As seen already before, we will replicate the process of mapping those values with the `http-request` object template

In [66]:
HTTP_REQUEST_OBJECT_RELATIONS = (
    'method', 'host', 'content-type', 'cookie', 'referer', 'url', 'uri',
    'user-agent'
)

def parse_http_request(ip_src: str, ip_dst: str, *fields: tuple[str]) -> MISPObject:
    http_request = MISPObject('http-request')
    http_request.add_attribute('ip-src', ip_src)
    http_request.add_attribute('ip-dst', ip_dst)
    for relation, value in zip(HTTP_REQUEST_OBJECT_RELATIONS, fields):
        if value:
            http_request.add_attribute(relation, value)
    return http_request

proc = subprocess.Popen(http_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

for line in proc.stdout.readlines():
    ip_src, ip_dst, *fields = line.decode().strip().split('|')
    http_request = parse_http_request(ip_src, ip_dst, *fields)
    print(http_request)
    for attribute in http_request.attributes:
        print(f' - {attribute.object_relation}: {attribute.value}')

<MISPObject(name=http-request)
 - ip-src: 10.1.9.101
 - ip-dst: 192.3.27.144
 - method: GET
 - host: 192.3.27.144
 - url: http://192.3.27.144/xampp/mpa/seemebestthingsevermeetgivenbestthingsfornewways.hta
 - uri: /xampp/mpa/seemebestthingsevermeetgivenbestthingsfornewways.hta
 - user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0
<MISPObject(name=http-request)
 - ip-src: 192.3.27.144
 - ip-dst: 10.1.9.101
 - content-type: application/hta
 - url: http://192.3.27.144/xampp/mpa/seemebestthingsevermeetgivenbestthingsfornewways.hta
 - uri: /xampp/mpa/seemebestthingsevermeetgivenbestthingsfornewways.hta
<MISPObject(name=http-request)
 - ip-src: 10.1.9.101
 - ip-dst: 107.172.31.5
 - method: GET
 - host: 107.172.31.5
 - url: http://107.172.31.5/comonstraints.vbs
 - uri: /comonstraints.vbs
 - user-agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50

#### 5. Extract payloads from the HTTP packets

To extend the HTTP request extraction method we just saw, we can pretty easily update the code to include payloads.

The field we are looking for here is `http.file_data`. Tshark will give a hex-encoded string value for this fields, which means we will have to use the `unhexlify` method from the `binascii` built-in python library to get the raw content of the file.

PyMISP, on the other side comes again with a little helper allowing us to skip the complete encoding procedure: `make_binary_object`. As it takes a file name or the payload itself as bytes, we will simply have to encode the raw content of the file in a `BytesIO` object.

In [None]:
import binascii
from io import BytesIO
from pymisp.tools import make_binary_objects

http_fields = (
    'http.request.method', 'http.host', 'http.content_type', 'http.cookie',
    'http.referer', 'http.request.full_uri', 'http.request.uri',
    'http.user_agent', 'http.file_data', 'frame.number'
)
# We use the `frame.number` field to generate a file name base on the uri or the frame number
def set_payload_name(uri: str, frame_number: str) -> str:
    filename = uri.split('/')[-2 if uri.endswith('/') else -1]
    if filename:
        return filename
    return f'payload_from_packet_{frame_number}'

http_cmd = define_command(pcap_file, ('ip.src', 'ip.dst') + http_fields, display_filter='http')

proc = subprocess.Popen(http_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

for line in proc.stdout.readlines():
    ip_src, ip_dst, *fields, file_data, frame_number = line.decode().strip().split('|')
    # As seen before, we create a new `http-request` object
    # We add this object to our event and we will be able to reuse the variable to add a reference to the file object
    http_request = misp_event.add_object(parse_http_request(ip_src, ip_dst, *fields))

    if file_data:
        response_uri = fields[-2]
        payload, executable, sections = make_binary_objects(
            pseudofile=BytesIO(binascii.unhexlify(file_data)),
            filename=set_payload_name(response_uri, frame_number),
            standalone=False
        )
        misp_event.add_object(payload)
        # We add a reference to the payload object
        http_request.add_reference(payload.uuid, 'drops')
        print(payload)
        for attribute in payload.attributes:
            print(f' - {attribute.object_relation}: {attribute.value}')

        # In case of a Windows Portable Executable file (PE), a more detailed description of the executable is also created
        if executable is not None:
            misp_event.add_object(executable)
            if sections:
                for section in sections:
                    misp_event.add_object(section)


Unexpected type from lief: <class 'NoneType'>
Unexpected type from lief: <class 'NoneType'>
Unexpected type from lief: <class 'NoneType'>
Unexpected type from lief: <class 'NoneType'>


<FileObject(name=file)
 - filename: seemebestthingsevermeetgivenbestthingsfornewways.hta
 - size-in-bytes: 47975
 - entropy: 2.4415296098792627
 - md5: e90ae8ec16ea2056caaa64ac13a31373
 - sha1: 8041a1bda3769b97d8e8b980c6a77fcd2829d715
 - sha256: df215a01f6a83014a148c6e407cdc8422e9119a88b4220a1321b2986ea9aef63
 - sha512: 0e2387a7813adf066dab3ec72b4525cfb4965c3d124595165de42ea17e35055a2e5c7bbf9eae70568e2290cec9d627f742c23129730ec730a947175916c8fc7b
 - malware-sample: seemebestthingsevermeetgivenbestthingsfornewways.hta
 - mimetype: text/html
 - ssdeep: 384:gLezlvdbmgM8m956YSmzBB5CtbHA7lvRvw:gOlvBvm956YfwTARZ4
<FileObject(name=file)
 - filename: comonstraints.vbs
 - size-in-bytes: 223012
 - entropy: 5.250403159271223
 - md5: 3f691c4d5e1b53d16964d30e35863f77
 - sha1: 9ade8197b6f8828f384d5431a1d3a1b00e162782
 - sha256: a666a99f2056082802f459f7180f891582a527324a16d34b4755ed63e5467882
 - sha512: cd849788bf0d7ea4fdc8379ebfcb462a6b4a6d0b966bbd9a9a977122f8ffed9807e6f5ae8516dcf4a92e455f43cee55d92

#### 6. Extract network connection information

As a first step, we will store the information for every connection between a source and a destination. For this kind of parsing, the fields which could be interesting are for instance:
- `frame.time_epoch` - timestamp of the packet
- `ip.src` - source IP address
- `ip.dst` - destination IP address
- `tcp.srcport` / `udp.srcport` - source port
- `tcp.dstport` / `udp.dstport` - destination port
- `frame.protocols` - list of protocols through different layers related to the packet

In [70]:
# Tshark filters
standard_filters = (
    'frame.time_epoch', 'ip.src', 'ip.dst', 'tcp.srcport', 'tcp.dstport',
    'udp.srcport', 'udp.dstport', 'frame.protocols'
)

tshark_cmd = define_command(pcap_file, standard_filters)
print(tshark_cmd)
proc = subprocess.Popen(tshark_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# We read the results of the tshark command from the standard output
for line in proc.stdout.readlines():
    print(line)

tshark -T fields -o tcp.relative_sequence_numbers:FALSE -E separator="|" -e frame.time_epoch -e ip.src -e ip.dst -e tcp.srcport -e tcp.dstport -e udp.srcport -e udp.dstport -e frame.protocols -Y "!(arp || dhcp)" -r /Users/chrisr3d/git/MISP/automation4MISP/exercises/data/2025-01-09-CVE-2017-0199-XLS-to-DBatLoader-or-GuLoader-for-AgentTesla-variant.pcap
b'1736448111.295616000|10.1.9.101|10.1.9.1|||58178|53|eth:ethertype:ip:udp:dns\n'
b'1736448111.611485000|10.1.9.1|10.1.9.101|||53|58178|eth:ethertype:ip:udp:dns\n'
b'1736448111.668844000|10.1.9.101|14.103.79.10|49703|443|||eth:ethertype:ip:tcp\n'
b'1736448111.827716000|10.1.9.101|14.103.79.10|49704|443|||eth:ethertype:ip:tcp\n'
b'1736448111.883448000|14.103.79.10|10.1.9.101|443|49703|||eth:ethertype:ip:tcp\n'
b'1736448111.883761000|10.1.9.101|14.103.79.10|49703|443|||eth:ethertype:ip:tcp\n'
b'1736448111.884504000|10.1.9.101|14.103.79.10|49703|443|||eth:ethertype:ip:tcp\n'
b'1736448111.884514000|10.1.9.101|14.103.79.10|49703|443|||eth:ethe

We want to group connections in order to avoid duplicates.

So instead of generating a new MISP object for each packet, we would rather store the connection information for later, and count as we loop through all the packets to increment a count of packets transmitted through the same connection. We also want to keep a trace of the first seen and last seen timestamp values.

In [71]:
import json

layer3_protocols = (
    'arp', 'icmp', 'icmpv6', 'ip', 'ipv6'
)
layer4_protocols = (
    'tcp', 'udp'
)
layer7_protocols = (
    'dhcp', 'dns', 'ftp', 'http', 'ntp', 'smtp', 'snmp', 'ssdp', 'tftp'
)

# We define a function that will extract the protocols
def handle_protocols(frame_protocols: str) -> list:
    protocols = set(frame_protocols.split(':'))
    protocol_key = []
    for layer in (3, 4, 7):
        for protocol in globals()[f'layer{layer}_protocols']:
            if protocol in protocols:
                protocol_key.append(protocol)
                break
    return protocol_key

def store_connections(cmd):
    proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    connections = {}
    for line in proc.stdout.readlines():
        # Decompose the line into variables
        timestamp, ip_src, ip_dst, ts_port, td_port, us_port, ud_port, protocols = line.decode().strip('\n').split('|')

        # We store the connection information in a tuple
        key = (
            ip_src, ip_dst,
            ts_port if ts_port else us_port,
            td_port if td_port else ud_port,
            *handle_protocols(protocols)
        )

        
        if key not in connections:
            connections[key] = {
                'first_seen': float('inf'),
                'last_seen': 0.0,
                'counter': 0
            }
        timestamp = float(timestamp)
        if timestamp < connections[key]['first_seen']:
            connections[key]['first_seen'] = timestamp
        if timestamp > connections[key]['last_seen']:
            connections[key]['last_seen'] = timestamp
        connections[key]['counter'] += 1

    return connections

connections = store_connections(tshark_cmd)
for connection, values in connections.items():
    print(f'{connection}: {json.dumps(values, indent=2)}')

('10.1.9.101', '10.1.9.1', '58178', '53', 'ip', 'udp', 'dns'): {
  "first_seen": 1736448111.295616,
  "last_seen": 1736448111.295616,
  "counter": 1
}
('10.1.9.1', '10.1.9.101', '53', '58178', 'ip', 'udp', 'dns'): {
  "first_seen": 1736448111.611485,
  "last_seen": 1736448111.611485,
  "counter": 1
}
('10.1.9.101', '14.103.79.10', '49703', '443', 'ip', 'tcp'): {
  "first_seen": 1736448111.668844,
  "last_seen": 1736448114.868602,
  "counter": 14
}
('10.1.9.101', '14.103.79.10', '49704', '443', 'ip', 'tcp'): {
  "first_seen": 1736448111.827716,
  "last_seen": 1736448114.870511,
  "counter": 9
}
('14.103.79.10', '10.1.9.101', '443', '49703', 'ip', 'tcp'): {
  "first_seen": 1736448111.883448,
  "last_seen": 1736448114.867818,
  "counter": 16
}
('14.103.79.10', '10.1.9.101', '443', '49704', 'ip', 'tcp'): {
  "first_seen": 1736448112.044317,
  "last_seen": 1736448114.869204,
  "counter": 10
}
('10.1.9.101', '192.3.27.144', '49706', '80', 'ip', 'tcp'): {
  "first_seen": 1736448113.069674,
  

In the previous code snippet, we extracted some connection information like the source and destination IP addresses, source and destination ports, and protocols, which we all combined in a tuple used as a key to sort of "fingerprint" a connection.

Every packet describing the same connection will then update the `first_seen` value and increment a counter to keep the information on the number of packets exchanged through each connection.

Now for each connection, we want to create a `network-connection` MISP object, and add the different values we stored as Attributes

In [72]:
CONNECTION_OBJECT_RELATIONS = ('ip-src', 'ip-dst', 'src-port', 'dst-port')

for connection, values in connections.items():
    misp_object = MISPObject('network-connection')
    for value, relation in zip(connection[:4], CONNECTION_OBJECT_RELATIONS):
        if value:
            misp_object.add_attribute(relation, value)
    for protocol in connection[4:]:
        layer = 3 if protocol in layer3_protocols else 4 if protocol in layer4_protocols else 7
        misp_object.add_attribute(f'layer{layer}-protocol', protocol.upper())
    misp_object.add_attribute('first-packet-seen', values['first_seen'])
    misp_object.add_attribute('last-packet-seen', values['last_seen'])
    misp_object.add_attribute('count', values['counter'])

    print(misp_object)
    for attribute in misp_object.attributes:
        print(f' - {attribute.object_relation}: {attribute.value}')

<MISPObject(name=network-connection)
 - ip-src: 10.1.9.101
 - ip-dst: 10.1.9.1
 - src-port: 58178
 - dst-port: 53
 - layer3-protocol: IP
 - layer4-protocol: UDP
 - layer7-protocol: DNS
 - first-packet-seen: 1736448111.295616
 - last-packet-seen: 1736448111.295616
 - count: 1
<MISPObject(name=network-connection)
 - ip-src: 10.1.9.1
 - ip-dst: 10.1.9.101
 - src-port: 53
 - dst-port: 58178
 - layer3-protocol: IP
 - layer4-protocol: UDP
 - layer7-protocol: DNS
 - first-packet-seen: 1736448111.611485
 - last-packet-seen: 1736448111.611485
 - count: 1
<MISPObject(name=network-connection)
 - ip-src: 10.1.9.101
 - ip-dst: 14.103.79.10
 - src-port: 49703
 - dst-port: 443
 - layer3-protocol: IP
 - layer4-protocol: TCP
 - first-packet-seen: 1736448111.668844
 - last-packet-seen: 1736448114.868602
 - count: 14
<MISPObject(name=network-connection)
 - ip-src: 10.1.9.101
 - ip-dst: 14.103.79.10
 - src-port: 49704
 - dst-port: 443
 - layer3-protocol: IP
 - layer4-protocol: TCP
 - first-packet-seen: 17

However, we can see this extraction method generates different `network-connection` objects that are actually describing the same connection in both directions, with requests and responses from the destination address, which could be stored in one single connection object instead.

We could then improve our parsing function by using a better fingerprinting method.