# **Politecnico di Milano**
## *Students: Caravano Andrea, Cantele Alberto*

*A.Y.: 2024/2025*

*Last modified: 05/04/2025*

### Description: Internet of Things: Challenge n. 2 - Wireshark application-layer packet capture analysis (through Python)

# Part 1

### Libraries and constants declaration

In [1]:
import pyshark  # A Python wrapper for tshark, a Wireshark packet capture parser
import nest_asyncio  # see below
import custom_functions  # custom-crafted helper functions, useful for solving one or more of the following questions

# needed for PyShark, allows for asynchronous nested loops, as they are needed for packet analysis
nest_asyncio.apply()

In [2]:
#### COAP Constants declaration (codes and types scalar values) ####
# Request methods or response codes
COAP_GET_CODE = 1
COAP_PUT_CODE = 3
COAP_CLIENT_BAD_REQ_RESPONSE_CODE = 128  # The first client-side error response code
COAP_SERVER_PROXY_NOT_SUPPORTED_CODE = 165  # The last server-side error response code

# Message types
COAP_CONFIRMABLE_TYPE = 0
COAP_ACK_TYPE = 2

#### MQTT Constants declaration (message types scalar values) ####
# Message types
MQTT_CONNECT = 1
MQTT_PUBLISH = 3
MQTT_SUBSCRIBE = 8

#### PCAP FILE URI ####
PCAP_URI = "challenge2.pcapng"

## Question n. 1
How many different Confirmable PUT requests obtained an unsuccessful response from the local CoAP server?

In [3]:
# Filters CoAP CONfirmable PUT packets, directed to the local CoAP Server (localhost/loopback address)
cap = pyshark.FileCapture(PCAP_URI,
                          display_filter="coap and coap.type == {} and coap.code == {} and (ip.dst == 127.0.0.1 or ipv6.dst == ::1)"
                          .format(COAP_CONFIRMABLE_TYPE,
                                  COAP_PUT_CODE))

# Processing of packets
tokens = set()  # A hash structure, carrying unique objects (no duplicates)
malformed_packets = 0
for packet in cap:
    try:
        # CoAP layer fields
        coap_layer = packet.coap
        token = coap_layer.token
        # Unique identifiers (token) are stored for matching with corresponding responses
        tokens.add(token)
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

# Capture object is freed to allow easier usage of successive sub-computations
cap.close()
cap.clear()

print("First sub-computation ended, found %d malformed packets." % malformed_packets)

# Then, for each token corresponding to a CONfirmable PUT Request, checks for unsuccessful responses
# Unsuccessful response code are considered to be both client and server side
cap = pyshark.FileCapture(PCAP_URI,
                          display_filter="coap and coap.type == {} and coap.code >= {} and coap.code <= {} and (ip.src == 127.0.0.1 or ipv6.src == ::1)"
                          .format(COAP_ACK_TYPE,
                                  COAP_CLIENT_BAD_REQ_RESPONSE_CODE,
                                  COAP_SERVER_PROXY_NOT_SUPPORTED_CODE))

malformed_packets = 0
matches = 0  # Final matches
for packet in cap:
    try:
        # CoAP layer fields
        coap_layer = packet.coap
        token = coap_layer.token
        # If currently analyzed token is present among the ones stored before, then it received an unsuccessful response and matches!
        if token in tokens:
            matches += 1
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

# Capture object is freed to allow easier usage of successive sub-computations
cap.close()
cap.clear()

print("Second sub-computation ended, found %d malformed packets." % malformed_packets)

print("Computation ended! Question n. 1 has:\n\t\t\t%d matches" % matches)

First sub-computation ended, found 0 malformed packets.
Second sub-computation ended, found 0 malformed packets.
Computation ended! Question n. 1 has:
			22 matches


## Question n. 2
How many CoAP resources in the coap.me public server received the same number of unique Confirmable and Non Confirmable GET requests?

In [4]:
# coap.me is a symbolic name, that needs to be resolved using DNS
coapme_addresses = custom_functions.get_addresses("coap.me")

# Assumption: the packet capture is complete up to the level of detail needed to reconstruct the traffic flow, including the DNS resolution of the public server address
# The address therefore used to refer to the coap.me server has been symbolically resolved and the DNS answer is included in the packet capture
assert len(coapme_addresses) >= 1
# Another reasonable approach would involve resolving the symbolic name through a group of geographically distributed DNS servers
# Note that this second approach would most often lead to incompleteness, as DNS resolution has no geographical and temporal standard bound, so it may be adapted to traffic congestion or many other external conditions
# Relying on the resolution provided by the capture is therefore the best approach
# Of course, to continue processing, the underlying assertion states that the amount of found address is greater (or equal) than 1

# Filtering all GET requests, that will then be parsed selecting CONfirmable and NON-confirmable ones and counting them
captures = pyshark.FileCapture(
    PCAP_URI,
    display_filter="coap.code == {}"
    .format(
        COAP_GET_CODE
    )
)

# Dictionaries, a hash map enabling a generic data structure as a key and/or value
# In our case, the URI will be the key for all structures
tokens_confirmable = {}
tokens_nonconfirmable = {}
confirmables = {}
nonconfirmables = {}
malformed_packets = 0
for packet in captures:
    try:
        # Checks whether the packet under analysis is directed to one of the coap.me address received through DNS and parsed in the beginning
        to_check = False
        if hasattr(packet, 'ip') and packet.ip.dst in coapme_addresses:
            to_check = True
        if hasattr(packet, 'ipv6') and packet.ipv6.dst in coapme_addresses:
            to_check = True
        if to_check:
            coap_layer = packet.coap
            token = coap_layer.token
            uri = coap_layer.opt_uri_path_recon
            # Token and URIs are collected
            # If the present request has the CON type, corresponding variables are used, otherwise the NON ones
            if int(coap_layer.type) == COAP_CONFIRMABLE_TYPE:
                # If the URI has not been seen before, we need to create the corresponding data structures
                if uri not in tokens_confirmable.keys():
                    tokens_confirmable[
                        uri] = set()  # This set will store all tokens values that requested the URI used as key
                    confirmables[
                        uri] = 0  # This value will be incremented with the number of matching unique (CON) requests to the URI
                # A new token has been detected and is added among the requests records
                if token not in tokens_confirmable[uri]:
                    tokens_confirmable[uri].add(token)
                    confirmables[uri] += 1
            else:
                # If the URI has not been seen before, we need to create the corresponding data structures
                if uri not in tokens_nonconfirmable.keys():
                    tokens_nonconfirmable[
                        uri] = set()  # This set will store all tokens values that requested the URI used as key
                    nonconfirmables[
                        uri] = 0  # This value will be incremented with the number of matching unique (NON) requests to the URI
                # A new token has been detected and is added among the requests records
                if token not in tokens_nonconfirmable[uri]:
                    tokens_nonconfirmable[uri].add(token)
                    nonconfirmables[uri] += 1
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

# Assertion: at least one CONfirmable and NON-confirmable request have been found
# Otherwise, no comparison would make sense for this query
assert len(confirmables) >= 1
assert len(nonconfirmables) >= 1

print("First sub-computation ended, found %d malformed packets." % malformed_packets)

matches = 0
# The two sets are intersected: only URIs appearing in both lists are useful for a comparison (of course, in the other cases, there would be a disparity among the two)
resources_endpoints = set(confirmables.keys()).intersection(set(nonconfirmables.keys()))
for resource in resources_endpoints:
    if confirmables[resource] == nonconfirmables[resource]:
        matches += 1

# Capture object is freed to allow easier usage of successive sub-computations
captures.close()
captures.clear()

print("Computation ended! Question n. 2 has:\n\t\t\t%d matches" % matches)

First sub-computation ended, found 0 malformed packets.
Computation ended! Question n. 2 has:
			3 matches


## Question n. 3
How many different MQTT clients subscribe to the public broker HiveMQ using multi-level wildcards?

In [5]:
# broker.hivemq.com is a symbolic name, that needs to be resolved using DNS
hivemq_addresses = custom_functions.get_addresses("broker.hivemq.com")

# Assumption: the packet capture is complete up to the level of detail needed to reconstruct the traffic flow, including the DNS resolution of the public server address
# The address therefore used to refer to the broker.hivemq.com broker has been symbolically resolved and the DNS answer is included in the packet capture
assert len(hivemq_addresses) >= 1
# See above for more in-depth discussion

# Filters MQTT Subscribe packets having an interest declaration (topic) matching the regular expression (.*)# for multi-level wildcards
# Composed by a Kleene star and a # (wildcard), meaning it will match every string ending in #
# This, in fact, is the only valid position of the wildcard
# The case in which the wildcard is found inside the string (#) and then any character follows (.+), meaning more than one character, is rejected
# Note that this is not equivalent to "mqtt.topic contains "#"", as it will match every string containing a wildcard, also in the middle of the string (that still, violates the protocol).
# If we can assume that the protocol is never violated by any packet, then the results found are the same.
captures = pyshark.FileCapture(
    PCAP_URI,
    display_filter="mqtt and mqtt.msgtype == {} and mqtt.topic matches \".*#\" and !mqtt.topic matches \"#.+\""
    .format(
        MQTT_SUBSCRIBE
    )
)

# Let's store unique client ids declaring interest in a topic having a wildcard
clientids = []
malformed_packets = 0
for packet in captures:
    try:
        # First, symmetrically to before, let's check whether the Subscribe message is in fact directed to HiveMQ
        to_check = False
        if hasattr(packet, 'ip') and packet.ip.dst in hivemq_addresses:
            to_check = True
        if hasattr(packet, 'ipv6') and packet.ipv6.dst in hivemq_addresses:
            to_check = True
        # If it is, we need to determine its client id (to, therefore, determine if the client currently under analysis is unique)
        if to_check:
            # A custom function is invoked (see detailed explanation in the custom_function file)
            clientid = custom_functions.search_clientid(packet)
            # Since we are grouping on a single broker, the client id is definitely unique for a single session, but that may be empty, so we need to couple it with the socket definition from the client, defining therefore a unique couple
            socket_details = custom_functions.get_socket_details(packet)
            coupled_id = [clientid, socket_details]
            if coupled_id not in clientids:
                # The resulting client id is always added to the set, but since the set structure can only carry unique objects, it will be added only once
                clientids.append(coupled_id)
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

print("First sub-computation ended, found %d malformed packets." % malformed_packets)

# Capture object is freed to allow easier usage of successive sub-computations
captures.close()
captures.clear()

print("Computation ended! Question n. 3 has:\n\t\t\t%d matches" % len(clientids))

First sub-computation ended, found 0 malformed packets.
Computation ended! Question n. 3 has:
			4 matches


## Question n. 4
How many different MQTT clients specify a last Will Message to be directed to a topic having as first level "university"?

In [6]:
# Filters MQTT Connect packets having the connection will flag set to True (declaring interest in providing a Last Will message)
# Among those, we are interested in the ones having as a destination topic "university" at the first level
# Matching can be implemented, again, using a regular expression:
# ^ indicates the beginning of the string to match, then "university" must appear at the first level, so at the beginning
# Then, anything can appear (even no more levels, actually): that means a Kleene star and a closing sign ($) can be used
captures = pyshark.FileCapture(
    PCAP_URI,
    display_filter="mqtt and mqtt.msgtype == {} and mqtt.conflag.willflag == True and mqtt.willtopic matches \"^university.*$\""
    .format(
        MQTT_CONNECT
    )
)

# Let's store unique client ids specifying a last will message with a destination topic having "university" as a first level
clientids = []
malformed_packets = 0
for packet in captures:
    try:
        # A matching CONNECT ACK is searched for, meaning the connection was successful, and we can proceed univocally identifying each client
        if not custom_functions.check_connect_ack(packet):
            continue
        # As before, the client id is derived from the CONNECT packet
        clientid = packet.mqtt.clientid
        # Since we are grouping on different brokers, the client id may be not unique, so we need to couple it with the socket definition from the client, defining therefore a unique couple
        socket_details = custom_functions.get_socket_details(packet)
        coupled_id = [clientid, socket_details]
        if coupled_id not in clientids:
            clientids.append(coupled_id)
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

print("First sub-computation ended, found %d malformed packets." % malformed_packets)

# Capture object is freed to allow easier usage of successive sub-computations
captures.close()
captures.clear()

print("Computation ended! Question n. 4 has:\n\t\t\t%d matches" % len(clientids))

First sub-computation ended, found 0 malformed packets.
Computation ended! Question n. 4 has:
			1 matches


## Question n. 5
How many MQTT subscribers receive a last will message derived from a subscription without a wildcard?

In [7]:
# Filters MQTT Connect packets carrying a last will message (as done before, but without a specific topic bound)
cap = pyshark.FileCapture(
    PCAP_URI,
    display_filter="mqtt and mqtt.msgtype == {} and mqtt.conflag.willflag == True"
    .format(
        MQTT_CONNECT
    )
)

# All relevant details about the last will messages are stored:
# This includes: topic, message content and frame number (position in the packet capture, it will be useful later)
will_messages = []
for packet in cap:
    try:
        # A matching CONNECT ACK is searched for, meaning the connection was successful, and we can proceed storing the last will message, having been captured already by the broker
        if not custom_functions.check_connect_ack(packet):
            continue
        will_topic = packet.mqtt.willtopic
        will_message = packet.mqtt.willmsg
        frame_number = int(packet.frame_info.number)
        # When the relevant details have been gathered, they are stored in a triple, in a list data structure
        will_messages.append([will_topic, will_message, frame_number])
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

print("First sub-computation ended, found %d malformed packets." % malformed_packets)

# Capture object is freed to allow easier usage of successive sub-computations
cap.close()
cap.clear()

# Now, we are interested in capturing all publish messages
# The last will message, in fact, will be sent to interested clients just like a normal publish message!
# The only meaningful difference is that this
cap = pyshark.FileCapture(
    PCAP_URI,
    display_filter="mqtt and mqtt.msgtype == {}"
    .format(MQTT_PUBLISH)
)

# Each sub-list is split in three temporary-separated ones, to allow for easier matching
will_topic_list = list(msg[0] for msg in will_messages)
will_message_list = list(msg[1] for msg in will_messages)
will_frame_number_list = list(msg[2] for msg in will_messages)
matches = 0
malformed_packets = 0
for packet in cap:
    try:
        topic = packet.mqtt.topic
        message = packet.mqtt.msg
        frame_number = int(packet.frame_info.number)
        index = -1
        # For each publish message found, the content is checked with respect to the last will messages collected at the beginning
        try:
            index = will_message_list.index(message)
        except:
            continue
        # Then, the topic has to match and the frame number has to be a strict successor of the last will one
        # (of course, as it would be impossible to publish a message that anyone never gave the broker)
        if will_topic_list[index] == topic and frame_number > will_frame_number_list[index]:
            # Then the message derives from a last will message
            # Returns a set containing all topics to which the client declared interest in
            subscriptions = custom_functions.compute_subscriptions(packet, will_frame_number_list[index])
            # Then, for each subscription derived, it is checked for matching upon the original topic used in the last will message
            for sub in subscriptions:
                # If topic matches and the message does not have a single-level wildcard in the middle of the string or a multi-level wildcard at the end (assuming, in this case, compliance with MQTT subscription rules)
                if (custom_functions.mqtt_topic_matches(will_topic_list[index], sub)
                        and not sub.endswith('#') and not '+' in sub):
                    matches += 1
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

print("Second sub-computation ended, found %d malformed packets." % malformed_packets)

# Capture object is freed to allow easier usage of successive sub-computations
cap.close()
cap.clear()

print("Computation ended! Question n. 5 has:\n\t\t\t%d matches" % matches)

First sub-computation ended, found 0 malformed packets.
Second sub-computation ended, found 0 malformed packets.
Computation ended! Question n. 5 has:
			3 matches


## Question n. 6
How many MQTT publish messages directed to the public broker mosquitto are sent with the retain option and use QoS “At most once”?

In [8]:
# test.mosquitto.org is a symbolic name, that needs to be resolved using DNS
mosquitto_addresses = custom_functions.get_addresses("test.mosquitto.org")

# Assumption: the packet capture is complete up to the level of detail needed to reconstruct the traffic flow, including the DNS resolution of the public server address
# The address therefore used to refer to the test.mosquitto.org broker has been symbolically resolved and the DNS answer is included in the packet capture
assert len(mosquitto_addresses) >= 1
# See above for more in-depth discussion

# Filters MQTT Publish messages having the Retain option set to True and a QoS level set to 0
# That is, "best effort" case: at most once delivery guarantee
QOS_LEVEL = 0
captures = pyshark.FileCapture(
    PCAP_URI,
    display_filter="mqtt and mqtt.msgtype == {} and mqtt.retain == True and mqtt.qos == {}"
    .format(
        MQTT_PUBLISH,
        QOS_LEVEL
    )
)

# All packets having the requested characteristics have already been selected
# Now, we only need to match with the set of Mosquitto broker addresses received through DNS resolution
matches = 0
for packet in captures:
    try:
        to_check = False
        if hasattr(packet, 'ip') and packet.ip.dst in mosquitto_addresses:
            to_check = True
        if hasattr(packet, 'ipv6') and packet.ipv6.dst in mosquitto_addresses:
            to_check = True
        if to_check:
            matches += 1
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

print("First sub-computation ended, found %d malformed packets." % malformed_packets)

# Capture object is freed to allow easier usage of successive sub-computations
captures.close()
captures.clear()

print("Computation ended! Question n. 6 has:\n\t\t\t%d matches" % matches)

First sub-computation ended, found 0 malformed packets.
Computation ended! Question n. 6 has:
			208 matches


## Question n. 7
How many MQTT-SN messages on port 1885 are sent by the clients to a broker in the local machine?

In [9]:
# In PyShark, there is no explicit option to remap a specific port to a different protocol other than the standard one
# We can, however, exploit the power of Wireshark filters to explicitly request decoding UDP packets directed to port 1885 being MQTT-SN packets
# This would include ICMP packets, used in this context for signalling procedures, so they are explicitly excluded
cap = pyshark.FileCapture(
    PCAP_URI,
    display_filter="udp and udp.dstport == 1885 and (ip.dst == 127.0.0.1 or ipv6.dst == ::1) and !icmp",
    # Map and decode MQTT-SN traffic to UDP destination port 1885
    decode_as={
        'udp.port==1885': 'mqttsn'
    }
)

# Matching packets will now contain an MQTT-SN layer!
matches = 0
for packet in cap:
    try:
        if hasattr(packet, 'mqttsn'):
            matches += 1
    except:
        # A malformed packet has been found and computation needed to be stopped, counting it
        malformed_packets += 1

print("First sub-computation ended, found %d malformed packets." % malformed_packets)

# Capture object is freed to allow easier usage of successive sub-computations
cap.close()
cap.clear()

print("Computation ended! Question n. 7 has:\n\t\t\t%d matches" % matches)

First sub-computation ended, found 0 malformed packets.
Computation ended! Question n. 7 has:
			0 matches
