<a href="https://colab.research.google.com/github/alright21/colab_thesis/blob/main/CAN_Bus_IDS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CAN Bus IDS implementation and benchmarking

This project aims at implementing and comparing some of the most intersting and promising CAN bus IDS found in literature. A lot of projects and ideas has been published during last year in this area, but there is no study that wants to compare the different solutions using common data to see what perform best, what are their strengths and weaknesses.

## Language, modules and tools

The language of choice is python3.9 because it is a flexible language that can be quite simple to write, but at the same time it is very powerful and suitable for large data manipulation. Moreover, some of the CAN bus used machine learning structures and python has been recognized as one of the most used language for this area of study thanks to its various number of mudules.

Speaking of modules, the packet generation and manipulation happened with a module called `python-can`, a complete library for reading, writing and analyzing CAN messages. It also allows to directly communicate with a real or virtual CAN bus in an easy way, but since logs were already taken. Getting the log directly from a car and checking the validity of the IDS in real time was the first idea, but there was a need for expensive hardware and extensive study on a chosen vehicle to tweak the bus reading accordingly. Moreover, it was more interesting to use a complete dataset, like ReCAN, and check the validity on different types of vehicles, like an heavy truck.


In [1]:
!pip install python-can

Collecting python-can
  Downloading python-can-3.3.4.tar.gz (179 kB)
[K     |████████████████████████████████| 179 kB 5.1 MB/s 
Collecting aenum
  Downloading aenum-3.1.0-py3-none-any.whl (123 kB)
[K     |████████████████████████████████| 123 kB 62.7 MB/s 
[?25hBuilding wheels for collected packages: python-can
  Building wheel for python-can (setup.py) ... [?25l[?25hdone
  Created wheel for python-can: filename=python_can-3.3.4-py2.py3-none-any.whl size=154207 sha256=f7bbf9c90ac10b68094ce5f5caa1990c40aa8fd27808792f1934ade7c257e650
  Stored in directory: /root/.cache/pip/wheels/23/22/aa/ab09f2d1a99925c9eb5c38e0afbb717c48fe59b74c4480d4ce
Successfully built python-can
Installing collected packages: aenum, python-can
Successfully installed aenum-3.1.0 python-can-3.3.4


In [2]:
import threading
import time
import can
import logging
from base64 import b64encode, b64decode
import datetime
import sys
import numpy as np
import math

## Logging

In [3]:
# Reference: https://www.bogotobogo.com/python/Multithread/python_multithreading_Synchronization_Producer_Consumer_using_Queue.php
logging.basicConfig(level=logging.INFO, format='(%(threadName)-9s) %(message)s',)

## Data

Data was chosen from two main sources: the first one is a university and european project, that the university of Turku collaborated with, on the security of heavy trucks. The origin of the data are CAN bus logs of these trucks in normal driving conditions, without any previous attack injected.

The second source came from a public research, ReCAN dataset, which is a complete resource with raw and unified data, taken from different car models on different conditions.

The first step was data normalization: modify the structure of the datasets while preserving the content in order to be ingeted to our algorithm. The expected result is a series of CSV files with the following structure:

| timestamp | arbitration_id | extended | remote | error | dlc | data | data | data | data | data | data | data | data |
|:---------:|----------------|----------|--------|-------|-----|------|------|------|------|------|------|------|------|

It was based on the structure of the heavy truck's dataset and the others were modified accordingly.
- `timestamp`: the timestamp of the frame
- `arbitration_id`: the ID of the frame
- `extended`: it indicated if the ID is extended
- `remote`: it indicates if the frame is a remote frame or not
- `error`: it indicates if the frame is an errror frame or not
- `dlc`: data length code, the length of the data
- `data`: eight decimal values indicating the data of the frame. This is usually different between datasets: it can be expressed in hexidecimal or binary values

In [4]:
attackfree_datasets = [
             '/content/2020_12_04_15_49_09_806427_vehicle.csv',
             '/content/2020_12_07_07_54_05_363774_vehicle.csv',
             '/content/raw11.csv',
             '/content/raw22.csv',
             '/content/raw33.csv',
             '/content/test.csv'
             ]

training_fn = [
    '/content/2021_06_22_13_10_04_728057_vehicle_normalized.csv',
    '/content/2021_06_22_13_11_03_600554_vehicle_normalized.csv',
    '/content/2021_06_22_13_12_02_778615_vehicle_normalized.csv',
    '/content/2021_06_22_13_13_01_995553_vehicle_normalized.csv',
    '/content/2021_06_22_13_14_01_213477_vehicle_normalized.csv',
    '/content/2021_06_22_13_15_00_431179_vehicle_normalized.csv',
    '/content/2021_06_22_13_15_59_634608_vehicle_normalized.csv',
    '/content/2021_06_22_13_16_58_828128_vehicle_normalized.csv',
    '/content/2021_06_22_13_17_58_001905_vehicle_normalized.csv',
    '/content/2021_06_22_13_18_57_198424_vehicle_normalized.csv',
    '/content/2021_06_22_13_19_56_400136_vehicle_normalized.csv',
    '/content/2021_06_22_13_20_55_602416_vehicle_normalized.csv',
    '/content/2021_06_22_13_21_54_811286_vehicle_normalized.csv',
    '/content/2021_06_22_13_22_53_887541_vehicle_normalized.csv',
    '/content/2021_06_22_13_23_53_085124_vehicle_normalized.csv',
    '/content/2021_06_22_13_24_52_442638_vehicle_normalized.csv',
    '/content/2021_06_22_13_25_51_772838_vehicle_normalized.csv',
    '/content/2021_06_22_13_26_51_068837_vehicle_normalized.csv',
    '/content/2021_06_22_13_27_50_340133_vehicle_normalized.csv',
    '/content/2021_06_22_13_28_49_583673_vehicle_normalized.csv',
    '/content/2021_06_22_13_29_48_854743_vehicle_normalized.csv',
    '/content/2021_06_22_13_30_48_122172_vehicle_normalized.csv',
    '/content/2021_06_22_13_31_47_396052_vehicle_normalized.csv',
    '/content/2021_06_22_13_32_46_668090_vehicle_normalized.csv',
    '/content/2021_06_22_13_33_45_860518_vehicle_normalized.csv',
    '/content/2021_06_22_13_34_45_136631_vehicle_normalized.csv',
    '/content/2021_06_22_13_35_44_916969_vehicle_normalized.csv',
    '/content/2021_06_22_13_36_44_743770_vehicle_normalized.csv',
    '/content/2021_06_22_13_37_44_122987_vehicle_normalized.csv',
    '/content/2021_06_22_13_38_43_301819_vehicle_normalized.csv',
    '/content/2021_06_22_13_39_42_558894_vehicle_normalized.csv',
    '/content/2021_06_22_13_40_41_813371_vehicle_normalized.csv',
    '/content/2021_06_22_13_41_41_075228_vehicle_normalized.csv',
    '/content/2021_06_22_13_42_40_314591_vehicle_normalized.csv',
    '/content/2021_06_22_13_43_39_531373_vehicle_normalized.csv',
    '/content/2021_06_22_13_44_38_749345_vehicle_normalized.csv',
    '/content/2021_06_22_13_45_37_974657_vehicle_normalized.csv',
    '/content/2021_06_22_13_46_37_181616_vehicle_normalized.csv',
    '/content/2021_06_22_13_47_36_409704_vehicle_normalized.csv',
    '/content/2021_06_22_13_48_35_630889_vehicle_normalized.csv',
    '/content/2021_06_22_13_49_34_833695_vehicle_normalized.csv',
    '/content/2021_06_22_13_50_33_964979_vehicle_normalized.csv',
    '/content/2021_06_22_13_51_32_864148_vehicle_normalized.csv',
    '/content/2021_06_22_13_52_32_392879_vehicle_normalized.csv',
    '/content/2021_06_22_13_53_31_499819_vehicle_normalized.csv',
    '/content/2021_06_22_13_54_30_643057_vehicle_normalized.csv',
    '/content/2021_06_22_13_55_29_829240_vehicle_normalized.csv',
    '/content/2021_06_22_13_56_29_051575_vehicle_normalized.csv',
    '/content/2021_06_22_13_57_28_450536_vehicle_normalized.csv',
    '/content/2021_06_22_13_58_28_044336_vehicle_normalized.csv',
    '/content/2021_06_22_13_59_27_574745_vehicle_normalized.csv',
    '/content/2021_06_22_14_00_27_012099_vehicle_normalized.csv',
    '/content/2021_06_22_14_01_26_539580_vehicle_normalized.csv',
    '/content/2021_06_22_14_02_26_180494_vehicle_normalized.csv',
    '/content/2021_06_22_14_03_25_575464_vehicle_normalized.csv',
    '/content/2021_06_22_14_04_25_057171_vehicle_normalized.csv',
    '/content/2021_06_22_14_05_24_619667_vehicle_normalized.csv',
    '/content/2021_06_22_14_06_23_979033_vehicle_normalized.csv',
    '/content/2021_06_22_14_07_23_539428_vehicle_normalized.csv',
    '/content/2021_06_22_14_08_23_007943_vehicle_normalized.csv',
    '/content/2021_06_22_14_09_22_554866_vehicle_normalized.csv',
    '/content/2021_06_22_14_10_22_089648_vehicle_normalized.csv'
]

detection_fn = [
                '/content/2021_06_22_14_11_21_675720_vehicle_dos.csv'            
                
]

## Importing data

The method used to import the data was modified from the original CSVReader of the `python-can` module. It gets a CSV file and returns a list of CANMessages.

In [5]:
# Credits to: https://python-can.readthedocs.io/en/master/_modules/can/io/csv.html#CSVReader
class CSVReader(can.io.generic.BaseIOHandler):
    """Iterator over CAN messages from a .csv file that was
    generated by :class:`~can.CSVWriter` or that uses the same
    format as described there. Assumes that there is a header
    and thus skips the first line.

    Any line separator is accepted.
    """

    def __init__(self, file):
        """
        :param file: a path-like object or as file-like object to read from
                     If this is a file-like object, is has to opened in text
                     read mode, not binary read mode.
        """
        super(CSVReader, self).__init__(file, mode='r')

    def __iter__(self):
        # skip the header line
        try:
            next(self.file)
        except StopIteration:
            # don't crash on a file with only a header
            return

        for row,line in enumerate(self.file):

            # Line reading was modified for our format
            timestamp, arbitration_id, extended, remote, error, dlc, data0, data1, data2, data3, data4, data5, data6, data7 = line.split(',')

            date, time = timestamp.split(' ')
            year, month, day = date.split('-')
            hour, minute, seconds = time.split(':')
            seconds = seconds.split('.')

            if len(seconds) == 1:
                seconds.append('000000')

            dt = datetime.datetime(int(year), int(month), int(day), int(hour), int(minute), int(seconds[0]), int(seconds[1]))
            data_temp = [data0 , data1, data2, data3, data4, data5, data6, data7.rstrip('\n')]

            data = []
            for i in range(len(data_temp)):
                if data_temp[i] != '':
                    data.append(int(data_temp[i]))
            yield can.Message(
                timestamp=dt.timestamp(),
                is_remote_frame=(True if dlc=='0' else False),
                is_extended_id=(True),
                is_error_frame=(False),
                arbitration_id=int(arbitration_id, base=16),
                dlc=int(dlc),
                data=(data if dlc!='0' else None),
                check=True
            )

        self.stop()

## Choice of IDS

Choosing the proper IDS from literature was a interesting but at the same time challenging task. The IDS were split into different group according to their detection method. ... (TODO)

## 1. IDS based on frequency of each message frame

The first IDS implemented was the one proposed by Gmiden et al. [fonte]. The main idea is checking each message time frame from each other (for example, when the IDS receives a message of if `0xA1B` check the timestamp of the last message with the same id). The IDS calculates the difference between the current timeframe and check if it is less than half of the shortest timeframe calculated so far. In this case an alarm is raised, otherwise if the timeframe is just shorter, the new timeframe is saved. The IDS ignores the remote frames and the answer of those (because we suppose that are "out of regular frequency"). The detection based on the frequency starts from the assumption that CAN messages are sent periodically through the CAN bus, so we can assume that a message with id know is sent out of the normal period, an attack is possibly in act. On the other hand, if an ECU is compromised and can send messages with its original frequency, the attack is not detected. Another disadvantage is the status of the car when the IDS is working: in fact, a vehicle sends some message with different frequency when it is running or when it is parked. This factor should be considered if OEM wants to implement this IDS. The solution can be a frequency-based IDS installed on the car that change its mode according to the state of the vehicle, and saves the timeframes in different datasets to avoid unwanted collisions or false positives. In this study, a single state of the vehicle is considered because the IDS is not connected physically to a CAN bus, but the messages are read from a log file, and there is no way to detect the different states. The IDS does not have any validation phase and can be run in real time. When an anomaly is detected the alarm is immediately raised.

In [19]:
class IDS_timeframe:
    def __init__(self, filenames=None, name=None, args=(), kwargs=None, verbose=None):
        self.name = name
        self.filenames = filenames
        return

    def run(self):
        
        logging.debug(self.name + " fired up")
        

        for filename in self.filenames:

            min_tolerance = {}
            last_timestamp = {}
            ignore_next_msg = {}
            i = 0
            for msg in CSVReader(filename):
                if msg is None:
                    logging.info('No message has been received')
                    sys.exit()
                else:
                    if msg.dlc != 0 and (msg.arbitration_id not in ignore_next_msg):
                        if msg.arbitration_id in last_timestamp:
                            time_frame = msg.timestamp - last_timestamp[msg.arbitration_id]
                            if msg.arbitration_id not in min_tolerance:
                                min_tolerance[msg.arbitration_id] = time_frame
                            else:
                                if time_frame < (min_tolerance[msg.arbitration_id]/2):
                                    logging.error("ATTACK detected: i=" + str(i) + " " + str(msg) + " " + str(time_frame) + " " + str(min_tolerance[msg.arbitration_id]/2))
                                    min_tolerance[msg.arbitration_id] = time_frame
                                elif time_frame < min_tolerance[msg.arbitration_id]:
                                    min_tolerance[msg.arbitration_id] = time_frame

                        last_timestamp[msg.arbitration_id] = msg.timestamp
                    # ignore the response of the remote frame, time frequency analysis would detect attack here
                    elif msg.dlc != 0 and (msg.arbitration_id in ignore_next_msg):
                        del ignore_next_msg[msg.arbitration_id]
                    # ignore the remote frame
                    else:
                        ignore_next_msg[msg.arbitration_id] = True
                i+=1

### Execution
This is the execution of of the first IDS without any attack to check its functionality.

In [20]:
ids_timeframe = IDS_timeframe(
        name='ids_timeframe',
        filenames=training_fn)

ids_timeframe.run()

(MainThread) ATTACK detected: i=303 Timestamp: 1624367404.796196    ID: 10ff80e6    X                DLC:  8    00 00 15 6e f0 90 ff ff 0.00015091896057128906 0.0003339052200317383
(MainThread) ATTACK detected: i=332 Timestamp: 1624367404.799038    ID: 15ff59e6    X                DLC:  8    00 00 00 00 00 00 00 00 0.005216836929321289 0.005728602409362793
(MainThread) ATTACK detected: i=334 Timestamp: 1624367404.799181    ID: 18fdc4e6    X                DLC:  8    ff 03 ff ff ff ff ff ff 0.0050868988037109375 0.005653500556945801
(MainThread) ATTACK detected: i=335 Timestamp: 1624367404.799233    ID: 1cff9053    X                DLC:  4    fe 1f 3b 80 0.004993915557861328 0.005661964416503906
(MainThread) ATTACK detected: i=7031 Timestamp: 1624367471.920291    ID: 18ecffe6    X                DLC:  8    20 15 00 03 ff aa ff 00 0.15487098693847656 0.18198144435882568
(MainThread) ATTACK detected: i=7724 Timestamp: 1624367531.923307    ID: 18ecffe6    X                DLC:  8    20 15 

## 2. IDS based on matrix of message id transitions

The second IDS analysed is developed by Marchetti and Stabili [fonte] and its main assumption is that the messages flows through the CAN bus following a consistent pattern, meaning that it is possible to find sequences of message ids that repeats over time. In particular, during similar driving condition, a transition between `id_1` and `id_2` (first the ids receives `id_1` and then the following message has `id_2`) it is a consistent pattern. If the transition is new (at least one of the id was never seen before or the pattern `id_1`->`id_2` is not in the database) it is very probable that an attack is happening. This IDS needs a training phase to build up a transition matrix with in each row the origin id and in each column the destination id, and a validation phase to verify that the matrix is complete and, if not, add more transition. After that, the IDS is ready to detect in real time if there are unknown transitions and/or unknknown ids and raise alarms. One of the main strength of this IDS is the ability to detect message with unknown id, but at the same time it can be a weakness, if the id was never seen during the training phase, but the message is genuine. In order to have a complete matrix, the training phase should accept a great amount of messages.


In [29]:
class IDS_transitions:
    def __init__(self, tranining_filenames=None, detection_filenames=None, name=None, args=(), kwargs=None, verbose=None):
        self.name = name
        self.training_filenames = tranining_filenames
        self.detection_filenames = detection_filenames
        self.MAX_SIZE = 150
        return


    def train(self):
        self.matrix = np.zeros((self.MAX_SIZE, self.MAX_SIZE))
        
        matrix_index = 0
        self.unique_id = {}

        for filename in self.training_filenames:

            i = 0
            last_id = 0
            for msg in CSVReader(filename):

                if i != 0:
                    if last_id not in self.unique_id:
                        self.unique_id[last_id] = matrix_index
                        matrix_index+=1
                    if msg.arbitration_id not in self.unique_id:
                        self.unique_id[msg.arbitration_id] = matrix_index
                        matrix_index+=1
                    
                    self.matrix[self.unique_id[last_id]][self.unique_id[msg.arbitration_id]] = 1

                last_id = msg.arbitration_id
                i+=1

    def run(self):
        
        anomaly_counter = 0
        logging.debug("starting IDS detection")
       
        

        unknown_ids = {}
        for filename in self.detection_filenames:
            i = 0
            for msg in CSVReader(filename):
                
                if i != 0:
                    if last_id not in self.unique_id:
                        # logging.info("ANOMALY detected in transition: " + str(last_id) + " -> " + str(msg.arbitration_id))
                        anomaly_counter += 1
                        continue
                    elif msg.arbitration_id not in self.unique_id:
                        anomaly_counter +=1
                        if msg.arbitration_id not in unknown_ids:
                            unknown_ids[msg.arbitration_id] = 1
                        else:
                            unknown_ids[msg.arbitration_id] += 1
                        continue
                    else:
                        if not self.matrix[self.unique_id[last_id]][self.unique_id[msg.arbitration_id]]:
                            # logging.info("ANOMALY detected in transition: " + str(last_id) + " -> " + str(msg.arbitration_id))
                            anomaly_counter += 1
                
                i+=1
                last_id = msg.arbitration_id
            logging.info("number of anomalies detected: " + str(anomaly_counter))
            logging.info("unknown id: " + str(unknown_ids))

### Execution

This is the execution of the second IDS with attack free dataset

In [30]:
ids_transitions = IDS_transitions(
        name='ids_transitions', 
        tranining_filenames=training_fn, 
        detection_filenames=detection_fn)
  
ids_transitions.train()

In [36]:
ids_transitions.detection_filenames = ['/content/2021_06_21_15_06_59_032664_vehicle_normalized.csv']
ids_transitions.run()

(MainThread) number of anomalies detected: 2
(MainThread) unknown id: {}


## 3. IDS based on the hamming distance of frames with the same id

The following IDS follows the idea of Stabili et al. in [fonte]. Its functioning is based on the assumption that the Hamming distance of consequent messages (with the same id) will always be inside a certain range. This range is calculate during the training and validation phase. During the live detection phase, if a message has an Hamming distance smaller or larger than this range, an alarm is raised because it is very likely that an attack is happening. Based on its definition, this IDS seems very good in detecting replay attacks and fuzzing attack with malicious messages with id already seen. On the other hand, a message with unknown id is ignored, so attacks like DoS with high priority message can be ignored during the execution.

In [6]:
class IDS_hamming:
    def __init__(self, tranining_filenames=None, detection_filenames=None, name=None, args=(), kwargs=None, verbose=None):
        self.name = name
        self.training_filenames = tranining_filenames
        self.detection_filenames = detection_filenames
        return

    def hamming(self, data1, data2):
        if len(data1) != len(data2):
            logging.error("messages with different length!")
            return 0
        else:
            length = len(data1)
            hamming_distance = 0
            for i in range(length):
                 byte_distance = bin(data1[i] ^ data2[i]).count('1')
                 hamming_distance += byte_distance
            
            return hamming_distance


    def train(self):
        self.min_hamming = {}
        self.max_hamming = {}

        for filename in self.training_filenames:
            last_msg = {}

            for msg in CSVReader(filename):

                if msg.arbitration_id in last_msg:

                    current_hamming = self.hamming(msg.data,last_msg[msg.arbitration_id].data)

                    if msg.arbitration_id not in self.min_hamming:
                        self.min_hamming[msg.arbitration_id] = current_hamming
                        self.max_hamming[msg.arbitration_id] = current_hamming
                    else:
                        if current_hamming > self.max_hamming[msg.arbitration_id]:
                            self.max_hamming[msg.arbitration_id] = current_hamming
                        elif current_hamming < self.min_hamming[msg.arbitration_id]:
                            self.min_hamming[msg.arbitration_id] = current_hamming

                last_msg[msg.arbitration_id] = msg



    def run(self):

        for filename in self.detection_filenames:

            last_msg = {}
            anomaly_counter = 0

            for msg in CSVReader(filename):

                if msg.arbitration_id in last_msg:

                    current_hamming = self.hamming(msg.data,last_msg[msg.arbitration_id].data)

                    if msg.arbitration_id not in self.min_hamming:
                        logging.info("new ID detected: " + str(msg.arbitration_id))
                    else:
                        if current_hamming > self.max_hamming[msg.arbitration_id] or current_hamming < self.min_hamming[msg.arbitration_id]:
                            anomaly_counter +=1

                last_msg[msg.arbitration_id] = msg

            logging.info("anomalies encountered: " + str(anomaly_counter))
        


### Execution

In [7]:
ids_hamming = IDS_hamming(
    training_fn,
    detection_fn,
    'ids_hamming')
ids_hamming.train()


In [13]:
ids_hamming.detection_filenames = ['/content/2021_06_22_14_11_21_675720_vehicle_dos_1rand.csv']
ids_hamming.run()
print(ids_hamming.max_hamming)

(MainThread) new ID detected: 419319782
(MainThread) new ID detected: 419406207
(MainThread) new ID detected: 419319782
(MainThread) new ID detected: 419319782
(MainThread) new ID detected: 419412607
(MainThread) new ID detected: 419319782
(MainThread) new ID detected: 419319782
(MainThread) new ID detected: 419319782
(MainThread) new ID detected: 418185087
(MainThread) new ID detected: 419319782
(MainThread) new ID detected: 201385599
(MainThread) new ID detected: 418119551
(MainThread) new ID detected: 285179775
(MainThread) new ID detected: 418185087
(MainThread) new ID detected: 285179775
(MainThread) new ID detected: 285179775
(MainThread) new ID detected: 419408614
(MainThread) new ID detected: 419406207
(MainThread) new ID detected: 285183871
(MainThread) new ID detected: 419412607
(MainThread) new ID detected: 419408614
(MainThread) new ID detected: 419412607
(MainThread) new ID detected: 419406207
(MainThread) new ID detected: 418185087
(MainThread) new ID detected: 201385599


{285180134: 10, 418384358: 29, 217055974: 20, 217056486: 26, 150892262: 43, 218000614: 18, 369055206: 5, 217056230: 12, 369055974: 8, 285110248: 15, 284166120: 1, 419322856: 3, 417726408: 0, 417726400: 0, 417726392: 0, 417726384: 0, 417726376: 0, 369056230: 0, 419284198: 1, 486510675: 2, 419349704: 0, 419373030: 15, 217996006: 0, 419349696: 0, 418383334: 16, 419349688: 0, 419322342: 17, 419349680: 0, 369056998: 4, 419349672: 8, 419315958: 3, 419323080: 0, 419361254: 15, 419323072: 0, 419361510: 10, 419239398: 1, 419323064: 0, 419323056: 0, 418382310: 7, 419323048: 0, 419370470: 0, 416350182: 0, 418119654: 43, 486506726: 0, 419396326: 7, 419360742: 7, 419358438: 11, 418185190: 5, 417398758: 1, 418383590: 0, 419364070: 5, 419360486: 10, 486458342: 0, 419389414: 21, 419363046: 9, 419362278: 9, 419321574: 2, 419405798: 20, 419402982: 18, 419403238: 11, 419344102: 10, 419265766: 0, 419236326: 13, 419348966: 26, 419422182: 0, 419358182: 5, 486535398: 0, 486535654: 0}


## IDS based on the entropy of ID CAN message bits

The next IDS analysed is developed by Wang and colleagues [fonte], and its detection technique is based on the entropy of each bit of a regular or extended message arbitration ID. Since the data considered in this study uses extended ID, this will be the type of IDs analysed.

In [14]:
class IDS_Id_Entropy:
    def __init__(self, tranining_filenames=None, detection_filenames=None, name=None, args=(), kwargs=None, verbose=None):
        self.name = name
        self.training_filenames = tranining_filenames
        self.detection_filenames = detection_filenames
        return

    def train(self):

        entropy_vectors = np.zeros((len(self.training_filenames), 29))

        K = 5

        for filename_index, filename in enumerate(self.training_filenames):
            probability_vector = np.zeros(29)

            messages = CSVReader(filename)

            num_of_messages = 0

            for msg in messages:
                
                b_arbitration_id = format(msg.arbitration_id, "029b")

                for i in range(len(b_arbitration_id)):
                    if b_arbitration_id[i] == '1':
                        probability_vector[i] += 1
                
                num_of_messages += 1
            
            probability_vector = np.divide(probability_vector, float(num_of_messages))


            for i in range(len(probability_vector)):

                if probability_vector[i] == 1.0 or probability_vector[i] == 0.0:
                     entropy_vectors[filename_index][i] = 0.0
                elif probability_vector[i] > 0.0:
                    # H(X) = H_b(p) = -p * log_2(p)-(1-p)log_2(1-p) 
                    entropy_vectors[filename_index][i] = -(probability_vector[i]) * math.log2(probability_vector[i]) - (1.0 - probability_vector[i]) * math.log2(1.0 - probability_vector[i])
            


        max_entropy = np.amax(entropy_vectors, axis=0)
        min_entropy = np.amin(entropy_vectors, axis=0)


        entropy_range = max_entropy - min_entropy

        self.threshold = np.multiply(entropy_range, K)


        print(self.threshold)

        self.entropy_template = np.mean(entropy_vectors, axis=0)

        print(self.entropy_template)

    def run(self):
        
        for filename_index, filename in enumerate(self.detection_filenames):

            messages = CSVReader(filename)

            num_of_messages = 0

            probability_vector = np.zeros(29)

            entropy_vector_to_check = np.zeros(29)
            for msg in messages:
                
                b_arbitration_id = format(msg.arbitration_id, "029b")

                for i in range(len(b_arbitration_id)):
                    if b_arbitration_id[i] == '1':
                        probability_vector[i] += 1
                
                num_of_messages += 1
            
            probability_vector = np.divide(probability_vector, float(num_of_messages))


            for i in range(len(probability_vector)):

                if probability_vector[i] == 1.0 or probability_vector[i] == 0.0:
                     entropy_vector_to_check[i] = 0.0
                elif probability_vector[i] > 0.0:
                    # H(X) = H_b(p) = -p * log_2(p)-(1-p)log_2(1-p) 
                    entropy_vector_to_check[i] = -(probability_vector[i]) * math.log2(probability_vector[i]) - (1.0 - probability_vector[i]) * math.log2(1.0 - probability_vector[i])


            for i in range(len(entropy_vector_to_check)):

                if entropy_vector_to_check[i] > (self.entropy_template[i] + self.threshold[i]):

                    logging.info("attack detected on bit " + str(i))
                






### Execution

In [15]:
entropy_ids = IDS_Id_Entropy(training_fn, detection_fn, 'entropy_ids')

entropy_ids.train()

[0.02874571 0.05321616 0.02636567 0.         0.02037845 0.
 0.         0.00369453 0.02045797 0.0047829  0.00624241 0.00349854
 0.02591707 0.02651045 0.00410948 0.01414741 0.00675294 0.00336313
 0.0019587  0.03077084 0.02609876 0.006378   0.02512278 0.0225136
 0.0597328  0.14488687 0.08810933 0.09494428 0.006378  ]
[0.87701268 0.74848331 0.91271323 0.         0.37026783 0.
 0.         0.05240467 0.36762981 0.9835928  0.95287278 0.99370345
 0.91731525 0.91049468 0.99618123 0.74180377 0.91268995 0.9951982
 0.99834676 0.94818328 0.96145509 0.09290482 0.49004006 0.4127726
 0.45344408 0.59103395 0.7703397  0.74864727 0.09290482]


In [16]:
entropy_ids.detection_filenames = ['/content/2021_06_22_14_11_21_675720_vehicle_dos_1rand.csv']
entropy_ids.run()

(MainThread) attack detected on bit 5
(MainThread) attack detected on bit 6
