# Detection of Malicious DNS-over-HTTPS (DoH) Tunneling using a Stacked Classifier
DNS-over-HTTPS (DoH) uses the HTTPS protocol to send encrypted request to the DNS server rather than the default User Datagram Protocol (UDP) (Böttger, et al., 2019). As DoH protocol uses the port 443 which is the default HTTPS port, it becomes difficult for network administrators to differentiate between regular HTTPS request and DNS request. However, malicious attackers with the knowledge of the inner workings of the DoH protocol found a way to use the protocol to hide their malicious activities when transferring stolen data from a compromised system. In 2019, an Iranian hacking group known as Oilrig became the first known group to incorporate the DoH protocol in a tool called DNSExfiltrator. This tool was used to exfiltrate data from the compromised network via DoH to several COVID-19 related domains (Cimpanu, 2020).

The goal of this research is to develop a hybrid classifier that would effectively detect and classify DNS tunneling that utilize the DoH Protocol.

### Dataset Details
The dataset used for this research is the CIRA-CIC-DoHBrw-2020 dataset developed by the Canadian Institute of Cybersecurity.
This dataset can be found on : ```https://www.unb.ca/cic/datasets/dohbrw-2020.html```

The CIRA-CIC-DoHBrw-2020 dataset provides 10 days of network traffic from Monday, December 10 to Thursday December 20, 2019. The dataset consists of 371,836 labelled network flows consisting of 34 features (MontazeriShatoori, et al., 2020).


In [1]:
# importing the dataset and outputting the first 5 samples in the dataset
import pandas as pd
df = pd.read_csv("MaliciousDoH.csv")
df.head(5)

Unnamed: 0,SourceIP,DestinationIP,SourcePort,DestinationPort,TimeStamp,Duration,FlowBytesSent,FlowSentRate,FlowBytesReceived,FlowReceivedRate,...,PacketTimeCoefficientofVariation,ResponseTimeTimeVariance,ResponseTimeTimeStandardDeviation,ResponseTimeTimeMean,ResponseTimeTimeMedian,ResponseTimeTimeMode,ResponseTimeTimeSkewFromMedian,ResponseTimeTimeSkewFromMode,ResponseTimeTimeCoefficientofVariation,DoH
0,192.168.20.111,8.8.8.8,49972,443,2019-12-10 13:14:43,0.017147,171,9972.589957,105,6123.520149,...,0.707107,0.0,0.0,0.01712,0.01712,0.01712,-10.0,-10.0,0.0,True
1,192.168.20.111,8.8.8.8,50028,443,2019-12-10 13:14:52,0.954806,2555,2675.936263,5675,5943.615771,...,1.353916,8.4e-05,0.009147,0.010709,0.008997,4e-06,0.561552,1.170332,0.854139,True
2,192.168.20.111,8.8.8.8,50092,443,2019-12-10 13:14:55,1.646289,8465,5141.867558,10845,6587.543256,...,0.549101,6.4e-05,0.00799,0.010945,0.015319,3e-06,-1.642429,1.369527,0.729979,True
3,192.168.20.111,54.72.229.126,60540,443,2019-12-10 13:14:56,1.132705,1751,1545.857041,4504,3976.322167,...,0.847468,0.001276,0.035717,0.07984,0.095209,1.2e-05,-1.290885,2.235006,0.447359,False
4,192.168.20.111,54.191.252.154,50928,443,2019-12-10 13:14:56,1.170098,2351,2009.233415,4633,3959.49741,...,0.806748,0.001192,0.034518,0.054705,0.082358,1.6e-05,-2.403374,1.584345,0.630991,False


In [2]:
# Analyzing the target class to see how balanced or unbalanced the dataset is.

df.groupby(df.DoH).size()

DoH
False    355207
True      16257
dtype: int64

### Data Preparation
In this phase, The removal of features with insignificant values was carried out . These features were, Source IP, Destination IP, Packet Time Mode and Timestamp. You may ask, why were these features chosen. Well, Source IP and Destination IP were dropped from the dataset because in a practical sense, numerous applications exist that randomly generate IP addresses making it not a good feature in training an anomaly detection system. So instead of targeting those features, the machine learning model was trained using data that replicates the behavior of DNS tunneling on the DoH protocol. Timestamp on the other hand was removed because it was observed that it had a high correlation with the target variable. To prevent data leakage, timestamp was removed as a feature. While packet time mode had no significant value because it only contained the number 0

In [3]:
df = df.drop(['SourceIP','DestinationIP','PacketTimeMode','TimeStamp'],1)
df

Unnamed: 0,SourcePort,DestinationPort,Duration,FlowBytesSent,FlowSentRate,FlowBytesReceived,FlowReceivedRate,PacketLengthVariance,PacketLengthStandardDeviation,PacketLengthMean,...,PacketTimeCoefficientofVariation,ResponseTimeTimeVariance,ResponseTimeTimeStandardDeviation,ResponseTimeTimeMean,ResponseTimeTimeMedian,ResponseTimeTimeMode,ResponseTimeTimeSkewFromMedian,ResponseTimeTimeSkewFromMode,ResponseTimeTimeCoefficientofVariation,DoH
0,49972,443,0.017147,171,9.972590e+03,105,6.123520e+03,3.380000e+02,18.384776,92.000000,...,0.707107,0.000000e+00,0.000000,0.017120,0.017120,0.017120,-10.000000,-10.000000,0.000000,True
1,50028,443,0.954806,2555,2.675936e+03,5675,5.943616e+03,3.106133e+05,557.326960,257.187500,...,1.353916,8.367029e-05,0.009147,0.010709,0.008997,0.000004,0.561552,1.170332,0.854139,True
2,50092,443,1.646289,8465,5.141868e+03,10845,6.587543e+03,1.360269e+04,116.630552,132.260274,...,0.549101,6.383296e-05,0.007990,0.010945,0.015319,0.000003,-1.642429,1.369527,0.729979,True
3,60540,443,1.132705,1751,1.545857e+03,4504,3.976322e+03,6.008720e+05,775.159356,347.500000,...,0.847468,1.275703e-03,0.035717,0.079840,0.095209,0.000012,-1.290885,2.235006,0.447359,False
4,50928,443,1.170098,2351,2.009233e+03,4633,3.959497e+03,2.082705e+05,456.366649,258.666667,...,0.806748,1.191502e-03,0.034518,0.054705,0.082358,0.000016,-2.403374,1.584345,0.630991,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
371459,443,49440,13.777943,3165,2.297150e+02,2286,1.659174e+02,9.243248e+03,96.141813,126.767442,...,0.317461,5.656498e-08,0.000238,0.000155,0.000050,0.000004,1.330187,0.636808,1.529926,True
371460,35308,443,40.117653,3209,7.998972e+01,47363,1.180602e+03,1.563846e+06,1250.538417,857.152542,...,2.888559,6.483573e+01,8.052064,1.743961,0.008872,0.000046,0.646451,0.216580,4.617112,False
371461,443,49440,177.001521,279,1.576258e+00,720,4.067762e+00,2.851875e+02,16.887495,83.250000,...,0.767390,3.583343e-04,0.018930,0.013419,0.000034,0.000034,2.121320,0.707107,1.410630,True
371462,443,49440,0.000026,93,3.576923e+06,54,2.076923e+06,3.802500e+02,19.500000,73.500000,...,1.000000,0.000000e+00,0.000000,0.000026,0.000026,0.000026,-10.000000,-10.000000,0.000000,True


In [4]:
# removal of data samples that contained Na or duplicates
df = df.dropna()
df = df.drop_duplicates()
df

Unnamed: 0,SourcePort,DestinationPort,Duration,FlowBytesSent,FlowSentRate,FlowBytesReceived,FlowReceivedRate,PacketLengthVariance,PacketLengthStandardDeviation,PacketLengthMean,...,PacketTimeCoefficientofVariation,ResponseTimeTimeVariance,ResponseTimeTimeStandardDeviation,ResponseTimeTimeMean,ResponseTimeTimeMedian,ResponseTimeTimeMode,ResponseTimeTimeSkewFromMedian,ResponseTimeTimeSkewFromMode,ResponseTimeTimeCoefficientofVariation,DoH
0,49972,443,0.017147,171,9.972590e+03,105,6.123520e+03,3.380000e+02,18.384776,92.000000,...,0.707107,0.000000e+00,0.000000,0.017120,0.017120,0.017120,-10.000000,-10.000000,0.000000,True
1,50028,443,0.954806,2555,2.675936e+03,5675,5.943616e+03,3.106133e+05,557.326960,257.187500,...,1.353916,8.367029e-05,0.009147,0.010709,0.008997,0.000004,0.561552,1.170332,0.854139,True
2,50092,443,1.646289,8465,5.141868e+03,10845,6.587543e+03,1.360269e+04,116.630552,132.260274,...,0.549101,6.383296e-05,0.007990,0.010945,0.015319,0.000003,-1.642429,1.369527,0.729979,True
3,60540,443,1.132705,1751,1.545857e+03,4504,3.976322e+03,6.008720e+05,775.159356,347.500000,...,0.847468,1.275703e-03,0.035717,0.079840,0.095209,0.000012,-1.290885,2.235006,0.447359,False
4,50928,443,1.170098,2351,2.009233e+03,4633,3.959497e+03,2.082705e+05,456.366649,258.666667,...,0.806748,1.191502e-03,0.034518,0.054705,0.082358,0.000016,-2.403374,1.584345,0.630991,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
371459,443,49440,13.777943,3165,2.297150e+02,2286,1.659174e+02,9.243248e+03,96.141813,126.767442,...,0.317461,5.656498e-08,0.000238,0.000155,0.000050,0.000004,1.330187,0.636808,1.529926,True
371460,35308,443,40.117653,3209,7.998972e+01,47363,1.180602e+03,1.563846e+06,1250.538417,857.152542,...,2.888559,6.483573e+01,8.052064,1.743961,0.008872,0.000046,0.646451,0.216580,4.617112,False
371461,443,49440,177.001521,279,1.576258e+00,720,4.067762e+00,2.851875e+02,16.887495,83.250000,...,0.767390,3.583343e-04,0.018930,0.013419,0.000034,0.000034,2.121320,0.707107,1.410630,True
371462,443,49440,0.000026,93,3.576923e+06,54,2.076923e+06,3.802500e+02,19.500000,73.500000,...,1.000000,0.000000e+00,0.000000,0.000026,0.000026,0.000026,-10.000000,-10.000000,0.000000,True


### Data Preparation: Label Encoding
To handle the categorical variable in the dataset, sklearn’s label encoder module was utilized. The feature DoH was the only categorical variable in the dataset and it only contained two classes namely True or False. So, these classes were converted to binary were 0 = False and 1 = True.

In [5]:
from sklearn.preprocessing import LabelEncoder
df.DoH = LabelEncoder().fit_transform(df.DoH)

### Data preparation: Target class Balancing
Another preprocessing step taken to prevent a bias classification algorithm, was to balance the datasets target classes. It was observed in Table 2, that the dataset contained more normal networking activities (355207) than DNS tunneling on the DoH (16257). In order to award equal priorities for the proposed binary classification algorithm to perform optimally, the dataset must be balanced.

In [6]:
from sklearn.utils import resample
df_majority = df[df.DoH == 0]
df_minority = df[df.DoH == 1]
df_majority_downsampled = resample(df_majority, replace=False, n_samples=16023, random_state=42)
df = pd.concat([df_majority_downsampled, df_minority])
df.groupby(df.DoH).size()

DoH
0    16023
1    16023
dtype: int64

In [7]:
# Separation of target class from the rest of the dataset
X = df.drop('DoH',1)
y = df.DoH

### Stacked Classifier Design
#### Base Models
* Decision Tree
* Random Forest Classifier
#### Meta-learner
* Multilayer Perceptron

In [8]:
from sklearn.ensemble import StackingClassifier, RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
def stacking_classifier():
    base_model = []
    base_model.append(('rf', RandomForestClassifier(n_estimators=100,random_state=42,max_depth=15, n_jobs=-1)))
    base_model.append(('dt', DecisionTreeClassifier(random_state=42,max_depth=15)))

    meta_learner = MLPClassifier(hidden_layer_sizes=(100), activation="relu", max_iter=500, learning_rate="invscaling")

    model = StackingClassifier(estimators=base_model, final_estimator=meta_learner, cv=5)
    return model

In [9]:
# Standardizing the dataset
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler()
X = X_scaled.fit_transform(X)

In [10]:
#Splitting the dataset into train and test set
from sklearn.model_selection import train_test_split,cross_val_score
X_train, X_test, y_train, y_test = train_test_split(X,y,train_size=0.7, random_state=42)

In [11]:
#calculating the information gain of all features to select the best 19 features that can be used to predict the target class
from sklearn.feature_selection import SelectKBest, mutual_info_classif
select_feature = SelectKBest(mutual_info_classif, k=19).fit(X_train,y_train)
X_train = select_feature.transform(X_train)
X_test = select_feature.transform(X_test)

In [12]:
#fitting the data to the model
model = stacking_classifier()
model = model.fit(X_train,y_train)
y_pred = model.predict(X_test)

In [13]:
#calculate the accuracy of the model
accuracy = model.score(X_test, y_test)
accuracy

0.9826294986478052

### Confusion Matrix
Confusion matrix: - When performing predictions using binary classification models, there are
four possible outcomes that could occur.
• True Positives (TP): is the outcome where the machine learning model correctly predicts
the positive class.
• True Negatives (TN): is the outcome where the model correctly predicts the negative
class.
• False Positives (FP): this occurs when the model, incorrectly predicts the positive class.
• False Negatives (FN): this occurs when the model incorrectly predicts the negative class.

In [14]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,y_pred)

array([[4795,   71],
       [  96, 4652]], dtype=int64)

### Cross validation score
Cross validation was also carried out to see how well the model generalizes

In [15]:
cv_score = cross_val_score(model, X,y,cv=5,scoring="recall_macro")
cv_score.mean()

0.9618082213925261