# Predicting Operating System through Network Traffic
## Irina Lee & William Wang

This notebook represents our work for the final project in Machine Learning for Computer Systems. We took network data and labelled each flow with its corresponding operating system (according to CIC-IDS2017), then experimented with different models to see which one produced the best prediction results. The goal of this project was to replicate the best-known results produced by nPrint for the problem of OS Detection. We used Random Forest Classifiers, Extra Tree Classifiers, and K Nearest Neighbors. Overall, we found that Random Forest Classifier produced the best resuls with an AUC score of 0.94.


## Loading the Data

We used the Intrusion Detection Evaluation Dataset (CIC-IDS2017) from this website (https://www.unb.ca/cic/datasets/ids-2017.html) to get our data. We used flow data from the dataset labelled "Friday Morning Working Hours".

In [None]:
import pandas as pd

friday = pd.read_csv('TrafficLabelling/Friday-WorkingHours-Morning.pcap_ISCX.csv.gz', index_col=0)

print('Number of Packets: {0}, Features per packet'.format(friday.shape[0], friday.shape[1]))

We then print out the `friday` dataframe to observe the columns.

In [2]:
friday

Unnamed: 0_level_0,Source IP,Source Port,Destination IP,Destination Port,Protocol,Timestamp,Flow Duration,Total Fwd Packets,Total Backward Packets,Total Length of Fwd Packets,...,min_seg_size_forward,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min,Label
Flow ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
192.168.10.3-192.168.10.50-3268-56108-6,192.168.10.50,56108,192.168.10.3,3268,6,7/7/2017 8:59,112740690,32,16,6448,...,32,3.594286e+02,1.199802e+01,380.0,343.0,16100000.0,4.988048e+05,16400000.0,15400000.0,BENIGN
192.168.10.3-192.168.10.50-389-42144-6,192.168.10.50,42144,192.168.10.3,389,6,7/7/2017 8:59,112740560,32,16,6448,...,32,3.202857e+02,1.574499e+01,330.0,285.0,16100000.0,4.987937e+05,16400000.0,15400000.0,BENIGN
8.0.6.4-8.6.0.1-0-0-0,8.6.0.1,0,8.0.6.4,0,0,7/7/2017 9:00,113757377,545,0,0,...,0,9.361829e+06,7.324646e+06,18900000.0,19.0,12200000.0,6.935824e+06,20800000.0,5504997.0,BENIGN
192.168.10.9-224.0.0.252-63210-5355-17,192.168.10.9,63210,224.0.0.252,5355,17,7/7/2017 9:00,100126,22,0,616,...,32,0.000000e+00,0.000000e+00,0.0,0.0,0.0,0.000000e+00,0.0,0.0,BENIGN
192.168.10.9-224.0.0.22-0-0-0,192.168.10.9,0,224.0.0.22,0,0,7/7/2017 9:00,54760,4,0,0,...,0,0.000000e+00,0.000000e+00,0.0,0.0,0.0,0.000000e+00,0.0,0.0,BENIGN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
192.168.10.3-192.168.10.14-53-51018-17,192.168.10.14,51018,192.168.10.3,53,17,7/7/2017 12:59,61452,4,2,180,...,20,0.000000e+00,0.000000e+00,0.0,0.0,0.0,0.000000e+00,0.0,0.0,BENIGN
192.168.10.3-192.168.10.14-53-49984-17,192.168.10.14,49984,192.168.10.3,53,17,7/7/2017 12:59,171,2,2,80,...,32,0.000000e+00,0.000000e+00,0.0,0.0,0.0,0.000000e+00,0.0,0.0,BENIGN
192.168.10.3-192.168.10.14-53-64015-17,192.168.10.14,64015,192.168.10.3,53,17,7/7/2017 12:59,222,2,2,90,...,32,0.000000e+00,0.000000e+00,0.0,0.0,0.0,0.000000e+00,0.0,0.0,BENIGN
192.168.10.17-198.100.147.178-123-123-17,192.168.10.17,123,198.100.147.178,123,17,7/7/2017 12:59,16842,1,1,48,...,20,0.000000e+00,0.000000e+00,0.0,0.0,0.0,0.000000e+00,0.0,0.0,BENIGN


## Labelling Our Samples

We label each sample with its corresponding operating system by matching its source IP address to the OS found
on this website https://www.unb.ca/cic/datasets/ids-2017.html. There were IP addresses in the dataset that did not have any corresponding OS, so we just labelled it with a default value of "Other." Because OS was labelled using IP address, we dropped all columns that had a direct relationship with IP, including Source IP, Destination IP, and Destination Port. This also matched what the nPrint researchers did in their study as well. Furthermore, we also dropped timestamp because logicaly timestamp and OS shouldn't have much correlation, and we didn't want any spurious correlations between time and OS prediction.

We also ran into some issues with our samples being float64 while the models could only accept float32, so we converted all float64 values into float32 types.

In [3]:
import numpy as np
import math 
samples = []
labels = []

friday = friday.replace([np.inf, -np.inf], np.nan)
friday = friday.dropna()
# label the operating system based on IP address from this website: https://www.unb.ca/cic/datasets/ids-2017.html
ip_to_os = {'205.174.165.73': 'Kali', '205.174.165.69': 'Win', '205.174.165.70': 'Win', '205.174.165.71': 'Win', '192.168.10.50': 'Web server 16 Public', '192.168.10.205.174.165.68': 'Web server 16 Public', '192.168.10.51': 'Ubuntu server 12 Public', '192.168.10.205.174.165.66': 'Ubuntu server 12 Public', '192.168.10.19': 'Ubuntu 14.4, 32B', '192.168.10.17': 'Ubuntu 14.4, 64B', '192.168.10.16': 'Ubuntu 16.4, 32B', '192.168.10.12': 'Ubuntu 16.4, 64B', '192.168.10.9': 'Win 7 Pro, 64B', '192.168.10.5': 'Win 8.1, 64B', '192.168.10.8': 'Win Vista, 64B', '192.168.10.14': 'Win 10, pro 32B', '192.168.10.15': 'Win 10, 64B', '192.168.10.25': 'MAC'}
labels = friday[' Source IP'].apply(lambda x : "Other" if x not in ip_to_os else ip_to_os[x])
samples = friday.drop([' Source IP', ' Source Port', ' Destination IP', ' Destination Port', ' Timestamp'], axis=1)
samples = samples._get_numeric_data()
samples = samples.reset_index(drop=True)

samples = samples.astype(np.float32)
samples.dtypes.value_counts()

float32    78
dtype: int64

## Training Our Classifiers

We now train our data on a multitude of different classifiers to identify the best-performing classifier for OS detection. We try Random Forest Classifier, ExtraTrees Classifier and K Nearest Neighbors Classifier. We found that Random Forest Classifier performed the best with an AUC score of 0.94 and K Nearest Neighbors performed the worst with an AUC score of 0.81.

### Random Forest Classifier

In [17]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score

# Split data
X_train, X_test, y_train, y_test = train_test_split(samples, labels)

# Initialize Classifier
clf = RandomForestClassifier(n_estimators=1000, max_depth=None, min_samples_split=2, random_state=0)

# Train 
clf.fit(X_train, y_train) 

# Predict
y_pred = clf.predict(X_test)

# Statistics

# First, lets get a stat report about the precision and recall:
report = classification_report(y_test, y_pred)
print(report)

# Let's also get the ROC AUC score while we're here, which requires a probability instead of just the prediction
y_pred_proba = clf.predict_proba(X_test)
# predict_proba gives us a probability estimate of each class, while roc_auc just cares about the "positive" class
y_pred_proba_pos = [sublist[1] for sublist in y_pred_proba]

                         precision    recall  f1-score   support

                   Kali       0.92      0.92      0.92       161
                    MAC       0.87      0.80      0.84      1695
                  Other       0.98      0.99      0.98     13838
       Ubuntu 14.4, 32B       0.58      0.61      0.60      3809
       Ubuntu 14.4, 64B       0.47      0.39      0.42      2468
       Ubuntu 16.4, 32B       0.50      0.40      0.44      2486
       Ubuntu 16.4, 64B       0.53      0.62      0.57      4381
Ubuntu server 12 Public       0.97      0.91      0.94       247
   Web server 16 Public       0.83      0.88      0.86      1007
            Win 10, 64B       0.48      0.47      0.47      3305
        Win 10, pro 32B       0.44      0.35      0.39      2590
         Win 7 Pro, 64B       0.50      0.55      0.53      4407
           Win 8.1, 64B       0.55      0.59      0.57      4783
         Win Vista, 64B       0.51      0.49      0.50      2551

               accuracy

In [18]:
roc = roc_auc_score(y_test, y_pred_proba, multi_class="ovo")
print('ROC AUC Score: {0}'.format(roc))

ROC AUC Score: 0.9480181246701394


### ExtraTrees Classifier

In [19]:
from sklearn.ensemble import ExtraTreesClassifier

# Split data
X_train, X_test, y_train, y_test = train_test_split(samples, labels)

# Initialize Classifier
clf = ExtraTreesClassifier(n_estimators=1000, max_depth=None, min_samples_split=2, random_state=0)

# Train 
clf.fit(X_train, y_train) 

# Predict
y_pred = clf.predict(X_test)

# Statistics

# First, lets get a stat report about the precision and recall:
report = classification_report(y_test, y_pred)
print(report)

# Let's also get the ROC AUC score while we're here, which requires a  probability instead of just the prediction
y_pred_proba = clf.predict_proba(X_test)
# predict_proba gives us a probability estimate of each class, while roc_auc just cares about the "positive" class
y_pred_proba_pos = [sublist[1] for sublist in y_pred_proba]

                         precision    recall  f1-score   support

                   Kali       0.90      0.91      0.90       184
                    MAC       0.86      0.84      0.85      1723
                  Other       0.98      0.98      0.98     13974
       Ubuntu 14.4, 32B       0.58      0.60      0.59      3730
       Ubuntu 14.4, 64B       0.44      0.40      0.42      2477
       Ubuntu 16.4, 32B       0.47      0.41      0.44      2426
       Ubuntu 16.4, 64B       0.55      0.62      0.58      4355
Ubuntu server 12 Public       0.93      0.94      0.94       266
   Web server 16 Public       0.86      0.86      0.86       984
            Win 10, 64B       0.45      0.45      0.45      3165
        Win 10, pro 32B       0.38      0.33      0.35      2622
         Win 7 Pro, 64B       0.49      0.52      0.51      4403
           Win 8.1, 64B       0.54      0.57      0.55      4822
         Win Vista, 64B       0.52      0.47      0.49      2597

               accuracy

In [20]:
roc = roc_auc_score(y_test, y_pred_proba, multi_class="ovo")
print('ROC AUC Score: {0}'.format(roc))

ROC AUC Score: 0.9304106731453283


### KNeighbors Classifier

In [25]:
from sklearn.neighbors import KNeighborsClassifier

# Split data
X_train, X_test, y_train, y_test = train_test_split(samples, labels)

# Initialize Classifier
clf = KNeighborsClassifier()

# Train 
clf.fit(X_train, y_train) 

# Predict
y_pred = clf.predict(X_test)

# Statistics

# First, lets get a stat report about the precision and recall:
report = classification_report(y_test, y_pred)
print(report)

# Let's also get the ROC AUC score while we're here, which requires a probability instead of just the prediction
y_pred_proba = clf.predict_proba(X_test)
# predict_proba gives us a probability estimate of each class, while roc_auc just cares about the "positive" class
y_pred_proba_pos = [sublist[1] for sublist in y_pred_proba]

                         precision    recall  f1-score   support

                   Kali       0.66      0.75      0.70       179
                    MAC       0.51      0.69      0.58      1660
                  Other       0.93      0.97      0.95     13996
       Ubuntu 14.4, 32B       0.33      0.47      0.38      3797
       Ubuntu 14.4, 64B       0.25      0.26      0.25      2361
       Ubuntu 16.4, 32B       0.28      0.25      0.26      2455
       Ubuntu 16.4, 64B       0.36      0.34      0.35      4328
Ubuntu server 12 Public       0.76      0.68      0.72       255
   Web server 16 Public       0.75      0.73      0.74      1015
            Win 10, 64B       0.30      0.30      0.30      3273
        Win 10, pro 32B       0.28      0.22      0.24      2599
         Win 7 Pro, 64B       0.36      0.35      0.35      4326
           Win 8.1, 64B       0.45      0.38      0.42      4886
         Win Vista, 64B       0.38      0.24      0.29      2598

               accuracy

In [26]:
roc = roc_auc_score(y_test, y_pred_proba, multi_class="ovo")
print('ROC AUC Score: {0}'.format(roc))

ROC AUC Score: 0.814932109085949


## Understanding the model

We use the classifiers' built-in `feature_importances_` field to check which features are most important in OS detection. We see that Bakcward Average Bulk Rate stands out as the most important feature in predicting operating system, with Backward PSH Flags, and Backward Packet Length also contributing.

In [9]:
# Get Raw feature importances
feature_importances = clf.feature_importances_
# Match the feature names we know with the importances
named_importances = []
for column_name, importance in zip(friday.columns, feature_importances):
    named_importances.append((column_name, importance))
# Sort the named feature importances
sorted_feature_importances = sorted(named_importances, key=lambda tup: tup[1], reverse=True)
# Now lets print the top 20 important features (bits)
print(*sorted_feature_importances[0:20], sep='\n') 

('Bwd Avg Bulk Rate', 0.06587550618756476)
(' Bwd PSH Flags', 0.03456268321739852)
(' Bwd Packet Length Std', 0.03388788337733025)
('Bwd Packet Length Max', 0.03321528965970906)
(' Source Port', 0.032348214504896076)
(' Bwd Packet Length Min', 0.03103152576909429)
(' Fwd Packet Length Std', 0.027906934094365753)
(' Fwd URG Flags', 0.027673210505167998)
('Flow Bytes/s', 0.026854952635362212)
(' Bwd Packet Length Mean', 0.023765668338893303)
(' Destination Port', 0.02040653695907125)
(' Flow IAT Max', 0.020325223913026234)
(' Flow Packets/s', 0.01982427522386787)
(' Bwd Avg Bytes/Bulk', 0.019470949545454263)
(' Flow IAT Mean', 0.01926322793584967)
(' Subflow Bwd Packets', 0.01912892502631372)
(' Flow IAT Min', 0.01886823846283587)
('Fwd PSH Flags', 0.0180025710790003)
(' Packet Length Variance', 0.01773255467280275)
(' ACK Flag Count', 0.017037203867046335)


## Conclusion

To c