## Train a Model and Predict Device Activities

The file works through training a model to detect activities of a given device. An activity is defined as any action a device allow its users to do, and each activity should contain at least three repeated experiments to make representative learnings. 

**Before you run this notebook, download the required pcap files.** Request the dataset at https://moniotrlab.ccis.neu.edu/imc19/. When access has been granted, download the `iot-model.tgz` archive, decompress it to the current folder. You should expect the file structure to be `traffic/us/yi-camera/{activity_name}/{datetime}.{length}.pcap`.

**IMPORTANT** Make sure to use `python3`, and install all the dependencies. 
- `pip install -r requirements.txt`


#### Extract pcap files to per-flow level info
Output has been truncated because of length.

In [1]:
!./raw2intermediate.sh exp_list.txt tagged-intermediate/us

Running ./raw2intermediate.sh...
Input files located in: exp_list.txt
Output files placed in: tagged-intermediate/us
Decoding traffic/us/yi-camera/power/2019-04-25_19:28:58.154s.pcap into tagged-intermediate/us/yi-camera/power/2019-04-25_19:28:58.154s.txt
5	1556235010.189201000	6.621162000	eth:ethertype:ip:udp:bootp	322	b0:d5:9d:b9:f0:b4	ff:ff:ff:ff:ff:ff	0.0.0.0	255.255.255.255				68	67
6	1556235010.189291000	0.000090000	eth:ethertype:ip:icmp:data	62	22:ef:03:1a:97:b9	b0:d5:9d:b9:f0:b4	192.168.10.254	192.168.10.204					
7	1556235011.190407000	1.001116000	eth:ethertype:ip:udp:bootp	346	22:ef:03:1a:97:b9	b0:d5:9d:b9:f0:b4	192.168.10.254	192.168.10.204				67	68
Line count: 152 tagged-intermediate/us/yi-camera/power/2019-04-25_19:28:58.154s.txt

Decoding traffic/us/yi-camera/power/2019-04-25_19:25:30.155s.pcap into tagged-intermediate/us/yi-camera/power/2019-04-25_19:25:30.155s.txt
5	1556234778.510610000	1.074978000	eth:ethertype:ip:udp:bootp	322	b0:d5:9d:b9:f0:b4	ff:ff:ff:ff:ff:ff	0.0.0.

#### Parse per-flow info to features per-activity
Output has been truncated because of length.

In [2]:
!python extract_features.py tagged-intermediate/us/ features/us/

Running extract_features.py...
Input files located in: tagged-intermediate/us/
Output files placed in: features/us/
mkdir: created directory 'features'
mkdir: created directory 'features/us'
mkdir: created directory 'features/us//caches'
Feature files to be generated from following devices: yi-camera
Total packets: 160
    Saved to features/us//caches/yi-camera_power_2019-04-25_19:25:30.155s.csv
Total packets: 162
    Saved to features/us//caches/yi-camera_power_2019-04-25_19:21:40.166s.csv
Total packets: 152
    Saved to features/us//caches/yi-camera_power_2019-04-25_19:28:58.154s.csv
Total packets: 1557
    Saved to features/us//caches/yi-camera_android_wan_photo_2019-04-27_22:08:01.37s.csv
Total packets: 1573
    Saved to features/us//caches/yi-camera_android_wan_photo_2019-04-27_21:44:25.36s.csv
Total packets: 1715
    Saved to features/us//caches/yi-camera_android_wan_photo_2019-04-27_22:29:48.37s.csv


#### Train the model(s) using the features
Reruning the command below will skip the model training. Delete .model and .label.txt files in `tagged-models/us/` to retrain.

In [3]:
!python eval_models.py -f features/us/ -m tagged-models/us/ -knr

Running eval_models.py...
Reading command line arguments...
Performing error checking on command line arguments...
Input files located in: features/us/
Output files placed in: tagged-models/us/
mkdir: created directory 'tagged-models'
mkdir: created directory 'tagged-models/us'
mkdir: created directory 'tagged-models/us//output'
root_feature: features/us/
root_model: tagged-models/us/
root_output: tagged-models/us//output
yi-camera.csv
Training yi-camera using algorithm(s): ['kmeans', 'knn', 'rf']
	#Total data points: 2490 
  return self.partial_fit(X, y)
  return self.fit(X, **fit_params).transform(X)
Train: 1743
Test: 747
  kmeans: n_clusters=8
	Time to perform tSNE: 104.03s
	Saved the tSNE plot to tagged-models/us//kmeans/kmeans-yi-camera.png
    model -> tagged-models/us//kmeans/yi-camerakmeans.model
    labels -> tagged-models/us//kmeans/yi-camera.label.txt
	android_lan_photo
	android_lan_recording
	android_lan_watch
	android_wan_photo
	android_wan_recording
	android_wan_watch
	lo

#### Predict activities given a pcap file

In [4]:
!python predict.py yi_camera_sample.pcap tagged-models/us/ yi-camera knn sample.csv

Running predict.py...
Input pcap: yi_camera_sample.pcap
Input model directory: tagged-models/us//knn
Device name: yi-camera
Model name: knn
Output CSV: sample.csv
mkdir: created directory 'user-intermediates/'
yi-cameraknn.model
Model: tagged-models/us//knn/yi-cameraknn.model
Total packets: 1621
Number of slices: 2
  unknown_data = ss.transform(unknown_data)
Results:
             ts        ts_end  ts_delta  num_pkt              state
0  1.556329e+09  1.556329e+09  0.000019     1620  android_lan_photo
Results saved to sample.csv


In [5]:
!cat sample.csv

ts,ts_end,ts_delta,num_pkt,state,device
1556329377.198794,1556329407.828307,1.9e-05,1620,android_lan_photo,yi-camera


Explanation: Between epoch time 1556329377.198794 and 1556329407.828307, the network traffic from yi-camera was predicted to be the same activity as android_lan_watch, which is using the android companion app to watch the video from the camera when both devices are connected to the same WI-FI network.