## Train a model and predict device activities

The file works you through training a model to detect activities of a given device. An activity is defined as any action a device allow its users to do, and each activity should contain at least three repeated experiments to make representative learnings. 

**Before you go ahead, download the required pcap files** from [Google Drive > iot-model.tgz](https://drive.google.com/open?id=1lMqZ5qx6ATqIIiLOdTYcSm6RliK1F7vA) (size = ~127MB), and decompress it to the current folder. You should expect the file structure to be `traffic/us/yi-camera/{activity_name}/{datetime}.{length}.pcap`.

**IMPORTANT** Make sure to use `python3` and install all the dependencies. 
- `pip install -r requirements.txt`


#### Extract pcap files to per-flow level info 
(output too long, hidden from this file)

In [None]:
!./raw2intermediate.sh list_exp.txt tagged-intermediate/us

#### Parse per-flow info to features per-activity
(output too long, hidden from this file)

In [None]:
!python extract_tbp_features.py tagged-intermediate/us/ features/us/

#### Train the model using the features
(re-reun the command below will skip the trained model, delete .model and .label.txt file to re-train) 

In [3]:
!python3 train_rf_models.py features/us/ tagged-models/us/


Training data and creating model...
Running train_rf_models.py...
mkdir: created directory 'tagged-models/us'
mkdir: created directory 'tagged-models/us//output'
Scanning features/us//yi-camera.csv
  Data points: 2490 
	Variable: spanOfGroup          Importance: 0.409
	Variable: meanTBP              Importance: 0.063
	Variable: q60                  Importance: 0.053
	Variable: q80                  Importance: 0.049
	Variable: q90                  Importance: 0.048
	Variable: kurtosisLength       Importance: 0.047
	Variable: meanBytes            Importance: 0.044
	Variable: q70                  Importance: 0.044
	Variable: medAbsDev            Importance: 0.04
	Variable: q40                  Importance: 0.036
	Variable: skewLength           Importance: 0.033
	Variable: medianTBP            Importance: 0.027
	Variable: varTBP               Importance: 0.025
	Variable: q50                  Importance: 0.024
	Variable: skewTBP              Importance: 0.02
	Variable: kurtosisTBP          

#### Predict activities given a pcap file

`Usage: ./predict.sh device_name path-to-pcap result-file modeldir
    Note that a temprary file  /tmp/{md5}.txt will be created during the process
    Requires python3`

In [4]:
!python3 -W ignore predict.py yi-camera sample-yi-camera-recording.pcap sample-result.csv tagged-models/us/


Predicting amout of inferable device activity from pcap file...
Running predict.py...
mkdir: created directory 'user-intermediates/'
Model: tagged-models/us//yi-camera.model
Total packets: 1621
Number of slices: 2
Results:
             ts        ts_end  ts_delta  num_pkt              state
0  1.556329e+09  1.556329e+09  0.000019     1620  android_lan_watch
Results saved to sample-result.csv


In [5]:
!cat sample-result.csv

ts,ts_end,ts_delta,num_pkt,state,device
1556329377.198794,1556329407.828307,1.9e-05,1620,android_lan_watch,yi-camera


Explanation - between epoch time 1556329377.198794 and 1556329407.828307, the network traffic from yi-camera was predicted as the same activity as android_lan_watch, which is using android companion app to watch the video from the camera when both devices are connected to the same WI-FI network.