# Multi-Sensor Test
## Angle Threshold

### Experiment Aims：
Test the influence of the threshold of different steer angles on the classification accuracy, and choose a suitable threshold of the steer angle.

### Experiment Design：
For efficiency purposes, this experiment uses Sklearn's Random Forest (n_estimators=10) to determine whether the current angle_threshold is reasonable.  
If the accuracy rate of Random Forest is low, we believe that the current angle_threshold cannot separate left and right well. If the accuracy is high, we can regard the current angle_threshold is reasonable.  
This experiment uses the KFold method of 4 Folds to reduce the influence of randomness in the separation of the training set and the test set on the results.

This experiment tested the effect of angle_threshold of 10, 20, 30...90 on the rationality of splitting the data set.

### Experiment Content：

In [6]:
import pandas as pd
import numpy as np
import os

In [7]:
# Read the data, assign a label to each image data according to the threshold, including go, stop, left, right
# The default speed_threshold=5, angle_threshold=30
# Finally we generate bounding_box data: X, corresponding label: y
import random
def process_data(data, speed_threshold, angle_threshold, sign_threshold, data_size):
    stop, go, left, right = split(data, speed_threshold, angle_threshold, sign_threshold, data_size)

    print("go, stop, left, right")
    print(len(go), len(stop), len(left), len(right) )

    X = np.array(list(go)+list(stop)+list(left)+list(right))
    y = np.array(list(np.ones(len(go)))+list(np.ones(len(stop))*2)+list(np.ones(len(left))*3)+list(np.ones(len(right))*4))
    
    mask = [i for i in range(len(y))]
    random.shuffle(mask)

    X=X[mask]
    y=y[mask]

    X = np.reshape(X, (len(y),21*5))

    print("X:", X.shape)
    print("y:", y.shape)
    return X, y

# Separate all data files into four categories: go, stop, left, and right by threshold.
# The default speed_threshold=5, angle_threshold=30
def split(data, speed_threshold=5, angle_threshold=30, sign_threshold=0.5, data_size=200):
    stop_full  = data[data["vehicle_speed"]<=speed_threshold]

    go = data[data["vehicle_speed"]>speed_threshold]
    go_full =  go[go["steering_angle_calculated"]<=angle_threshold]

    steer = go[go["steering_angle_calculated"]>angle_threshold]
    left_full  = steer[steer["steering_angle_sign"]<=sign_threshold]
    right_full = steer[steer["steering_angle_sign"]>sign_threshold]
    
    go    = get_box(go_full[:data_size])
    stop  = get_box(stop_full[:data_size])
    left  = get_box(left_full[:data_size])
    right = get_box(right_full[:data_size])
    
    return stop, go, left, right

# Take out the bounding boxs, turning angle, speed and other data of all pictures from the data file.
def get_box(fulltsv, padding=0):
    maxBox = 21

    header = [col for col in fulltsv]
    header.remove('box')
    
    x_full = []
    
    label_dict = {'Car': 1,
                 'VanSUV': 2,
                 'Pedestrian': 3,
                 'Trailer': 4,
                 'Bus': 5,
                 'Truck': 6,
                 'Bicycle': 7,
                 'MotorBiker': 8,
                 'Motorcycle': 9,
                 'Animal': 10,
                 'UtilityVehicle': 11,
                 'CaravanTransporter': 12,
                 'EmergencyVehicle': 13,
                 'Cyclist': 14}
    
    for index, row in fulltsv.iterrows():
        x = []
            
        boxs = eval(row['box'])
        for box in boxs[:maxBox]: # 生成x, 添加已有的box，box上限数量是maxBox
            x.append(box['2d_bbox'] + [label_dict[box['class']]])

        for i in range(maxBox - len(boxs)): # 填补空的box
            x.append([padding,padding,padding,padding,padding])
        
        x_full.append(x)
    return np.array(x_full)

In [8]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import KFold

def test(X, y, n_splits=4):
    cm_result = np.zeros((4,4))
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=False)
    for train_index, test_index in kf.split(X):
    #     print("TRAIN:", train_index, "TEST:", test_index)
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        rfc = RandomForestClassifier(n_estimators=10, max_features=4, max_depth=None, min_samples_split=4, bootstrap=True)
        rfc.fit(X_train, y_train)
        y_pred = rfc.predict(X_test)
        cm = confusion_matrix(y_test, y_pred)
        cm_rate = cm/cm.sum(axis=1)
        cm_result += cm_rate
    
    correct = sum(y_pred==y_test)/len(y_test)
    print(f"Test Accuracy: {(100*correct):>0.1f}%")
    return cm_result/n_splits

In [9]:
data = pd.read_csv("full_info.tsv", sep ="\t")

This experiment tried the influence of angle_threshold of 10, 20, 30...90 on the classification results.

The classification accuracy is expressed in the form of a confusion matrix.  
The diagonal line from top left to bottom right corresponds to the accuracy of the four categories. The four types are go, stop, left, and right.  
The data in each row represents the probability that the data which is actually belonged to the row is classified and classified into the corresponding column by the classifier.

In [10]:
for i in range(1,10):
    speed_threshold = 5
    angle_threshold = i*10
    sign_threshold = 0.5
    
    print("angle_threshold", angle_threshold)
    
    data_size = 200
    X, y = process_data(data, speed_threshold, angle_threshold, sign_threshold, data_size)

    n_splits = 4
    print(test(X, y, 4))
    print()


angle_threshold 10
go, stop, left, right
200 200 200 200
X: (800, 105)
y: (800,)
Test Accuracy: 46.0%
[[0.45026857 0.13198997 0.2632601  0.1723356 ]
 [0.06586866 0.68132107 0.14984024 0.11161848]
 [0.20665622 0.18757525 0.36251389 0.26860261]
 [0.20420764 0.17795987 0.23666448 0.39437358]]

angle_threshold 20
go, stop, left, right
200 200 200 200
X: (800, 105)
y: (800,)
Test Accuracy: 57.5%
[[0.52214129 0.11206935 0.18116105 0.19623258]
 [0.08973712 0.68096267 0.1258012  0.11471821]
 [0.19182261 0.14834386 0.44587178 0.22835738]
 [0.15940217 0.1695303  0.19417156 0.48422124]]

angle_threshold 30
go, stop, left, right
200 200 200 200
X: (800, 105)
y: (800,)
Test Accuracy: 46.5%
[[0.5067047  0.12044818 0.18664904 0.19515062]
 [0.07530572 0.6547619  0.15811727 0.12390433]
 [0.18642825 0.14595838 0.3778082  0.30533222]
 [0.16695148 0.18067227 0.28708379 0.38695021]]

angle_threshold 40
go, stop, left, right
200 200 200 200
X: (800, 105)
y: (800,)
Test Accuracy: 50.0%
[[0.49609226 0.1228872

### Experiment Analysis：
The data in the third row and the third column and the data in the fourth row and fourth column of the confusion matrix correspond to the classification accuracy of the left class and the right class, respectively.  
According to the results of different angle_threshold, we can believe that when angle_threshold ∈ [20,70], the classification accuracy of the two types does not change a lot.  
When angle_threshold<20, the accuracy of the classification may deteriorate due to insufficient data discrimination.
When angle_threshold>70, it may be that the total amount of data becomes smaller, and the final classification accuracy becomes worse.

For angle_threshold ∈ [20,70], we can find that the accuracy rate is highest when angle_threshold=50, which is more appropriate angle_threshold

### Expeiment Conclusion：
50 should be a more appropriate angle_threshold. However, it does not improve the final accuracy rate much, and the accureate rate is still lower than 60%.