# Goal 3: Investigate overall correlation between quantitative and qualitative metrics with a classification based approach.

**3.1** 
- Which metrics are better to classify correctly the experiment? Use external labels [Good, Mid, Bad]
- Train a classifier to predict the goodness of the experiment based on assigned labels using both QM and HM features.

**3.2**
 - Use HM survey data as labels: An overall label [0, 1, 2] assigned by humans for each experiment can be found with clustering or with weighted average
 - Train a classifier with x = QM and y = HM survey metrics

**Ablation (both cases)**: try all the combination of QM to get the best classification accuracy

**Experiments Scenarios**:
- "Passing"
- "Overtaking"
- "Crossing 1"
- "Crossing 2"
- "Advanced 1"
- "Advanced 2"
- "Advanced 3"
- "Advanced 4"

**Labels**:
- "Good"
- "Mid"
- "Bad"

**QM Metrics**
- [0] Time to Goal
- [1] Path length
- [2] Cumulative heading changes
- [3] Avg robot linear speed

- [4] Social Work 
- [5] Social Work (per second)
- [6] Average minimum distance to closest person
- [7] Proxemics: intimate space occupancy
- [8] Proxemics: personal space occupancy
- [9] Proxemics: social space occupancy
- [10] Proxemics: public space occupancy

**HM Metrics**
- [0] Unobtrusiveness
- [1] Friendliness
- [2] Smoothness
- [3] Avoidance Foresight

In [9]:
import yaml
import numpy as np
import os
from os.path import expanduser

In [10]:
home = expanduser("~")
# Load config params for experiments
config = yaml.safe_load(open('params.yaml'))['social_metrics_match']

lab_data_path = home + config['data']['repo_dir'] + config['data']['lab_data_path']
survey_data_path = home + config['data']['repo_dir'] + config['data']['survey_data_path']
results_dir = home + config['data']['results_path']
print("lab data path: ", lab_data_path)
print("survey data path: ", survey_data_path)
print("results dir path: ", results_dir)

lab data path:  /root/Social-Nav-Metrics-Matching/social_metrics_match/data_folder/validation_of_metrics_quantitative_and_lab_qualitative.ods
survey data path:  /root/Social-Nav-Metrics-Matching/social_metrics_match/data_folder/qualitative_metrics_survey.xlsx
results dir path:  /root/social_metrics_results


# Extract LAB data arrays

In [11]:
from utils.data_organization import organize_dict_lab_data, get_all_lab_data_arr, np_extract_exp_lab, np_single_lab_run
from utils.data_organization  import organize_dict_survey, weighted_avg_survey_data, get_robotics_knowledge, datacube_qual_survey_data

In [12]:
dict_lab_data = organize_dict_lab_data(lab_data_path)

# Extract the np arrays of a specific experiments identified by its keys
passing_good_QM_array, passing_good_HM_array = np_single_lab_run(dict_lab_data, experiment='Passing', label='Good')
print(f"Passing single run QM shape:{passing_good_QM_array.shape}, passing single run HM shape: {passing_good_HM_array.shape}")
print(f"Passing good QM: {passing_good_QM_array},\nPassing good HM: {passing_good_HM_array}") 

# Extract the np arrays of a lab scenario (all the 3 runs with different labels), dividing QM and HM
passing_QM_array, passing_HM_array = np_extract_exp_lab(dict_lab_data, experiment='Advanced 4', order=True, normalize=True, normalization="rescale")
print(f"passing QM shape:{passing_QM_array.shape}, passing HM shape: {passing_HM_array.shape}")
# print(f"passing QM: {passing_QM_array},\npassing HM: {passing_HM_array}")

# Starting from the complete dataframe with lab data, Extract the np arrays of all lab scenarios dividing QM and HM
all_lab_QM_array, all_lab_HM_array = get_all_lab_data_arr(dict_lab_data, order=True, normalize=True, normalization="rescale")
print(f"All lab QM array: {all_lab_QM_array.shape}, All lab HM array: {all_lab_HM_array.shape}")
# print(f"All lab QM array: {all_lab_QM_array}, All lab HM array: {all_lab_HM_array}")

Passing single run QM shape:(11,), passing single run HM shape: (4,)
Passing good QM: [1.02326076e+01 4.55598068e+00 4.04913597e+00 1.99936767e-01
 1.99432749e+03 1.81716719e+02 2.37741413e+00 9.28961749e+00
 1.23341140e+01 7.28337237e+01 5.54254489e+00],
Passing good HM: [0.8 0.8 0.8 1. ]
passing QM shape:(11, 3), passing HM shape: (4, 3)
All lab QM array: (24, 11), All lab HM array: (24, 4)


**SURVEY DATA**

In [13]:
dict_survey_data = organize_dict_survey(survey_data_path)
robot_knowledge_array = get_robotics_knowledge(survey_data_path)

# To extract np arrays of all the survey data
survey_datacube = datacube_qual_survey_data(dict_survey_data, normalize=True)

# To directly extract the average and std: If Weighted average set w_avg=True (use robotics background knowledge as weights)
weighted_survey_array_avg, weighted_survey_array_std = weighted_avg_survey_data(dict_survey_data, robot_knowledge_array, w_avg=True, normalize=False)
print(f"survey weighted avg shape: {weighted_survey_array_avg.shape},\nsurvey weighted std shape:  {weighted_survey_array_std.shape}") 

survey weighted avg shape: (24, 4),
survey weighted std shape:  (24, 4)


In [14]:
# Encode survey aggregated scores in labels format
survey_score = np.rint(np.nanmean(weighted_survey_array_avg, axis=1)).astype(int)
survey_score_coded = survey_score.copy()
survey_score_coded[survey_score_coded < 3] = 0
survey_score_coded[survey_score_coded == 3] = 1
survey_score_coded[survey_score_coded > 3] = 2
print(survey_score_coded.shape, survey_score_coded)

(24,) [0 1 2 1 1 2 2 0 2 2 1 0 1 2 2 1 1 0 2 0 2 0 0 0]


# Classifier study

In [15]:
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix,classification_report
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [58]:
# Define X, y dataset
# X = QM metrics combination
# y = HM survey aggregatede scores

X = all_lab_QM_array
y = survey_score_coded

X_train = np.vstack([X[:15, :], X[-3:, :]])
y_train = np.vstack([y[:15, None], y[-3:, None]])
print(X_train.shape, y_train.shape)

X_test = X[15:21, :]
y_test = y[15:21]
print(X_test.shape, y_test.shape)

# Define Decision Tree Classifier
print("Decision Tree Classifier")

# Define RandomForest Classifier
print("Random Forest Classifier")
rf_clf = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', max_depth=2, random_state=42)
rf_clf.fit(X_train, y_train)

# Predicting the Test set results
y_pred = rf_clf.predict(X_test)
print("Y pred: ", y_pred)
print("Y true: ", y_test)

accuracy_score(y_test, y_pred)

# Define SVM Classifier
print("SVM Classifier")

(18, 11) (18, 1)
(6, 11) (6,)
Y pred:  [1 1 2 2 0 2]
Y true:  [1 1 0 2 0 2]


  return fit_method(estimator, *args, **kwargs)


0.8333333333333334

In [52]:
# Making the Confusion Matrix
print(confusion_matrix(y_test, y_pred))
print("\n")



0.8333333333333334

GridSearch for hyperparameters tuning

In [53]:
n_estimators = [10, 25]
max_depth = [2, 10, 25]
min_samples_leaf = [2]
criterion = ['gini', 'entropy']
bootstrap = [True, False]

param_grid = {
    "n_estimators": n_estimators,
    "max_depth": max_depth,
    "min_samples_leaf": min_samples_leaf,
    "bootstrap": bootstrap,
}

rf = RandomForestRegressor(random_state=42)

rf_model = GridSearchCV(estimator=rf, param_grid=param_grid, cv=2, verbose=10, n_jobs=-1)
rf_model.fit(X_train_scaled, Y_train)

print("Using hyperparameters --> \n", rf_model.best_params_)

NameError: name 'RandomForestRegressor' is not defined