None coding part:

A. ROC is a popular graphic simultaneously displaying the two types of errors (FPR, TPR) for all possible thresholds. The overall performance of a classifier, summarized
over all possible thresholds, is given by the area under the curve.
The advantage of this metric is that it considers all possible thresholds, in that way we can compare two classifiers without the need to choose a specific threshold.
while evaluating the overall performance is important, at some areas, high AUC is not interpretable. ROC AUC treats sensitivity and specificity as equally important overall when averaged across all thresholds, but what if we care more about the sensitivity, meaning correctly predicting a cancer and treating it? in that situation we would prefer to look at a specific measure and not the total performance of a classifier. 

B. Accuracy, as the name states, measures how accurate is our classifier, meaning how much did we predicted correctly over all predictions. It is a good metric when we have a balanced dataset and we don't care to be more accurate in out prediction to one class then the other. On the other hand, when we have an unbalanced dataset, where we have for example 95 observation of class 0 and 5 observations of class 1, accuracy preforms poorly.
If we would decide to predict all the observation to be 0, we would get 95% accuracy and we would think we have a good classifier while we didn't predict correctly the other class which is probably more interesting to predict.

C. F1 considers both Precision and Recall, so unlike Accuracy, F1 performs well with unbalanced data and for the example above F1 will be 0. Its disadvantage is that it gives equal importance to both Precision and Recall. For example, if we what we care the most is that our classifier's positive predictions are truly positive, then we would want to the precision to have more weight than the Recall and F1 might indicate that our classifier performs poorly while it performs well.

D. Log loss measures the performance of a classification model where the prediction input is a probability value between 0 and 1. We aim to minimize its value as possible. The perfect model has a value of 0. It used mainly for comparing between different models, but its value doesn't tell us how our model is preforming in terms of accurate prediction in each class.

E. Splitting the data to train and test is very important when evaluating a classifier. That way we try to estimate the error rate of our model in the real world. 
High differences between train performance values and test performance values, meaning low error rate in train set but high error rate in test set, may indicate that our model is overfitting and we should retune our model, or try different resampling method.
F.  MCC have an advantage over Accuracy and F1 by that it is not sensitive to which class we assign as positive and negative. For example, lets say we have TP =95, FP = 5, TN = 0, FN = 0. In this situation we will have Accuracy = 95% and F1 = 97% while MCC is not defined so we will notice that our model is going in the wrong direction. 
G. Cohen's Kappa is a statistic which measures inter-rater agreement. It's a measure that ranges between [-1,1] and therefore it's not clear what value counts as high agreement. Furthermore, it may perform differently then other metrics when data is unbalanced.
For example, if we have a confusion matrix with TP = 0, TN = 14, FP = 1, FN = 1 then we get accuracy of 0.875 and kappa value of -0.066. So, it's important to look not only at the value of a matric but also at the quantities at each predicted class.




Reading the file includes data subject information.

    Data Columns:
    0: code [1-24]
    1: weight [kg]
    2: height
    3: age [years]
    4: gender [0:Female, 1:Male]

Returns:
A pandas DataFrame that contains inforamtion about data subjects' attributes

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

root = "C:/Users/elior/PycharmProjects/data_wrangling/"
def get_ds_infos():
    dss = pd.read_csv(root + "data_subjects_info.csv")
    print("[INFO] -- Data subjects' information is imported.")
    return dss

Select the sensors and the mode to shape the final dataset.

Args:
data_types: A list of sensor data type from this list: [attitude, gravity, rotationRate,
userAcceleration]

Returns:
It returns a list of columns to use for creating time-series from files.

In [None]:
def set_data_types(data_types=["userAcceleration"]):
    dt_list = []
    for t in data_types:
        if t != "attitude":
            dt_list.append([t + ".x", t + ".y", t + ".z"])
        else:
            dt_list.append([t + ".roll", t + ".pitch", t + ".yaw"])

    return dt_list

Args:
    dt_list: A list of columns that shows the type of data we want.
    act_labels: list of activites
    trial_codes: list of trials
    mode: It can be "raw" which means you want raw data
    for every dimention of each data type,
    [attitude(roll, pitch, yaw); gravity(x, y, z); rotationRate(x, y, z); userAcceleration(x,y,z)].
    or it can be "mag" which means you only want the magnitude for each data type: 
    (x^2+y^2+z^2)^(1/2)
    labeled: True, if we want a labeld dataset. False, if we only want sensor values.
Returns:
        It returns a time-series of sensor data.

In [None]:
def creat_time_series(dt_list, act_labels, trial_codes, mode="mag", labeled=True):

    num_data_cols = len(dt_list) if mode == "mag" else len(dt_list * 3)

    if labeled:
        dataset = np.zeros((0, num_data_cols + 7))  # "7" --> [act, code, weight, height, age, gender, trial]
    else:
        dataset = np.zeros((0, num_data_cols))

    ds_list = get_ds_infos()

    print("[INFO] -- Creating Time-Series")
    for sub_id in ds_list["code"]:
        for act_id, act in enumerate(act_labels):
            for trial in trial_codes[act_id]:
                fname = root + 'A_DeviceMotion_data/' + act + '_' + str(trial) + '/sub_' + str(int(sub_id)) + '.csv'
                raw_data = pd.read_csv(fname)
                raw_data = raw_data.drop(['Unnamed: 0'], axis=1)
                vals = np.zeros((len(raw_data), num_data_cols))
                for x_id, axes in enumerate(dt_list):
                    if mode == "mag":
                        vals[:, x_id] = (raw_data[axes] ** 2).sum(axis=1) ** 0.5
                    else:
                        vals[:, x_id * 3:(x_id + 1) * 3] = raw_data[axes].values
                    vals = vals[:, :num_data_cols]
                if labeled:
                    lbls = np.array([[act_id,
                                      sub_id - 1,
                                      ds_list["weight"][sub_id - 1],
                                      ds_list["height"][sub_id - 1],
                                      ds_list["age"][sub_id - 1],
                                      ds_list["gender"][sub_id - 1],

                                      trial
                                      ]] * len(raw_data))
                    vals = np.concatenate((vals, lbls), axis=1)
                dataset = np.append(dataset, vals, axis=0)
    cols = []
    for axes in dt_list:
        if mode == "raw":
            cols += axes
        else:
            cols += [str(axes[0][:-2])]

    if labeled:
        cols += ["act", "id", "weight", "height", "age", "gender", "trial"]

    dataset = pd.DataFrame(data=dataset, columns=cols)
    return dataset

Creating The Dataframe:

In [None]:
ACT_LABELS = ["dws", "ups", "wlk", "jog", "std", "sit"]
TRIAL_CODES = {
    ACT_LABELS[0]: [1, 2, 11],
    ACT_LABELS[1]: [3, 4, 12],
    ACT_LABELS[2]: [7, 8, 15],
    ACT_LABELS[3]: [9, 16],
    ACT_LABELS[4]: [6, 14],
    ACT_LABELS[5]: [5, 13]
}

# Here we set parameter to build labeled time-series from data set of "(A)DeviceMotion_data"
# attitude(roll, pitch, yaw); gravity(x, y, z); rotationRate(x, y, z); userAcceleration(x,y,z)
sdt = ["attitude", "gravity", "rotationRate", "userAcceleration"]
print("[INFO] -- Selected sensor data types: " + str(sdt))
act_labels = ACT_LABELS
print("[INFO] -- Selected activites: " + str(act_labels))
trial_codes = [TRIAL_CODES[act] for act in act_labels]
dt_list = set_data_types(sdt)
dataset = creat_time_series(dt_list, act_labels, trial_codes, mode="mag", labeled=True)
print("[INFO] -- Shape of time-Series dataset:" + str(dataset.shape))

Splitting to train and test
All trials from 1-9 are for training and 11-16 are fr testing

In [None]:
y_test = pd.DataFrame()
x_test = pd.DataFrame()
y_train = pd.DataFrame()
x_train = pd.DataFrame()

for j in set(dataset['act']):
    for i in set(dataset[dataset['act'] == j]['trial']):
        if i > 10:
            y_test = pd.concat([y_test, dataset[dataset['act'] == j][dataset[dataset['act'] ==
                                                                             j]['trial'] == i]['act']])
            x_test = pd.concat([x_test, dataset[dataset['act'] == j][dataset[dataset['act'] == j]['trial'] ==
                                         i][dataset.columns[dataset.columns != 'act']]])
        else:
            y_train = pd.concat([y_train, dataset[dataset['act'] == j][dataset[dataset['act'] ==
                                                                              j]['trial'] == i]['act']])
            x_train = pd.concat([x_train, dataset[dataset['act'] == j][dataset[dataset['act'] == j]['trial'] ==
                                              i][dataset.columns[dataset.columns != 'act']]])


y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)

Using a Long Short Term Memory neural network to predict the next activity.

In [None]:
np.random.seed(123) # for reproducibility
batch_size = 50000


model = keras.models.Sequential()
model.add(keras.layers.Embedding(1081446, 10))
model.add(keras.layers.LSTM(128, dropout=0.7))
model.add(keras.layers.Dense(6))
model.add(keras.layers.Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train
np.random.seed(123) # for reproducibility
model.fit(x_train, y_train, batch_size=batch_size, epochs=5,verbose=0,
          validation_data=(x_test, y_test))

# Evaluate
train_score, train_acc = model.evaluate(x_train, y_train, batch_size=batch_size)
test_score, test_acc = model.evaluate(x_test, y_test, batch_size=batch_size)

print('Train score:', train_score)
print('Train accuracy:', train_acc)
print('Test score:', test_score)
print('Test accuracy:', test_acc)


Train accuracy: 0.8112712  
Test accuracy: 0.45008284  

It's obvious that my model is overfitting, hence the big difference between train accuracy and test accuracy.
I belive that if I had more computational power for retuning the parameters of the model I would be able to get better results. Moreover, If I had more time to spend in research about this type of classification problems and the appropriate algorithms and methods that should be used I couldget better results.