Skip to content

Pecneb/computer_vision_research

Repository files navigation

computer_vision_research

Predicting trajectories of objects

Table of Contents

  1. Abstract
  2. Darknet
  3. Tracking of detected objects
  4. Predicting trajectories of moving objects
  5. Clustering
  6. Classification
  7. Documentation
  8. Classification Results
  9. Testing for decision tree depth
  10. Classification Results - Features V2
  11. Cross validation
  12. References

TODO: Abstract

Notice: linear regression implemented, very primitive, but working

Yolov7 is the most recent version of YOLO. Darknet is no more, the source code of the neural net is in PyTorch. Original-Repository [2]. To work with my framework, I read the whole codebase of Yolov7. I wrote yolov7api.py, function load_model(device, weights, imgsz, classify) can load the desired yolo model, if GPU is used half precision can be used (FP16 instead of FP32), detect(img) takes an opencv image as argument, it can take a lot more arguments, but those are only for parametization, there are default values set for those arguments, that are tested. The image has to be resized to the size of the NeuralNet. After the model is loaded, we can input the resized image to the neural net. The results are a matrix shaped (number of input images, number of detections, 6). A detection is a vector of [x, y, x, y, confidence, class] (first xy is top-left, second xy is bottom-right). The raw output of the neural net has to be resized to fit the original image. The output is still not good for my framework. The output have to be converted to a matrix of shape(number of detections, 3) what is looks like [label, confidence, (xywh)] xywh is center xy coordinates and width, height of bbox.

NOTICE: If pytorch throws this error: RuntimeError: CUDA out of memory. Tried to allocate X.XX GiB (GPU 0; X.XX GiB total capacity; X.XX MiB already allocated; X.XX GiB free; X.XX GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF. Then set environment variable PYTORCH_CUDA_ALLOC_CONF to PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:256", if this does not solve the problem, play with the max_split_size_mb, try to give it other sizes.

Video dataset

https://github.com/City-of-Bellevue/TrafficVideoDataset https://drive.google.com/drive/folders/1cR1VwoAvEjFLRaUzeYph-bxx4LoM6pOH https://drive.google.com/drive/folders/1irB6XKu2iM3BSJ2AEYH4kJl9nfG9j-yy https://drive.google.com/drive/folders/1IN6kwywddO3B3uHyC5S18vqf0KEWToJ_ https://drive.google.com/drive/folders/17bn7l7Qm5s-r5DYoFQPhviFZ0jWY9qk5 https://drive.google.com/drive/folders/16coOR8PlNzvmUm1vsaYJVF_bAOQGySa8

The program was implemented and tested in a linux environment. The usage of anaconda is recommended, download it from here Anaconda

To be able to run yolov7, download yolov7 weights file from yolov7.pt, then copy or move it to yolov7 directory.

Conda environment setup

conda create -n <environment-name-here> --file=computer_vision_environment.yaml

Deep-SORT

Deep-SORT: Simple Online and Realtime Tracking with convolutonal neural network. Pretty much based on Kalmanfilter. See the arXiv preprint for more information. [3]

I integrated Deep-SORT into my own code, and modified the original Deep-SORT src code, to fit my needs.

Determining wheter an object moving or not

The main problem is to filter out the non moving objects when running detection in real time. This is needed, due to yolo's inaccuracy, due to floating bounding boxes on non moving boxes.

Solution: Calculating the tracking history's last and the first detection's euclidean distance.

self.isMoving = ((self.history[0].X-self.history[-1].X)**2 + (self.history[0].Y-self.history[-1].Y)**2)**(1/2) > 7.0  

Throw away old detections and trackings

This can save read, write time and memory.

A threshold should be used to prevent (OOM) - Out Of Memory - effect. And also to speed up the runtime of the detection and visualizer program.

HistoryDepth: Implemented a historyDepth variable, that determines how long back in time should we track an objects detection data. With this, we can throw away old trackings if they are not on screen any more.

Regression (Dead End)

The trajectory prediction with Linear and Polinomyal Regression algorithm were a dead-end, this technique did not give accurate prediction of the vechicles trajectory, only for the short term, which means for the next few frames.

Using Scikit Learn Linear Models

model = linear_model.RANSACRegressor(base_estimator=linear_model.LinearRegression(), random_state=30, min_samples=X_train.reshape(-1,1).shape[1]+1)  
reg = model.fit(X_train.reshape(-1,1), y_train.reshape(-1,1))  
y_pred = reg.predict(X_test.reshape(-1,1))  

Best working linear model RANSACRegressor() with base_estimator LinearRegression(). RANSACRegressor is goot at ignoring outliers.

TODO: this has to be implemented, calculate weights based on detecions position.

Using Sklearn PolynomialFeatures function to generate X and Y training points for the estimator.

The PolynomialFeatures and the estimator have to be inputted to the make_pipeline function.

polyModel = make_pipeline(PolynomialFeatures(degree), linear_model.RANSACRegressor(base_estimator=linear_model.Ridge(alpha=0.5), random_state=30, min_samples=X_train.reshape(-1,1).shape[1]+1))  
polyModel.fit(X_train.reshape(-1, 1), y_train.reshape(-1, 1))  
y_pred = polyModel.predict(X_test.reshape(-1, 1))  

TODO: Implement Spline, not working yet.

Regression with coordinate depending weigths

Kalman filter calculates velocities, these velocities can be used as weight in the regression.

Classification Models

The training of Classification models e.g. Support Vector Machine, KNN and Decision Tree models seems to give an accurate prediction, where the vechicles will exit the traffic conjunctions.

Feature extraction, clustering, classification (building a model)

Clustering algorithms: KMEANS, SPECTRAL, DBSCAN, OPTICS

Feature extraction -> Clustering
Clustering Algorithm: Affinity Propagation. (NOTICE: This algorithm seems to give nonsense results, will have to test other ones too.)
K_MEANS: Seems to give better results than Affinity Propagation, but still not the results, what we want.
To make the predictions smarter, a learning algorithm have to be implemented, that trains on the detection and prediction history.
NOTICE: New idea, gather detections, that velocity vector points in the same direction.
Feature extraction -> Classification
TODO: OPTICS (Partially done, still testing)

[x, y] the x and y coordinates of the detection

[x, y, vx, vy] the x, y coordinates and the x, y velocities of the detection

Not all feature vectors are good for us, there are many false positive detections, that are come from the inaccuracy of yolo. These false positives can be filtered out based on their euclidean distance. Although a threshold value have to be given. The enter and exit points, that distance is under this value, is not chosen as training data for the clustaring algorithm.

TODO: Parameters

There are several algorithms that can evaluate the results of our clustering. There are no ground thruth available to us, so only those evaluation algorithms are useful, that require none.
Scikit-Learn have a few of these: Silhouette Coefficient, Calinski-Harabasz Index, Davies-Bouldin Index. Clustering performance evaluation

The results of the evaluation algorithms can be displayed with elbow diagrams. There is a python module for this, which is implemented for Scikit-Learns's KMeans algorithm. https://www.scikit-yb.org/en/latest/api/cluster/elbow.html

Silhouette Coefficient

Hihger Silhouette Coefficient score realtes to a model with better defined clusters. The Silhouette Coefficient is defined for each sample and is composed of two scores:

  • a: he mean distance between a sample and all other points in the same class.
  • b: The mean distance between a sample and all other points in the next nearest cluster.

The Solhouette Coefficient $s$ for a single sample is then given as: $$s = \frac{b - a}{max(a, b)}$$

  • The score is bounded between -1 for incorrect clustering and +1 for highly dense clustering. Scores around zero indicate overlapping clusters.
  • The score is higher when clusters are dense and well separated, which relates to a standard concept of a cluster.
Drawbacks
  • The Silhouette Coefficient is generally higher for convex clusters than other concepts of clusters.
References

Calinski-Harabasz Index

Known as the Variance Ratio Criterion - can be used to evaluate the model, where a higher Calinski-Harabasz score relates to a model with better defined clusters.
The index is the ratio of the sum of between-clusters dispersion and of whithin-cluster dispersion for all clusters (where dispersion is defined as the sum of distances squared).

  • The score is higher when clusters are dense and well separated, which relates to a standard concept of a cluster.
  • The score is fast to compute.
Drawbacks
  • The Calinski-Harabasz index is generally higher for convex clusters than other concepts of clusters.
The Math

For a set of data $E$ of size $n_E$ which has been clustered into $k$ clusters, the Calinski-Harabasz score $s$ is defined as the ratio of the between-clusters dispersion mean and the within-cluster dispersion: $$s = \frac{\mathrm{tr}(B_k)}{\mathrm{tr}(W_k)} \times \frac{n_E - k}{k - 1}$$ where $tr(B_k)$ is trace of the between group dispersion matrix and $tr(W_k)$ is ther tace of the within-cluster dispersion matrix defined by: $$W_k = \sum_{q=1}^k \sum_{x \in C_q} (x - c_q) (x - c_q)^T$$ $$B_k = \sum_{q=1}^k n_q (c_q - c_E) (c_q - c_E)^T$$ with $C_q$ the set of points in cluster $q$, $c_q$ the center of cluster $q$, $c_E$ the center of $E$,, and $n_q$ the number of points in cluster $q$.

References

Davies-Bouldin Index

The Davies-Bouldin index can be used to evaluate the model, where a lower Davies-Bouldin index relates to a model with better separation between the clusters.
This index signifies the average ‘similarity’ between clusters, where the similarity is a measure that compares the distance between clusters with the size of the clusters themselves.
Zero is the lowest possible score. Values closer to zero indicate a better partition.

  • The computation of Davies-Bouldin is simpler than that of Silhouette scores.
  • The index is solely based on quantities and features inherent to the dataset as its computation only uses point-wise distances.
Drawbacks
  • The Davies-Boulding index is generally higher for convex clusters than other concepts of clusters.
  • The usage of centroid distance limits the distance metric to Euclidean space.
The Math

The index is defined as the average similarity between each cluster $C_i$ for $i=1,...,k$ and its most similar one $C_j$. In the context of this index, smilarity is defined as a measure $R_ij$ that trades off:

  • $s_i$ the average distance between each points of cluster $i$ and the cetroid of that cluster - also known as cluster diameter.
  • $d_ij$ the distance between cluster centroids $i$ and $j$.

A simple choice to construct $R_ij$ so that it is nonnegative and symmetric is: $$R_{ij} = \frac{s_i + s_j}{d_{ij}}$$ Then the Davies-Bouldin index is defined as: $$DB = \frac{1}{k} \sum_{i=1}^k \max_{i \neq j} R_{ij}$$

References

Propability Calibration

KNN(KNearestNeighbours), RNN(RadiusNearestNeighbours), SVM(SupportVectorMachines), NN models Voting Classifier, Naive Bayes, Gaussian Process Classification (GPC), Stochastic Gradient Descent (Try out log_loss and modified_huber, those loss functions enable multi class classification as "one vs. all" classifier, it is implemented as combining binary classifiers together)

Tuning the hyperparameters of an estimator

Create feature vectors for Classification. A feature vector could be the start middle and end detection.

The KNN Classifier only accepts N x 2 dimension feature vectors, so a feature vector can be created from the euclidean distance of the enter and middle detection as the first feature, and euclidean distance of the middle and end detection as the second feature.

Scikit-FeatureSelection

https://medium.com/analytics-vidhya/save-and-load-your-scikit-learn-models-in-a-minute-21c91a961e9b

For trajectory prediction tasks, several types of neural network architectures can be effective, depending on the specific characteristics of your data and the complexity of the problem. Here are a few neural network types commonly used for trajectory prediction:

- Recurrent Neural Networks (RNNs): RNNs are a natural choice for sequential data like trajectories. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are popular variants of RNNs that are designed to handle long-range dependencies and mitigate the vanishing gradient problem. They can capture temporal patterns and relationships in trajectory data.

- Convolutional Neural Networks (CNNs): While CNNs are commonly associated with image data, they can also be applied to trajectory prediction tasks. You can treat trajectory data as a sequence of spatial positions and use 1D or 2D convolutions to capture spatial patterns and features.

- Transformer-based Models: Transformers, originally designed for natural language processing, have been adapted for sequence-to-sequence tasks. They can model long-range dependencies and have shown promise in trajectory prediction tasks by attending to relevant context in the input sequence.

- Graph Neural Networks (GNNs): If your trajectory data can be represented as a graph (where each point is a node and edges represent relationships), GNNs can be used to capture interactions and dependencies between different trajectory points.

- Hybrid Models: Depending on the characteristics of your data, you might find that a combination of different neural network types works well. For instance, you could use an RNN to capture temporal patterns and a CNN to capture spatial features.

- Encoder-Decoder Architectures: These architectures are often used for sequence-to-sequence tasks. An encoder processes the input trajectory, and a decoder generates the predicted trajectory. Variants like Seq2Seq models, as well as attention mechanisms, can enhance the accuracy of trajectory prediction.

- Variational Autoencoders (VAEs): VAEs can learn a latent representation of trajectory data, which can be useful for generating diverse predictions. They can also handle uncertainty estimation.

- Ensemble Models: Combining predictions from multiple neural networks or models can lead to improved accuracy and robustness in trajectory prediction.

When deciding on a neural network type, consider factors such as the length of trajectories, the presence of noise or missing data, the availability of historical context, and the computational resources available. Experimenting with different architectures and tuning hyperparameters is often necessary to find the best approach for your specific trajectory prediction problem.

VIDEO:0001_1_37min Top picks

KNN GP GNB MLP SGD SVM DT
1 0.741259 0.636364 0.517483 0.643357 0.622378 0.678322 0.65035
2 1 0.958042 0.923077 0.874126 0.769231 0.965035 0.832168
3 1 0.993007 0.972028 0.944056 0.79021 1 0.874126
Threshold
KNN GP GNB MLP SGD SVM DT
---: ---------: ---------: ---------: ---------: ---------: ---------: ---------:
0 0.795733 0.628006 0.692951 0.561528 0.561528 0.636021 0.811645
1 0.834374 0.698502 0.577715 0.606533 0.737932 0.711402 0.774865
2 0.871976 0.636233 0.933036 0.645161 0.9303 0.756336 0.830789
3 0.726378 0.496063 0.667323 0.5 0.5 0.5 0.781004
0
:---- ---------:
KNN 0.807115
GP 0.614701
GNB 0.717756
MLP 0.578306
SGD 0.68244
SVM 0.65094
DT 0.799576

VIDEO:0001_2_608min Top picks

KNN GP GNB MLP SGD SVM DT
1 0.778547 0.622837 0.702422 0.558824 0.638408 0.743945 0.66955
2 0.910035 0.82526 0.818339 0.610727 0.756055 0.875433 0.737024
3 0.944637 0.901384 0.866782 0.709343 0.802768 0.934256 0.747405
Threshold
KNN GP GNB MLP SGD SVM DT
---: ---------: ---------: ---------: ---------: ---------: ---------: ---------:
0 0.5 0.5 0.890451 0.5 0.498227 0.641971 0.745567
1 0.628427 0.5 0.920154 0.5 0.499118 0.540164 0.631954
2 0.53125 0.53125 0.56528 0.5 0.53125 0.591971 0.578625
3 0.907442 0.830566 0.932503 0.5 0.816677 0.803711 0.880586
4 0.57484 0.5 0.595051 0.5 0.529386 0.573923 0.73097
5 0.958674 0.5 0.964476 0.5 0.5 0.861338 0.896448
6 0.887581 0.499106 0.879531 0.5 0.59453 0.704265 0.78679
7 0.929265 0.870969 0.842197 0.832051 0.857261 0.880325 0.883913
8 0.897288 0.5 0.691971 0.5 0.578192 0.871863 0.916383
9 0.95991 0.881049 0.948938 0.74675 0.962098 0.931049 0.910875
10 0.891087 0.5 0.71591 0.5 0.799416 0.80488 0.821211
11 0.558824 0.496435 0.923351 0.5 0.52852 0.586453 0.672906
12 0.607143 0.517857 0.661623 0.5 0.534805 0.535714 0.605
13 0.990958 0.5 0.790054 0.5 0.498192 0.614575 0.938192
14 0.7 0.5 0.786796 0.5 0.5 0.75 0.793838
0
:---- ---------:
KNN 0.768179
GP 0.575149
GNB 0.807219
MLP 0.538587
SGD 0.615178
SVM 0.712813
DT 0.786217

VIDEO:0002_2_308min Top picks

KNN GP GNB MLP SGD SVM DT
1 0.708791 0.43956 0.403846 0.313187 0.222527 0.664835 0.607143
2 0.884615 0.651099 0.68956 0.409341 0.453297 0.865385 0.692308
3 0.936813 0.804945 0.763736 0.483516 0.554945 0.934066 0.706044
Threshold
KNN GP GNB MLP SGD SVM DT
---: ---------: ---------: ---------: ------: ---------: ---------: ---------:
0 0.616132 0.5 0.597182 0.5 0.5 0.71137 0.684645
1 0.656463 0.5 0.620991 0.5 0.5 0.683188 0.754616
2 0.798499 0.528792 0.815481 0.5 0.648494 0.748512 0.842488
3 0.60666 0.5 0.586603 0.5 0.5 0.494065 0.752555
4 0.913961 0.633117 0.827922 0.5 0.712662 0.709416 0.879058
5 0.909314 0.5 0.889461 0.5 0.711765 0.599755 0.857353
6 0.935243 0.5 0.74564 0.5 0.952847 0.926316 0.911274
7 0.837985 0.5 0.772461 0.5 0.5 0.60925 0.763955
8 0.867692 0.498462 0.703077 0.5 0.496923 0.654359 0.766667
9 0.848024 0.5 0.723708 0.5 0.49848 0.695441 0.830699
10 0.879574 0.5 0.780314 0.5 0.498534 0.5 0.876642
11 0.666667 0.5 0.973011 0.5 0.497159 0.583333 0.870739
12 0.71 0.5 0.904286 0.5 0.782857 0.714286 0.885714
0
:---- ---------:
KNN 0.78817
GP 0.512336
GNB 0.764626
MLP 0.5
SGD 0.599979
SVM 0.663792
DT 0.821262

GPU Accelarated pandas and scikit-learn.

cuML cuDF

  1. Building main loop of the program to be able to input video sources, using OpenCV VideoCapture. From VideoCapture object frames can be read. cv.imshow("FRAME", frame) imshow function opens GUI window to show actual frame.
    cap = cv.VideoCapture(input)
    # exit if video cant be opened
    if not cap.isOpened():
        print("Source cannot be opened.")
        exit(0)
    .
    .
    .
    while(1):
      ret, frame = cap.read()
      if frame is None:
          break
    
    cv.imshow("FRAME", frame)
    if cv.waitKey(1) == ord('p'):
        if cv.waitKey(0) == ord('r'):
            continue
    if cv.waitKey(10) == ord('q'):
            break
  1. Implement YOLO API - hldnapi.py - that works with the C-API of Darknet. In this function, the image has to be transformed to Darknet be able to run inference on it. cv.cvtColor(image, cv.COLOR_BGR2RGB) convert OpenCV color (Blue,Green,Red) to Darknet color (Red, Green, Blue). cv.resize(image_rgb, (darknet_width, darknet_height), interpolation=cv.INTER_LINEAR) resize image to Darknet's neural net image size. darknet.detect_image(network, class_name, img_for_detect) run detection on preprocessed image. This function returns a tuple (label, confidence, bbox[x,y,w,h]), the bounding box coordinates have to be resized to the original image.
    def cvimg2detections(image):
        """Fcuntion to make it easy to use darknet with opencv

        Args:
            image (Opencv image): input image to run darknet on

        Returns:
            detections(tuple): detected objects on input image (label, confidence, bbox(x,y,w,h))
        """
        # Convert frame color from BGR to RGB
        image_rgb = cv.cvtColor(image, cv.COLOR_BGR2RGB)
        # Resize image for darknet
        image_resized = cv.resize(image_rgb, (darknet_width, darknet_height), interpolation=cv.INTER_LINEAR)
        # Create darknet image
        img_for_detect = darknet.make_image(darknet_width, darknet_height, 3)
        # Convert cv2 image to darknet image format
        darknet.copy_image_from_bytes(img_for_detect, image_resized.tobytes())
        # Load image into nn and get detections
        detections = darknet.detect_image(network, class_names, img_for_detect)
        darknet.free_image(img_for_detect)
        # Resize bounding boxes for original frame
        detections_adjusted = []
        for label, confidence, bbox in detections:
            bbox_adjusted = convert2original(image, bbox)
            detections_adjusted.append((str(label), confidence, bbox_adjusted))
        return detections_adjusted
  1. Implement classes for storing the detections and object trackings. The classes dont have to be overly complex, they must be easy to read and understand. A class Detection() and a class TrackedObject() was created. The implementation can be found in the historyClass.py file. Detection class has 7 attributes, label, confidence, X, Y, Width, Height, frameID. TrackedObject class has 11, objID, label, futureX, futureY, history, isMoving, time_since_update, max_age, mean, X, Y, VX, VY.

  2. Iplement object tracking algorithm. Base idea was to calculate x and y coordinate distances between detection objects. This is a very primitive way of tracking, for initial testing it was good, but I had to find a more accurate tracking algorithm.

  3. First prediction algorithm with scikit-learn's LinearRegression function library. The predictLinear() function takes 3 arguments, a trackedObject object from historyClass.py, historyDepth to determine, how big is the learning set, and a futureDepth to know how far in the future to predict. To do the regression, at least 3 detections should occur. With variable $k$ we can tell the LinearRegression algorithm, on how many points from the training set to train on. Before running the regression, the movementIsRight() function determines wheter the object moving right or left, this is crucial in generation of the prediction points. After we run the regression, the futureX and futureY vector of the trackedObject object can be updated with the predicted values. For the regression I use the simple Ordinary Least Squares (OLS) method. Linear regression formula: $$\hat{y} (w, x) = w_0 + w_1 x_1 + ... + w_p x_p$$ Ordinary Least Squares formula: $$\min_{w} || X w - y||_2^2$$

def movementIsRight(obj: TrackedObject):
    """Returns true, if the object moving right, false otherwise. 

    Args:
        obj (TrackedObject): tracking data of an object 
    
    Return:
        bool: Tru if obj moving right.

    """
    return obj.VX > 0 
    
def predictLinear(trackedObject: TrackedObject, k=3, historyDepth=3, futureDepth=30):
    """Fit linear function on detection history of an object, to predict future coordinates.

    Args:
        trackedObject (TrackedObject): The object, which's future coordinates should be predicted. 
        k (int, optional): Number of training points, ex.: if historyDepth is 30 and k is 3, then the 1st, 15th and 30th points will be training points. Defaults to 3.
        historyDepth (int, optional): Training history length. Defaults to 3.
        futureDepth (int, optional): Prediction vectors length. Defaults to 30.
    """
    x_history = [det.X for det in trackedObject.history]
    y_history = [det.Y for det in trackedObject.history]
    if len(x_history) >= 3 and len(y_history) >= 3:
        # k (int) : number of training points
        # k = len(trackedObject.history) 
        # calculating even slices to pick k points to fit linear model on
        slice = len(trackedObject.history) // k
        X_train = np.array([x for x in x_history[-historyDepth:-1:slice]])
        y_train = np.array([y for y in y_history[-historyDepth:-1:slice]])
        # check if the movement is right or left, because the generated x_test vector
        # if movement is right vector is ascending, otherwise descending
        if movementIsRight(trackedObject):
            X_test = np.linspace(X_train[-1], X_train[-1]+futureDepth)
        else:
            X_test = np.linspace(X_train[-1], X_train[-1]-futureDepth)
        # fit linear model on the x_train vectors points
        model = linear_model.LinearRegression(n_jobs=-1)
        reg = model.fit(X_train.reshape(-1,1), y_train.reshape(-1,1))
        y_pred = reg.predict(X_test.reshape(-1,1))
        trackedObject.futureX = X_test
        trackedObject.futureY = y_pred
  1. Integrating Deep-SORT tracking into the program. Kalman filter and CNN that has been trained to discriminate pedestrians on a large-scale person re-identification dataset. [3] The Kalman filter implementation uses 8 dimensional space (x, y, a, h, vx, vy, va, vh) to track objects.

  2. Prediction with Polynom fitting using Scikit-Learn's PolynomTransformer. This is similar to the Linear fitting, but this makes it possible to predict curves in an objects trajectory based on the object's position history. The only difference between the predictLinear and this algorithm, that a PolynomTransformer transforms the history data.

def predictPoly(trackedObject: TrackedObject, degree=3, k=3, historyDepth=3, futureDepth=30):
    """Fit polynomial function on detection history of an object, to predict future coordinates.

    Args:
        trackedObject (TrackedObject): The object, which's future coordinates should be predicted. 
        degree (int, optional): The polynomial functions degree. Defaults to 3.
        k (int, optional): Number of training points, ex.: if historyDepth is 30 and k is 3, then the 1st, 15th and 30th points will be training points. Defaults to 3.
        historyDepth (int, optional): Training history length. Defaults to 3.
        futureDepth (int, optional): Prediction vectors length. Defaults to 30.
    """
    x_history = [det.X for det in trackedObject.history]
    y_history = [det.Y for det in trackedObject.history]
    if len(x_history) >= 3 and len(y_history) >= 3:
        # k (int) : number of training points
        # k = len(trackedObject.history) 
        # calculating even slices to pick k points to fit linear model on
        slice = len(trackedObject.history) // k
        X_train = np.array([x for x in x_history[-historyDepth:-1:slice]])
        y_train = np.array([y for y in y_history[-historyDepth:-1:slice]])
        # generating future points
        if movementIsRight(trackedObject):
            X_test = np.linspace(X_train[-1], X_train[-1]+futureDepth)
        else:
            X_test = np.linspace(X_train[-1], X_train[-1]-futureDepth)
        # poly features
        polyModel = make_pipeline(PolynomialFeatures(degree), linear_model.Ridge(alpha=1e-3))
        polyModel.fit(X_train.reshape(-1, 1), y_train.reshape(-1, 1))
        # print(X_train.shape, y_train.shape)
        y_pred = polyModel.predict(X_test.reshape(-1, 1))
        trackedObject.futureX = X_test
        trackedObject.futureY = y_pred
  1. Prediction with splines using Scikit-Learn's SplineTransformer. Spline can only be fitted on data we have, so it cant predict on its own. Before fitting spline on any data, polynom fitting should be done first, then on the result data we can fit a spline curve.
# TODO
  1. Implement database logging, to save results for later analyzing. The init_db(video_name: str) function creates the database. Name of the video, that is being played, will be the name of the database with a .db appended at the end of it. After the database file is created, schema script will be executed. This is the schema of the database.
CREATE TABLE IF NOT EXISTS objects (
    objID INTEGER PRIMARY KEY NOT NULL,
    label TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS detections (
                objID INTEGER NOT NULL,
                frameNum INTEGER NOT NULL,
                confidence REAL NOT NULL,
                x REAL NOT NULL,
                y REAL NOT NULL,
                width REAL NOT NULL,
                height REAL NOT NULL,
                vx REAL NOT NULL,
                vy REAL NOT NULL,
                ax REAL NOT NULL,
                ay REAL NOT NULL,
                FOREIGN KEY(objID) REFERENCES objects(objID)
            );
CREATE TABLE IF NOT EXISTS predictions (
                objID INTEGER NOT NULL,
                frameNum INTEGER NOT NULL,
                idx INTEGER NOT NULL,
                x REAL NOT NULL,
                y REAL NOT NULL
            );
CREATE TABLE IF NOT EXISTS metadata (
                historyDepth INTEGER NOT NULL,
                futureDepth INTEGER NOT NULL,
                yoloVersion TEXT NOT NULL,   
                device TEXT NOT NULL,
                imgsize INTEGER NOT NULL,
                stride INTEGER NOT NULL,
                confidence_threshold REAL NOT NULL,
                iou_threshold REAL NOT NULL
            );
CREATE TABLE IF NOT EXISTS regression (
                linearFunction TEXT NOT NULL,
                polynomFunction TEXT NOT NULL,
                polynomDegree INTEGER NOT NULL,
                trainingPoints INTEGER NOT NULL
);

Every object is stored in the objects table, objID as primary key, will help us identify detections. Detections are stored in the detections table, here the objID is a foreign key, that tells us which detection belongs to which object. Predictions have an own table, to a single frame and a single object there can be multiple predictions. THe program's inner environment is also being logged as metadata, historyDepth is the length of the training set. FutureDepth is the length of the prediction vector. Yolo version is also being logged, because of the legacy version 4 (although yolov4 is not really used anymore, it is just an option, that propably will be taken out), imgsize is the input image size of the neural network, stride is how many pixels the convolutonal filter slides over the image. Confidence threshold and iou threshold will determine which detection of yolo will we accept, if the propability of a detection being right. To the regression table, will be the regression function's configuration values stored.

  1. The logging makes it possible, to analyze the data without running the videos each time. For this, data loading functions are needed, that fetches the resutls from the database. These functions are implemented in the databaseLoader.py script. Each function returns a list of all entries logged in the database.

  2. Next step after data loading module, is to create heatmap of the traffic data logged from videos. For better visuals, each object has its own coloring, so it also shows, how good DeepSort algorithm works.

  3. With scikit-learn's clustering module, clusters from the gathered data can be created. The point of this, is when a crossroad being observed, the paths can be identified, with this knowledge, personalised training can be done for each scenario. For first k_means algorithm was tested. The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified. It scales well to large numbers of samples and has been used across a large range of application areas in many different fields. The k-means algorithm divides a set of $N$ samples $X$ into $K$ disjoint clusters $C$, each described by the mean of the samples in the cluster. The means are commonly called the cluster “centroids”; note that they are not, in general, points from $X$, although they live in the same space. The K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion: $$\sum_{i=0}^{n}\min_{\mu_j \in C}(||x_i - \mu_j||^2)$$ Although this algorithm does not require that much computation, cant identify lanes on a crossroad. The result plots can be found in dir "research_data/sherbrooke_video/".

Result of k_means with n_clusters = 2

k_means algorithm with 2 initial cluster

Result of k_means with n_clusters = 3

k_means algorithm with 3 initial cluster

Result of k_means with n_clusters = 3

k_means algorithm with 4 initial cluster
  1. Using clustering on all detection data, seems to be pointsless, the algorithms cant discriminate different directions from each other. A better approach would be, creating feature vectors from trajectories, then run the clustering algorithms on the extracted features. Here there are 2 functions, that extract feature vectors from detections and tracks. The first function that makes feature vectors containing only one coordinate hasnt been updated, because other function that extracts 4 dimension feature vectors (containing 2 coordinates) gave better results.
def makeFeatureVectors_Nx2(trackedObjects: list) -> np.ndarray:
    """Create 2D feature vectors from tracks.
    The enter and exit coordinates are put in different vectors. Only creating 2D vectors.

    Args:
        trackedObjects (list): list of tracked objects 

    Returns:
        np.ndarray: numpy array of feature vectors 
    """
    featureVectors = [] 
    for obj in trackedObjects:
        featureVectors.append(obj.history[0].X, obj.history[0].Y)
        featureVectors.append(obj.history[-1].X, obj.history[-1].Y)
    return np.array(featureVectors)

def makeFeatureVectorsNx4(trackedObjects: list) -> np.ndarray:
    """Create 4D feature vectors from tracks.
    The enter and exit coordinates are put in one vector. Creating 4D vectors.
    v = [enterX, enterY, exitX, exitY]

    Args:
        trackedObjects (list): list of tracked objects 

    Returns:
        np.ndarray: numpy array of feature vectors 
    """
    featureVectors = np.array([np.array([obj.history[0].X, obj.history[0].Y, obj.history[-1].X, obj.history[-1].Y]) for obj in tqdm.tqdm(trackedObjects, desc="Feature vectors.")])
    return featureVectors
  1. Slow loggin problem solved, big improvement in speed. Creation of shell scripts, which enables sqlite3's Write-Ahead Logging. 4x - 6x times speed improvement. Best solution to slow runtime is to implement a buffer, that stores all detections and predictions, then log them at the end before exiting.

  2. First try at feature extraction and clustering. First I chose Affinity Propagation Clustring algorithm. AffinityPropagation creates clusters by sending messages between pairs of samples until convergence. A dataset is then described using a small number of exemplars, which are identified as those most representative of other samples. The messages sent between pairs represent the suitability for one sample to be the exemplar of the other, which is updated in response to the values from other pairs. This updating happens iteratively until convergence, at which point the final exemplars are chosen, and hence the final clustering is given. Affinity Propagation can be interesting as it chooses the number of clusters based on the data provided. For this purpose, the two important parameters are the preference, which controls how many exemplars are used, and the damping factor which damps the responsibility and availability messages to avoid numerical oscillations when updating these messages. Algorithm description: The messages sent between points belong to one of two categories. The first is the responsibility $r(i, k)$, which is the accumulated evidence that sample $k$ should be the exemplar for sample $i$. The second is the availability $a(i, k)$ which is the accumulated evidence that sample $i$ should choose $k$ sample to be its exemplar, and considers the values for all other samples that $k$ should be an exemplar. In this way, exemplars are chosen by samples if they are (1) similar enough to many samples and (2) chosen by many samples to be representative of themselves. More formally, the responsibility of a sample $k$ to be the exemplar of sample $i$ is given by: $$r(i, k) \leftarrow s(i, k) - max [ a(i, k') + s(i, k') \forall k' \neq k ]$$
    Where $s(i, k)$ is the similarity between samples $i$ and $k$. The availability of sample $k$ to be the exemplar of sample $i$ is given by: $$a(i, k) \leftarrow min [0, r(k, k) + \sum_{i'~s.t.~i' \notin {i, k}}{r(i', k)}]$$
    To begin with, all values for $r$ and $a$ are set to zero, and the calculation of each iterates until convergence. As discussed above, in order to avoid numerical oscillations when updating the messages, the damping factor $\lambda$ is introduced to iteration process: $$r_{t+1}(i, k) = \lambda\cdot r_{t}(i, k) + (1-\lambda)\cdot r_{t+1}(i, k)$$ $$a_{t+1}(i, k) = \lambda\cdot a_{t}(i, k) + (1-\lambda)\cdot a_{t+1}(i, k)$$ where $t$ indicates the iteration times.

  3. Although affinity propagation does not require initial cluster number, it seems that the results are not usable, because it finds too many clusters. Other algorithm should be tested ex.: K-Mean, Spectral. For better results, detections should be filtered out, because of false positive detections. Standing objects were detected, so those should be filtered out. The algorithm to filter out only the best data to run clustering on is based on the euclidean distance between enter and exit point pairs. $$d(p,q) = \sqrt{\sum_{i=1}^{n}{(p_i - q_i)^2}}$$

Result of affinity propagation on video 0005_2_36min.mp4 on 2D feature vectors
  1. Kmeans and Spectral clustering give far better results with the filtered detections, than affinity propagation. Here are the results on the 0005_2_36min.mp4 video.

Result of kmeans clustering on 0005_2_36min.mp4

Result of spectral clustering on 0005_2_36min.mp4
  1. Finding the optimal number of clusters is very important step to be able to build and train a model. There are many algorithms to evaluate results of clusterings. Also from the evaluation results, we have to tell, which evaluation score is the most optimal, for this, the elbow diagram will give the guidance to find it.

  2. To be able to run evaluation algorithm on kmeans clustering results, detections have to be assinged to object tracks. That is an easy task, when there are not many objects and detections in the database, but when 27000 objects and 300000 detections in there, things can go very bad, even if multiprocessing is involved, although I implemented multiprocessing into the algorithm, it wasnt worth it. The solution is to do preprocessing on the data, that means, doing the assignment in the SQL queries. This also can be done with multiprocessing. The new soltion to process data performs very good, it takes only 6 mins instead of 21 mins, on the largest database.

  3. To be able to mass produce elbow diagrams, new functions had to be implemented, that can create plots in a flexible way. With these plots, the optimal number of clusters can be chosen.

  4. The results from the clustering are promising, but with a better filtering algorithm, it can be better. To decrease the number of bad detections, we can use only the detections, that are a certain distance from the edge detections.

  5. To gather more results on clustering, new algorithms were tested, like DBSCAN and OPTICS.

  6. The DBSCAN algorithm views clusters as areas of high density separated by areas of low density. Due to this rather generic view, clusters found by DBSCAN can be any shape, as opposed to k-means which assumes that clusters are convex shaped. The central component to the DBSCAN is the concept of core samples, which are samples that are in areas of high density. A cluster is therefore a set of core samples, each close to each other (measured by some distance measure) and a set of non-core samples that are close to a core sample (but are not themselves core samples). There are two parameters to the algorithm, min_samples and eps, which define formally what we mean when we say dense. Higher min_samples or lower eps indicate higher density necessary to form a cluster. More formally, we define a core sample as being a sample in the dataset such that there exist min_samples other samples within a distance of eps, which are defined as neighbors of the core sample. This tells us that the core sample is in a dense area of the vector space. A cluster is a set of core samples that can be built by recursively taking a core sample, finding all of its neighbors that are core samples, finding all of their neighbors that are core samples, and so on. A cluster also has a set of non-core samples, which are samples that are neighbors of a core sample in the cluster but are not themselves core samples. Intuitively, these samples are on the fringes of a cluster. Any core sample is part of a cluster, by definition. Any sample that is not a core sample, and is at least eps in distance from any core sample, is considered an outlier by the algorithm. While the parameter min_samples primarily controls how tolerant the algorithm is towards noise (on noisy and large data sets it may be desirable to increase this parameter), the parameter eps is crucial to choose appropriately for the data set and distance function and usually cannot be left at the default value. It controls the local neighborhood of the points. When chosen too small, most data will not be clustered at all (and labeled as -1 for “noise”). When chosen too large, it causes close clusters to be merged into one cluster, and eventually the entire data set to be returned as a single cluster.

  7. The OPTICS algorithm shares many similarities with the DBSCAN algorithm, and can be considered a generalization of DBSCAN that relaxes the eps requirement from a single value to a value range. The key difference between DBSCAN and OPTICS is that the OPTICS algorithm builds a reachability graph, which assigns each sample both a reachability_ distance, and a spot within the cluster ordering_ attribute; these two attributes are assigned when the model is fitted, and are used to determine cluster membership. If OPTICS is run with the default value of inf set for max_eps, then DBSCAN style cluster extraction can be performed repeatedly in linear time for any given eps value using the cluster_optics_dbscan method. Setting max_eps to a lower value will result in shorter run times, and can be thought of as the maximum neighborhood radius from each point to find other potential reachable points. The reachability distances generated by OPTICS allow for variable density extraction of clusters within a single data set. As shown in the above plot, combining reachability distances and data set ordering_ produces a reachability plot, where point density is represented on the Y-axis, and points are ordered such that nearby points are adjacent. ‘Cutting’ the reachability plot at a single value produces DBSCAN like results; all points above the ‘cut’ are classified as noise, and each time that there is a break when reading from left to right signifies a new cluster. The default cluster extraction with OPTICS looks at the steep slopes within the graph to find clusters, and the user can define what counts as a steep slope using the parameter xi.

  8. As can be read above, DBSCAN can yield different results, when the dataset is shuffled, so I wrote a simple dataset shuffling function. The results are saved in the shuffled dir.

  9. Optics had been giving the best results so far. An example command that gave me a good result: python3 dataAnalyzer.py -db research_data/0001_2_308min/0001_2_308min.db --classification --min_samples 10 --max_eps 0.1 --xi 0.15 --n_neighbours 15

Optics clustering with parameters of min_samples = 20, max_eps = 2.0, xi = 0.1, min_cluster_size = 0.05 and filtering algorithm with threshold = 0.4 Cluster number 0 Cluster number 1 Cluster number 2 Cluster number 3 Cluster number 4 Cluster number 5 Cluster number 6

  1. The results of the clustering are the classes used in classification. There are many classification algorithm, ex.: KNN, GaussianNB, StochasticGradientDescent, etc... Neural Network Models also can be used for classification.

  2. Based on the results of the clustering algorithms OPTICS clustering performs the best. Manual examination of the clustering results is still needed, although implementation of elbow diagrams; these cannot tell if the clustering is a success or not. With the tuning of the hyperparameters the optimal clusters can be found, that will serve as the ground truth for the classification models.

  3. Classification models...

0002_2_308min.mp4

python3 dataAnalyzer.py -db research_data/0002_2_308min/0002_2_308min.db --min_samples 10 --max_eps 0.1 --xi 0.15 --min_cluster_size 10 --ClassificationWorker

Classification Accuracy - non calibrated Accuracy - calibrated Accuracy - five fold method Accuracy - FeatureVectorShape (enter, enter_vel, middle, exit, exit_vel)
KNN 70.6185 % 46.0154 % 62.6898 % 70.3608 %
SGD 39.1752 % 25.1928 % 38.8286 % 42.7835 %
GP 42.2680 % 31.1053 % 39.6963 % 42.5257 %
GNB 27.3195 % 28.7917 % 30.3687 % 37.3711 %
MLP 50.2577 % 30.3341 % 43.3839 % 53.8659 %
Voting 47.9381 % 60.0515 %
SVM 55.0976 % 49.4845 %
DT 69.9300 %

0001_2_308min.mp4

python3 dataAnalyzer.py -db research_data/0001_2_308min/0001_2_308min.db --min_samples 10 --max_eps 0.2 --xi 0.15 --min_cluster_size 10 --ClassificationWorker

Classification Accuracy - non calibrated Accuracy - calibrated Accuracy - five fold method Accuracy - FeatureVectorShape (enter, enter_vel, middle, exit, exit_vel)
KNN 77.0967 % 72.1934 % 80.2768 %
SGD 50.3227 % 62.6943 % 62.4567 %
GP 58.3870 % 64.0759 % 62.2837 %
GNB 54.1935 % 62.5215 % 69.7231 %
MLP 67.7419 % 66.1485 % 73.7024 %
Voting 68.3871 % 72.6643 %
SVM 49.1758 % 68.5121 %
DT

Binary classifier results

The accuracy is the proportion of correct predictions (both true positives and true negatives) among the total number of cases examined.

$$ Accuracy = \frac{(TP + TN)}{(TP + TN + FP + FN)} $$

With feature vector shape (enter, middle, exit)
0001_2_308min.mp4
KNN GP GNB MLP SGD SVM
0 0.975779 0.975779 0.925606 0.975779 0.974048 0.980969
1 0.970588 0.980969 0.937716 0.980969 0.977509 0.967128
2 0.974048 0.974048 0.927336 0.972318 0.975779 0.972318
3 0.972318 0.974048 0.946367 0.937716 0.968858 0.974048
4 0.949827 0.942907 0.653979 0.942907 0.941176 0.944637
5 0.982699 0.974048 0.946367 0.974048 0.972318 0.982699
6 0.979239 0.965398 0.963668 0.967128 0.946367 0.974048
7 0.918685 0.852941 0.787197 0.823529 0.821799 0.901384
8 0.982699 0.956747 0.591696 0.956747 0.745675 0.974048
9 0.980969 0.955017 0.939446 0.941176 0.894464 0.968858
10 0.977509 0.949827 0.937716 0.949827 0.967128 0.970588
11 0.974048 0.963668 0.896194 0.970588 0.970588 0.974048
12 0.961938 0.953287 0.951557 0.951557 0.953287 0.955017
13 0.982699 0.956747 0.887543 0.956747 0.949827 0.958478
14 0.989619 0.982699 0.965398 0.982699 0.982699 0.991349
0002_2_308min.mp4
KNN GP GNB MLP SGD SVM
0 0.966495 0.945876 0.811856 0.945876 0.945876 0.945876
1 0.956186 0.917526 0.71134 0.914948 0.930412 0.945876
2 0.940722 0.943299 0.943299 0.943299 0.938144 0.935567
3 0.938144 0.899485 0.886598 0.899485 0.884021 0.920103
4 0.963918 0.969072 0.842784 0.969072 0.966495 0.974227
5 0.963918 0.914948 0.914948 0.914948 0.917526 0.958763
6 0.956186 0.938144 0.842784 0.938144 0.938144 0.958763
7 0.953608 0.889175 0.806701 0.89433 0.881443 0.925258
8 0.953608 0.886598 0.768041 0.886598 0.899485 0.935567
9 0.966495 0.958763 0.708763 0.958763 0.963918 0.966495
10 0.963918 0.963918 0.96134 0.963918 0.963918 0.966495
11 0.966495 0.951031 0.951031 0.951031 0.951031 0.971649
12 0.93299 0.917526 0.858247 0.917526 0.917526 0.940722
13 0.948454 0.930412 0.878866 0.930412 0.930412 0.943299
With feature vector shape (enter, enter_velocity, middle, exit, exit_velocity)
0001_2_308min.mp4
KNN GP GNB MLP SGD SVM
0 0.975779 0.975779 0.922145 0.975779 0.974048 0.980969
1 0.970588 0.980969 0.930796 0.980969 0.979239 0.974048
2 0.974048 0.974048 0.922145 0.972318 0.974048 0.974048
3 0.972318 0.974048 0.946367 0.937716 0.939446 0.972318
4 0.949827 0.942907 0.934256 0.942907 0.944637 0.948097
5 0.982699 0.974048 0.930796 0.974048 0.974048 0.982699
6 0.979239 0.965398 0.963668 0.967128 0.960208 0.970588
7 0.918685 0.851211 0.821799 0.820069 0.681661 0.870242
8 0.986159 0.956747 0.922145 0.956747 0.960208 0.974048
9 0.980969 0.958478 0.948097 0.946367 0.960208 0.967128
10 0.979239 0.949827 0.491349 0.949827 0.967128 0.970588
11 0.974048 0.963668 0.906574 0.970588 0.970588 0.974048
12 0.961938 0.953287 0.517301 0.951557 0.951557 0.955017
13 0.982699 0.956747 0.963668 0.956747 0.948097 0.956747
14 0.989619 0.982699 0.967128 0.982699 0.982699 0.991349
AVG
KNN 0.971857
GP 0.957324
GNB 0.872549
MLP 0.951442
SGD 0.913033
SVM 0.963668
Balanced Accuracy - 0.4
KNN GP GNB MLP SGD SVM DT
0 0.671479 0.5 0.890451 0.5 0.496454 0.641971 0.745567
1 0.672118 0.5 0.920154 0.5 0.499118 0.539282 0.584736
2 0.59286 0.53125 0.562611 0.5 0.62411 0.591971 0.581294
3 0.906519 0.83892 0.932503 0.680556 0.927891 0.830566 0.879664
4 0.679066 0.5 0.592299 0.5 0.544537 0.573923 0.715819
5 0.954233 0.499112 0.960924 0.5 0.5 0.858674 0.863114
6 0.913003 0.499106 0.879531 0.5 0.723425 0.834055 0.838527
7 0.919514 0.840237 0.848447 0.838153 0.842949 0.880991 0.869662
8 0.895479 0.54 0.607089 0.5 0.638192 0.86915 0.914575
9 0.956049 0.950257 0.947973 0.932625 0.956918 0.950257 0.920174
10 0.906507 0.5 0.71591 0.5 0.863639 0.837542 0.853872
11 0.615865 0.583779 0.918895 0.5 0.583779 0.702317 0.672906
12 0.625 0.534805 0.660714 0.5 0.534805 0.535714 0.623766
13 0.990054 0.615479 0.830054 0.5 0.498192 0.614575 0.918192
14 0.7 0.5 0.836796 0.5 0.69912 0.75 0.793838
0
KNN 0.79985
GP 0.59553
GNB 0.806957
MLP 0.563422
SGD 0.662209
SVM 0.734066
DT 0.785047
Balanced Accuracy - 0.5
KNN GP GNB MLP SGD SVM DT
0 0.5 0.5 0.890451 0.5 0.498227 0.641971 0.745567
1 0.628427 0.5 0.920154 0.5 0.499118 0.541045 0.584736
2 0.53125 0.53125 0.56528 0.5 0.56161 0.591971 0.581294
3 0.907442 0.830566 0.932503 0.5 0.851937 0.803711 0.879664
4 0.57484 0.5 0.595051 0.5 0.544537 0.558771 0.715819
5 0.958674 0.5 0.964476 0.5 0.5 0.86045 0.863114
6 0.887581 0.499106 0.879531 0.5 0.619951 0.704265 0.838527
7 0.929265 0.870969 0.842197 0.840385 0.806743 0.884492 0.869662
8 0.897288 0.5 0.691971 0.5 0.578192 0.871863 0.914575
9 0.95991 0.881049 0.948938 0.780084 0.957272 0.938417 0.920174
10 0.891087 0.5 0.71591 0.5 0.890051 0.80488 0.853872
11 0.558824 0.496435 0.923351 0.5 0.524955 0.644385 0.672906
12 0.607143 0.517857 0.661623 0.5 0.516948 0.535714 0.623766
13 0.990958 0.5 0.790054 0.5 0.498192 0.615479 0.918192
14 0.7 0.5 0.786796 0.5 0.6 0.75 0.793838
0
KNN 0.768179
GP 0.575149
GNB 0.807219
MLP 0.541365
SGD 0.629849
SVM 0.716494
DT 0.785047
Balanced Accuracy - 0.5 - FeatureVectors made from second half of track's history
KNN GP GNB MLP SGD SVM DT
0 0.717451 0.5 0.979184 0.5 0.5 0.696177 0.876602
1 0.914053 0.5 0.984707 0.5 0.5 0.756602 0.878726
2 0.70417 0.499572 0.819047 0.5 0.51385 0.658768 0.73887
3 0.95421 0.952441 0.957211 0.889945 0.946283 0.945841 0.970063
4 0.74912 0.583333 0.707813 0.5 0.582893 0.704545 0.791587
5 0.849205 0.5 0.970386 0.5 0.5 0.8876 0.956884
6 0.907247 0.5 0.857453 0.5 0.5 0.791862 0.933318
7 0.949784 0.877326 0.819823 0.888092 0.716641 0.888744 0.893161
8 0.888081 0.601695 0.789389 0.5 0.794586 0.834609 0.91263
9 0.940611 0.810352 0.90098 0.883699 0.716567 0.890815 0.96914
10 0.968889 0.617495 0.951392 0.5 0.875832 0.975614 0.900218
11 0.624145 0.5 0.891106 0.5 0.5 0.637206 0.825561
12 0.635923 0.49912 0.721684 0.5 0.596724 0.641739 0.767539
13 0.961769 0.786714 0.91907 0.617757 0.806217 0.786279 0.874403
14 0.865385 0.807692 0.873561 0.5 0.769231 0.884615 0.881214
0
KNN 0.842003
GP 0.635716
GNB 0.876187
MLP 0.5853
SGD 0.654588
SVM 0.798734
DT 0.877994
Balanced Accuracy - 0.6
KNN GP GNB MLP SGD SVM DT
0 0.5 0.5 0.893997 0.5 0.499113 0.570542 0.745567
1 0.629309 0.5 0.920154 0.5 0.499118 0.541927 0.584736
2 0.53125 0.53125 0.56706 0.5 0.53125 0.560721 0.581294
3 0.868542 0.802788 0.934348 0.5 0.822263 0.803711 0.879664
4 0.559689 0.5 0.595969 0.5 0.529386 0.558771 0.715819
5 0.82534 0.5 0.968028 0.5 0.5 0.794671 0.863114
6 0.863054 0.499106 0.880426 0.5 0.522738 0.706054 0.838527
7 0.930141 0.848681 0.844551 0.797325 0.733703 0.856743 0.869662
8 0.897288 0.5 0.652767 0.5 0.539096 0.871863 0.914575
9 0.952542 0.691313 0.948938 0.567278 0.958237 0.822716 0.920174
10 0.873846 0.5 0.718642 0.5 0.884649 0.78855 0.853872
11 0.529412 0.5 0.924242 0.5 0.527629 0.586453 0.672906
12 0.589286 0.5 0.66526 0.5 0.499091 0.535714 0.623766
13 0.972767 0.5 0.770054 0.5 0.498192 0.616383 0.918192
14 0.7 0.5 0.786796 0.5 0.5 0.75 0.793838
0
KNN 0.748164
GP 0.558209
GNB 0.804749
MLP 0.524307
SGD 0.602964
SVM 0.690988
DT 0.785047
0002_2_308min.mp4
KNN GP GNB MLP SGD SVM
0 0.966495 0.945876 0.780928 0.945876 0.945876 0.948454
1 0.958763 0.917526 0.752577 0.914948 0.935567 0.948454
2 0.940722 0.943299 0.603093 0.943299 0.917526 0.93299
3 0.938144 0.899485 0.474227 0.899485 0.891753 0.917526
4 0.963918 0.969072 0.96134 0.969072 0.966495 0.974227
5 0.963918 0.914948 0.487113 0.914948 0.868557 0.938144
6 0.956186 0.938144 0.698454 0.938144 0.938144 0.940722
7 0.956186 0.889175 0.801546 0.873711 0.896907 0.886598
8 0.953608 0.886598 0.618557 0.886598 0.876289 0.951031
9 0.979381 0.958763 0.67268 0.958763 0.966495 0.976804
10 0.963918 0.963918 0.716495 0.963918 0.963918 0.966495
11 0.966495 0.951031 0.914948 0.951031 0.951031 0.966495
12 0.93299 0.917526 0.907216 0.917526 0.917526 0.943299
13 0.951031 0.930412 0.878866 0.930412 0.930412 0.93299
AVG
KNN 0.956554
GP 0.930412
GNB 0.733432
MLP 0.929124
SGD 0.923417
SVM 0.945508
Balanced Accuracy - Threshold 0.4
KNN GP GNB MLP SGD SVM DT
0 0.888413 0.5 0.880109 0.5 0.498638 0.568704 0.842156
1 0.815023 0.574349 0.786214 0.5 0.764277 0.765685 0.795647
2 0.904496 0.5 0.781421 0.5 0.479508 0.577248 0.844511
3 0.885901 0.5 0.697708 0.5 0.59676 0.692234 0.803321
4 0.654699 0.5 0.978723 0.5 0.496011 0.74734 0.912677
5 0.904524 0.528895 0.683781 0.5 0.510926 0.676184 0.774861
6 0.92239 0.5 0.831044 0.5 0.478022 0.782051 0.804258
7 0.91701 0.658196 0.856571 0.651557 0.664148 0.650069 0.843864
8 0.891649 0.49564 0.679968 0.5 0.526823 0.897463 0.815011
9 0.841062 0.5 0.761425 0.5 0.59375 0.842406 0.89953
10 0.495989 0.5 0.756112 0.5 0.5 0.535714 0.597785
11 0.784054 0.5 0.528241 0.5 0.5 0.732777 0.727357
12 0.694698 0.5 0.606742 0.5 0.5 0.669066 0.726124
13 0.746794 0.5 0.625115 0.5 0.5 0.605571 0.766697
0
KNN 0.810479
GP 0.518363
GNB 0.746655
MLP 0.510825
SGD 0.54349
SVM 0.695894
DT 0.7967
Balanced Accuracy - Threshold 0.5
KNN GP GNB MLP SGD SVM DT
0 0.847606 0.5 0.884196 0.5 0.5 0.498638 0.864604
1 0.785062 0.515152 0.796073 0.5 0.724456 0.767093 0.779087
2 0.712245 0.5 0.768256 0.5 0.498634 0.57998 0.845877
3 0.817574 0.5 0.707736 0.5 0.495702 0.669459 0.814709
4 0.578014 0.5 0.980053 0.5 0.5 0.664007 0.912677
5 0.88408 0.5 0.692232 0.5 0.5 0.808323 0.795647
6 0.820971 0.5 0.839286 0.5 0.666896 0.661172 0.888965
7 0.901557 0.651557 0.861035 0.55174 0.82223 0.650069 0.851992
8 0.874736 0.5 0.685782 0.5 0.898388 0.863372 0.802193
9 0.75 0.5 0.769489 0.5 0.59375 0.779906 0.896841
10 0.5 0.5 0.784186 0.5 0.5 0.535714 0.630825
11 0.682856 0.5 0.530951 0.5 0.5 0.707816 0.753673
12 0.636412 0.5 0.593926 0.5 0.785112 0.670471 0.75316
13 0.665282 0.5 0.609367 0.5 0.582692 0.605571 0.781061
0
KNN 0.746885
GP 0.511908
GNB 0.750184
MLP 0.503696
SGD 0.61199
SVM 0.675828
DT 0.812236
Balanced Accuracy - Threshold 0.5 - FeatureVectors made from second half of track's history
KNN GP GNB MLP SGD SVM DT
0 0.847549 0.5 0.934582 0.5 0.496782 0.609132 0.92474
1 0.847881 0.747987 0.817821 0.519737 0.845329 0.79592 0.928709
2 0.899782 0.5 0.889971 0.5 0.499353 0.679091 0.911492
3 0.589489 0.5 0.852519 0.5 0.496713 0.527486 0.703395
4 0.817487 0.5 0.965553 0.5 0.5 0.758116 0.997487
5 0.915802 0.5 0.847744 0.5 0.587516 0.917565 0.879183
6 0.613145 0.5 0.915718 0.5 0.5 0.729521 0.715309
7 0.901088 0.782656 0.90724 0.747896 0.499287 0.787048 0.936561
8 0.66338 0.5 0.823465 0.5 0.586512 0.79648 0.714298
9 0.867012 0.5 0.796136 0.5 0.676471 0.764706 0.940541
10 0.617487 0.5 0.819196 0.5 0.5 0.519372 0.671834
11 0.921799 0.5 0.701617 0.5 0.5 0.860579 0.948079
12 0.812101 0.5 0.624472 0.5 0.626474 0.729436 0.799816
13 0.796823 0.499336 0.734972 0.5 0.498672 0.709251 0.878418
0
KNN 0.79363
GP 0.537856
GNB 0.830786
MLP 0.519117
SGD 0.558079
SVM 0.727407
DT 0.853562
Balanced Accuracy - Threshold 0.6
KNN GP GNB MLP SGD SVM DT
0 0.781627 0.5 0.885559 0.5 0.498638 0.498638 0.840794
1 0.739607 0.5 0.804524 0.5 0.68041 0.768502 0.799872
2 0.672255 0.5 0.786016 0.5 0.491803 0.559985 0.821783
3 0.807619 0.5 0.723496 0.5 0.495702 0.64525 0.830395
4 0.539007 0.5 0.980053 0.5 0.49867 0.62234 0.912677
5 0.856594 0.5 0.704908 0.5 0.568715 0.689927 0.797055
6 0.680632 0.5 0.846154 0.5 0.5 0.640339 0.804258
7 0.864583 0.649382 0.869963 0.5 0.665636 0.650069 0.824634
8 0.874736 0.5 0.691596 0.5 0.604915 0.78528 0.80074
9 0.71875 0.5 0.778898 0.5 0.71875 0.748656 0.89953
10 0.5 0.5 0.808251 0.5 0.5 0.535714 0.595111
11 0.682856 0.5 0.530951 0.5 0.5 0.709171 0.699686
12 0.639221 0.5 0.596735 0.5 0.5 0.654846 0.744558
13 0.628245 0.5 0.610752 0.5 0.535652 0.587052 0.763927
0
KNN 0.713267
GP 0.51067
GNB 0.758418
MLP 0.5
SGD 0.554207
SVM 0.649698
DT 0.795359

Video 0001_2

Decision Tree depth 2 accuracy

Depth 2 Depth 2 multiclass average Depth 2 one class prediction
0 0.5 0.703933 0.731834
1 0.5 nan nan
2 0.59286 nan nan
3 0.790744 nan nan
4 0.57484 nan nan
5 0.956898 nan nan
6 0.706054 nan nan
7 0.872719 nan nan
8 0.66 nan nan
9 0.962452 nan nan
10 0.712298 nan nan
11 0.5 nan nan
12 0.5 nan nan
13 0.932767 nan nan
14 0.797359 nan nan
Decision Tree depth 3 accuracy
Depth 3 Depth 3 multiclass average Depth 3 one class prediction
---: ----------: -----------------------------: -------------------------------:
0 0.499113 0.726864 0.764706
1 0.543691 nan nan
2 0.591081 nan nan
3 0.891708 nan nan
4 0.589992 nan nan
5 0.858674 nan nan
6 0.726109 nan nan
7 0.908222 nan nan
8 0.66 nan nan
9 0.952542 nan nan
10 0.716852 nan nan
11 0.674688 nan nan
12 0.534805 nan nan
13 0.955479 nan nan
14 0.8 nan nan
Decision Tree depth 4 accuracy
Depth 4 Depth 4 multiclass average Depth 4 one class prediction
---: ----------: -----------------------------: -------------------------------:
0 0.820542 0.80093 0.778547
1 0.897627 nan nan
2 0.591081 nan nan
3 0.868542 nan nan
4 0.589074 nan nan
5 0.891119 nan nan
6 0.811317 nan nan
7 0.913597 nan nan
8 0.919096 nan nan
9 0.952542 nan nan
10 0.768576 nan nan
11 0.702317 nan nan
12 0.534805 nan nan
13 0.955479 nan nan
14 0.798239 nan nan
Decision Tree depth 5 accuracy
Depth 5 Depth 5 multiclass average Depth 5 one class prediction
---: ----------: -----------------------------: -------------------------------:
0 0.820542 0.81307 0.794118
1 0.9422 nan nan
2 0.59286 nan nan
3 0.894475 nan nan
4 0.588157 nan nan
5 0.892007 nan nan
6 0.863949 nan nan
7 0.912389 nan nan
8 0.917288 nan nan
9 0.938771 nan nan
10 0.857515 nan nan
11 0.674688 nan nan
12 0.586558 nan nan
13 0.917288 nan nan
14 0.797359 nan nan
Decision Tree depth 6 accuracy
Depth 6 Depth 6 multiclass average Depth 6 one class prediction
---: ----------: -----------------------------: -------------------------------:
0 0.784828 0.799032 0.782007
1 0.631073 nan nan
2 0.591081 nan nan
3 0.895398 nan nan
4 0.691465 nan nan
5 0.92534 nan nan
6 0.891159 nan nan
7 0.916827 nan nan
8 0.917288 nan nan
9 0.929472 nan nan
10 0.853872 nan nan
11 0.674688 nan nan
12 0.586558 nan nan
13 0.898192 nan nan
14 0.798239 nan nan
Decision Tree depth 7 accuracy
Depth 7 Depth 7 multiclass average Depth 7 one class prediction
---: ----------: -----------------------------: -------------------------------:
0 0.783055 0.788302 0.759516
1 0.583854 nan nan
2 0.558941 nan nan
3 0.869465 nan nan
4 0.662997 nan nan
5 0.928893 nan nan
6 0.865738 nan nan
7 0.919786 nan nan
8 0.918192 nan nan
9 0.920174 nan nan
10 0.837542 nan nan
11 0.674688 nan nan
12 0.586558 nan nan
13 0.917288 nan nan
14 0.797359 nan nan
Decision Tree depth 8 accuracy
Depth 8 Depth 8 multiclass average Depth 8 one class prediction
---: ----------: -----------------------------: -------------------------------:
0 0.74734 0.788764 0.743945
1 0.585618 nan nan
2 0.554493 nan nan
3 0.881509 nan nan
4 0.731415 nan nan
5 0.929781 nan nan
6 0.78679 nan nan
7 0.919786 nan nan
8 0.916383 nan nan
9 0.919208 nan nan
10 0.853872 nan nan
11 0.703209 nan nan
12 0.585649 nan nan
13 0.917288 nan nan
14 0.79912 nan nan
Decision Tree depth 9 accuracy
Depth 9 Depth 9 multiclass average Depth 9 one class prediction
---: ----------: -----------------------------: -------------------------------:
0 0.746454 0.790632 0.735294
1 0.630191 nan nan
2 0.584853 nan nan
3 0.878741 nan nan
4 0.714429 nan nan
5 0.896448 nan nan
6 0.839422 nan nan
7 0.917098 nan nan
8 0.915479 nan nan
9 0.926577 nan nan
10 0.837542 nan nan
11 0.673797 nan nan
12 0.582013 nan nan
13 0.918192 nan nan
14 0.798239 nan nan
Decision Tree depth 10 accuracy
Depth 10 Depth 10 multiclass average Depth 10 one class prediction
---: -----------: ------------------------------: --------------------------------:
0 0.745567 0.784341 0.714533
1 0.584736 nan nan
2 0.584853 nan nan
3 0.853731 nan nan
4 0.699277 nan nan
5 0.896448 nan nan
6 0.812212 nan nan
7 0.913536 nan nan
8 0.915479 nan nan
9 0.919208 nan nan
10 0.83572 nan nan
11 0.702317 nan nan
12 0.583831 nan nan
13 0.918192 nan nan
14 0.8 nan nan

0001_2

Top picks

KNN GP GNB MLP SGD SVM DT
1 0.861592 0.794118 0.84083 0.648789 0.726644 0.858131 0.889273
2 0.953287 0.916955 0.935986 0.797578 0.769896 0.920415 0.918685
3 0.972318 0.960208 0.975779 0.861592 0.785467 0.960208 0.920415
Threshold
KNN GP GNB MLP SGD SVM DT
---: ---------: ---------: ---------: ---------: ---------: ---------: ---------:
0 0.892857 0.5 1 0.5 0.992908 0.85537 1
1 0.999118 0.5 1 0.5 0.5 0.9519 1
2 0.62411 0.59375 0.749889 0.5 0.625 0.62411 0.873221
3 0.957411 0.859266 0.986111 0.666667 0.75 0.972222 0.984266
4 0.620295 0.5 0.786239 0.5 0.499083 0.619377 0.809925
5 0.997336 0.828893 0.999112 0.5 0.5 0.994671 1
6 0.969212 0.943791 0.995528 0.5 0.994633 0.996422 1
7 0.925703 0.902367 0.88268 0.894638 0.681225 0.899618 0.933185
8 0.999096 0.639096 1 0.5 0.619096 0.978192 1
9 0.970174 0.967278 0.951223 0.936486 0.927542 0.968243 0.96184
10 0.911972 0.826675 0.971767 0.672414 0.859337 0.912882 0.926481
11 0.793226 0.647059 0.950089 0.5 0.870766 0.794118 0.999109
12 0.606234 0.535714 0.742143 0.5 0.550844 0.534805 0.976688
13 0.997288 0.68 1 0.5 0.796383 0.799096 0.999096
14 0.75 0.5 0.840317 0.5 0.64912 0.65 0.947359
0
:---- ---------:
KNN 0.867602
GP 0.694926
GNB 0.923673
MLP 0.578014
SGD 0.721062
SVM 0.836735
DT 0.960745

0001_2 features v2 half

Top picks

KNN GP GNB MLP SGD SVM DT
1 0.85 0.785714 0.885714 0.603571 0.725 0.871429 0.892857
2 0.946429 0.925 0.960714 0.717857 0.789286 0.932143 0.928571
3 0.964286 0.960714 0.985714 0.789286 0.792857 0.957143 0.928571
Threshold
KNN GP GNB MLP SGD SVM DT
---: ---------: ---------: ---------: ---------: ---------: ---------: ---------:
0 0.6 0.5 1 0.5 0.996364 0.798182 1
1 0.996283 0.498141 0.954545 0.5 0.765292 0.903515 0.907232
2 0.623162 0.623162 0.790441 0.5 0.816176 0.685662 1
3 1 0.821429 0.964286 0.5 0.75 0.964286 1
4 0.655979 0.5 0.806513 0.5 0.5244 0.629663 0.907643
5 1 0.914801 1 0.5 0.5 0.914801 1
6 0.994526 0.74635 0.99635 0.5 0.99635 0.994526 1
7 0.925955 0.907773 0.906851 0.904743 0.903426 0.901186 0.929644
8 0.961538 0.5 0.961538 0.5 0.538462 0.959666 0.994382
9 0.981633 0.979592 0.985714 0.932653 0.989796 0.981633 0.969388
10 0.926692 0.714286 0.964286 0.5 0.951128 0.926692 0.960526
11 0.75 0.583333 0.985401 0.5 0.75 0.833333 1
12 0.625 0.5 0.717662 0.5 0.621269 0.621269 0.998134
13 0.931985 0.625 1 0.5 0.6875 0.805147 1
14 0.5 0.5 0.75 0.5 0.5 0.5 1
0
:---- ---------:
KNN 0.831517
GP 0.660924
GNB 0.918906
MLP 0.555826
SGD 0.752677
SVM 0.827971
DT 0.977797

0001_2

features v1

python3 classification.py --cross_val -db research_data/0001_2_308min/0001_2_308min_filtered.joblib --n_jobs 18 --outdir research_data/0001_2_308min/tables/2023-01-22_cross_validation_features_v1.xlsx

Time: 290 s

Classifier parameters

{'KNN': {'n_neighbors': 15}, 'GP': {}, 'GNB': {}, 'MLP': {'max_iter': 1000, 'solver': 'sgd'}, 'SGD': {'loss': 'modified_huber'}, 'SVM': {'kernel': 'rbf', 'probability': True}, 'DT': {}}

Cross-val Basic accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.787529 0.787529 0.676674 0.706697 0.600462 0.7806 0.665127
1 2 0.769053 0.792148 0.669746 0.706697 0.540416 0.792148 0.674365
2 3 0.794457 0.812933 0.662818 0.727483 0.51963 0.794457 0.69515
3 4 0.796296 0.810185 0.68287 0.733796 0.476852 0.798611 0.664352
4 5 0.768519 0.793981 0.650463 0.712963 0.622685 0.780093 0.659722
5 Max split 0.796296 0.812933 0.68287 0.733796 0.622685 0.798611 0.69515
6 Mean 0.783171 0.799355 0.668514 0.717527 0.552009 0.789182 0.671743
7 Standart deviation 0.012105 0.0102209 0.0112473 0.0111283 0.0532432 0.00750723 0.0126307
Cross-val Balanced accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.587371 0.606337 0.461622 0.463485 0.315106 0.602184 0.575693
1 2 0.571679 0.633548 0.443841 0.474462 0.35318 0.633645 0.535623
2 3 0.607156 0.658273 0.457848 0.461005 0.291135 0.609464 0.54805
3 4 0.615799 0.661326 0.481622 0.495608 0.374723 0.645212 0.547454
4 5 0.62882 0.67252 0.484886 0.500403 0.390376 0.644971 0.565586
5 Max split 0.62882 0.67252 0.484886 0.500403 0.390376 0.645212 0.575693
6 Mean 0.602165 0.646401 0.465964 0.478993 0.344904 0.627095 0.554481
7 Standart deviation 0.0203445 0.0237329 0.0153454 0.0162422 0.0368902 0.0180117 0.0142805
Cross-val Top 1 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.787529 0.787529 0.676674 0.711316 0.528868 0.787529 0.667436
1 2 0.769053 0.792148 0.669746 0.704388 0.286374 0.787529 0.662818
2 3 0.794457 0.812933 0.662818 0.720554 0.616628 0.794457 0.667436
3 4 0.796296 0.810185 0.68287 0.736111 0.585648 0.796296 0.655093
4 5 0.768519 0.793981 0.650463 0.699074 0.631944 0.782407 0.636574
5 Max split 0.796296 0.812933 0.68287 0.736111 0.631944 0.796296 0.667436
6 Mean 0.783171 0.799355 0.668514 0.714289 0.529893 0.789644 0.657871
7 Standart deviation 0.012105 0.0102209 0.0112473 0.0130677 0.126766 0.00507418 0.011565
Cross-val Top 2 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.903002 0.903002 0.817552 0.872979 0.646651 0.886836 0.699769
1 2 0.907621 0.91455 0.796767 0.854503 0.355658 0.891455 0.713626
2 3 0.91224 0.91455 0.836028 0.879908 0.674365 0.91224 0.722864
3 4 0.93287 0.928241 0.826389 0.872685 0.712963 0.914352 0.696759
4 5 0.893519 0.902778 0.796296 0.856481 0.6875 0.872685 0.68287
5 Max split 0.93287 0.928241 0.836028 0.879908 0.712963 0.914352 0.722864
6 Mean 0.909851 0.912624 0.814606 0.867311 0.615427 0.895514 0.703178
7 Standart deviation 0.0130708 0.00938961 0.0158738 0.01001 0.131633 0.015796 0.0138702
Cross-val Top 3 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.903002 0.903002 0.817552 0.872979 0.646651 0.886836 0.699769
1 2 0.907621 0.91455 0.796767 0.854503 0.355658 0.891455 0.713626
2 3 0.91224 0.91455 0.836028 0.879908 0.674365 0.91224 0.722864
3 4 0.93287 0.928241 0.826389 0.872685 0.712963 0.914352 0.696759
4 5 0.893519 0.902778 0.796296 0.856481 0.6875 0.872685 0.68287
5 Max split 0.93287 0.928241 0.836028 0.879908 0.712963 0.914352 0.722864
6 Mean 0.909851 0.912624 0.814606 0.867311 0.615427 0.895514 0.703178
7 Standart deviation 0.0130708 0.00938961 0.0158738 0.01001 0.131633 0.015796 0.0138702
Test set basic
KNN GP GNB MLP SGD SVM DT
0 0.812155 0.803867 0.70442 0.73895 0.581492 0.796961 0.685083
Test set balanced
Class KNN GP GNB MLP SGD SVM DT
0 0 0.499298 0.5 0.975421 0.5 0.614466 0.582631 0.707631
1 1 0.494475 0.498619 0.472376 0.5 0.486188 0.496547 0.494475
2 2 0.545455 0.545455 0.657274 0.5 0.530728 0.545455 0.717455
3 3 0.88878 0.887316 0.989019 0.874388 0.84707 0.86439 0.884387
4 4 0.744382 0.706227 0.687968 0.706929 0.696395 0.706929 0.811564
5 5 0.946398 0.865226 0.946494 0.565946 0.508742 0.865226 0.847839
6 6 0.869382 0.82912 0.897706 0.497893 0.824906 0.786751 0.74368
7 7 0.919954 0.910673 0.851029 0.907193 0.676766 0.91074 0.860496
8 8 0.885255 0.888889 0.728844 0.66376 0.690811 0.915213 0.915213
9 9 0.940695 0.939082 0.945471 0.970316 0.961414 0.94389 0.895813
10 10 0.943862 0.835211 0.673919 0.701696 0.739316 0.877817 0.9133
11 11 0.638889 0.666667 0.956091 0.5 0.711599 0.721514 0.771404
12 12 0.628571 0.585714 0.708912 0.5 0.560725 0.577807 0.626802
13 13 0.997076 0.961769 0.825804 0.724269 0.502997 0.911769 0.925
14 14 0.630435 0.694939 0.69956 0.695652 0.693512 0.692799 0.691373
15 Mean 0.771527 0.754327 0.801059 0.65387 0.669709 0.759965 0.787095
16 Standart deviation 0.174872 0.158638 0.147398 0.158173 0.134322 0.148129 0.120028
Test set top k
Top KNN GP GNB MLP SGD SVM DT
0 Top_1 0.812155 0.803867 0.70442 0.73895 0.581492 0.796961 0.685083
1 Top_2 0.911602 0.88674 0.81768 0.857735 0.689227 0.857735 0.754144
2 Top_3 0.948895 0.940608 0.91989 0.928177 0.726519 0.936464 0.774862

features v1 half

python3 classification.py --cross_val -db research_data/0001_2_308min/0001_2_308min_filtered.joblib --n_jobs 18 --from_half --outdir research_data/0001_2_308min/tables/2023-01-22_cross_validation_features_v1.xlsx

Time: 294 s

Classifier parameters

{'KNN': {'n_neighbors': 15}, 'GP': {}, 'GNB': {}, 'MLP': {'max_iter': 1000, 'solver': 'sgd'}, 'SGD': {'loss': 'modified_huber'}, 'SVM': {'kernel': 'rbf', 'probability': True}, 'DT': {}}

Cross-val Basic accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.859091 0.859091 0.777273 0.772727 0.663636 0.861364 0.738636
1 2 0.813212 0.822323 0.728929 0.744875 0.519362 0.842825 0.744875
2 3 0.85877 0.85877 0.758542 0.765376 0.697039 0.84738 0.76082
3 4 0.854214 0.872437 0.751708 0.790433 0.612756 0.863326 0.776765
4 5 0.81549 0.820046 0.753986 0.751708 0.708428 0.826879 0.744875
5 Max split 0.859091 0.872437 0.777273 0.790433 0.708428 0.863326 0.776765
6 Mean 0.840155 0.846533 0.754088 0.765024 0.640244 0.848355 0.753194
7 Standart deviation 0.0211521 0.0212893 0.0154661 0.0160522 0.0689781 0.0133123 0.0138861
Cross-val Balanced accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.745847 0.738071 0.649039 0.579844 0.471325 0.729395 0.642175
1 2 0.670241 0.689663 0.571668 0.521461 0.331235 0.726802 0.637897
2 3 0.729015 0.740884 0.629378 0.568323 0.399997 0.737605 0.695905
3 4 0.741372 0.776786 0.640842 0.588259 0.3977 0.752417 0.711876
4 5 0.684284 0.700585 0.634478 0.577125 0.362404 0.713073 0.60099
5 Max split 0.745847 0.776786 0.649039 0.588259 0.471325 0.752417 0.711876
6 Mean 0.714152 0.729198 0.625081 0.567002 0.392532 0.731858 0.657769
7 Standart deviation 0.0309409 0.031182 0.0275033 0.0236429 0.0468098 0.012963 0.0406051
Cross-val Top 1 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.859091 0.859091 0.777273 0.756818 0.554545 0.861364 0.738636
1 2 0.813212 0.822323 0.728929 0.742597 0.628702 0.838269 0.756264
2 3 0.85877 0.85877 0.758542 0.781321 0.630979 0.851936 0.751708
3 4 0.854214 0.872437 0.751708 0.781321 0.630979 0.863326 0.779043
4 5 0.81549 0.820046 0.753986 0.758542 0.592255 0.826879 0.749431
5 Max split 0.859091 0.872437 0.777273 0.781321 0.630979 0.863326 0.779043
6 Mean 0.840155 0.846533 0.754088 0.76412 0.607492 0.848355 0.755017
7 Standart deviation 0.0211521 0.0212893 0.0154661 0.0150959 0.0302941 0.013922 0.0133364
Cross-val Top 2 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.936364 0.934091 0.875 0.879545 0.636364 0.938636 0.790909
1 2 0.933941 0.917995 0.845103 0.849658 0.685649 0.91344 0.794989
2 3 0.947608 0.940774 0.874715 0.881549 0.744875 0.924829 0.788155
3 4 0.949886 0.958998 0.888383 0.906606 0.738041 0.94533 0.820046
4 5 0.927107 0.943052 0.856492 0.867882 0.665148 0.936219 0.794989
5 Max split 0.949886 0.958998 0.888383 0.906606 0.744875 0.94533 0.820046
6 Mean 0.938981 0.938982 0.867938 0.877048 0.694015 0.931691 0.797817
7 Standart deviation 0.00856265 0.0133024 0.0152726 0.0186246 0.0418377 0.0112691 0.0114121
Cross-val Top 3 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.936364 0.934091 0.875 0.879545 0.636364 0.938636 0.790909
1 2 0.933941 0.917995 0.845103 0.849658 0.685649 0.91344 0.794989
2 3 0.947608 0.940774 0.874715 0.881549 0.744875 0.924829 0.788155
3 4 0.949886 0.958998 0.888383 0.906606 0.738041 0.94533 0.820046
4 5 0.927107 0.943052 0.856492 0.867882 0.665148 0.936219 0.794989
5 Max split 0.949886 0.958998 0.888383 0.906606 0.744875 0.94533 0.820046
6 Mean 0.938981 0.938982 0.867938 0.877048 0.694015 0.931691 0.797817
7 Standart deviation 0.00856265 0.0133024 0.0152726 0.0186246 0.0418377 0.0112691 0.0114121
Test set basic
KNN GP GNB MLP SGD SVM DT
0 0.838972 0.845737 0.748309 0.790257 0.668471 0.832206 0.763194
Test set balanced
Class KNN GP GNB MLP SGD SVM DT
0 0 0.583333 0.5 0.996561 0.5 0.496561 0.625 0.75
1 1 0.492558 0.494587 0.491204 0.5 0.460758 0.493911 0.495264
2 2 0.624312 0.541667 0.549862 0.5 0.605055 0.583333 0.660477
3 3 0.963568 0.963568 0.992826 0.975473 0.959264 0.963568 0.935455
4 4 0.829207 0.829207 0.808574 0.829207 0.784101 0.820266 0.819578
5 5 0.947884 0.897179 0.957499 0.515961 0.746215 0.864551 0.79859
6 6 0.871561 0.91254 0.848177 0.666667 0.74381 0.788916 0.872249
7 7 0.943567 0.937923 0.848282 0.93228 0.724258 0.936795 0.913161
8 8 0.881044 0.855403 0.812582 0.752839 0.7913 0.923077 0.829762
9 9 0.948113 0.948113 0.888171 0.971698 0.828609 0.948113 0.938657
10 10 0.987413 0.987413 0.983217 0.947844 0.498601 0.987413 0.924213
11 11 0.775004 0.777084 0.823663 0.5 0.858376 0.830559 0.800701
12 12 0.672861 0.652028 0.668405 0.5 0.613204 0.692196 0.716423
13 13 0.935443 0.949071 0.933311 0.774041 0.797016 0.938308 0.855825
14 14 0.73913 0.76087 0.772832 0.76087 0.78191 0.608696 0.76087
15 Mean 0.813 0.800444 0.825011 0.708459 0.712603 0.800313 0.804748
16 Standart deviation 0.1522 0.168336 0.148723 0.187558 0.141036 0.155331 0.115014
Test set top k
Top KNN GP GNB MLP SGD SVM DT
0 Top_1 0.838972 0.845737 0.748309 0.790257 0.668471 0.832206 0.763194
1 Top_2 0.918809 0.91475 0.828146 0.859269 0.778078 0.899865 0.817321
2 Top_3 0.933694 0.952639 0.902571 0.903924 0.794317 0.94046 0.832206

features v2

python3 classification.py --cross_val -db research_data/0001_2_308min/0001_2_308min_filtered.joblib --n_jobs 18 --features_v2 --outdir research_data/0001_2_308min/tables/2023-01-22_cross_validation_features_v2.xlsx

Time: 283 s

Classifier parameters

{'KNN': {'n_neighbors': 15}, 'GP': {}, 'GNB': {}, 'MLP': {'max_iter': 1000, 'solver': 'sgd'}, 'SGD': {'loss': 'modified_huber'}, 'SVM': {'kernel': 'rbf', 'probability': True}, 'DT': {}}

Cross-val Basic accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.812933 0.812933 0.792148 0.78291 0.658199 0.808314 0.628176
1 2 0.822171 0.836028 0.831409 0.759815 0.669746 0.840647 0.801386
2 3 0.849885 0.852194 0.829099 0.822171 0.665127 0.847575 0.727483
3 4 0.847222 0.856481 0.851852 0.805556 0.673611 0.872685 0.763889
4 5 0.858796 0.856481 0.821759 0.828704 0.643519 0.865741 0.701389
5 Max split 0.858796 0.856481 0.851852 0.828704 0.673611 0.872685 0.801386
6 Mean 0.838201 0.842824 0.825253 0.799831 0.66204 0.846992 0.724464
7 Standart deviation 0.0175356 0.0167362 0.0193305 0.0255071 0.0105887 0.0225754 0.0587951
Cross-val Balanced accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.662674 0.665915 0.634206 0.58738 0.515157 0.649763 0.588725
1 2 0.727668 0.756941 0.751901 0.629825 0.548301 0.773176 0.79248
2 3 0.728123 0.721442 0.758564 0.656637 0.523509 0.736825 0.732054
3 4 0.723303 0.741456 0.775098 0.607761 0.516335 0.787591 0.710997
4 5 0.802221 0.803183 0.802662 0.71759 0.457695 0.823725 0.725858
5 Max split 0.802221 0.803183 0.802662 0.71759 0.548301 0.823725 0.79248
6 Mean 0.728798 0.737787 0.744486 0.639839 0.512199 0.754216 0.710023
7 Standart deviation 0.044261 0.0449213 0.0578556 0.0451855 0.0297585 0.0591908 0.0667074
Cross-val Top 1 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.812933 0.812933 0.792148 0.775982 0.648961 0.808314 0.637413
1 2 0.822171 0.836028 0.831409 0.78291 0.678984 0.840647 0.792148
2 3 0.849885 0.852194 0.829099 0.817552 0.692841 0.847575 0.713626
3 4 0.847222 0.856481 0.851852 0.798611 0.710648 0.87037 0.777778
4 5 0.858796 0.856481 0.821759 0.826389 0.472222 0.863426 0.689815
5 Max split 0.858796 0.856481 0.851852 0.826389 0.710648 0.87037 0.792148
6 Mean 0.838201 0.842824 0.825253 0.800289 0.640731 0.846066 0.722156
7 Standart deviation 0.0175356 0.0167362 0.0193305 0.0193645 0.0866372 0.0216745 0.0570836
Cross-val Top 2 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.884527 0.896074 0.900693 0.877598 0.748268 0.875289 0.665127
1 2 0.921478 0.928406 0.923788 0.909931 0.727483 0.900693 0.819861
2 3 0.909931 0.935335 0.928406 0.91455 0.766744 0.91455 0.792148
3 4 0.94213 0.923611 0.921296 0.921296 0.793981 0.916667 0.837963
4 5 0.939815 0.94213 0.886574 0.93287 0.696759 0.914352 0.777778
5 Max split 0.94213 0.94213 0.928406 0.93287 0.793981 0.916667 0.837963
6 Mean 0.919576 0.925111 0.912151 0.911249 0.746647 0.90431 0.778575
7 Standart deviation 0.021182 0.0158141 0.0159296 0.0185119 0.0331756 0.0155801 0.0604761
Cross-val Top 3 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.884527 0.896074 0.900693 0.877598 0.748268 0.875289 0.665127
1 2 0.921478 0.928406 0.923788 0.909931 0.727483 0.900693 0.819861
2 3 0.909931 0.935335 0.928406 0.91455 0.766744 0.91455 0.792148
3 4 0.94213 0.923611 0.921296 0.921296 0.793981 0.916667 0.837963
4 5 0.939815 0.94213 0.886574 0.93287 0.696759 0.914352 0.777778
5 Max split 0.94213 0.94213 0.928406 0.93287 0.793981 0.916667 0.837963
6 Mean 0.919576 0.925111 0.912151 0.911249 0.746647 0.90431 0.778575
7 Standart deviation 0.021182 0.0158141 0.0159296 0.0185119 0.0331756 0.0155801 0.0604761
Test set basic
KNN GP GNB MLP SGD SVM DT
0 0.845304 0.84116 0.839779 0.805249 0.65884 0.84116 0.743094
Test set balanced
Class KNN GP GNB MLP SGD SVM DT
0 0 0.871489 0.826311 1 0.5 0.995084 0.955524 1
1 1 0.499309 0.499309 0.5 0.5 0.493785 0.495856 0.496547
2 2 0.545455 0.590909 0.71822 0.5 0.681691 0.545455 0.71465
3 3 1 0.999268 1 0.963415 0.93756 1 0.949023
4 4 0.702013 0.706227 0.766854 0.706929 0.803137 0.706227 0.928839
5 5 1 0.966667 1 0.698559 0.833429 0.966667 1
6 6 0.995084 0.996489 0.994382 0.994382 0.70412 0.995084 0.745787
7 7 0.918794 0.916473 0.907874 0.909513 0.709763 0.922274 0.871828
8 8 0.982477 0.940811 0.930556 0.760982 0.707122 0.972222 1
9 9 0.954311 0.953505 0.949473 0.947891 0.971929 0.948697 0.912655
10 10 0.928969 0.906242 0.952991 0.883515 0.893421 0.906242 0.922559
11 11 0.75 0.75 0.982295 0.694444 0.829792 0.777778 0.829084
12 12 0.592857 0.585714 0.784404 0.5 0.598733 0.592093 0.598493
13 13 0.948538 0.949269 0.9125 0.8625 0.929459 0.924269 0.85
14 14 0.695652 0.695652 0.702413 0.695652 0.645041 0.695652 0.68852
15 Mean 0.825663 0.818856 0.873464 0.741185 0.782271 0.826936 0.833866
16 Standart deviation 0.172107 0.162114 0.142525 0.175091 0.143196 0.169959 0.149033
Test set top k
Top KNN GP GNB MLP SGD SVM DT
0 Top_1 0.845304 0.84116 0.839779 0.805249 0.65884 0.84116 0.743094
1 Top_2 0.907459 0.899171 0.907459 0.883978 0.759669 0.864641 0.801105
2 Top_3 0.941989 0.93232 0.939227 0.928177 0.799724 0.926796 0.84116

features v2 half

python3 classification.py --cross_val -db research_data/0001_2_308min/0001_2_308min_filtered.joblib --n_jobs 18 --features_v2_half --outdir research_data/0001_2_308min/tables/2023-01-22_cross_validation_features_v2.xlsx

Time: 186 s

Classifier parameters

{'KNN': {'n_neighbors': 15}, 'GP': {}, 'GNB': {}, 'MLP': {'max_iter': 1000, 'solver': 'sgd'}, 'SGD': {'loss': 'modified_huber'}, 'SVM': {'kernel': 'rbf', 'probability': True}, 'DT': {}}

Cross-val Basic accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.858491 0.882075 0.882075 0.834906 0.75 0.872642 0.787736
1 2 0.890995 0.914692 0.881517 0.829384 0.767773 0.914692 0.824645
2 3 0.905213 0.914692 0.890995 0.872038 0.772512 0.900474 0.829384
3 4 0.909953 0.924171 0.933649 0.876777 0.758294 0.938389 0.843602
4 5 0.919431 0.919431 0.881517 0.867299 0.78673 0.905213 0.815166
5 Max split 0.919431 0.924171 0.933649 0.876777 0.78673 0.938389 0.843602
6 Mean 0.896817 0.911012 0.893951 0.856081 0.767062 0.906282 0.820106
7 Standart deviation 0.0212474 0.0148892 0.020174 0.019849 0.0125301 0.0212985 0.0186063
Cross-val Balanced accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.745075 0.771571 0.801526 0.696927 0.618203 0.755332 0.714732
1 2 0.836642 0.868912 0.835184 0.712032 0.613577 0.878424 0.800275
2 3 0.82326 0.850611 0.842382 0.787149 0.71024 0.854268 0.800521
3 4 0.828053 0.877625 0.905403 0.733608 0.807494 0.893864 0.75351
4 5 0.896551 0.89036 0.864262 0.743022 0.680471 0.880104 0.772083
5 Max split 0.896551 0.89036 0.905403 0.787149 0.807494 0.893864 0.800521
6 Mean 0.825916 0.851816 0.849751 0.734548 0.685997 0.852398 0.768224
7 Standart deviation 0.0482725 0.042154 0.0343507 0.0308605 0.0710216 0.0501793 0.0321346
Cross-val Top 1 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.858491 0.882075 0.882075 0.834906 0.537736 0.877358 0.79717
1 2 0.890995 0.914692 0.881517 0.867299 0.777251 0.914692 0.838863
2 3 0.905213 0.914692 0.890995 0.872038 0.748815 0.900474 0.838863
3 4 0.909953 0.924171 0.933649 0.867299 0.658768 0.933649 0.810427
4 5 0.919431 0.919431 0.881517 0.890995 0.819905 0.905213 0.800948
5 Max split 0.919431 0.924171 0.933649 0.890995 0.819905 0.933649 0.838863
6 Mean 0.896817 0.911012 0.893951 0.866507 0.708495 0.906277 0.817254
7 Standart deviation 0.0212474 0.0148892 0.020174 0.0180564 0.100396 0.018391 0.0181645
Cross-val Top 2 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.943396 0.957547 0.948113 0.910377 0.570755 0.933962 0.849057
1 2 0.971564 0.943128 0.924171 0.952607 0.815166 0.938389 0.876777
2 3 0.957346 0.962085 0.943128 0.962085 0.815166 0.966825 0.909953
3 4 0.966825 0.966825 0.962085 0.952607 0.672986 0.976303 0.824645
4 5 0.971564 0.966825 0.933649 0.966825 0.876777 0.957346 0.824645
5 Max split 0.971564 0.966825 0.962085 0.966825 0.876777 0.976303 0.909953
6 Mean 0.962139 0.959282 0.942229 0.9489 0.75017 0.954565 0.857015
7 Standart deviation 0.0107134 0.00878185 0.0128796 0.0200331 0.111916 0.016228 0.0327169
Cross-val Top 3 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.943396 0.957547 0.948113 0.910377 0.570755 0.933962 0.849057
1 2 0.971564 0.943128 0.924171 0.952607 0.815166 0.938389 0.876777
2 3 0.957346 0.962085 0.943128 0.962085 0.815166 0.966825 0.909953
3 4 0.966825 0.966825 0.962085 0.952607 0.672986 0.976303 0.824645
4 5 0.971564 0.966825 0.933649 0.966825 0.876777 0.957346 0.824645
5 Max split 0.971564 0.966825 0.962085 0.966825 0.876777 0.976303 0.909953
6 Mean 0.962139 0.959282 0.942229 0.9489 0.75017 0.954565 0.857015
7 Standart deviation 0.0107134 0.00878185 0.0128796 0.0200331 0.111916 0.016228 0.0327169
Test set basic
KNN GP GNB MLP SGD SVM DT
0 0.892045 0.889205 0.875 0.877841 0.792614 0.875 0.857955
Test set balanced
Class KNN GP GNB MLP SGD SVM DT
0 0 0.998555 0.99422 1 0.5 0.998555 1 1
1 1 0.5 0.5 0.5 0.5 0.485795 0.497159 0.5
2 2 0.6 0.6 0.535735 0.5 0.495677 0.6 0.898559
3 3 1 1 1 1 0.971988 1 0.973494
4 4 0.99422 0.912331 0.810694 0.828998 0.813102 0.823218 0.978324
5 5 1 0.966667 0.966667 0.833333 0.825915 0.966667 0.965183
6 6 0.99711 0.998555 0.99711 0.99711 0.915222 0.99711 0.998555
7 7 0.944976 0.942584 0.926021 0.925837 0.901178 0.949761 0.943136
8 8 0.969228 0.969228 0.944444 0.885895 0.885895 0.972222 0.913673
9 9 0.95 0.96 0.95 0.95 0.97 0.95 0.93
10 10 0.995614 0.997076 0.997076 0.997076 0.997076 0.997076 0.997076
11 11 0.888889 0.944444 0.997085 0.833333 0.942987 0.944444 0.944444
12 12 0.691176 0.676471 0.831761 0.617647 0.544118 0.676471 0.762116
13 13 0.973684 0.973684 0.921053 0.868421 0.919551 0.921053 0.91805
14 14 0.772727 0.772727 0.809384 0.772727 0.722874 0.636364 0.768328
15 Mean 0.885079 0.880532 0.879135 0.800692 0.825995 0.862103 0.899396
16 Standart deviation 0.158849 0.157205 0.15606 0.177977 0.174325 0.165733 0.128197
Test set top k
Top KNN GP GNB MLP SGD SVM DT
0 Top_1 0.892045 0.889205 0.875 0.877841 0.792614 0.875 0.857955
1 Top_2 0.9375 0.926136 0.931818 0.931818 0.84375 0.926136 0.886364
2 Top_3 0.960227 0.954545 0.960227 0.954545 0.875 0.948864 0.903409

features v3

python3 classification.py --cross_val -db research_data/0001_2_308min/0001_2_308min_filtered.joblib --n_jobs 9 --features_v3 --outdir research_data/0001_2_308min/tables/2023-01-23_cross_validation_features_v3.xlsx

Time: 313 s

Classifier parameters

{'KNN': {'n_neighbors': 15}, 'GP': {}, 'GNB': {}, 'MLP': {'max_iter': 1000, 'solver': 'sgd'}, 'SGD': {'loss': 'modified_huber'}, 'SVM': {'kernel': 'rbf', 'probability': True}, 'DT': {}}

Cross-val Basic accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.859122 0.863741 0.845266 0.803695 0.725173 0.863741 0.725173
1 2 0.840647 0.840647 0.838337 0.773672 0.415704 0.842956 0.775982
2 3 0.856813 0.856813 0.787529 0.792148 0.69746 0.861432 0.817552
3 4 0.875 0.868056 0.840278 0.782407 0.689815 0.881944 0.775463
4 5 0.849537 0.858796 0.835648 0.789352 0.726852 0.858796 0.715278
5 Max split 0.875 0.868056 0.845266 0.803695 0.726852 0.881944 0.817552
6 Mean 0.856224 0.857611 0.829412 0.788255 0.651001 0.861774 0.761889
7 Standart deviation 0.01139 0.00934346 0.0211761 0.0100161 0.118562 0.0124404 0.0374216
Cross-val Balanced accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.787529 0.796823 0.790671 0.687552 0.533471 0.789128 0.691773
1 2 0.734616 0.733339 0.745906 0.568053 0.544062 0.732151 0.784764
2 3 0.730626 0.737681 0.726472 0.592593 0.498917 0.733271 0.762009
3 4 0.778998 0.775986 0.760523 0.583323 0.567668 0.79253 0.71616
4 5 0.795842 0.810037 0.797101 0.591555 0.592354 0.819687 0.764668
5 Max split 0.795842 0.810037 0.797101 0.687552 0.592354 0.819687 0.784764
6 Mean 0.765522 0.770773 0.764135 0.604615 0.547294 0.773353 0.743875
7 Standart deviation 0.0274159 0.0308022 0.0266638 0.0423882 0.0315829 0.0348356 0.0343974
Cross-val Top 1 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.859122 0.863741 0.845266 0.787529 0.706697 0.863741 0.729792
1 2 0.840647 0.840647 0.838337 0.773672 0.709007 0.849885 0.78291
2 3 0.856813 0.856813 0.787529 0.799076 0.769053 0.861432 0.806005
3 4 0.875 0.868056 0.840278 0.768519 0.699074 0.881944 0.777778
4 5 0.849537 0.858796 0.835648 0.80787 0.747685 0.858796 0.708333
5 Max split 0.875 0.868056 0.845266 0.80787 0.769053 0.881944 0.806005
6 Mean 0.856224 0.857611 0.829412 0.787333 0.726303 0.86316 0.760964
7 Standart deviation 0.01139 0.00934346 0.0211761 0.0148346 0.0272386 0.010502 0.0361515
Cross-val Top 2 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.937644 0.942263 0.942263 0.91455 0.748268 0.935335 0.745958
1 2 0.91455 0.935335 0.939954 0.893764 0.759815 0.919169 0.817552
2 3 0.921478 0.939954 0.91224 0.919169 0.838337 0.921478 0.842956
3 4 0.958333 0.949074 0.944444 0.925926 0.736111 0.923611 0.8125
4 5 0.94213 0.939815 0.928241 0.918981 0.796296 0.914352 0.743056
5 Max split 0.958333 0.949074 0.944444 0.925926 0.838337 0.935335 0.842956
6 Mean 0.934827 0.941288 0.933428 0.914478 0.775766 0.922789 0.792404
7 Standart deviation 0.0155124 0.00449364 0.011982 0.0109763 0.0372014 0.00698693 0.040458
Cross-val Top 3 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.937644 0.942263 0.942263 0.91455 0.748268 0.935335 0.745958
1 2 0.91455 0.935335 0.939954 0.893764 0.759815 0.919169 0.817552
2 3 0.921478 0.939954 0.91224 0.919169 0.838337 0.921478 0.842956
3 4 0.958333 0.949074 0.944444 0.925926 0.736111 0.923611 0.8125
4 5 0.94213 0.939815 0.928241 0.918981 0.796296 0.914352 0.743056
5 Max split 0.958333 0.949074 0.944444 0.925926 0.838337 0.935335 0.842956
6 Mean 0.934827 0.941288 0.933428 0.914478 0.775766 0.922789 0.792404
7 Standart deviation 0.0155124 0.00449364 0.011982 0.0109763 0.0372014 0.00698693 0.040458
Test set basic
KNN GP GNB MLP SGD SVM DT
0 0.857735 0.850829 0.838398 0.801105 0.650552 0.85221 0.751381
Test set balanced
Class KNN GP GNB MLP SGD SVM DT
0 0 1 0.916667 1 0.5 0.872191 0.958333 1
1 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5
2 2 0.636364 0.636364 0.700816 0.5 0.723766 0.681818 0.616728
3 3 0.999268 1 1 0.889512 0.969503 1 0.991215
4 4 0.831227 0.708333 0.782303 0.5 0.498596 0.706227 0.891386
5 5 1 0.933333 1 0.5 0.633333 0.933333 1
6 6 0.99368 0.99368 0.994382 0.992978 0.82912 0.995084 0.745787
7 7 0.915313 0.909513 0.889379 0.897912 0.576657 0.912447 0.835033
8 8 0.958333 0.958333 1 0.958333 0.539486 0.958333 1
9 9 0.973542 0.948697 0.955087 0.952699 0.976737 0.929467 0.916656
10 10 0.928257 0.902681 0.926638 0.8615 0.923854 0.902681 0.964452
11 11 0.75 0.75 0.981586 0.666667 0.883931 0.75 0.829084
12 12 0.596177 0.577807 0.755352 0.5 0.497706 0.596942 0.54135
13 13 1 1 0.975 0.825 0.861769 1 1
14 14 0.586957 0.673913 0.673913 0.5 0.562705 0.717391 0.584103
15 Mean 0.844608 0.827288 0.87563 0.702973 0.72329 0.836137 0.82772
16 Standart deviation 0.175022 0.16338 0.150931 0.202352 0.179599 0.156982 0.178167
Test set top k
Top KNN GP GNB MLP SGD SVM DT
0 Top_1 0.857735 0.850829 0.838398 0.801105 0.650552 0.85221 0.751381
1 Top_2 0.935083 0.912983 0.910221 0.868785 0.803867 0.901934 0.773481
2 Top_3 0.964088 0.958564 0.941989 0.941989 0.846685 0.958564 0.799724

features v3 half

python3 classification.py --cross_val -db research_data/0001_2_308min/0001_2_308min_filtered.joblib --n_jobs 18 --features_v3_half --outdir research_data/0001_2_308min/tables/2023-01-23_cross_validation_features_v3_half.xlsx --seed 10

Time: 185 s

Classifier parameters

{'KNN': {'n_neighbors': 15}, 'GP': {}, 'GNB': {}, 'MLP': {'max_iter': 1000, 'solver': 'sgd'}, 'SGD': {'loss': 'modified_huber'}, 'SVM': {'kernel': 'rbf', 'probability': True}, 'DT': {}}

Cross-val Basic accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.905213 0.933649 0.886256 0.829384 0.815166 0.919431 0.791469
1 2 0.914286 0.919048 0.919048 0.847619 0.871429 0.942857 0.871429
2 3 0.87619 0.880952 0.87619 0.847619 0.795238 0.904762 0.847619
3 4 0.890476 0.909524 0.87619 0.847619 0.828571 0.914286 0.82381
4 5 0.914286 0.919048 0.92381 0.847619 0.785714 0.928571 0.838095
5 Max split 0.914286 0.933649 0.92381 0.847619 0.871429 0.942857 0.871429
6 Mean 0.90009 0.912444 0.896299 0.843972 0.819224 0.921981 0.834484
7 Standart deviation 0.0147844 0.0175336 0.0208993 0.00729406 0.0300911 0.0129728 0.026513
Cross-val Balanced accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.829469 0.898358 0.845024 0.676338 0.680008 0.874671 0.716359
1 2 0.814007 0.867946 0.859663 0.685219 0.685219 0.90734 0.849408
2 3 0.772381 0.79746 0.79037 0.732808 0.644127 0.833968 0.802438
3 4 0.774747 0.839798 0.835354 0.71596 0.750388 0.826465 0.798052
4 5 0.845053 0.853807 0.891686 0.707925 0.680169 0.867275 0.804602
5 Max split 0.845053 0.898358 0.891686 0.732808 0.750388 0.90734 0.849408
6 Mean 0.807132 0.851474 0.844419 0.70365 0.687982 0.861944 0.794172
7 Standart deviation 0.0291226 0.0332527 0.0330768 0.0205215 0.0344965 0.0293006 0.0431231
Cross-val Top 1 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.905213 0.933649 0.886256 0.824645 0.834123 0.919431 0.78673
1 2 0.914286 0.919048 0.919048 0.852381 0.704762 0.942857 0.885714
2 3 0.87619 0.880952 0.87619 0.847619 0.828571 0.909524 0.842857
3 4 0.890476 0.909524 0.87619 0.838095 0.666667 0.909524 0.82381
4 5 0.914286 0.919048 0.92381 0.847619 0.819048 0.928571 0.847619
5 Max split 0.914286 0.933649 0.92381 0.852381 0.834123 0.942857 0.885714
6 Mean 0.90009 0.912444 0.896299 0.842072 0.770634 0.921981 0.837346
7 Standart deviation 0.0147844 0.0175336 0.0208993 0.00987264 0.0705405 0.0126183 0.0323129
Cross-val Top 2 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.966825 0.962085 0.957346 0.924171 0.876777 0.976303 0.805687
1 2 0.990476 0.97619 0.957143 0.947619 0.938095 0.985714 0.904762
2 3 0.942857 0.938095 0.938095 0.942857 0.857143 0.933333 0.871429
3 4 0.97619 0.971429 0.952381 0.928571 0.72381 0.966667 0.871429
4 5 0.952381 0.942857 0.957143 0.904762 0.880952 0.957143 0.871429
5 Max split 0.990476 0.97619 0.957346 0.947619 0.938095 0.985714 0.904762
6 Mean 0.965746 0.958131 0.952422 0.929596 0.855355 0.963832 0.864947
7 Standart deviation 0.0168781 0.0151877 0.00740373 0.0151519 0.071088 0.0179851 0.0323202
Cross-val Top 3 accuracy
Split KNN GP GNB MLP SGD SVM DT
0 1 0.966825 0.962085 0.957346 0.924171 0.876777 0.976303 0.805687
1 2 0.990476 0.97619 0.957143 0.947619 0.938095 0.985714 0.904762
2 3 0.942857 0.938095 0.938095 0.942857 0.857143 0.933333 0.871429
3 4 0.97619 0.971429 0.952381 0.928571 0.72381 0.966667 0.871429
4 5 0.952381 0.942857 0.957143 0.904762 0.880952 0.957143 0.871429
5 Max split 0.990476 0.97619 0.957346 0.947619 0.938095 0.985714 0.904762
6 Mean 0.965746 0.958131 0.952422 0.929596 0.855355 0.963832 0.864947
7 Standart deviation 0.0168781 0.0151877 0.00740373 0.0151519 0.071088 0.0179851 0.0323202
Test set basic
KNN GP GNB MLP SGD SVM DT
0 0.89916 0.910364 0.890756 0.823529 0.840336 0.929972 0.840336
Test set balanced
Class KNN GP GNB MLP SGD SVM DT
0 0 1 1 1 0.5 1 1 0.998588
1 1 0.99858 1 1 0.5 0.497159 1 1
2 2 0.776341 0.83046 0.862069 0.5 0.888889 0.887452 0.868774
3 3 0.997076 1 1 0.998538 0.997076 1 0.99269
4 4 0.659091 0.590909 0.721845 0.5 0.647151 0.771235 0.733582
5 5 1 0.97619 0.97619 0.5 0.806548 0.97619 0.928571
6 6 0.998525 1 0.99705 0.805556 0.969272 1 0.998525
7 7 0.942797 0.957732 0.917373 0.927966 0.955508 0.964088 0.916725
8 8 0.97619 0.97619 1 0.904762 0.91369 1 0.973214
9 9 0.998366 0.998366 0.996732 0.996732 0.998366 0.998366 1
10 10 0.958333 0.958333 0.916667 0.916667 0.958333 0.958333 0.916667
11 11 0.966667 0.966667 0.980994 0.766667 1 1 0.962281
12 12 0.75744 0.738095 0.776786 0.571429 0.657738 0.736607 0.727679
13 13 1 1 0.958333 0.958333 0.912319 1 0.958333
14 14 0.772727 0.909091 0.909091 0.5 0.863636 0.909091 0.904756
15 Mean 0.920142 0.926802 0.934209 0.72311 0.871046 0.946758 0.925359
16 Standart deviation 0.111885 0.114996 0.0840682 0.207984 0.149048 0.0831524 0.0859361
Test set top k
Top KNN GP GNB MLP SGD SVM DT
0 Top_1 0.89916 0.910364 0.890756 0.823529 0.840336 0.929972 0.840336
1 Top_2 0.97479 0.971989 0.94958 0.918768 0.871148 0.980392 0.87395
2 Top_3 0.983193 0.988796 0.969188 0.985994 0.910364 0.991597 0.887955

https://drive.google.com/drive/folders/1OVgMHfInUQ8JfWMALURHPvEBddDnDzDz?usp=sharing

References

[1]
@misc{bochkovskiy2020yolov4,
title={YOLOv4: Optimal Speed and Accuracy of Object Detection},
author={Alexey Bochkovskiy and Chien-Yao Wang and Hong-Yuan Mark Liao},
year={2020},
eprint={2004.10934},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@InProceedings{Wang_2021_CVPR,
author = {Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
title = {{Scaled-YOLOv4}: Scaling Cross Stage Partial Network},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {13029-13038}
}

[2] @article{wang2022yolov7,
title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors},
author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
journal={arXiv preprint arXiv:2207.02696},
year={2022}
}

[3]
@inproceedings{Wojke2017simple,
title={Simple Online and Realtime Tracking with a Deep Association Metric},
author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
year={2017},
pages={3645--3649},
organization={IEEE},
doi={10.1109/ICIP.2017.8296962}
}
@inproceedings{Wojke2018deep,
title={Deep Cosine Metric Learning for Person Re-identification},
author={Wojke, Nicolai and Bewley, Alex},
booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2018},
pages={748--756},
organization={IEEE},
doi={10.1109/WACV.2018.00087}
}

About

Calculating trajectories of objects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published