Cross-validation: while training and testing classification methods I will always be using 5-fold cross validation and presenting average results from these experiments, unless otherwise stated. Thus train to test dataset split is 80% to 20%.
Most used Python libraries: sklearn, skimage, matplotlib, numpy, torch.
I am using a multi-class outdoor weather classification dataset (Multi-Class Weather Classification from Still Image Using Said Ensemble Method | IEEE Conference Publication | IEEE Xplore). It contains 1125 images and 4 classes: 'cloudy', 'sunrise', 'rain' and 'shine'. With distribution of classes as such: {'cloudy': 300, 'shine': 253, 'sunrise': 357, 'rain': 215}.
Images were converted to greyscale and pixels were discretized to have 8-bit integer values.
Looking at the sample images I thought that it might be a good idea to see if texture descriptors (e.g., Haralick features, LBP) can deal with this task. Therefore, I have picked local binary patterns (LBP) as my feature extraction mechanism. skimage python library defines 10 local binary patterns: 9 of them are called uniform and represent something like an edge, line, corner etc., whilst the last pattern is called non-uniform and does not represent anything.
Let’s look at the mean class distribution of these patterns:
We can see that there are some differences present between the classes, it especially clear if we compare rain’s pattern distribution and cloudy’s pattern distribution. This will help the classifier to do its job.
Experiments with this classifier will include picking different amount of neighbors (k = 3, 5, 10, 20, 100), fitting on a full dataset or an half of the dataset. Measurement metrics are test accuracy and confusion matrix.
Row normalized confusion matrices:
k = 5, full data k = 100, full data
I will be using a 3-layer MLP: input, hidden, output. Training will be done with the following parameters: cross entropy loss, learning rate = 0.03, optimizer – SGD with batch size of 16 datapoints. Experiments with this classifier will include using either 512 nodes or 10 nodes in the hidden layer, training on a full dataset or an half of the dataset. Measurement metrics are train loss, test and train accuracy and confusion matrix.
Row normalized confusion matrices:
Full data, hidden layer nodes = 512 Full data, hidden layer nodes = 10
Regarding kNN: 1. It seems that it works the best when the amount of neighbors equals to 5. If we pick more than 20 neighbors then the performance will significantly degrade, which is as expected. 2. Decreasing the data by half seemed to have little impact on the accuracy of the algorithm.
Regarding MLP: 1. As in the case with kNN decreasing the data in half seemed to have almost no impact on the model performance. 2. In my architecture of a 3-layer network, the hidden layer had 512 nodes or 10 nodes. Even though the complexity of the model was different, the testing accuracy produced was similar, so generalization capability was the same. However, the difference is seen in training accuracy, where the model with 512 nodes was a better fit for the data. 3. If there was more time, I would try out different learning rates, different network configurations and train for more epochs.
In this experiment, kNN performed better than MLP. kNN approach was also faster, easier to implement and had less tunable parameters. On the other hand, MLP has a lot of tunable parameters, thus having a potential to outperform kNN, if enough time is allocated to explore the possibilities. In addition, local binary patterns are just one of many features I could have used in classifying this dataset. In the original paper, authors use ensemble of features and classifiers in order to achieve high classification accuracy.
SMS Spam Collection v.1: [PDF] Contributions to the study of SMS spam filtering: new collection and results | Semantic Scholar. This dataset consists of spam and non-spam SMS messages, labeled “spam” and “ham”. It contains 4,827 SMS legitimate messages (86.6%) and a total of 747 (13.4%) spam messages.
All messages are stripped of punctuations, stopwords (eg. “the”, “a”, “an”, “in”) and lowercased.
Bag-of-words from sklearn library is used. Basically, features are word frequencies in spam and non-spam messages.
Experiments with this classifier will include picking different alphas (0.00001, 1.0, 10.0, 100.0). Measurement metrics are accuracy, precision, recall, f1-score and confusion matrix.
Row normalized confusion matrices:
Alpha = 1.0 Alpha = 100.0
Experiments with this classifier will include picking different kernels (polynomial or radial), using full data set or half of it. Measurement metrics are accuracy, precision, recall, f1-score and confusion matrix.
Row normalized confusion matrices:
Radial kernel, full data Radial kernel, half data
Polynomial kernel, full data Polynomial kernel, half data
Regarding Naïve Bayes, the default value of parameter in sklearn library alpha is 1.0. As seen in my experiments it proves to be the best choice for this case of SMS spam classification.
Regarding SVM, we can see that use of radial kernel significantly beats the use of polynomial kernel. We can also see that halving the data used has no impact on the precision, but deals a significant blow to the recall score.
In this experiment, both methods used, Naïve Bayes and SVM, performed on par. Furthermore, it seemed extremely easy for classifiers to correctly predict that non-spam were non-spam messages. On the other hand, it was a lot harder to say that spam is spam. Which I would say is not surprising, as humans commonly have similar performance in regards to telling if a message is spam or not. In addition,
This dataset is about classifying positions occurring during the physical exercise called “push- up”. It consists of two classes “up” and “down”, one is for images of when the body is positioned at an angle and not close to the floor, and the other one is for when the body is positioned horizontally and close to the floor. It consists of 32 images, 16 for each class. Some are taken in good brightness conditions (during the day) and some in a slightly worse (during the evening).
Down position: Up position:
Some of the images taken at night had to have a two-fold brightness increase for feature extraction to work.
Google’s MediaPipe library for python was used to extract key body points from the images. Each image gives us 33 pairs of x and y coordinates of a human body parts’ relative position to each other. Data augmentation was also done by flipping the coordinates along the x axis. Thus making the dataset twice as big.
Experiments with this classifier will include picking different amount of neighbors (k = 1, 3, 5, 10). Measurement metrics are accuracy, precision, recall, f1-score and confusion matrix.
Row normalized confusion matrices:
k = 1 k = 3
k = 5 k = 10
Experiments with this classifier will include picking different kernels (polynomial: degree = 1, 2,3, or radial). Measurement metrics are accuracy, precision, recall, f1-score and confusion matrix.
Radial kernel Polynomial kernel, degree = 1
Polynomial kernel, degree = 2 Polynomial kernel, degree = 3
Regarding kNN, having just one neighbor as a reference seemed to yield the best result for our model.
Regarding SVM, amongst kernels used in the experiment polynomial with degree 2 seemed to perform the best. It is worth noting that the default degree in sklearn is 3, so the search for a better degree was worth it.
In this experiment, I was pleasantly surprised how well kNN and SVM performed on such a small dataset (32 images). I think I would spend a lot more time trying to get MLP or any deep learning approach working as good as these models. Moreover, feature extractor provided by Google helped immensely. It allowed me to just deal with 33 integer coordinates instead of thousands of pixel values, making this classification task a lot simpler.