Computer Vision has become ubiquitous in our society, with applications in image/video search and understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, segmentation, localization and detection. Recent developments in neural network (a.k.a. deep learning) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems.
Download the dataset you want to apply PCA to, and use the following command:
cd PCA&KNN/
python3 ./pca.py
Remenber that you dataset should be located in "./PCA&KNN/train" as well as "./PCA&KNN/test"
- Perform PCA on the training set. Plot the mean face and the first three eigenfaces.
- Plot these reconstructed images using the first n = 3, 50, 100, 239 eigenfaces, with the corresponding MSE values.
- To apply the k-nearest neighbors classifier to recognize test set images, and use such hyperparameters, k = {1, 3, 5} and n = {3, 50, 159}. Show the 3-fold cross validation results.
- filterBank.mat: The given .mat file contains a set of 38 filters (also known as filter bank). This filter bank is stored as a 49 x 49 x 38 matrix (i.e., each filter is of size 49 x 49 pixels).
- Images: zebra.jpg and mountain.jpg
cd Segmentation/
python3 ./color_seg.py
python3 ./text_seg.py
- Original images
-
Color segmentation
Convert both RGB images into Lab color space and plot the segmentation results for both images based on the clustering results
-
Texture segmentation
Convert the color images into grayscale ones, before extracting image textural features via the provided filter bank and plot the texture segmentation results for both images.
- Combine both color and texture features (3 + 38 = 41-dimensional features) for image segmentation
The implementation of a basic image-based bag-of-words (BoW) model for a scene image dataset with 5 categories.
cd Recognition/
python3 ./interst_point_detection.py
python3 ./kmeans.py
- Detect interest points, calculate their descriptors for this image using SURF, and plot the interest point detection results
-
Use k-means algorithm to divide these interest points into C clusters
Extract the detected interest points from all of the 50 images in Train-10, and stack them into a N × d matrix, where N denotes the total number of interest points and d is the dimension of its descriptor. (choose C = 50 and maximum number of iterations = 5000) The centroid of each cluster then indicates a visual word. Randomly select 6 clusters from the above results and plot the visual words and the associated interest points in this PCA subspace.
Perform 2 semantic segmentation models, which predicts a label to each pixel with CNN models. The input is a RGB image while the output is the semantic segmentation/prediction. The models I used are the base model (VGG16-FCN32s) and the improved model(FCN8s). Before using the codes, the training/testing data and model file (vgg16_weights_tf_dim_ordering_tf_kernels.h5) are required.
cd Semantic-segmentation
bash ss.sh $1 $2
bash ss_best.sh $1 $2
$1: testing images directory (images are named 'xxxx_sat.jpg')
$2: output images directory
python3 mean_ios_evaluate.py -g ground_truth -p prediction
I also provide my pre-trained models for both. Check model_files .
- Show the predicted segmentation mask from the base model.
- Show the predicted segmentation mask from the improved model.
- Demonstrate the IOU accuracy.