GitHub - akshaym96/Computer-Vision-Project: Yelp Photo Classification Task

Yelp Photo Classification Task

We need to predict the labels describing the business attributes of restaurants from the user submitted pictures. This is a Multi-Instance Multi-Label (MIML) classification problem. Each photo is assigned a business id and each business id is assigned multiple labels as given below. The problem was launched as Kaggle competition. [10]

The original dataset contains nearly 2.3 lakh images taken by the users. Each photo is assigned a id called photoID. Each photoID is mapped to a businessID and each of the businessID is assigned multiple labels as shown in the figure above.

The different labels are listed below:

0: good_for_lunch
1: good_for_dinner
2: takes_reservations
3: outdoor_seating
4: restaurant_is_expensive
5: has_alcohol
6: has_table_service
7: ambience_is_classy
8: good_for_kids

Method-1 SIFT, Random Forest

In this method, features for both train and test images were extracted using SIFT. Then these descriptors were used to build clusters using K-Means. The clusters from K-Means are used to build the histograms for the train and test images. The extracted histograms are fed to 9 Random Forest Classifier for training and testing one 9 different available labels.

Feature Extraction time:-

Train Images:- 15 hours 37 minutes 37 seconds
Test Images:- 1 hour 22 minutes 26 secs.
K-means clustering :- 3 hours 7 minutes 48 secs.
Training time:- 1 minute 34 secs.
Testing Time:- 30.5867 secs. Results:- Accuracy:- Strict Match:- 36.7031 % One Mismatch:- 40.8372% Two Mismatch:- 51.2016%

Method-2 VGG + Fisher vectors

In this method, we use image features extracted by VGG model, run fisher vectors over the features to get the global descriptors and train a Multi-Output SVC which is used as the classifier. Since the data set is of varied image sizes, we are taking only 5003753 image sizes. There are 2 types of preprocessing on images were, one is direct resize to 2242243 and other method is taking 7 cropped images averages which overlap at various positions of the image.

- VGG

The image features are extracted using a pre-trained VGG-16 model. In this model the last 3 layers have been removed mainly the fully connected layer, Relu and Dropout. So the outputs of the model are 4096. The architecture is as shown below.

- Fisher Vectors

Fisher Vectors was introduced in the paper, authored by F. Perronnin and C. Dance[1] and Florent Perronnin, Jorge Sánchez, and Thomas Mensink[2]. The FV is an image representation obtained by pooling local image features. It is frequently used as a global image descriptor in visual classification. While the FV can be derived as a special, approximate, and improved case of the general Fisher Kernel framework, it is easy to describe directly. Let I=(x1,…,xN) be a set of D dimensional feature vectors (e.g. SIFT descriptors) extracted from an image. Let Θ=(μk,Σk,πk:k=1,…,K) be the parameters of a Gaussian Mixture Model fitting the distribution of descriptors. For each mode k, consider the mean and covariance deviation vectors. The FV of image I is the stacking of the vectors uk and then of the vectors vk for each of the K modes in the Gaussian mixtures.

For implementation , GGMM uses Cuda via CudaMat was used to generate the Gaussian Mixture Model[3][4]. Regarding Fisher Vector python implementation, there is no library implementation. So we used a reference implementation[5] and modified it accordingly to fit the GGMM model.

Method 3 - Res-Net + Multi-Output SVM Classifier

Res-Net50 and Res-Net10 was experimented and the behavior and performance of both were studied. Res-Net101 ran for 4.2 hours and Res-net50 took comparatively less time of 3.5 hours. In Res-Net101 model, a dropout layer of 0.2 and the linear layer was added in the end to prevent overfitting and to bring down the count of layers from 4096 to 2048 before feeding it to the model. The performance of Res-Net101 was higher than the Res-Net50. A slight increase in accuracy of 1.9% was seen.

Features from the Res-Net was fed to the Multi-Output SVM model since they are less prone to overfitting. The Res-Net network is fine-tuned as above and can be fine-tuned some more in several different ways. It gives the user more freedom to access and modify the parts of the network which fairly affects the outcome.

Method 4 - SIFT + Ensemble model

This model uses Bag of visual words to extract the features using SIFT with a dense sampling of stride 50 and scale of 20. Then the image id is mapped with each of the business ids. Many pictures would have been taken in the same restaurant. So all these pictures will be mapped to the same business id and their labels. Then an ensemble model was built consisting of scikit-learn models (Decision tree Classifier, Random Forest Classifier, Nearest Neighbors, GradientBoostingClassifier, SGDClassifier). These sklearn models are fed to Multi-Output classifier since the labels in our problem is an array of values containing 0 and 1. Using an ensemble of models for predicting is proving to give better results than using a single model. The results of all the models are compared and the result predicted by most of the models is taken as the output. In case, all the models give a varied result, the prediction of the model with the highest accuracy is taken as the final prediction. But in these models, there is less power given to the user to work with the parameters which affect the outcome of each of the model. For this data and for the various Res-Net architectures and parameters I tried above, Res-Net performs better in this case.

Method 5 - AlexNet + OneVsRestClassifier

In this method, we have used features from the penultimate linear layer which outputs 4096 dim vector. These features are then fed to OneVsRestClassifier for training and testing.

References

[1] F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image categorization. In Proc. CVPR, 2006.
[2] Florent Perronnin, Jorge Sánchez, and Thomas Mensink. Improving the fisher kernel for large-scale image classification. In Proc. ECCV, 2010.
[3] Documentation of Ggmm - http://ebattenberg.github.io/ggmm/cpu.html
[4] GMMs using CUDA (via CUDAMat) - https://github.com/ebattenberg/ggmm
[5] Fisher Code is used as reference code and modified - https://github.com/jacobgil/pyfishervector
[6] VLFeat - http://www.vlfeat.org/api/fisher.html
[7] Scikit Learn - https://scikit-learn.org/
[8] Pytorch tutorials
[9] Numpy - https://docs.scipy.org/doc/numpy/reference/
[10] Kaggle - https://www.kaggle.com/c/yelp-restaurant-photo-classification

To-Dos

Training a deep learning model from scratch.

Thanks for stopping by!! :D

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
BagofWords_RandomForest.ipynb		BagofWords_RandomForest.ipynb
DataSet_Creation.ipynb		DataSet_Creation.ipynb
Ensemble_Model.ipynb		Ensemble_Model.ipynb
Final - DEMO.ipynb		Final - DEMO.ipynb
README.md		README.md
Resnet_model.ipynb		Resnet_model.ipynb
VGG.png		VGG.png
VGG_feature.ipynb		VGG_feature.ipynb
_config.yml		_config.yml
alexNet_Feature.ipynb		alexNet_Feature.ipynb
alexNet_Features_SVM.ipynb		alexNet_Features_SVM.ipynb
alexnet.png		alexnet.png
example.png		example.png
fisher-vgg.png		fisher-vgg.png
fisher.ipynb		fisher.ipynb
fisher.png		fisher.png
overview.png		overview.png
py.py		py.py
random_forest.png		random_forest.png
sift_code.py		sift_code.py
train.csv		train.csv
train.csv.tar		train.csv.tar
train_photo_to_biz_ids.csv		train_photo_to_biz_ids.csv
vgg_feat_svm.ipynb		vgg_feat_svm.ipynb
vgg_fisher_multiSVC.ipynb		vgg_fisher_multiSVC.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yelp Photo Classification Task

Method-1 SIFT, Random Forest

Method-2 VGG + Fisher vectors

- VGG

- Fisher Vectors

Method 3 - Res-Net + Multi-Output SVM Classifier

Method 4 - SIFT + Ensemble model

Method 5 - AlexNet + OneVsRestClassifier

References

To-Dos

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Yelp Photo Classification Task

Method-1 SIFT, Random Forest

Method-2 VGG + Fisher vectors

- VGG

- Fisher Vectors

Method 3 - Res-Net + Multi-Output SVM Classifier

Method 4 - SIFT + Ensemble model

Method 5 - AlexNet + OneVsRestClassifier

References

To-Dos

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages