The primary goal of this project is to determine whether images containing both objects and scenes belong to the object or scene category. To address this question, I employed various neural networks, such as ResNet and AlexNet, trained on object classification (ImageNet) or scene classification (Places365) tasks. I then fine-tuned these networks using a dataset of object and scene images to adapt their feature extraction capabilities.
For evaluation, I used another testing dataset containing images of objects, scenes, and both conditions. I extracted features from each layer of the neural networks and assessed which model (control, scene, or object) could best explain the variance in the feature space across the different layers of the networks using representational similarity analysis (RSA). This involved extracting representational dissimilarity matrices (RDMs) for each network.
Overall, the results were not significant, indicating that the models were unable to explain the variance in feature space effectively. This suggests that a better set of images or a different approach may be needed to improve the prediction accuracy for the 'both' condition.
The project includes two main components: the 'Python_extractFeatures' folder for feature extraction from the neural networks, and the 'MATLAB_extractRDMs' folder for extracting RDMs for RSA, with the results available in the '/figures' folder.
- This project was part of my research internship in Object Vision Group at CIMeC. I would like to thank and acknowlege Dr. Stefania Bracci's supervison during my training.