# 03A-Pizza-Steak-Image-Classification

## Introduction to CNN and Computer Vision with Tensorflow

**Note**: 

* You can use CNNs for image data as well as text data. CNNs exploit local relationships very well.
* An image has patterns which are very well spatially correlated on a local scale (i.e. we don't have per pixel patterns, rather they span in a spatial location with a definite structure)
* So if a sentence has a structure which is influenced more by local nearby relationships (in context of the NLP problem), CNNs will work really well.

**Examples**
* Simple image classification, whether a picture of a food contains pizza or a steak
* Detect whether or not an object appears in an image (i.e. did a specific car pass through the security cameras)

**Steps**
* Getting the dataset. 
    * Where to keep/host large datasets in case > gdrive storage. 
    * Dataset fetching scripts
    * Common dataset sources for different kinds of problems
* Preparing the dataset
    * Making sure it's the right size.
    * Dataset directory structure/Creating a CustomDataset Class
    * Augmentation techniques
    * Noise removal
    * Batch Data Loader, prefetching (Important for large datasets)
* Creating a baseline model 
* Experimenting with large models (on small training subset)
* Fitting and monitoring the training (TensorBoard, Weights & Biases)
* Visualizing the predictions
* Evaluating the model
* Improving the model
* Comparing the models
* Making a prediction with the trained model

## `ClassicImageDatasetDirectory`

* Two folders with train and test images
* Each of train and test contain subfolders with class_names
* Each class_names folder contains images 

In [46]:
import os
import glob
import pandas as pd

In [39]:
data_dir = '../data/pizza_steak/'
subsets = ['train', 'test']
class_names = ['pizza', 'steak']


class ClassicImageDataDirectory:
    
    def __init__(self, data_dir):
        self._verify_train_test_structure(data_dir)
        
        self.data_dir = data_dir
        
        self.train = self.__get_subset_info('train')
        self.test = self.__get_subset_info('test')
        
    
    # TODO
    @staticmethod
    def _verify_train_test_structure(data_dir):
        pass
    
    @staticmethod
    def __list_files(folder):
        return [f for f in os.listdir(folder) if not f.startswith('.')]
    
    def list_data_files(self, subset, class_label):
        folder = os.path.join(self.data_dir, subset, class_label)
        return self.__list_files(folder)
    
    def __get_subset_info(self, subset):
        d = {}
        d['dir'] = os.path.join(self.data_dir, 'train')
        d['class_name'] = self.__list_files(d['dir'])
        d['count'] = [len(self.list_data_files(subset, c)) for c in d['class_name']]
        
        return d
    
    
    
        


In [40]:
imgdir = ClassicImageDataDirectory('../data/pizza_steak')

In [50]:
imgdir.train

{'dir': '../data/pizza_steak\\train',
 'class_name': ['pizza', 'steak'],
 'count': [750, 750]}

In [57]:
traindf = pd.DataFrame(imgdir.train).drop('dir', axis=1)

In [56]:
testdf = pd.DataFrame(imgdir.test).drop('dir', axis=1)

In [58]:
traindf.merge(testdf, on='class_name', how='outer', suffixes=('_train', '_test'))

Unnamed: 0,class_name,count_train,count_test
0,pizza,750,250
1,steak,750,250
