# Image Recognition - Logistic Regression
---

In this two-part project, we first explore image recognition using Logistic Regression. The subsequent segment contrasts this with Torchvision, offering a comparative insight into both methodologies.

---

![title](header_image.jpg)

Data source for this project - https://www.cs.toronto.edu/~kriz/cifar.html

*Recommended to download the file and save in the same directory of the jupyter notebook*

---

## Packages and Instalations

In [8]:
# Imports
from platform import python_version
import math
import pickle
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [15]:
# python and package version
# install watermark package if do not have -> !pip install -q -U watermark

%reload_ext watermark
print('Python:', python_version())
%watermark --iversions

Python: 3.9.13
matplotlib: 3.5.2
numpy     : 1.21.5



## Class for Load, Process and Handle image data 

In [41]:
class ImageDataHandler:
    """
    ImageDataHandler Class:

    This class provides functionalities for handling and processing image data 
    from the CIFAR-10 dataset.

    Attributes:
    - base_path: Specifies the directory where the CIFAR-10 batch files are located.

    Methods:
    1. load_data(filename):
       - Loads raw image data and their labels from the specified filename.
       - Performs basic preprocessing like reshaping and normalization.
       - Returns the processed images and their respective labels.

    2. format_data(X, Y, v0, v1):
       - Filters and formats the image data based on the provided class labels (v0 and v1).
       - Returns the filtered and formatted image data and their respective labels.

    3. prepare_data(batch_number, start_val, end_val):
       - Combines the functionalities of load_data and format_data.
       - Loads the specified data batch, then filters and formats it.
       - Returns the prepared image data and labels for training or testing purposes.

    Usage:
    To preprocess the CIFAR-10 image data, instantiate the class and use the 
    prepare_data method with the desired batch number and label values.

    """ 
    
    
    def __init__(self, base_path="cifar-10-batches-py/"):
        self.base_path = base_path
    
    def load_data(self, filename):
        # full path of the file
        full_path = f"{self.base_path}{filename}"
        
        # Attempt to load the data; handle potential errors
        try:
            with open(full_path, 'rb') as file:
                data = pickle.load(file, encoding='bytes') # This is a byte string
        except Exception as e:
            print(f"Error loading data: {e}")
            return None, None

        # Load raw image data
        # This is a byte string --> byte_string = b'hello' | text_string = 'hello'
        raw_images = data[b'data']
        
        # Convert labels to a numpy array
        y = np.array(data[b'labels'])

        # Normalize and convert raw images to floating point
        raw_float = np.array(raw_images, dtype=float) / 255.0
        
        # Reshape and reorder axes to get proper image format
        images = raw_float.reshape([-1, 3, 32, 32]).transpose([0, 2, 3, 1])
        
        # Flatten the images for potential use in algorithms that expect flat vectors
        X = images.reshape((images.shape[0], 3*32*32))
        
        return X, y

    def format_data(self, X, Y, v0, v1):
        # Determine the maximum between v0 and v1
        lg = max(v0, v1)
        
        # Use numpy to efficiently find indices of labels that match v0 or v1
        indices = np.where((Y == v0) | (Y == v1))
        
        # Extract and format the data based on found indices
        X = np.squeeze(np.take(X, indices, axis=0))
        Y = np.squeeze(np.floor(np.take(Y, indices, axis=0) / lg))
        
        return X, Y

    def prepare_data(self, batch_number, start_val, end_val):
        # Load the data using the specified batch number
        x_train, y_train = self.load_data(f"data_batch_{batch_number}")
        
        # If data loading failed, return None values
        if x_train is None or y_train is None:
            return None, None
        
        # Format the loaded data based on the specified start and end values
        x_train, y_train = self.format_data(x_train, y_train, start_val, end_val)
        
        return x_train, y_train


## Instantiating an Image Handler and creating train data

In [42]:
handler = ImageDataHandler()
bloco_start = 0
bloco_end = 3
TRAIN_BATCH = 1
x_train, y_train = handler.prepare_data(TRAIN_BATCH, bloco_start, bloco_end)