# Welcome to the Computer Vision subteam, we're excited to have you!

As one of the most rapidly growing fields of engineering & computer science, there's plenty to explore and learn. 

First and foremost however, what is computer vision? 

Wikipedia says, "Computer vision is an interdisciplinary field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do." 

Personally, I'd define it as the process of designing *quantitative* algorithms for computers to match our *qualitative* understanding of the world. For example, I know that the image below is a coffee cup sitting on a table, and I know how far I'd need to move my hand to grab the cup. But how do we design an algorithm so that a computer can understand the same thing?


<img src="images/coffee_cup.png" alt="image description" width="300">

For a quick video overview of the field, check out this video [from Nvidia.](https://www.youtube.com/watch?v=OnTgbN3uXvw)

# Goal of this Notebook

Through this notebook, you'll get an introduction to some of the main software libraries we use, as well as some key concepts.

The first section of this notebook features a range of functions commonly used within image processing, with the second section of the notebook providing a section to play around with these functions.

As a first step, we'll import our neccessary libraries:

In [40]:
import cv2
import numpy as np
import torch
import torchvision.transforms as transforms
import torchvision.models as models
from PIL import Image

* CV2: Open-CV, a comprehensive computer vision library that's the core of many python computer vision pipelines.
* Numpy: An array library that provides essential matrix operations and data structures. Typically, images will converted to numpy arrays while being processed.
* Torch: PyTorch, a machine learning library that provides functions for creating, training, loading, and deploying various ML models.
* PIL: Python Image Library, provides some useful functions for displaying images as we work with them

First things first, let's look at some of the functionality available within OpenCV. Below are several functions that highlight some common tasks you might need OpenCV for. If you're not sure what the function is doing, its good to get into the habit of looking up its documentation to double check.

For many of the functions below, certain options have been pre-selected. Try to look up the different options that these functions have, and when you'd use a different one. Later on in the notebook, feel free to modify these functions to get some hands-on practice!

In [41]:
def load_image(image_path):
    """
    Load an image from a specified path.
    """
    img = cv2.imread(image_path) # OpenCV function for reading an image as a numpy array
    if img is None: 
        print(f"Error: Image at {image_path} not found")
    return img

def display_image(window_name, image):
    """
    Display a specified image.
    """
    cv2.imshow(window_name, image) # Display image
    cv2.waitKey(0) # Image will be displayed until the escape key is pressed
    cv2.destroyAllWindows() # Stop displaying the image

def convert_to_grayscale(image):
    """
    Converts image to grayscale.
    """
    gray_img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # OpenCV has many functions for converting between different colorspaces.
    # In this case, COLOR_BGR2GRAY is a conversion from a BGR image to a grayscale image.
    # In practice, we'll just call the openCV function in code as opposed to creating a dedicated function like this.
    return gray_img

def binary_segmentation(image, threshold_value=127):
    """
    Apply a binary segmentation to an image based on an input threshold.
    """
    max_pixel_val = 255

    _, thresh_img = cv2.threshold(image, threshold_value, max_pixel_val, cv2.THRESH_BINARY)
    # cv2.threshold() supports other thresholding algorithms than just binary. If you're curious, look up
    # the functions documentation and try out the other algorithms. When might we use the other types?
    return thresh_img

def edge_detection(image):
    """
    Detect edges in an image w/ Canny edge detection.
    """

    edges = cv2.Canny(image, 100, 200)
    # Question: What do the values 100 and 200 represent? Just to get some practice with documentation, check out opencv's documentation
    # for this function to find out!
    return edges


# For the next two functions, look at their documentation as well. Why might it be useful to extract the contours of an image?

def find_contours(image):
    """
    Find the contours in the image.
    """
    contours, _ = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    
    return contours

def draw_contours(image, contours):
    """
    Draw the detected contours on the image.
    """

    contour_img = image.copy()
    cv2.drawContours(contour_img, contours, -1, (0, 255, 0), 2)

    return contour_img

# PyTorch Functions

Below are some functions for common tasks we'd use PyTorch for, such as loading a pre-trained model and getting outputs from it. We won't cover how to actually create and train a model in this notebook since its a bit out of scope, but feel free to ask questions! Later on you will get experience with the creation process of models.

In [42]:
def load_pretrained_model(model_name):
    """
    Load a pre-trained PyTorch model from torchvision.
    Try out these three: 'resnet18', 'vgg16', 'densenet121'

    If you wish, you can look up each of these models online and see what they're all about.
    WARNING: may contain research papers
    """
    
    if model_name == "resnet18":
        model = models.resnet18(pretrained=True)

    elif model_name == "vgg16":
        model = models.vgg16(pretrained=True)

    elif model_name == "densenet121":
        model = models.densenet121(pretrained=True)

    else:
        raise ValueError(f"Model {model_name} not supported.")
    
    model.eval()  # Set model to evaluation mode instead of train mode

    return model



def preprocess_image(image_path):
    """
    When working with neural nets and other ML models, they'll typically have a certain
    format for data they can accept. You don't need to get too aquainted with the exact values
    we are choosing during this step for now, just know that preprocessing is important.
    """

    input_img = Image.open(image_path).convert("RGB") # Use PIL to read an image in the RGB colorspace

    # Next we'll package together all of the preprocessing steps we want to execute using the Compose function
    preprocess = transforms.Compose([
                transforms.Resize(256), #Resize smallest image dimension to 256 and scale other accordingly
                transforms.CenterCrop(224), # Crop the image into a 224x224 square
                transforms.ToTensor(), # Convert image to a Tensor. Think of this as an array that models work with
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # For each channel of the input image, normalize values
    ])

    input_tensor = preprocess(input_img) # Apply our preprocessing to the input image
    input_batch = input_tensor.unsqueeze(0) # Convert the 3x224x224 image into a single vector

    return input_batch



def load_imagenet_labels():
    """
    For this notebook, we'll be using models trained on imagenet, a dataset that features 1000 different classes.

    To get a class name for our model outputs as opposed to just the number assigned to the class, we'll
    need to load the imagement labels from online.

    If you copy the url below into a web browser, you can preview what exactly imagenet can classify.
    """

    # No need to worry about the exact syntax here, but labels to be getting loaded
    labels_url = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
    import urllib
    import json
    with urllib.request.urlopen(labels_url) as url:
        labels = json.loads(url.read().decode())
    return labels



def classify_image(model, input_batch, labels, top_pred=5):
    """
    Perform classification on an input image using a pre-trained model and print the top predictions.
    """

    with torch.no_grad(): # Improves classification speed by skipping model gradient computation
        output = model(input_batch) # Get classification from our model
    
    probs = torch.nn.functional.softmax(output[0], dim=0) # Convert model output into probabilities

    top_probs, top_indices = torch.topk(probs, top_pred)

    for i in range(top_pred): # Print out our top predictions
        label = labels[top_indices[i]]
        prob = top_probs[i].item()
        print(f"{label}: {prob * 100:.2f}%")

# Experimentation Time!!

Now that we've covered a few functions above, let's spend some time trying to process an image of our own. This part of the notebook is pretty open-ended, so feel free to download your own images to try out some of the processing above.

To get things stared, I'd recommend trying to find an image with an object distinct from its background. Try to segment this object, find only the contours of the segmented portion, and then overlaying these contours onto the original image!

Once you've done that, try out classifying the image, and then overlaying the predicted class label ontop of the image with the contours. To draw text over an image, check out the [cv2.putText() function.](https://www.geeksforgeeks.org/python-opencv-cv2-puttext-method/)

In [43]:
# Computer vision time woop woop!

# Try out with the provided coffee_cup image, or find your own!