-
Notifications
You must be signed in to change notification settings - Fork 1
CNNs and Object Detection
The Litgo detector service uses a Mask R-CNN model with a ResNet-50-FPN backbone that is finetuned on either the TACO or UAVVaste training data. What does this all mean? Let's break it down.
Machine Learning
AI is a field of study around the idea that we can build machines that are intelligent. Formally, an artificially intelligent machine is an agent that acts rationally in its environment. There are many ways to go about accomplishing this (building complex probabilistic models of an environment, efficiently determining optimal outcomes, etc.) but perhaps the most successful method is a branch of AI called Machine Learning.
Machine learning (ML) addresses the fact that many real-world problems simply involve too many variables to have a humanly codable solution. It does so by mimicking Humans' ability to learn from past experience to complete tasks faster or better. More concretely, this means that given examples of a problem (an assignment to all input variables called 'features'), the agent uses its learning algorithm to improve its ability to solve the problem generally.
There are 3 classes of learning algorithms: Reinforcement, Unsupervised, and Supervised. We're interested in the latter. Supervised learning algorithms are ones where training data is labeled, meaning problem examples also come with assigned output features, i.e. the solutions to the problem.
Neural Networks
Neural networks are a type of supervised learning algorithm inspired by the way the human brain works. They can be used for a wide range of tasks, such as image and speech recognition, natural language processing, and predictive modeling. They consist of interconnected nodes or "neurons" arranged in discrete layers/rows that process and transmit information. Each neuron takes in input, performs a calculation on that input, and then outputs a result. The connections between neurons are weighted, and these weights are adjusted through a process called "training" to improve the accuracy of the network's output. Neurons receive a weighted sum of the values from nodes in the previous layer and apply to it what is called an 'activation function', a non-linear function that decides whether the neuron was fired or not.
Convolutional Neural Networks
Convolutional Neural Networks (CNNs) are a type of neural network that is used to solve image and video processing problems. They consist of a series of interconnected layers that learn to identify patterns in an image by passing it through a sequence of convolutional and pooling layers.
The convolutional layers apply filters to the input image, detecting specific features like edges and shapes. The pooling layers reduce the spatial dimensions of the output from the convolutional layer, simplifying the information and allowing for better generalization
CNNs are able to learn and identify complex patterns in images and have been used for applications namely object recognition, image segmentation, and facial recognition.
Mask R-CNN
To understand Mask R-CNN and its predecessors, check out this medium article.