After a long day of work, when I go grocery shopping, my eyes feel bombarded with information. Sales, coupons, colors, toddlers, flashing lights, crowded aisles are just a few examples of all the signals my eyes pick up, whether or not I actively try to pay attention. The visual system absorbs an abundance of information.
Ever heard the phrase “a picture is worth a thousand words?" At least that might be true for you or me, but can a machine find meaning within images as well? The photoreceptor cells in our retinas pick up wavelengths of light, but that information doesn’t seem to propagate up to our consciousness. Similarly, a camera picks up pixels, yet we want to squeeze out some form of higher-level knowledg instead.
To achieve some intelligent meaning from raw sensory input with machine learning, we’ll design a neural network model. In the previous chapters, we’ve seen a few types of neural networks models such as fully-connected ones or autoencoders. New to this chapter, there’s another type of model called a convolutional neural network (CNN), which performs exceptionally well on images and other sensory data such as audio.
- Concept 1: Using CIFAR-10 dataset
- Concept 2: Convolutions
- Concept 3: Convolutional neural network