### Chapter 9

This chapter covers

* Important takeaways from this book
* The limitations of deep learning
* The future of deep learning, machine learning,
and AI
* Resources for learning further and working in
the field

**Machine learning** is a specific subfield of AI that aims at automatically developing
programs (called **models**) purely from exposure to training data. This process of turning data into a program is called **learning**. Although machine learning has been
around for a long time, it only started to take off in the 1990s.


**Deep learning** is one of many branches of machine learning, where the models are
long chains of geometric functions, applied one after the other. These operations are
structured into modules called **layers**: 
* Deep-learning models are typically stacks of layers—or, more generally, graphs of layers. These layers are parameterized by **weights**,
which are the parameters learned during training. The knowledge of a model is stored
in its weights, and the process of **learning** consists of finding good values for these
weights.

Even though deep learning is just one among many approaches to machine learning, it isn’t on an equal footing with the others. Deep learning is a breakout success.
Here’s why. 

In the span of only a few years, deep learning has achieved tremendous breakthroughs across a wide range of tasks that have been historically perceived as
extremely difficult for computers, especially in the area of machine perception:
* Extracting useful information from images, videos, sound, and more. Given sufficient
training data (in particular, training data appropriately labeled by humans), it’s possible to extract from perceptual data almost anything that a human could extract.
Hence, it’s sometimes said that deep learning has **solved perception**, although that’s true
only for a fairly narrow definition of **perception**.

###  Key network architectures

The three families of network architectures that we should be familiar with are 
* densely connected networks, 
* convolutional networks, and 
* recurrent networks. 

Each type of network is
meant for a specific input modality: 
* A network architecture (dense, convolutional,
recurrent) encodes **assumptions** about the structure of the data: a **hypothesis space** within
which the search for a good model will proceed. Whether a given architecture will
work on a given problem depends entirely on the match between the structure of the
data and the assumptions of the network architecture.
 
These different network types can easily be combined to achieve larger multimodal networks, much as we combine LEGO bricks. In a way, deep-learning layers are
LEGO bricks for information processing. Here’s a quick overview of the mapping
between input modalities and appropriate network architectures:

* `Vector data`—Densely connected network (Dense layers).
* `Image data`—2D convnets.
* `Sound data` (for example, waveform)—Either 1D convnets (preferred) or RNNs.
* `Text data`—Either 1D convnets (preferred) or RNNs.
* `Timeseries data`—Either RNNs (preferred) or 1D convnets.
* Other types of sequence data—Either RNNs or 1D convnets. Prefer RNNs if data
ordering is strongly meaningful (for example, for timeseries, but not for text).
* `Video data`—Either 3D convnets (if we need to capture motion effects) or a
combination of a frame-level 2D convnet for feature extraction followed by
either an RNN or a 1D convnet to process the resulting sequences.
* `Volumetric data`—3D convnets.

###  The space of possibilities

* Mapping vector data to vector data
    * `Predictive healthcare`—Mapping patient medical records to predictions of
    patient outcomes
    * `Behavioral targeting`—Mapping a set of website attributes with data on how
    long a user will spend on the website
    * `Product quality control`—Mapping a set of attributes relative to an instance of a
    manufactured product with the probability that the product will fail by next
    year


* Mapping image data to vector data
    * `Doctor assistant`—Mapping slides of medical images with a prediction about
    the presence of a tumor
    * `Self-driving vehicle`—Mapping car dash-cam video frames to steering wheel
    angle commands
    * `Board game AI`—Mapping Go and chess boards to the next player move
    * `Diet helper`—Mapping pictures of a dish to its calorie count
    * `Age prediction`—Mapping selfies to the age of the person


* Mapping timeseries data to vector data
    * `Weather prediction`—Mapping timeseries of weather data in a grid of locations
    of weather data the following week at a specific location
    * `Brain-computer interfaces`—Mapping timeseries of magnetoencephalogram
    (MEG) data to computer commands
    * `Behavioral targeting`—Mapping timeseries of user interactions on a website to
    the probability that a user will buy something

* Mapping text to text
    * `Smart reply`—Mapping emails to possible one-line replies
    * `Answering questions`—Mapping general-knowledge questions to answers
    * `Summarization`—Mapping a long article to a short summary of the article

* Mapping images to text
    * `Captioning`—Mapping images to short captions describing the contents of
    the images

* Mapping text to images
    * `Conditioned image generation`—Mapping a short text description to images
    matching the description
    * `Logo generation/selection`—Mapping the name and description of a company
    to the company’s logo

* Mapping images to images
    * `Super-resolution`—Mapping downsized images to higher-resolution versions of
    the same images
    * `Visual depth sensing`—Mapping images of indoor environments to maps of
    depth predictions

* Mapping images and text to text
    * `Visual QA`—Mapping images and natural-language questions about the contents of images to natural-language answers

* Mapping video and text to text
    * `Video QA`—Mapping short videos and natural-language questions about the
    contents of videos to natural-language answers