### Definition

<figure>
<figcaption><h4>Artificial intelligence,
machine learning, and deep learning</h4></figcaption>
<img src = "img/01_01.png">
</figure>

#### Artificial Intelligence

A concise definition of AI would be as follows: *the effort to automate intellectual tasks normally performed by humans.* For a fairly long time, many
experts believed that human-level artificial intelligence could be achieved by having
programmers handcraft a sufficiently large set of explicit rules for manipulating
knowledge. This approach is known as *symbolic AI*, and it was the dominant paradigm
in AI from the 1950s to the late 1980s. It reached its peak popularity during the expert
systems boom of the 1980s.<br>
Although *symbolic AI* proved suitable to solve well-defined, logical problems, such as
playing chess, it turned out to be intractable to figure out explicit rules for solving more
complex, fuzzy problems, such as image classification, speech recognition, and language translation. A new approach arose to take *symbolic AI*’s place: machine learning.

#### Machine Learning

Machine learning arises from this question: could a computer go beyond “what we
know how to order it to perform” and learn on its own how to perform a specified task? With
machine learning, humans input data as well as the answers expected from the data,
and out come the rules. These rules can then be applied to new data to produce original answers.

<figure>
<figcaption><h4>Machine learning:
a new programming paradigm</h4></figcaption>
<img src = "img/01_02.png">
</figure>

>A machine-learning system is trained rather than explicitly programmed. It’s presented
with many examples relevant to a task, and it finds statistical structure in these examples that eventually allows the system to come up with rules for automating the task.

To do machine learning, we need three things:
* *Input data points*—For instance, if the task is speech recognition, these data
points could be sound files of people speaking. If the task is image tagging,
they could be pictures.
* *Examples of the expected output*—In a speech-recognition task, these could be
human-generated transcripts of sound files. In an image task, expected outputs
could be tags such as “dog,” “cat,” and so on.
* *A way to measure whether the algorithm is doing a good job*—This is necessary in
order to determine the distance between the algorithm’s current output and
its expected output. The measurement is used as a feedback signal to adjust
the way the algorithm works. This adjustment step is what we call learning.

Machine-learning models are all about finding appropriate representations for their input data—transformations of the data that make it more amenable to the task at hand, such as a classification task.

<figure>
<figcaption><h4>Coordinate Change</h4></figcaption>
<img src = "img/01_03.png">
</figure>

The above figure shows how, by changing coordinates, we can classifify the black and white datasets. With this representation, the
black/white classification problem can be expressed as a simple rule: “Black points
are such that $x > 0$,” or “White points are such that $x < 0$.” This new representation
basically solves the classification problem.
<br>In this case, we defined the coordinate change by hand. But if instead we tried systematically searching for different possible coordinate changes, and used as feedback
the percentage of points being correctly classified, then we would be doing machine
learning.

>Machine-learning algorithms aren’t usually creative in finding these transformations; they’re merely searching through a predefined set of
operations, called a hypothesis space.<br>
So that’s what machine learning is, technically: searching for useful representations of some input data, within a predefined space of possibilities, using guidance
from a feedback signal. 

#### Deep Learning

Deep learning is a specific subfield of machine learning: a new take on learning representations from data that puts an emphasis on learning successive layers of increasingly
meaningful representations. The deep in deep learning isn’t a reference to any kind of
deeper understanding achieved by the approach; rather, it stands for this idea of successive layers of representations. How many layers contribute to a model of the data is
called the *depth* of the model. Other appropriate names for the field could have been
*layered representations learning* and *hierarchical representations learning*.

In deep learning, these layered representations are (almost always) learned via
models called *neural networks*, structured in literal layers stacked on top of each other.
The term *neural network* is a reference to neurobiology, but although some of the central concepts in deep learning were developed in part by drawing inspiration from our
understanding of the brain, deep-learning models are **not** models of the brain.

<figure>
<figcaption><h4>Deep representations learned by a digit-classification model</h4></figcaption>
<img src = "img/01_04.png">
</figure>

As you can see in figure, the network transforms the digit image into representations that are increasingly different from the original image and increasingly informative about the final result. You can think of a deep network as a multistage
information-distillation operation, where information goes through successive filters
and comes out increasingly *purified*.

### How Deep Learning Works?

#### 1

The specification of what a layer does to its input data is stored in the layer’s
*weights*, which in essence are a bunch of numbers. In technical terms, we’d say that the
transformation implemented by a layer is *parameterized* by its weights.
(Weights are also sometimes called the *parameters* of a layer.) In this context, learning
means finding a set of values for the weights of all layers in a network, such that the
network will correctly map example inputs to their associated targets.

<figure>
<figcaption><h4>A neural network is
parameterized by its weights</h4></figcaption>
<img src = "img/01_05.png">
</figure>

#### 2

To control the output of
a neural network, you need to be able to measure how far this output is from what you
expected. This is the job of the loss *function* of the network, also called the *objective
function*. The loss function takes the predictions of the network and the true target
(what you wanted the network to output) and computes a distance score, capturing
how well the network has done on this specific example

<figure>
<figcaption><h4>A loss function measures
the quality of the network’s output</h4></figcaption>
<img src = "img/01_06.png">
</figure>

#### 3

The fundamental trick in deep learning is to use this score as a feedback signal to
adjust the value of the weights a little, in a direction that will lower the loss score for
the current example. This adjustment is the job of the optimizer, which
implements what’s called the *Backpropagation algorithm*: the central algorithm in deep
learning. Initially, the weights of the network are assigned random values, so the network
merely implements a series of random transformations. Naturally, its output is far
from what it should ideally be, and the loss score is accordingly very high. But with
every example the network processes, the weights are adjusted a little in the correct
direction, and the loss score decreases.

<figure>
<figcaption><h4>The loss score is used as a
feedback signal to adjust the weights</h4></figcaption>
<img src = "img/01_07.png">
</figure>

### Has Deep Learning Very Short Life?

Twice in the past, AI went through a cycle of intense
optimism followed by disappointment and skepticism, with a dearth of funding as a
result. It started with symbolic AI in the 1960s. In those early days, projections about AI
were flying high. In the 1960s and early 1970s, several experts believed
it to be right around the corner. A few years later, as these
high expectations failed to materialize, researchers and government funds turned
away from the field, marking the start of the first AI winter.

 In the 1980s, a new take on symbolic AI, expert systems,
started gathering steam among large companies. A few initial success stories triggered
a wave of investment, with corporations around the world starting their own in-house
AI departments to develop expert systems. Around 1985, companies were spending
over $1 billion each year on the technology; but by the early 1990s, these systems had
proven expensive to maintain, difficult to scale, and limited in scope, and interest
died down. Thus began the second AI winter.

However, this time, things are different. We have a lot of positive things which suggest that the current wave of AI is going to be a long-term success, like:
* Hardware
* Datasets and benchmarks
* Algorithmic advances

>Deep learning has reached a level of public attention and industry investment never
before seen in the history of AI, but it isn’t the first successful form of machine learning. It’s safe to say that most of the machine-learning algorithms used in the industry
today aren’t deep-learning algorithms. Deep learning isn’t always the right tool for the
job—sometimes there isn’t enough data for deep learning to be applicable, and sometimes the problem is better solved by a different algorithm

>Naive Bayes is a type of machine-learning classifier based on applying Bayes’ theorem while assuming that the features in the input data are all independent (a strong,
or “naive” assumption, which is where the name comes from). This form of data analysis predates computers and was applied by hand decades before its first computer
implementation (most likely dating back to the 1950s). Bayes’ theorem and the foundations of statistics date back to the eighteenth century, and these are all you need to
start using Naive Bayes classifiers.