## 1. What is ML?
### Consider the traditional manner of building apps, as represented in the following diagram:

You express rules in a programming language. They act on data and your program provides answers**.** In the case of the activity detection, the rules (the code you wrote to define activity types) acted upon the data (the person's movement speed) to produce an answer: the return value from the function for determining the activity status of the user (whether they were walking, running, biking, or doing something else).

The process for detecting that activity status via ML is very similar, only the axes are different.

Instead of trying to define the rules and express them in a programming language, you provide the answers (typically called labels) along with the data, and the machine infers the rules that determine the relationship between the answers and data. For example, your activity detection scenario might look like this in an ML context:

You gather lots of data and label it to effectively say, "This is what walking looks like," or "This is what running looks like." Then, the computer can infer the rules that determine, from the data, what the distinct patterns that denote a particular activity are.

Beyond being an alternative method to programming that scenario, that approach also gives you the ability to open new scenarios, such as the golfing one that may not have been possible under the rules-based traditional programming approach.

In traditional programming, your code compiles into a binary that is typically called a program. In ML, the item that you create from the data and labels is called a model.

You pass the model some data and the model uses the rules that it inferred from the training to make a prediction, such as, "That data looks like walking," or "That data looks like biking."

## 3. Create your first ML model
### Consider the following sets of numbers. Can you see the relationship between them?

As you look at them, you might notice that the value of X is increasing by 1 as you read left to right and the corresponding value of Y is increasing by 3. You probably think that Y equals 3X plus or minus something. Then, you'd probably look at the 0 on X and see that Y is 1, and you'd come up with the relationship Y=3X+1.

That's almost exactly how you would use code to train a model to spot the patterns in the data!

Now, look at the code to do it.

How would you train a neural network to do the equivalent task? Using data! By feeding it with a set of X's and a set of Y's, it should be able to figure out the relationship between them.

Start with your imports. Here, you're importing TensorFlow and calling it tf for ease of use.

Next, import a library called numpy, which represents your data as lists easily and quickly.

The framework for defining a neural network as a set of sequential layers is called keras, so import that, too.

In [1]:
import tensorflow as tf
import numpy as np
from tensorflow import keras

#### Define and compile the neural network
Next, create the simplest possible neural network. It has one layer, that layer has one neuron, and the input shape to it is only one value.

In [2]:
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Next, write the code to compile your neural network. When you do so, you need to specify two functionsâ€”a loss and an optimizer.

In this example, you know that the relationship between the numbers is Y=3X+1.

When the computer is trying to learn that, it makes a guess, maybe Y=10X+10. The loss function measures the guessed answers against the known correct answers and measures how well or badly it did.

Next, the model uses the optimizer function to make another guess. Based on the loss function's result, it tries to minimize the loss. At this point, maybe it will come up with something like Y=5X+5. While that's still pretty bad, it's closer to the correct result (the loss is lower).

The model repeats that for the number of epochs, which you'll see shortly.

First, here's how to tell it to use mean_squared_error for the loss and stochastic gradient descent (sgd) for the optimizer. You don't need to understand the math for those yet, but you can see that they work!

Over time, you'll learn the different and appropriate loss and optimizer functions for different scenarios.