# Introduction to Machine Learning for Software Engineers

In this notebook we will get acquainted with [machine learning](https://en.wikipedia.org/wiki/Machine_learning) in relation to the practice of [software engineering](https://en.wikipedia.org/wiki/Software_engineer). For the sake of simplicity, all code examples in this notebook will be written in Python. We will be using TensorFlow (2.3) for practical machine learning demonstrations.

This notebook is written in collaboration with [Yassine Yousfi](https://yassineyousfi.github.io/).

## 1. Introduction to [machine learning](https://en.wikipedia.org/wiki/Machine_learning)

> Machine learning is the study of computer [algorithms](https://en.wikipedia.org/wiki/Algorithm) that improve automatically through experience. It is seen as a part of [artificial intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence). Machine learning algorithms build a model based on sample data, known as '[training data](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets#training_set)', in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as [email filtering](https://en.wikipedia.org/wiki/Email_filtering) and [computer vision](https://en.wikipedia.org/wiki/Computer_vision), where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

What is the goal of machine learning? Typically we are looking to train an algorithm so that it produces useful output from a given input. Take traditional programming for instance. The following input: `4`, passed to the following Python algorithm:

```py
def cube(x):
    return x**3
```
Produces the following useful output: `64`.

Likewise, one machine learning goal might be to take an input equal to an image of a horse, pass it to a function equal to a machine learning algorithm, and get an output equal to the same horse, but transformed into a zebra.

Another goal might be to take an input equal to historical electricity consumption, pass it to a function equal to a machine learning algorithm, and get an output equal to the future year's electricity consumption.

Yet another goal might be to take an input equal to a physical structure's attributes (say, data about a given house: size, location, damage, etc.), pass it to a function equal to a machine learning algorithm, and get an output equal to the value of the structure.

Finally, another goal might be to take an input equal to a set of geo-locations representing residences in a city, pass this set to a function which is equal to a machine learning algorithm, and get an output equal to a the bounded geo-locations grouped according to proximity.

What do all these goals have in common? 

For one, they can all actually be represented by the same formula: `f(x) = y`. Where `x` and `y` are input and output data respectively. 

So what about the function `f`?

Consider again our traditional programming example, what would represent `f` in that case? Yes, the `cube` algorithm! In the same way, in machine learning, we are searching for the _algorithm `f`_. Of course, in traditional software development, we can simply design an algorithm ourselves. But how does this process work programmatically in machine learning?

In the next section, we will consider two [types of machine learning algorithms](https://en.wikipedia.org/wiki/Machine_learning#Types_of_learning_algorithms) which are designed to accomplish just that; [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning), and [unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning). Afterward, we can look at some practical examples of _supervised learning_.

## 2. Machine learning algorithms

> The types of machine learning algorithms differ in their approach, the type of data they input and output, and the type of task or problem that they are intended to solve.

Their are plenty of algorithm types to discover. Some of the common ones include the above mentioned supervised and unsupervised learning algorithms, as well as [reinforcement learning algorithms](https://en.wikipedia.org/wiki/Reinforcement_learning). As brought out in the linked material, one type of algorithm can differ drastically from another type.

We cannot cover _all_ the different types of algorithms in this notebook, unfortunately. However, you are strongly encouraged to read more about them by visiting the links above. As previously mentioned, we will only cover _two_ types of machine learning algorithms in this notebook, only _one_ of which will be demonstrated in detail.

With that, let's consider supervised and unsupervised learning in the cells below.

### 2.1 Supervised learning

> Supervised learning is the [machine learning](https://en.wikipedia.org/wiki/Machine_learning) task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled [training data](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets#training_set) consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see [inductive bias](https://en.wikipedia.org/wiki/Inductive_bias)).

In supervised learning, we have a set of 'correct answers' (often called `labels`) corresponding to true outputs: `y`. 

Of course, these `labels` do not necessarily correspond to all the possible outputs ever. Rather, they correspond to a decent set of outputs that allow an algorithm to generalize to a problem domain.

Consider our previous formula: `f(x) = y`. If `y` is known during the training process, then we are dealing with a supervised learning problem.

The algorithm, `f`, is learned by training on input-output data pairs. Those inputs (often called `features`), therefore, must correspond to `x`, representing the input values.

So what do we mean by _**training**_ anyway? We can again simplify the idea by considering traditional programming. Say you were tasked with building an algorithm that would calculate the _mean_ of a list:

```py
def calculate_mean(data):
    """
    Function to calculate the mean of the data set.

    Args: 
        data: list. A list of int.

    Returns: 
        mean: float. Mean of data.
    """
    pass
```

How would you go about building such an algorithm? Ideally, if you already knew how to compute the mean of a list, you would simply implement the solution immediately, possibly as so:

```py
def calculate_mean(data):
    """
    Function to calculate the mean of the data set.

    Args: 
        data: list. A list of int.

    Returns: 
        mean: float. Mean of data.
    """
    return sum(data) / len(data)
```

But say you did not know how to compute the mean of a list. Then what would you do? And what if you were in a void, with no access to outside material? You'd be in a tough situation!

But say someone gave you one-thousand input-output pairs, each demonstrating an input represented by a list of integers, and an output represented by the true mean of that list. Now what would you do? Indeed, trial and error. Possibly, you could start with a simple algorithm, your best guess, and test the product of your algorithm against your set of true means. When your outputs are far off from the truth, you make large changes to your algorithm, and when your outputs are close to the truth, you make smaller changes. With each iteration drawing you closer and closer to the true solution.

In the same way, machine learning algorithms are learned by iterating over many input-output pairs with the goal of training the algorithm to output values equal to the true outputs.

Does this process different in unsupervised learning?

### 2.2 Unsupervised learning

> Unsupervised learning (UL) is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, the machine is forced to build a compact internal representation of its world. In contrast to Supervised Learning (SL) where data is tagged by a human, eg. as "car" or "fish" etc, UL exhibits self-organization that captures patterns as neuronal predilections or probability densities.

We can observe that, while sharing certain attributes, supervised and unsupervised learning are not equal. For one, unsupervised learning has _no true outputs_.

When we consider our formula: `f(x) = y`. It is evident that unsupervised learning problems dictate that, while `y` exists, it cannot be derived from input-output pairs. In other words, while we have `features`, we _do not_ have `labels`.

Therefore, the way in which the algorithm `f` is learned is no longer by training on `x` and `y` pairs as in supervised learning. Rather, we are now attempting to learn an optimal _pattern_ directly from the input `x` itself.

This is akin to our geo-location problem above, in which no _true_ boundaries ever actually exist. Rather, we can only infer that an _optimal pattern may exist_ and we must rely on the training process to discover such a pattern.

You'll find plenty of practical applications of both supervised and unsupervised learning. However, it seems that, of the two, supervised learning relates more so to the popular applications of machine learning in the workplace than unsupervised learning, at least for now. Further, other types of machine learning algorithms, such as reinforcement learning, are too complex to tackle without first having a solid foundation in supervised learning.

Therefore, as this notebook serves only as an introduction to machine learning, we will concentrate primarily on _supervised machine learning_ since it is the most relevant of the two, and possibly the most relevant in general. You are encouraged to read more about unsupervised learning in your own time.

### 3. Machine learning models

In short, the goal of machine learning is to _fit an algorithm_ (that is, the process of causing an algorithm to be learned) to data. Once the algorithm is fit, we can [derive a _statistical model_ from the rules, numbers, and any other algorithm-specific data structures required to make _predictions_](https://machinelearningmastery.com/difference-between-algorithm-and-model-in-machine-learning/).

With that model, we can then generate new outputs (_predictions_) for seen and unseen input.

As we are already familiar with types of algorithms, lets take this concept one step further and consider [_types of models_](https://en.wikipedia.org/wiki/Machine_learning#Models).

### 3.1 Types of machine learning models

> Performing machine learning involves creating a [model](https://en.wikipedia.org/wiki/Statistical_model), which is trained on some training data and then can process additional data to make predictions. Various types of models have been used and researched for machine learning systems.

As with [machine learning algorithms](#2.-Types-of-machine-learning-algorithms), machine learning _models_ also vary in  their approach, the type of data they input and output, and the type of task or problem that they are intended to solve. Even within the context of the _same type of algorithm_, models can really differ from one to another.

Models include [decision trees](https://en.wikipedia.org/wiki/Decision_tree_learning), [support vector machines](https://en.wikipedia.org/wiki/Support-vector_machine), [regression models](https://en.wikipedia.org/wiki/Regression_analysis), and more.

In this notebook we will focus primarily on just one type of model, possibly the most popular; the [_artificial neural network_](https://en.wikipedia.org/wiki/Artificial_neural_network).

#### 3.1.1 Artificial neural networks

> Artificial neural networks (ANNs), usually simply called neural networks (NNs), are computing systems vaguely inspired by the [biological neural networks](https://en.wikipedia.org/wiki/Neural_circuit) that constitute animal brains.

> An ANN is based on a collection of connected units or nodes called [artificial neurons](https://en.wikipedia.org/wiki/Artificial_neuron), which loosely model the [neurons](https://en.wikipedia.org/wiki/Neuron) in a biological brain. Each connection, like the [synapses](https://en.wikipedia.org/wiki/Synapse) in a biological brain, can transmit a signal to other neurons. An artificial neuron that receives a signal then processes it and can signal neurons connected to it. The "signal" at a connection is a [real number](https://en.wikipedia.org/wiki/Real_number), and the output of each neuron is computed by some non-linear function of the sum of its inputs. The connections are called edges. Neurons and edges typically have a [weight](https://en.wikipedia.org/wiki/Weighting) that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.

If you are not already familiar with the _concept of artificial neural networks_, [**see this tutorial**](http://jalammar.github.io/visual-interactive-guide-basics-neural-networks/) for a comprehensive, visual introduction.

Now that we have learned the basic ideas behind machine learning, we can move onto the topic of practical machine learning with Python.

## 4. Machine learning with Python

[Python is currently the most popular programming language for machine learning engineers](https://www.eduplusnow.com/blog/the-top-10-machine-learning-languages-to-know-in-2019). Hosting libraries such as [NumPy](https://numpy.org/), [SciPy](https://www.scipy.org/), [Matplotlib](https://matplotlib.org/), [Pandas](https://pandas.pydata.org/), [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), and more; it is hard to deny that Python is one of the most mature languages in the machine learning space at this time.

You are likely already familiar with at least some of these tools.

Some of them drastically simplify the process of designing and fitting [machine learning models and algorithms](https://en.wikipedia.org/wiki/Machine_learning#Models). Other libraries, such as NumPy, add invaluable functionality to Python by means of modules, classes, and functions that make it easier to work with data.

In this notebook, we will be using some of these very libraries to start building machine learning models today. 

To begin, let's get acquainted with machine learning in Python using [TensorFlow](https://www.tensorflow.org/).

### 4.1 TensorFlow

> TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

This section is designed to introduce you to basic machine learning with TensorFlow. It is adapted from [these tutorials](https://www.tensorflow.org/tutorials/https://www.tensorflow.org/tutorials/https://www.tensorflow.org/tutorials/). If you would like to work more with TensorFlow, you are strongly encouraged to go through more of their tutorials in your own time.

Since we will now consider practical code examples, you should ensure that you have properly [installed TensorFlow](https://www.tensorflow.org/install), [NumPy](https://numpy.org/install/), and [Matplotlib](https://matplotlib.org/users/installing.html#installing-an-official-release) in your workspace.

You may also optionally install [`pydot`](https://pypi.org/project/pydot/), [`pydotplus`](https://pypi.org/project/pydotplus/), and [Graphviz](https://graphviz.gitlab.io/download/). Some cells will not work correctly if you do not install these tools, however, you will still be able to complete this notebook without them.

Once you've installed the required libraries you should be able to run the following cell without issue:

In [None]:
import os
import random
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

print(tf.__version__)

Now that we have our environment prepared, we can start our journey by exploring the namesake of TensorFlow; the `tensor` object.

#### 4.1.1 [Tensors](https://en.wikipedia.org/wiki/Tensor)

In TensorFlow, data isn’t stored as integers, floats, or strings. Rather, these values are encapsulated in a `tensor` object.

Consider the tensors below.

In [None]:
i = tf.constant(1)
print(i)

x = tf.constant([1, 2, 3])
print(x)

y = x**2
print(y)

x = x + y
print(x)

Above, we observe that tensors behave much like we would expect them to. For example, we can perform vector operations on vector like tensors as we would with NumPy arrays. Further we observe that tensors have underlying shapes and data types (`dtype`).

In Python, it is normal to cast one type to another, for instance we could cast an `int` to a `float` as so: `float(10)`. What about a TensorFlow `tensor`?

Consider the cell below.

In [None]:
x = tf.cast(x, dtype=tf.float32)
print(x)

While syntactically different from the type casting we might be familiar with, we can observe above that it is arbitrary to cast a `tensor` of one `dtype` to a tensor of another `dtype`.

Say though, we don't want to just derive a `tensor` of a different `dtype`, say we want to get a different type of object all together, like a list or a `numpy` `int32`. This can also be accomplished arbitrarily.

Consider the cell below.

In [None]:
print(x.numpy().tolist())
print(i.numpy())

Additionally, like NumPy arrays, tensors can also be generated and reshaped as demonstrated below.

In [None]:
j = tf.ones((10,10))
k = tf.zeros((10,10))
j, k

In [None]:
print(tf.reshape(j ,shape=(1,-1)))
print(k[0::2,0::2])

As you've observed, tensors can be zero or more dimensions, and behave as expected according to their `dtype` and `shape`.

One idea to consider is, if a `tensor` can already be described by other data types, (such as a NumPy `array`) _why do we need tensors in the first place?_. [This resource may help](https://www.tensorflow.org/guide/tensor). In summary, it is for convenience with TensorFlow.

[`tf.constant`](https://www.tensorflow.org/api_docs/python/tf/constant), [`tf.cast`](https://www.tensorflow.org/api_docs/python/tf/cast), [`tf.ones`](https://www.tensorflow.org/api_docs/python/tf/ones), [`tf.zeros`](https://www.tensorflow.org/api_docs/python/tf/zeros), and [`tf.reshape`](https://www.tensorflow.org/api_docs/python/tf/reshape) are but a few of the many functions offered by TensorFlow.

In the next sections you will see _quite a few more_. If one is not explained in enough detail, you can do your own research in the [TensorFlow documentation here](https://www.tensorflow.org/api_docs/python/tf) as you follow along in this notebook.

Excellent, now that we are familiar with the `tensor` type and a few TensorFlow functions, we can dive into beginner level machine learning with TensorFlow; creating and training [Neural networks](https://en.wikipedia.org/wiki/Artificial_neural_network).

#### 4.1.2 Neural networks with TensorFlow

With a simple dataset, TensorFlow makes neural network modeling fairly straight forward. We can observe some simple steps in the cells below.

We will begin by importing, [loading](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/mnist/load_data) and normalizing the [MNIST dataset](https://www.tensorflow.org/datasets/catalog/mnist), MNIST is available through `tf.keras.datasets`, and we can normalize by simply dividing the features (corresponding to pixel values) by 255. [We want all the pixels to be in a range of zero and one](https://en.wikipedia.org/wiki/Feature_scaling), unscaled features make fitting an algorithm more difficult.

If you are not familiar with the MNIST dataset, please see the relevant link above.

This is a [classification problem](https://en.wikipedia.org/wiki/Statistical_classification), which we are tackling using a supervised learning algorithm which will produce a artificial neural network model. Our goal in this section is to train our artificial neural network, so that the derived model is able to correctly classify MNIST images.

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

This MNIST dataset features 70,000 examples of hand written digits. Each 28 pixels by 28 pixels. The digits have been size-normalized and centered. MNIST is generally a good starting point for an introduction to machine learning.

We have effectively split this data into two sets, a 'train set' consisting of 60,000 images, and a 'test set' consisting of 10,000 images.

Below, we should find the shape of our test set, which will be 10,000 (images), by 28 (height in pixels), by 28 (width in pixels).

In [None]:
x_test.shape

If that was successful, we can then view the data of the first `test` image, `x_test[0]`, and observe that the pixel values are indeed between zero and one.

In [None]:
print(x_test[0], x_test[0].shape)

We can also print out a reconstructed visualization of the first image using [`plt.imshow`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.imshow.html) along with it's associated label.

In [None]:
plt.imshow(x_test[0], cmap="gray")
print(y_test[0])

Above, we have proved that our features, which correspond to the pixel values of our MNIST images, are indeed scaled to values between zero and one, and that when converted to an image, portray the reality of what our data is representing; hand written digits.

Further we see that our labels (`y`, remember `f(x) = y`) correspond to the expected output of our input data.

We see too that we have a few sets of data now, derived from MNIST; `x_train`, `y_train`, `x_test`, and `y_test`. These represent the input-output (`x`-`y`) pairs as previously described in our section on [supervised learning](#2.1-Supervised-learning)

Now we can move onto constructing TensorFlow models. TensorFlow allows construction of models both by means of a [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) API as well as by means of a [`Functional`](https://www.tensorflow.org/guide/keras/functional) API. We can begin by constructing a model sequentially, and then do the same functionally.

##### 4.1.2.1 Building sequential models with TensorFlow

To define our model, we will use the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) API. The building blocks are available in [`tf.keras.layers`](https://www.tensorflow.org/api_docs/python/tf/keras/layers).

Namely, we will only be using the [`Dense` layer](https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/keras/layers/Dense) preceded by a [`Flatten` step](https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/keras/layers/Flatten).

Our output is `10`, corresponding to the number of possible output classes, corresponding to the hand-written digits, zero through nine.

The steps taken here are quite straight forward.

We flatten our image, resulting in a vector of length `784` (`28 * 28`). Then we create subsequent layers in our neural network, the first containing `256` units, the next containing `128` units, and the final containing our above mentioned `10` output units.

Each unit is of course a neuron as described previously in [[](#3.1.1 Artificial neural networks](#3.1.1-Artificial-neural-networks).

One idea that may be new is the [`activation` function](https://en.wikipedia.org/wiki/Activation_function). Below we tell TensorFlow to use the [rectified linear unit activation function (ReLU)](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) and [softmax activation function](https://en.wikipedia.org/wiki/Softmax_function). An activation function is just a mathematical function that operates on the output of our units. Activation functions are critical in machine learning. You should definitely read more about activation functions if you want to dive deeper into machine learning. For this lesson though, it is enough to just have a general understanding of what they do.

In [None]:
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28, 28)),
                                    tf.keras.layers.Dense(256, activation='relu'),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(10, activation='softmax')])

Now that we have our model, we can pass values into it as so `f(x)` to generate an output `y`.

Of course, this model is not trained at all, so we can expect meaningless output. Our input below is the same test image that we sampled above.

In [None]:
predictions = model(x_test[:1])
print(predictions)
print(predictions.numpy().argmax())

Finally, we can print a summary of our model using [tf.keras.utils.plot_model](https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/keras/utils/plot_model) and [model.summary](https://www.tensorflow.org/api_docs/python/tf/keras/Model#summary).

In [None]:
tf.keras.utils.plot_model(model, show_shapes=True)

In [None]:
model.summary()

##### 4.1.2.2 Training sequential models with TensorFlow

Obviously, we're going to want to _train_ our model. Otherwise it will not produce _useful output_. Before our model is ready for training, however, we need to set a few [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)). These are added during the model's [`compile`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile) step:

We are going to add a:

- [Loss function](https://en.wikipedia.org/wiki/Loss_function): [SparseCategoricalCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy).
- [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers): [Adam](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam).
- [Metric](https://www.tensorflow.org/api_docs/python/tf/keras/metrics): [Accuracy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Accuracy).

We can observe how this in the cell below. Also note, that in our new model, we are no longer using the `softmax` activation on the output since this is built into cross entropy loss functions.

In [None]:
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28, 28)),
                                    tf.keras.layers.Dense(256, activation='relu'),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(10)])

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer='adam', # OR tf.keras.optimizers.Adam()
              loss=loss_fn,
              metrics=['accuracy'])

Finally, we will now [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) our model in the cell below.

In [None]:
# Training the model
history = model.fit(x_train, y_train, validation_split=0.1, epochs=5)

How has our model changed over time?

Run the following cells to observe the model metrics.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10,4))

axes[0].plot(history.epoch, history.history['loss'], label='train')
axes[0].plot(history.epoch, history.history['val_loss'], label='val')
axes[0].set_ylabel('Loss')
axes[0].legend()
axes[1].plot(history.epoch, history.history['accuracy'], label='train')
axes[1].plot(history.epoch, history.history['val_accuracy'], label='val')
axes[1].set_ylabel('Accuracy')
axes[1].legend()

plt.show()

In [None]:
eval_metrics = model.evaluate(x_test,  y_test, verbose=2)

As we can see, we have successfully trained our model. It now produces useful results!

Below, you can see how a model can be trained using the functional approach.

The steps are mostly the same, so feel free to reference the **above** explanations when reading the **below** code.

##### 4.1.2.3 Training sequential models with TensorFlow

In [None]:
inputs = tf.keras.layers.Input(shape=(28,28)) # This is a symbolic input, a "placeholder"
x = tf.keras.layers.Flatten()(inputs)
x = tf.keras.layers.Dense(256, activation='relu')(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
outputs = tf.keras.layers.Dense(10)(x)

model = tf.keras.models.Model(inputs=inputs, outputs=outputs)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

history = model.fit(x_train, y_train, validation_split=0.1, epochs=5)

In [None]:
model.summary()

In the below cell, we demonstrate a few things we can do with the [Functional API](https://www.tensorflow.org/guide/keras/functional).

In [None]:
inputs = tf.keras.layers.Input(shape=(28,28))
x = tf.keras.layers.Flatten()(inputs)
x = tf.keras.layers.Dense(256, activation='relu')(x)
x = tf.keras.layers.Dense(128, activation='relu')(x) + tf.keras.layers.Dense(128, activation='swish')(x) # Notice here
outputs = tf.keras.layers.Dense(10)(x)

model =  tf.keras.models.Model(inputs=inputs, outputs=outputs)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

history = model.fit(x_train, y_train, validation_split=0.1, epochs=5)

In [None]:
model.summary()

In [None]:
tf.keras.utils.plot_model(model, show_shapes=True)

#### 4.1.3 Loading data with TensorFlow

In this section we look now at loading our own data into TensorFlow.

We will start by using [`tf.keras.preprocessing.image_dataset_from_directory`](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory). Then we will explore the data.

In [None]:
import pathlib
import PIL

dataset_url ="https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file(origin=dataset_url, 
                                   fname='flower_photos', 
                                   untar=True,
                                   cache_dir='/content/')

data_dir = pathlib.Path(data_dir)

#pathlib.Path('/content/datasets/flower_photos.tar.gz').unlink() # remove the tgz file

In [None]:
# pathlib is a great way to deal with paths in python
data_dir/'folder'/'folder'/'file.jpg'

In [None]:
data_dir

In [None]:
image_count = len(list(data_dir.glob('*/*.jpg')))
print(image_count)

In [None]:
roses = list(data_dir.glob('roses/*.jpg'))
PIL.Image.open(str(roses[0]))

Next let's try to produce datasets from data, like we did in MNIST. First we will again set some basic hyperparameters. Afterward, we will use the above mentioned `image_dataset_from_directory` to build two datasets. [One for training, and one for validation](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).

In [None]:
batch_size = 32
img_height = 28
img_width = 28

In [None]:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)

Now that we have our datasets, we can move onto the training process! So what are we predicting?

Below, we observe that just as in MNIST, we have a number of classes which we want to predict in our custom dataset.

Therefore, our goal is very much the same. Once you observe the classes. Go ahead and start training.

In [None]:
class_names = train_ds.class_names
print(class_names)

In [None]:
def get_model():
    inputs = tf.keras.layers.Input(shape=(28,28,3))
    
    x = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputs)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(256, activation='relu')(x)
    x = tf.keras.layers.Dense(128, activation='relu')(x)
    
    outputs = tf.keras.layers.Dense(5)(x)
    model =  tf.keras.models.Model(inputs=inputs, outputs=outputs)
    
    return model

In [None]:
model = get_model()

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

history = model.fit(x=train_ds, validation_data=val_ds, epochs=5)

In the above cells, we effectively used `tf.keras.preprocessing.image_dataset_from_directory` together with pathlib to quickly create a dataset from a directory structure. 

Note that this simplicity restricts many aspects of the data loading:
- How to read the files
- How to resize them
- They must be in a subdirectory structure
- Sampling is uniform 
- ...

However, for a beginner, this is more than enough.

With that, we can conclude this lesson on an introduction to machine learning for software engineers. What a journey!

You now have a strong foundation on which to continue growing as an engineer in the machine learning domain!

Of course, you still have much to learn, we have really only scratched the surface. I strongly encourage you to keep experimenting with the concepts you learned about in this notebook. [More learning resources can be found here](https://www.tensorflow.org/learn).