# Chapter 6 The Universal Workflow of Machine Learning

The workflow of machine learning is broadly structed in three parts

1. Define the task—Understand the problem domain and the business logic underlying what the customer asked for. Collect a dataset, understand what the data
represents, and choose how you will measure success on the task

2. Develop a model—Prepare your data so that it can be processed by a machine
learning model, select a model evaluation protocol and a simple baseline to
beat, train a first model that has generalization power and that can overfit, and
then regularize and tune your model until you achieve the best possible generalization performance

3. Deploy the model—Present your work to stakeholders, ship the model to a web
server, a mobile app, a web page, or an embedded device, monitor the model’s performance in the wild, and start collecting the data you’ll need to build the
next-generation model.

## 6.1 Define the task

### 6.1.1 Frame the problem

Some questions that should be on the top of your mind:

+ What will your input data be? What are you trying to predict? You can only
learn to predict something if you have training data available

+ What type of machine learning task are you facing?

+ What do existing solutions look like?

+ Are there particular constraints you will need to deal with?

Once you’ve done your research, you should know what your inputs will be, what your
targets will be, and what broad type of machine learning task the problem maps to. Be
aware of the hypotheses you’re making at this stage:

1. You hypothesize that your targets can be predicted given your inputs.

2. You hypothesize that the data that’s available (or that you will soon collect) is
sufficiently informative to learn the relationship between inputs and targets

### 6.1.2 Collect the dataset

the number of data points you have,the reliability of your labels, the quality of your features

If you’re doing supervised learning, then once you’ve collected inputs (such as
 images) you’re going to need annotations for them (such as tags for those images)—
 the targets you will train your model to predict.

#### INVESTING IN DATA ANNOTATION INFRASTRUCTURE

Your data annotation process will determine the quality of your targets, which in turn
determine the quality of your model. Carefully consider the options you have available:
1. Should you annotate the data yourself?

2. Should you use a crowdsourcing platform like Mechanical Turk to collect labels?

3. Should you use the services of a specialized data-labeling company?

To pick the best option, consider the constraints you’re working with:
1. Do the data labelers need to be subject matter experts, or could anyone annotate the data? Annotating CT scans of bone fractures pretty much requires a medical degree.


2. If annotating the data requires specialized knowledge, can you train people to
do it? If not, how can you get access to relevant experts?


3.  Do you, yourself, understand the way experts come up with the annotations? If
you don’t, you will have to treat your dataset as a black box, and you won’t be able
to perform manual feature engineering—this isn’t critical, but it can be limiting.

#### BEWARE OF NON-REPRESENTATIVE DATA

It’s critical that the data used for training should be representative of the production data

If possible, collect data directly from the environment where your model will be used

A related phenomenon you should be aware of is concept drift. You’ll encounter
 concept drift in almost all real-world problems, especially those that deal with user generated data.

 Concept drift occurs when the properties of the production data
 change over time, causing model accuracy to gradually decay.

 Keep in mind that machine learning can only be used to memorize patterns that
are present in your training data. You can only recognize what you’ve seen before.


Using machine learning trained on past data to predict the future is making the
assumption that the future will behave like the past. That often isn’t the case

### 6.1.3 Understand your data

1. If your data includes images or natural language text, take a look at a few samples (and their labels) directly.

2. If your data contains numerical features, it’s a good idea to plot the histogram
of feature values to get a feel for the range of values taken and the frequency of
different values.

3. If your data includes location information, plot it on a map. Do any clear patterns emerge?

4. Are some samples missing values for some features? If so, you’ll need to deal
with this when you prepare the data (we’ll cover how to do this in the next
section).

5. If your task is a classification problem, print the number of instances of each
class in your data. Are the classes roughly equally represented? If not, you will
need to account for this imbalance.

6. Check for target leaking: the presence of features in your data that provide information about the targets and which may not be available in production. If
you’re training a model on medical records to predict whether someone will be
treated for cancer in the future, and the records include the feature “this person has been diagnosed with cancer,” then your targets are being artificially
leaked into your data. 

Always ask yourself, is every feature in your data something that will be available in the same form in production?

### 6.1.4 Choose a measures of success

To achieve success on a project, you must first define what you mean by success. Accuracy? Precision and recall?
 Customer retention rate? 
 
Your metric for success will guide all of the technical choices
 you make throughout the project

 For balanced classification problems, where every class is equally likely, accuracy
and the area under a receiver operating characteristic (ROC) curve, abbreviated as ROC
AUC, are common metrics. 

For class-imbalanced problems, ranking problems, or
multilabel classification, you can use precision and recall, as well as a weighted form of
accuracy or ROC AUC.

## 6.2 Develop  a model

### 6.2.1 Prepare the data

#### Vectorization

All inputs and targets in a neural network must typically be tensors of floating-point
data (or, in specific cases, tensors of integers or strings).

#### Value Normalization

Before we fed this
data into our network, we had to cast it to float32 and divide by 255 so we’d end up
with floating-point values in the 0–1 range.

In general, it isn’t safe to feed into a neural network data that takes relatively
large values (for example, multi-digit integers, which are much larger than the initial values taken by the weights of a network) or data that is heterogeneous (for
example, data where one feature is in the range 0–1 and another is in the range
100–200).

+ Take small values—Typically, most values should be in the 0–1 range.
+ Be homogenous—All features should take values in roughly the same range.

+ Normalize each feature independently to have a mean of 0.
+ Normalize each feature independently to have a standard deviation of 1.

#### HANDLING MISSING VALUES

+ If the feature is categorical, it’s safe to create a new category that means “the
value is missing.” The model will automatically learn what this implies with
respect to the targets.

+ If the feature is numerical, avoid inputting an arbitrary value like "0", because
it may create a discontinuity in the latent space formed by your features, making it harder for a model trained on it to generalize. Instead, consider replacing the missing value with the average or median value for the feature in the
dataset. You could also train a model to predict the feature value given the values of other features

Note that if you’re expecting missing categorial features in the test data, but the network
 was trained on data without any missing values, the network won’t have learned to
 ignore missing values!

### 6.2.2 Choose an evaluation protocol

The goal of your validation protocol is to accurately estimate what your success metric of choice (such as accuracy) will be on actual production data

1. Maintaining a holdout validation set—This is the way to go when you have plenty
of data.
2. Doing K-fold cross-validation—This is the right choice when you have too few samples for holdout validation to be reliable.
3. Doing iterated K-fold validation—This is for performing highly accurate model
evaluation when little data is available.

### 6.2.3 Beat a Baseline

At this stage, these are the three most important things you should focus on:

1. Feature engineering—Filter out uninformative features (feature selection) and use
your knowledge of the problem to develop new features that are likely to be useful.

2. Selecting the correct architecture priors—What type of model architecture will you
use? A densely connected network, a convnet, a recurrent neural network, a
Transformer? Is deep learning even a good approach for the task, or should you
use something else?

3. Selecting a good-enough training configuration—What loss function should you use?
What batch size and learning rate?

Choosing loss function correctly :

 Type --- last_layer_activation --- loss function
 
+ Binary classification --- sigmoid --- binary_crossentropy

+ Multiclass-single-label classification --- softmax --- categorical_crossentropy

+ Multiclass, multilabel classification --- sigmoid --- binary_crossentropy

Note that it’s not always possible to achieve statistical power. 

If you can’t beat a simple baseline after trying multiple reasonable architectures, it may be that the answer
to the question you’re asking isn’t present in the input data. 

Remember that you’re making two hypotheses:
+ You hypothesize that your outputs can be predicted given your inputs.
+ You hypothesize that the available data is sufficiently informative to learn the
relationship between inputs and outputs.

### 6.2.4 Scale up : Develop a model that overfits

Once you’ve obtained a model that has statistical power, the question becomes, is your
model sufficiently powerful? Does it have enough layers and parameters to properly
model the problem at hand?

Remember that
the universal tension in machine learning is between optimization and generalization.

The ideal model is one that stands right at the border between underfitting and overfitting, between undercapacity and overcapacity.

To figure out how big a model you’ll need, you must develop a model that overfits.
This is fairly easy, as you learned in chapter 5:
1. Add layers.

2. Make the layers bigger.

3. Train for more epochs

Always monitor the training loss and validation loss, as well as the training and validation values for any metrics you care about. 

When you see that the model’s performance on the validation data begins to degrade, you’ve achieved overfitting.

### 6.2.5 Regularize and tune your model

Once you’ve achieved statistical power and you’re able to overfit, you know you’re on the
right path. At this point, your goal becomes to maximize generalization performance.


 This phase will take the most time: you’ll repeatedly modify your model, train it,
evaluate on your validation data (not the test data at this point), modify it again, and
repeat, until the model is as good as it can get. Here are some things you should try

1. Try different architectures; add or remove layers.

2. Add dropout.

3. If your model is small, add L1 or L2 regularization.

4. Try different hyperparameters (such as the number of units per layer or the
learning rate of the optimizer) to find the optimal configuration.

5. Optionally, iterate on data curation or feature engineering: collect and annotate more data, develop better features, or remove features that don’t seem to be informative.

Be mindful of the following: Every time you use feedback from your validation process to tune your model, you leak information about the validation process into the
 model

Once you’ve developed a satisfactory model configuration, you can train your
 final production model on all the available data (training and validation) and evaluate it one last time on the test set

## 6.3 Deploy the model

### 6.3.1 Explain your work to stakeholders and set expectations

The expectations of non-specialists towards AI systems are often unrealistic

You should clearly convey model performance expectations. 

Avoid using abstract statements
like “The model has 98% accuracy” (which most people mentally round up to 100%), and prefer talking

You should also make sure to discuss with stakeholders the choice of key launch parameters,Such decisions involve trade-offs that can only be handled with a deep understanding of the business context.

### 6.3.2  Ship an inference model

First, you may want to export your model to something other than Python:

+ Your production environment may not support Python at all—for instance, if
it’s a mobile app or an embedded system.

+ If the rest of the app isn’t in Python (it could be in JavaScript, C++, etc.), the use
of Python to serve a model may induce significant overhead.

Second, since your production model will only be used to output predictions (a phase
called inference), rather than for training, you have room to perform various optimizations that can make the model faster and reduce its memory footprint

#### Deploying a model as a REST API

This is perhaps the common way to turn a model into a product: install TensorFlow on
a server or cloud instance, and query the model’s predictions via a REST API


You should use this deployment setup when

1. The application that will consume the model’s prediction will have reliable
access to the internet (obviously). For instance, if your application is a mobile
app, serving predictions from a remote API means that the application won’t be
usable in airplane mode or in a low-connectivity environment.

2. The application does not have strict latency requirements: the request, inference, and answer round trip will typically take around 500 ms.

3. The input data sent for inference is not highly sensitive: the data will need to
be available on the server in a decrypted form, since it will need to be seen by
the model (but note that you should use SSL encryption for the HTTP request
and answer).

An important question when deploying a model as a REST API is whether you
 want to host the code on your own, or whether you want to use a fully managed third party cloud service.

#### DEPLOYING A MODEL ON A DEVICE

 You should use this setup when
+ Your model has strict latency constraints or needs to run in a low-connectivity
environment. If you’re building an immersive augmented reality application,
querying a remote server is not a viable option.

+ Your model can be made sufficiently small that it can run under the memory and
power constraints of the target device. You can use the TensorFlow Model Optimization Toolkit to help with this (www.tensorflow.org/model_optimization).

+ Getting the highest possible accuracy isn’t mission critical for your task. There
is always a trade-off between runtime efficiency and accuracy, so memory and
power constraints often require you to ship a model that isn’t quite as good as
the best model you could run on a large GPU.

+ The input data is strictly sensitive and thus shouldn’t be decryptable on a
remote server.

 To deploy a Keras model on a smartphone or embedded device, your go-to solution
is TensorFlow Lite (www.tensorflow.org/lite).

#### DEPLOYING A MODEL IN THE BROWSER

While it is usually possible to have the application query a remote model via a REST
API, there can be key advantages in having the model run directly in the browser, on
the user’s computer (utilizing GPU resources if they’re available).
 Use this setup when

+ You want to offload compute to the end user, which can dramatically reduce
server costs.

+ The input data needs to stay on the end user’s computer or phone. For
instance, in our spam detection project, the web version and the desktop version of the chat app (implemented as a cross-platform app written in JavaScript) should use a locally run model.

+ Your application has strict latency constraints. While a model running on the end
user’s laptop or smartphone is likely to be slower than one running on a large
GPU on your own server, you don’t have the extra 100 ms of network round trip.

+ You need your app to keep working without connectivity, after the model has
been downloaded and cached.

You should only go with this option if your model is small enough that it won’t hog the
 CPU, GPU, or RAM of your user’s laptop or smartphone.
 
In addition, since the entire
 model will be downloaded to the user’s device, you should make sure that nothing
 about the model needs to stay confidential.

To deploy a model in JavaScript, the TensorFlow ecosystem includes TensorFlow.js
 (www.tensorflow.org/js), a JavaScript library for deep learning that implements
 almost all of the Keras API (originally developed under the working name WebKeras)
 as well as many lower-level TensorFlow APIs.

#### INFERENCE MODEL OPTIMIZATION

You should
always seek to optimize your model before importing into TensorFlow.js or exporting
it to TensorFlow Lite.
 There are two popular optimization techniques you can apply:


1. Weight pruning—Not every coefficient in a weight tensor contributes equally to
the predictions. It’s possible to considerably lower the number of parameters
in the layers of your model by only keeping the most significant ones. This
reduces the memory and compute footprint of your model, at a small cost in
performance metrics. By deciding how much pruning you want to apply, you
are in control of the trade-off between size and accuracy.


2. Weight quantization—Deep learning models are trained with single-precision
floating-point (float32) weights. However, it’s possible to quantize weights to
8-bit signed integers (int8) to get an inference-only model that’s a quarter the
size but remains near the accuracy of the original model.

### 6.3.3 Monitor your model in the wild

Even this is not the end. Once you’ve deployed a model, you need to keep monitoring its behavior, its performance on new data, its interaction with the rest of the application, and its eventual impact on business metrics.


+ Is user engagement in your online radio up or down after deploying the new
music recommender system? Has the average ad click-through rate increased
after switching to the new click-through-rate prediction model? Consider using
randomized A/B testing to isolate the impact of the model itself from other
changes: a subset of cases should go through the new model, while another
control subset should stick to the old process. Once sufficiently many cases have
been processed, the difference in outcomes between the two is likely attributable to the model.


+ If possible, do a regular manual audit of the model’s predictions on production
data. It’s generally possible to reuse the same infrastructure as for data annotation:
send some fraction of the production data to be manually annotated, and compare the model’s predictions to the new annotations. For instance, you should
definitely do this for the image search engine and the bad-cookie flagging system.


+ When manual audits are impossible, consider alternative evaluation avenues
such as user surveys (for example, in the case of the spam and offensive-content
flagging system).

### 6.3.4 Maintia your Model

As soon as your model has launched, you should be getting ready to train the next
generation that will replace it. As such,

+  Watch out for changes in the production data. Are new features becoming available? Should you expand or otherwise edit the label set?

+  Keep collecting and annotating data, and keep improving your annotation
pipeline over time. In particular, you should pay special attention to collecting
samples that seem to be difficult for your current model to classify—such samples are the most likely to help improve performance.