ML CATEGORIES
Machine learning incorporates several hundred statistical-based algorithms
and choosing the right algorithm or combination of algorithms for the job is a
constant challenge for anyone working in this field. But before we examine
specific algorithms, it is important to understand the three overarching
categories of machine learning. These three categories are supervised,
unsupervised, and reinforcement.

Supervised Learning
As the first branch of machine learning, supervised learning concentrates on
learning patterns through connecting the relationship between variables and
known outcomes and working with labeled datasets.
Supervised learning works by feeding the machine sample data with various
features (represented as “X”) and the correct value output of the data
(represented as “y”). The fact that the output and feature values are known
qualifies the dataset as “labeled.” The algorithm then deciphers patterns that
exist in the data and creates a model that can reproduce the same underlying
rules with new data.
For instance, to predict the market rate for the purchase of a used car, a
supervised algorithm can formulate predictions by analyzing the relationship
between car attributes (including the year of make, car brand, mileage, etc.)
and the selling price of other cars sold based on historical data. Given that the
supervised algorithm knows the final price of other cards sold, it can then
work backward to determine the relationship between the characteristics of
the car and its value.

Figure 1: Car value prediction model

After the machine deciphers the rules and patterns of the data, it creates what
is known as a model: an algorithmic equation for producing an outcome with
new data based on the rules derived from the training data. Once the model is
prepared, it can be applied to new data and tested for accuracy. After the
model has passed both the training and test data stages, it is ready to be
applied and used in the real world.
In Chapter 13, we will create a model for predicting house values where y is
the actual house price and X are the variables that impact y, such as land size,
location, and the number of rooms. Through supervised learning, we will
create a rule to predict y (house value) based on the given values of various
variables (X).
Examples of supervised learning algorithms include regression analysis,
decision trees, k-nearest neighbors, neural networks, and support vector
machines. Each of these techniques will be introduced later in the book

Unsupervised Learning
In the case of unsupervised learning, not all variables and data patterns are
classified. Instead, the machine must uncover hidden patterns and create
labels through the use of unsupervised learning algorithms. The k-means
clustering algorithm is a popular example of unsupervised learning. This
simple algorithm groups data points that are found to possess similar features
as shown in Figure 1.

Figure 1: Example of k-means clustering, a popular unsupervised learning technique

If you group data points based on the purchasing behavior of SME (Small
and Medium-sized Enterprises) and large enterprise customers, for example,
you are likely to see two clusters emerge. This is because SMEs and large
enterprises tend to have disparate buying habits. When it comes to purchasing
cloud infrastructure, for instance, basic cloud hosting resources and a Content
Delivery Network (CDN) may prove sufficient for most SME customers.
Large enterprise customers, though, are more likely to purchase a wider array
of cloud products and entire solutions that include advanced security and
networking products like WAF (Web Application Firewall), a dedicated
private connection, and VPC (Virtual Private Cloud). By analyzing customer
purchasing habits, unsupervised learning is capable of identifying these two
groups of customers without specific labels that classify the company as
small, medium or large.
The advantage of unsupervised learning is it enables you to discover patterns
in the data that you were unaware existed—such as the presence of two major
customer types. Clustering techniques such as k-means clustering can also
provide the springboard for conducting further analysis after discrete groups
have been discovered.
In industry, unsupervised learning is particularly powerful in fraud detection
—where the most dangerous attacks are often those yet to be classified. One
real-world example is DataVisor, who essentially built their business model
based on unsupervised learning.
Founded in 2013 in California, DataVisor protects customers from fraudulentonline activities, including spam, fake reviews, fake app installs, and
fraudulent transactions. Whereas traditional fraud protection services draw on
supervised learning models and rule engines, DataVisor uses unsupervised
learning which enables them to detect unclassified categories of attacks in
their early stages.
On their website, DataVisor explains that "to detect attacks, existing solutions
rely on human experience to create rules or labeled training data to tune
models. This means they are unable to detect new attacks that haven’t already
been identified by humans or labeled in training data."
This means that traditional solutions analyze the chain of activity for a
particular attack and then create rules to predict a repeat attack. Under this
scenario, the dependent variable (y) is the event of an attack and the
independent variables (X) are the common predictor variables of an attack.
Examples of independent variables could be:
a) A sudden large order from an unknown user. I.E. established customers
generally spend less than $100 per order, but a new user spends $8,000 in one
order immediately upon registering their account.
b) A sudden surge of user ratings. I.E. As a typical author and bookseller
on Amazon.com, it’s uncommon for my first published work to receive more
than one book review within the space of one to two days. In general,
approximately 1 in 200 Amazon readers leave a book review and most books
go weeks or months without a review. However, I commonly see competitors
in this category (data science) attracting 20-50 reviews in one day!
(Unsurprisingly, I also see Amazon removing these suspicious reviews weeks
or months later.)
c) Identical or similar user reviews from different users. Following the
same Amazon analogy, I often see user reviews of my book appear on other
books several months later (sometimes with a reference to my name as the
author still included in the review!). Again, Amazon eventually removes
these fake reviews and suspends these accounts for breaking their terms of
service.
d) Suspicious shipping address. I.E. For small businesses that routinely ship
products to local customers, an order from a distant location (where they
don't advertise their products) can in rare cases be an indicator of fraudulent
or malicious activity.
Standalone activities such as a sudden large order or a distant shipping
address may prove too little information to predict sophisticated
[5]cybercriminal activity and more likely to lead to many false positives. But a
model that monitors combinations of independent variables, such as a sudden
large purchase order from the other side of the globe or a landslide of book
reviews that reuse existing content will generally lead to more accurate
predictions. A supervised learning-based model could deconstruct and
classify what these common independent variables are and design a detection
system to identify and prevent repeat offenses.
Sophisticated cybercriminals, though, learn to evade classification-based rule
engines by modifying their tactics. In addition, leading up to an attack,
attackers often register and operate single or multiple accounts and incubate
these accounts with activities that mimic legitimate users. They then utilize
their established account history to evade detection systems, which are
trigger-heavy against recently registered accounts. Supervised learning-based
solutions struggle to detect sleeper cells until the actual damage has been
made and especially with regard to new categories of attacks.
DataVisor and other anti-fraud solution providers therefore leverage
unsupervised learning to address the limitations of supervised learning by
analyzing patterns across hundreds of millions of accounts and identifying
suspicious connections between users—without knowing the actual category
of future attacks. By grouping malicious actors and analyzing their
connections to other accounts, they are able to prevent new types of attacks
whose independent variables are still unlabeled and unclassified. Sleeper cells
in their incubation stage (mimicking legitimate users) are also identified
through their association to malicious accounts. Clustering algorithms such as
k-means clustering can generate these groupings without a full training
dataset in the form of independent variables that clearly label indications of
an attack, such as the four examples listed earlier. Knowledge of the
dependent variable (known attackers) is generally the key to identifying other
attackers before the next attack occurs. The other plus side of unsupervised
learning is companies like DataVisor can uncover entire criminal rings by
identifying subtle correlations across users.
We will cover unsupervised learning later in this book specific to clustering
analysis. Other examples of unsupervised learning include association
analysis, social network analysis, and descending dimension algorithms.

Reinforcement Learning
Reinforcement learning is the third and most advanced algorithm category inmachine learning. Unlike supervised and unsupervised learning,
reinforcement learning continuously improves its model by leveraging
feedback from previous iterations. This is different to supervised and
unsupervised learning, which both reach an indefinite endpoint after a model
is formulated from the training and test data segments.
Reinforcement learning can be complicated and is probably best explained
through an analogy to a video game. As a player progresses through the
virtual space of a game, they learn the value of various actions under different
conditions and become more familiar with the field of play. Those learned
values then inform and influence a player’s subsequent behavior and their
performance immediately improves based on their learning and past
experience.
Reinforcement learning is very similar, where algorithms are set to train the
model through continuous learning. A standard reinforcement learning model
has measurable performance criteria where outputs are not tagged—instead,
they are graded. In the case of self-driving vehicles, avoiding a crash will
allocate a positive score and in the case of chess, avoiding defeat will
likewise receive a positive score.
A specific algorithmic example of reinforcement learning is Q-learning. In Q-
learning, you start with a set environment of states, represented by the
symbol ‘S’. In the game Pac-Man, states could be the challenges, obstacles or
pathways that exist in the game. There may exist a wall to the left, a ghost to
the right, and a power pill above—each representing different states.
The set of possible actions to respond to these states is referred to as “A.” In
the case of Pac-Man, actions are limited to left, right, up, and down
movements, as well as multiple combinations thereof.
The third important symbol is “Q.” Q is the starting value and has an initial
value of “0.”
As Pac-Man explores the space inside the game, two main things will
happen:
1) Q drops as negative things occur after a given state/action
2) Q increases as positive things occur after a given state/action
In Q-learning, the machine will learn to match the action for a given state that
generates or maintains the highest level of Q. It will learn initially through the
process of random movements (actions) under different conditions (states).
The machine will record its results (rewards and penalties) and how they
impact its Q level and store those values to inform and optimize its futureactions.
While this sounds simple enough, implementation is a much more difficult
task and beyond the scope of an absolute beginner’s introduction to machine
learning. Reinforcement learning algorithms aren’t covered in this book,
however, I will leave you with a link to a more comprehensive explanation of
reinforcement learning and Q-learning following the Pac-Man scenario.