Beata Sirowy

# Machine learning:  basics
Based on Geron, A. (2023) _Hands-On Machine Learning
with Scikit-Learn, Keras, and TensorFlow_ 

Liu, Y. (2020) _Python Machine Learning By Example_

__Machine learning__ is the process of teaching computers to learn and make decisions from data without being explicitly programmed. 

It is a  a subset of artificial intelligence where computers are trained to learn from data and make decisions based on patterns they identify, without needing explicit programming for each specific task. 
- It involves using algorithms to parse data, learn from it, and make predictions or decisions. 
- This process improves over time as the system is exposed to more data, enhancing its accuracy and performance in various applications.

![image.png](attachment:image.png)

A machine learning system is fed with input data—this can be numerical,
textual, visual, or audiovisual. 

The system usually has an output—this can
be a floating-point number, for instance, the acceleration of a self-driving
car, or it can be an integer representing a category (also called a class), for
example, a cat or tiger from image recognition.

The main task of machine learning is to explore and construct algorithms
that can learn from historical data and make predictions on new input data.

For a data-driven solution, we need to define (or have it defined by an
algorithm) an evaluation function called loss or cost function, which
measures how well the models are learning. In this setup, we create an
optimization problem with the goal of learning in the most efficient and
effective way.

Depending on the nature of the learning data, machine learning tasks can be
broadly classified into the following three categories:

#### __Unsupervised learning__: 
When the learning data only contains
indicative signals without any description attached, it's up to us to find
the structure of the data underneath, to discover hidden information, or
to determine how to describe the data. 
- This kind of learning data is
called __unlabeled data.__ 
- Unsupervised learning can be used to __detect
anomalies__, such as fraud or defective equipment, or to __group
customers__ with similar online behaviors for a marketing campaign.
- __Data visualization__ that makes data more digestible, and __dimensionality
reduction__ that distills relevant information from noisy data, are also in
the family of unsupervised learning.

#### __Supervised learning:__ 
When learning data comes with a description,
targets, or desired output besides indicative signals, the learning goal is
to find a general rule that maps input to output. 
- This kind of learning
data is called __labeled data__. 
- The learned rule is then used to label new
data with unknown output. 
- The labels are usually provided by event logging systems or evaluated by human experts. 
- If it's feasible, they may also be produced by human raters, through crowd
sourcing, for instance. 
- Supervised learning is commonly used in daily
applications, such as __face and speech recognition, products or movie
recommendations, sales forecasting, and spam email detection__.

We can further subdivide supervised learning into:

- __Regression__ trains on and predicts continuous-
valued responses, for example, predicting house prices,

- __Classification__ attempts to find the appropriate class label, such as
analyzing a positive/negative sentiment and predicting a loan default.

If not all learning samples are labeled, but some are, we will have __semi-
supervised learning__. This makes use of unlabeled data (typically a large amount) for training, besides a small amount of labeled data. Semi- supervised learning is applied in cases where it is expensive to acquire a fully labeled dataset and more practical to label a small subset.

#### __Reinforcement learning:__

Learning data provides feedback so that the
system adapts to dynamic conditions in order to achieve a certain goal
in the end. 
- The system evaluates its performance based on the
feedback responses and reacts accordingly. 
- Instances
include __robotics for industrial automation, self-driving cars, and the
chess master, AlphaGo__. 
- The key difference between reinforcement
learning and supervised learning is the __interaction with the
environment.__


![image.png](attachment:image.png)

#### __ML algorithms__

Machine learning algorithms have evolved over time. We can roughly categorize them
into four main approaches: 
- __logic-based learning__: used basic
rules specified by human experts and, with these rules, systems tried to
reason using formal logic, background knowledge, and hypotheses
- __statistical learning__: attempts to find a function to formalize the
relationships between variables
- __artificial neural networks (ANNs)__: imitate animal brains and
consist of interconnected neurons that are also an imitation of biological
neurons. They try to model complex relationships between input and output
values and to capture patterns in data. 
- __genetic algorithms (GA)__: were
popular in the 1990s. They mimic the biological process of evolution and
try to find the optimal solutions using methods such as mutation and
crossover.

We are currently seeing a revolution in __deep learning__, which we might
consider __a rebranding of neural networks__. The term deep learning was
coined around 2006 and refers to deep neural networks with many layers.
The breakthrough in deep learning was the result of the integration and
utilization of Graphical Processing Units (GPUs), which massively speed
up computation.
- It's believed that deep learning
resembles the way humans learn.Therefore, it may be able to deliver on the
promise of sentient machines.

#### __Moore's law:__
It is an empirical observation
claiming that __computer hardware improves exponentially with time__. The
law was first formulated by Gordon Moore, the co-founder of Intel, in 1965. 
- The consensus seems to be that Moore's law should continue to be valid for
a couple of decades.This gives some credibility to Ray Kurzweil's
predictions of achieving true machine intelligence by 2029.

![image.png](attachment:image.png)

#### __Overfitting, underfitting, and the bias-variance trade-off__
__Overfitting:__  
- a model fits the existing
observations too well but fails to predict future new observations. 
- This can occur
when we're over extracting too much information from the training sets and
making our model just work well with them, which is called __low bias__ in
machine learning.
- __bias:__ the difference between the average prediction and the true value
- The model, as a result, will perform poorly on datasets that weren't seen before. We call
this situation __high variance__ in machine learning. 
- __variance__ measures the spread of the prediction, which is the
variability of the prediction.


![image.png](attachment:image.png)

- Overfitting occurs when we try to describe the learning rules based on too
many parameters relative to the small number of observations, instead of
the underlying 
- Overfitting also takes place when we make the model excessively
complex so that it fits every training sample, such as memorizing the
answers for all exam questions.

__Underfitting:__

When a model is underfit, it doesn't
perform well on the training sets and won't do so on the testing sets, which
means it fails to capture the underlying trend of the data. 
- Underfitting may
occur if we aren't using enough data to train the model
- this may also happen if we're
trying to fit a wrong model to the data, 
- We call any of these situations a __high bias__ in machine learning;
- although its __variance is low__ as the performance in training and test sets is
pretty consistent, in a bad way.

![image.png](attachment:image.png)

 __The bias-variance trade-off__

- bias is the error stemming from incorrect assumptions in the learning
algorithm; high bias results in underfitting 
- variance measures how sensitive the model prediction is to variations in the datasets 
- hence, we need to avoid cases where either bias or variance is getting high 
- in practice, there is an explicit trade-off
between them, where decreasing one increases the other. 
This is the so-called bias-variance trade-off

The more complex the learning model ŷ(x) is, and the larger the size of the training samples, the lower the bias will become. 
However, this will also create more shift to the model in order to better fit the increased data points.
As a result, the variance will be lifted.

We usually employ the cross-validation technique as well as regularization
and feature reduction to find the optimal model balancing bias and variance
and to diminish overfitting.

#### __Avoiding overfitting with cross-validation__

In machine learning, the
validation procedure helps to evaluate how the models will generalize to
independent or unseen datasets in a simulated setting. 

In a conventional validation setting: 
- the original data is partitioned into three subsets, usually
__60% for the training set__, __20% for the validation set__, and the rest __(20%) for
the testing set__. 
- This setting suffices if we have enough training samples
after partitioning and we only need a __rough estimate__ of simulated
performance. 

Otherwise, __cross-validation__ is preferable: 

- In one round of cross-validation, __the original data is divided into two
subsets__, for training and testing (or validation), respectively. 
- The testing
performance is recorded. 
- Similarly, __multiple rounds of cross-validation are
performed__ under different partitions. 
- Testing results from all rounds are
finally averaged to generate a more reliable estimate of model prediction
performance. 
- Cross-validation helps to reduce variability and, therefore,
limit overfitting.

_When the training size is very large, it's often sufficient to split it into
training, validation, and testing (three subsets) and conduct a performance
check on the latter two. Cross-validation is less preferable in this case since
it's computationally costly to train a model for each single round. But if you
can afford it, there's no reason not to use cross-validation. When the size
isn't so large, cross-validation is definitely a good choice._

There are mainly two cross-validation schemes in use: 

In __the exhaustive scheme__, we leave out a fixed number of
observations in each round as testing (or validation) samples and use the
remaining observations as training samples. 
- This process is repeated until
all possible different subsets of samples are used for testing once. 
- For
instance, we can apply __Leave-One-Out-Cross-Validation (LOOCV)__,
which lets each sample be in the testing set once. For a dataset of the size n,
LOOCV requires n rounds of cross-validation. This can be slow
when n gets large. This following diagram presents the workflow of
LOOCV:

![image.png](attachment:image.png)

The __non-exhaustive scheme__ as the name implies, doesn't
try out all possible partitions. The most widely used type of this scheme
is __k-fold cross-validation__. 
- We first randomly split the original data into k
equal-sized folds. 
- In each trial, one of these folds becomes the testing set,
and the rest of the data becomes the training set.
- We repeat this process k times, with each fold being the designated testing
set once. 
- Finally, we average the k sets of test results for the purpose of
evaluation. Common values for k are 3, 5, and 10. The following table
illustrates the setup for five-fold:

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

- K-fold cross-validation often has a lower variance compared to LOOCV,
since we're using a chunk of samples instead of a single one for validation.

__The holdout method__

We can also randomly split the data into training and testing sets numerous
times. This is formally called the holdout method. The problem with this
algorithm is that some samples may never end up in the testing set, while
some may be selected multiple times in the testing set.

__In summary__
cross-validation derives a more accurate assessment of model
performance by combining measures of prediction performance on different
subsets of data. This technique not only reduces variance and avoids
overfitting, but also gives an insight into how the model will generally
perform in practice.

#### __The main production-ready Python ML frameworks:__

#### Scikit-Learn 
It is very easy to use, yet it implements many machine
learning algorithms efficiently, so it makes for a great entry point to
learning machine learning. 
- It was created by David Cournapeau in
2007, and is now led by a team of researchers at the French Institute
for Research in Computer Science and Automation (Inria).

#### TensorFlow 
It is a more complex library for distributed numerical
computation. 
- It makes it possible to train and run very large neural
networks efficiently by distributing the computations across potentially hundreds of multi-GPU (graphics processing unit) servers. 
- TensorFlow
(TF) was created at Google and supports many of its large-scale
machine learning applications. 
- It was open sourced in November 2015.
- Ideal for large-scale projects and production environments that require high-performance and scalable models.

#### Keras

It is a high-level deep learning API that makes it very simple to
train and run neural networks. 
- Keras comes bundled with TensorFlow,
and it relies on TensorFlow for all the intensive computations.


#### PyTorch 
It is used for applications such as computer vision and natural language processing.
- Developed by Meta AI (formerly Facebook AI Research Lab).
- Its initial release in 2016 quickly garnered attention due to its flexibility, ease of use, and dynamic computation graph.
- Ideal for research and small-scale projects prioritizing flexibility, experimentation and quick editing capabilities for models