Beata Sirowy

# Machine learning:  basics
Based on Geron, A. (2023) _Hands-On Machine Learning
with Scikit-Learn, Keras, and TensorFlow_ 

Liu, Y. (2020) _Python Machine Learning By Example_

![image.png](attachment:image.png)

A machine learning system is fed with input data—this can be numerical,
textual, visual, or audiovisual. 

The system usually has an output—this can
be a floating-point number, for instance, the acceleration of a self-driving
car, or it can be an integer representing a category (also called a class), for
example, a cat or tiger from image recognition.

The main task of machine learning is to explore and construct algorithms
that can learn from historical data and make predictions on new input data.

For a data-driven solution, we need to define (or have it defined by an
algorithm) an evaluation function called loss or cost function, which
measures how well the models are learning. In this setup, we create an
optimization problem with the goal of learning in the most efficient and
effective way.

Depending on the nature of the learning data, machine learning tasks can be
broadly classified into the following three categories:

#### __Unsupervised learning__: 
When the learning data only contains
indicative signals without any description attached, it's up to us to find
the structure of the data underneath, to discover hidden information, or
to determine how to describe the data. 
- This kind of learning data is
called __unlabeled data.__ 
- Unsupervised learning can be used to __detect
anomalies__, such as fraud or defective equipment, or to __group
customers__ with similar online behaviors for a marketing campaign.
- __Data visualization__ that makes data more digestible, and __dimensionality
reduction__ that distills relevant information from noisy data, are also in
the family of unsupervised learning.

#### __Supervised learning:__ 
When learning data comes with a description,
targets, or desired output besides indicative signals, the learning goal is
to find a general rule that maps input to output. 
- This kind of learning
data is called __labeled data__. 
- The learned rule is then used to label new
data with unknown output. 
- The labels are usually provided by event logging systems or evaluated by human experts. 
- If it's feasible, they may also be produced by human raters, through crowd
sourcing, for instance. 
- Supervised learning is commonly used in daily
applications, such as __face and speech recognition, products or movie
recommendations, sales forecasting, and spam email detection__.

We can further subdivide supervised learning into:

- __Regression__ trains on and predicts continuous-
valued responses, for example, predicting house prices,

- __Classification__ attempts to find the appropriate class label, such as
analyzing a positive/negative sentiment and predicting a loan default.

If not all learning samples are labeled, but some are, we will have __semi-
supervised learning__. This makes use of unlabeled data (typically a large amount) for training, besides a small amount of labeled data. Semi- supervised learning is applied in cases where it is expensive to acquire a fully labeled dataset and more practical to label a small subset.

#### __Reinforcement learning:__

Learning data provides feedback so that the
system adapts to dynamic conditions in order to achieve a certain goal
in the end. 
- The system evaluates its performance based on the
feedback responses and reacts accordingly. 
- Instances
include __robotics for industrial automation, self-driving cars, and the
chess master, AlphaGo__. 
- The key difference between reinforcement
learning and supervised learning is the __interaction with the
environment.__


![image.png](attachment:image.png)

We'll focus on production-ready Python frameworks:

#### Scikit-Learn 
It is very easy to use, yet it implements many machine
learning algorithms efficiently, so it makes for a great entry point to
learning machine learning. 
- It was created by David Cournapeau in
2007, and is now led by a team of researchers at the French Institute
for Research in Computer Science and Automation (Inria).

#### TensorFlow 
It is a more complex library for distributed numerical
computation. 
- It makes it possible to train and run very large neural
networks efficiently by distributing the computations across potentially hundreds of multi-GPU (graphics processing unit) servers. 
- TensorFlow
(TF) was created at Google and supports many of its large-scale
machine learning applications. 
- It was open sourced in November 2015.

#### Keras

It is a high-level deep learning API that makes it very simple to
train and run neural networks. 
- Keras comes bundled with TensorFlow,
and it relies on TensorFlow for all the intensive computations.
