# What is machine learning?

## Artificial Intelligence (AI) vs. Machine Learning

What is learning?

”*The activity or process of gaining knowledge or skill by studying,
practicing, being taught, or experiencing something.*”

Merriam Webster dictionary

Artificial intelligence (AI) and machine learning are often used interchangeably, but `machine learning (ML) is a subset of` the broader category `of AI`.

1. **AI** refers to the `general ability of computers to emulate human thought and perform tasks` in real-world environments, 

2. while `ML` refers to the `Technologies and Algorithms `that enable systems to 
* identify patterns, 
* make decisions, 
* and improve themselves through experience and data. 

ML (as a subcategory of AI) uses algorithms to 
* automatically learn insights and recognize patterns from data, 
* applying that learning to make increasingly better decisions.

**Machine learning approach**: program an algorithm to
automatically learn from data, or from experience

Arthur Samuel (~1950-60), a computer scientist who pioneered the study of artificial intelligence, said that machine learning is 

"*the study that gives computers the ability to learn without being explicitly programmed*."

A popular quote from computer scientist *Tom Mitchell* defines machine learning more
formally:

“*A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E.*”

Why might you want to use a learning algorithm?

* For many problems, it’s *difficult to program* the correct behavior
by hand (e.g. recognizing people and objects, understanding human speech)
* system needs to *adapt to a changing environment* (e.g. spam
detection)
* want the system to perform *better* than the human programmers
* privacy/fairness (e.g. ranking search results)

It’s similar to statistics...
* Both fields try to uncover patterns in data
* Both fields draw heavily on calculus, probability, and linear algebra,
and share many of the same core algorithms

ML and statistics are both data analysis fields, but they have different goals, approaches, and types of models:
* Goal
    * Statistics is used to make inferences about a population based on a sample, while ML is used to make repeatable predictions from data
* Approach
    * Statistical models define mathematical relationships between variables, while ML models learn from data without explicit programming.
* Data
    * ML requires large amounts of data, while statistics does not involve multiple subsets of data.

The major difference between machine learning and statistics is their purpose. 
* Machine learning models are designed to make the most accurate predictions possible. 
* Statistical models are designed for inference about the relationships between variables

## Machine Learning Methods

First, we will discuss types of experience

Tasks (& experience) are generally classified into broad categories.

These categories are based on 
* how learning is received 
* or how feedback on the learning is given to the system developed.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

Types of machine learning:
1. **Supervised learning**
*  a program predicts an output for an input by learning from pairs of labeled inputs and outputs;
* that is, the program learns from examples of the right answers

![image.png](attachment:image.png)


* algorithm needs to be able to “learn” by comparing its actual output with the “taught” outputs to find errors, and modify the model accordingly (this process is referred to as *Training* or *Fitting*).
* examples:
    * use historical stock market information to anticipate upcoming fluctuations. 
    * be employed to filter out spam emails. 
    * tagged photos of dogs can be used as input data to classify untagged photos of dogs.

2. **Unsupervised learning**
* no labeled examples – instead,a program attempts to discover “interesting”& hidden patterns in the data

![image-2.png](attachment:image-2.png)

* it may also have a goal of feature learning, which allows the computational machine to
automatically discover the `representations` that are needed to classify raw data

* `Representation`: 
    * How you (and your model) see the data. Basically, the mathematical space information resides in. (Example: encoding)
* example:
    * Assume that you have collected data describing the heights and weights of people. 
    * An example of an unsupervised learning problem is dividing the data points into groups. 
    * A program might produce groups that correspond to men and women, or children and adults.

    Now assume that the data is also labeled with the person's sex. An example of a
supervised learning problem is inducing a rule to predict whether a person is male
or female based on his or her height and weight. 

It is important to keep in mind that *validating the output variables* still calls for some level of human involvement. 

For instance, an unsupervised learning model can determine that customers who shop online tend to purchase multiple items from the same category at the same time. However, a human analyst would need to check that it makes sense for a recommendation engine to pair Item X with Item Y. 

3. Some types of problems, called **semi-supervised learning problems**, make use of both supervised and unsupervised data

![image.png](attachment:image.png)


![image-3.png](attachment:image-3.png)

![image-2.png](attachment:image-2.png)

* An example of semi-supervised machine learning is `reinforcement learning`,
in which a program receives feedback for its decisions, but the feedback may not be
associated with a single decision
* For example, 
    a reinforcement learning program that learns to play a side-scrolling video game such as Super Mario Bros. may receive a reward when it completes a level or exceeds a certain score, and a punishment when it loses a life. However, this supervised feedback is not associated with specific decisions to run or pick up fire flowers

# Supervised Machine Learning & ML Terminology


## ML Terminology

This means we are given a `training set` consisting of `inputs` and corresponding `labels`, e.g.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

A supervised learning program learns from labeled examples of the `outputs` that should be produced for an `input`

There are many names for the `output` of a ML program: we will refer to the output as the **response variable** (other name - `label`)

Similarly, the `input` variables have several names - we will refer to the input variables as **features**

The collection of examples that comprise supervised experience is called a **training set**. 

A collection of examples that is used to assess the performance of a program
is called a **test set**.

The response variable can be thought of as the answer to the question posed by the explanatory variables (features). 

Supervised learning problems learn from a collection of answers to different questions; that is, supervised learning
programs are provided with the correct answers and must learn to respond correctly to unseen, but similar, questions.

## Supervised ML tasks

In this relatively formal deﬁnition of the word `“task`,” the process of *learning itself is not the task*. 

`Learning` is our means of achieving `the ability to perform the task`. 

For example, 
* if we want a robot to be able to walk, * then walking is the task.
* We could program the robot to learn to walk, 
* or we could attempt to directly write a program that speciﬁes how to walk manually

**ML tasks** are usually described in terms of *how the machine learning system should process an* `example`. 

An **example** is a *collection of features that have been quantitatively measured from some object or event that we want the ML system to process*.

We typically represent an example as a vector:

![image.png](attachment:image.png)

where each entry *xi* of the vector is another `feature`

There are two types of supervised learning "tasks" (algorithms):

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### Classification

**Classification** is a type of supervised machine learning where *algorithms learn from the data to predict an outcome or event in the futur*e:
* the model tries to predict the correct label of a given input data.
* in classification tasks the program must learn to predict **discrete values** (the most probable category, class, or label) for the response variables (for new observations))

Example 1:

For instance, an algorithm can learn to predict whether a given email is spam or ham (no spam):

![image.png](attachment:image.png)

Example 2

A bank may have a customer dataset containing credit history, loans, investment details, etc. and they may want to know if any customer will default. 

In the historical data, we will have Features and Target. 
* Features will be attributes of a customer such as credit history, loans, investments, etc.
* Target will represent whether a particular customer has defaulted in the past (normally represented by 1 or 0 / True or False / Yes or No. 

### Regression

**Regression** is a type of supervised ML where algorithms learn from the data to *predict **continuous** (numerical) values* such as sales, salary, weight, or temperature

Example:

    A dataset containing features of the house such as lot size, number of bedrooms, number of baths, neighborhood, etc. and the price of the house, a Regression algorithm can be trained to learn the relationship between the features and the price of the house.

In regression tasks, we use `linear` and `non-linear` models to build our predictive models. 

* `Linear models` have a basic assumption that there exists *a linear relationship between the input and output* variables, 

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

Relation between the prediction y and inputs x is linear in both cases.

while 
* `non-linear models` do not rely on any such assumptions. 

The goal of linear regression is to find the best fit line for our data whereas in non-linear regression, we try to identify complex relationships within our dataset.

[How Linear regression works](https://images.datacamp.com/image/upload/v1661171231/Linear_regression_dff716e828.gif)

# Unsupervised Learning

In unsupervised learning, data is unlabeled, so the learning algorithm is
left to find `commonalities among its input data`. 

As unlabeled data are more abundant than labeled data, ML methods that facilitate unsupervised learning are particularly valuable.

![image.png](attachment:image.png)

Unsupervised learning techniques == a valuable set of tools for *exploratory analysis*

They `bring out patterns and structure within datasets`, which yield information that
may be informative in itself or serve as a guide to further analysis. 

It's critical to
have a solid set of unsupervised learning tools that you can apply to help break up
unfamiliar or complex datasets into actionable information.

They bring out patterns and structure within datasets, which yield information that
may be informative in itself or serve as a guide to further analysis. It's critical to
have a solid set of unsupervised learning tools that you can apply to help break up
unfamiliar or complex datasets into actionable information.

### Unsupervised ML tasks

#### Clustering
* is a commonly used ML task in which data points are grouped into *clusters* (groups of closely related data points).

![image.png](attachment:image.png)

* does not require labeled data
* can be used to identify patterns or similarities within a dataset. 
* has many applications ranging from customer segmentation, market segmentation, image segmentation, document classification, and more. 
* at its core, clustering is a process of partitioning a set of objects into distinct groups such that the elements in each group are similar to each other while those belonging to different groups are very dissimilar.

Examples:
* Given a collection of movie reviews, a clustering algorithm might discover sets of positive and negative
reviews. The system will not be able to label the clusters as "positive" or "negative"; without supervision, it will only have knowledge that the grouped observations are similar to each other by some measure.
* Discovering segments of customers within a market for a product. By understanding what attributes are common to particular groups of customers, marketers can decide what aspects of their campaigns need to be emphasized.
* image segmentation: == classification of an image into different groups (if you want to isolate objects in an image to analyze each object individually to check what it is.)

![image.png](attachment:image.png)

#### Dimensionality reduction

Some problems may contain `thousands or even millions of explanatory variables `(features), which
can be computationally costly to work with.

Additionally, the program's ability to generalize may be reduced if some of the `explanatory variables capture noise` or are irrelevant to the underlying relationship.

**Dimensionality reduction**
* is the process of discovering the explanatory variables that account for the greatest changes in the
response variable.

Popular algorithms used for dimensionality reduction include *principal component analysis (PCA)* and *Singular Value Decomposition (SVD)* 

* These algorithms seek to `transform data from high-dimensional spaces to low-dimensional spaces without compromising meaningful properties` in the original data. 

* These techniques are typically deployed during exploratory data analysis (EDA) or data processing to prepare the data for modeling.

#### Association Rule Mining

**Association rule mining**
* a rule-based approach to `discovering interesting relationships between features` in a given dataset. 
* it works by `using a measure of interest to identify strong rules` found within a dataset. 

Example 1:
* consider a dataset of transactions at a grocery store. 
* association rule mining could be used to identify relationships between items that are `frequently purchased together`. 
* For example, the rule "*If a customer buys bread, they are also likely to buy milk*" is an association rule that could be mined from this data set. 

We can use such rules to inform decisions about store layout, product placement, and marketing efforts.

Example 2 - Customer Segmentation:
* to discover that customers who purchase certain types of products are more likely to be younger
* similarly, they could learn that customers who purchase certain combinations of products are more likely to be located in specific geographic regions.


Imagine you have 10M customers, and you want to develop customized or focused marketing campaigns. It is unlikely that you will develop 10M marketing campaigns, so what do we do? We could use clustering to group 10M customers into 25 clusters and then design 25 marketing campaigns instead of 10M.

![image.png](attachment:image.png)

Example 3 - Fraud Detection
* a credit card company might use association rule mining to identify patterns of suspicious transactions, such as multiple purchases from the same merchant within a short period of time. 

Example 4 - Social network analysis
* an analysis of X(Twitter) data might reveal that users who write about a particular topic are also likely to write about other related topics, which could inform the identification of groups or communities within the network.

# Most Common Machine Learning Tasks

Following are the key machine learning tasks:
1. Regression
2. Classification
3. Clustering
4. Transcription
    * involves converting audio or video recordings or images having text into written text
        * For example, inoptical character recognition, the computer program is shown a photographcontaining an image of text and is asked to return this text in the form ofa sequence of characters ([Google_Street_View - processes address numbers in this way](https://en.wikipedia.org/wiki/Google_Street_View))

        ![image.png](attachment:image.png)

        
5. Machine translation
     * the input already consists of a sequence of symbols in some language, and the computer programmust convert this into a sequence of symbols in another language

    ![image-2.png](attachment:image-2.png)

6. Anomaly detection
    * identifying unusual patterns in data that do not conform to expected behavior (detecting fraudulent activity in financial data, detecting malicious behavior in network traffic data, etc.)


![image-3.png](attachment:image-3.png)

7. Synthesis & sampling
    * generate new data from existing data or to select a representative subset of data for further analysis. 
    * Synthesis and sampling are often used together, in order to create a more diverse and representative dataset.


    ![image-4.png](attachment:image-4.png)
    
8. Similarity matching:
    *  to match items based on their similarity (natural language processing, image recognition, recommendation systems, search engine optimization)
9. Co-occurrence grouping (aka frequent itemset mining, association rule discovery, and market-basket analysis tasks)
10. Causal modeling:
    * to infer the causes and effects of certain conditions or variables
        * data is used to make inferences about the relationships between variables. 
        * the goal is to identify which variables are causing certain outcomes and how they are related
11. Link profiling:
    * identifying potential connections between entities that are not yet connected (to predict relationships between entities, such as customers, products, authors, and more)

# Implementing ML systems

## General ML Workflow sketch

1. Should I use ML on this problem?
    * Is there a pattern to detect?
    * Can I solve it analytically?
    * Do I have data?
2. Gather and organize data.
    * Preprocessing, cleaning, visualizing.
3. Establishing a baseline.
4. Choosing a model, loss, regularization, ...
5. Optimization (could be simple, could be a PhD-level...).
6. Hyperparameter search.
7. Analyze performance & mistakes, and iterate back to step 4 (or 2).

**Predictive modeling workflow**

![image.png](attachment:image.png)

![](attachment:image.png)

![image.png](attachment:image.png)

[ Machine Learning Cheat Sheet](https://github.com/Hanna2110/MLintro/blob/main/data/ML%2BCheat%2BSheet_2.pdf)

![image.png](attachment:image.png)

[Data cleaning checkList](https://github.com/Hanna2110/MLintro/blob/main/data/Data_Cleaning_Checklist.pdf)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Machine learning is about adapting models to `data`.
* Data is the "raw material" for machine learning. 
* It learns from data. 

### Data processing basics

What an image looks like to the computer:

![image.png](attachment:image.png)

ML algorithms need to handle lots of types of data:
* images, text, audio waveforms, credit card transactions, etc.

Common strategy: represent the input as an input vector in *Rd*
* Representation = mapping to another space that’s easy to manipulate
* *Vectors* are a great representation since we can do linear algebra

Can use raw pixels:

![image.png](attachment:image.png)

Can do much better if you compute a *vector of meaningful features*.

Data processing -> Array processing ->**NumPy**

Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops:
* *vectorize* computations (express them in terms of matrix/vector operations) to exploit hardware efficiency
* makes your code cleaner and more readable

NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis.

Here are some of the things it provides:
* ndarray, a fast and space-efficient multidimensional array providing vectorized arithmetic operations and sophisticated broadcasting capabilities
* Standard mathematical functions for fast operations on entire arrays of data without having to write loops
* Tools for reading / writing array data to disk and working with memory-mapped files
* Linear algebra, random number generation, and Fourier transform capabilities

![image.png](attachment:image.png)

One of the more common problems is solving a matrix-vector equation. 

Here is an example. We seek the vector `x` that solves the equation
![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [5]:
import numpy as np

In [None]:
#We start by constructing the arrays for A and b.
A = np.array([[2,1,-2],[3,0,1],[1,1,-1]])
A


array([[ 2,  1, -2],
       [ 3,  0,  1],
       [ 1,  1, -1]])

In [10]:
b = np.transpose(np.array([[-3,5,-2]]))
b

array([[-3],
       [ 5],
       [-2]])

In [11]:
#To solve the system we do
x = np.linalg.solve(A,b)
x

array([[ 1.],
       [-1.],
       [ 2.]])

In [12]:
#To do a matrix multiplication or a matrix-vector multiplication we use the np.dot()
A = np.array([[1,-1,2],[3,2,0]])
v = np.array([[2],[1],[3]])

#!!! A more convenient approach is to transpose the corresponding row vector
v = np.transpose(np.array([[2,1,3]]))
w = np.dot(A,v)
w

array([[7],
       [8]])

### ML libraries & Neural net frameworks



ML libraries are collections of pre-built components and utilities that help developers and data scientists `build and implement ML models`

General-purpose libraries
* Provide algorithms and utilities for common ML tasks like classification, regression, and clustering

Task-specific libraries
* Provide support for specific tasks like data analysis, data visualization, natural language processing (NLP), computer vision, and deep learning

**Core ML and Deep Learning Frameworks**
* form the backbone of modern machine learning, providing tools to build and train a wide range of models from simple algorithms to complex neural networks

    * TensorFlow: Google’s open-source library for deep learning and neural networks.
    * PyTorch: Facebook’s flexible deep learning platform known for its dynamic computational graphs.
    * scikit-learn: A versatile library for classical machine learning algorithms and data mining.
    * Keras: High-level neural networks API, now integrated with TensorFlow.

**Data Manipulation and Numerical Computing**
* are essential for preparing and processing data, as well as performing the mathematical operations that underpin machine learning algorithms
    * NumPy: The fundamental package for scientific computing with Python.
    * Pandas: Powerful data manipulation and analysis library.

**Visualization and Plotting**
* tools are vital for exploratory data analysis, understanding model performance, and communicating results effectively
    * Matplotlib: Comprehensive library for creating static, animated, and interactive visualizations.
    * Also Widely Used: Seaborn, Plotly

**Natural Language Processing and Specialized Tools**
* cater to specific domains within machine learning, such as text processing, and provide utilities for optimizing model performance
    * Hugging Face Transformers: State-of-the-art natural language processing models and tools.
    * NLTK: Comprehensive suite of libraries and programs for symbolic and statistical natural language processing.
    * spaCy: Industrial-strength natural language processing library.

**Why study  ML  if these frameworks do so much for you?**
* So you know what to do if something goes wrong!
* Debugging learning algorithms requires sophisticated detective
work, which requires understanding what goes on beneath the hood.
* That’s why we derive things by hand

# Some ML fundamentals

## Training data and test data

The Figure below depicts an example of the dataset (the Iris dataset, which is a classic example in the field of machine
learning  https://archive.ics.uci.edu/ml/datasets/iris). 

The Iris dataset contains the measurements of 150 Iris flowers from three different species—Setosa,
Versicolor, and Virginica.

Each flower example represents one row in our dataset, and the flower measurements in centimeters are stored as columns, which we also call the features of the dataset

![image.png](attachment:image.png)

We will use a `matrix notation` to refer to our data.

The Iris dataset, consisting of 150 examples and four features, can then be written as a 150×4 matrix

![image.png](attachment:image.png)

Similarly, we can represent the **target** variables (here, class labels) as a 150-dimensional
column `vector`:

![image.png](attachment:image.png)

1. Split the Data

`Split the data` set into 2 pieces — a **training** set and a **testing** set.

The observations in the **training set** comprise the `experience` that the ML algorithm uses
to learn. 

In *supervised learning* problems, each observation consists of an *observed
response variable* and one or more *observed explanatory variables*.

The **test set** is a similar collection of observations that is used `to evaluate the
performance of the model` using some performance metric. 

* It is important that no observations from the training set are included in the test set.
* If the test set does contain examples from the training set, it will be difficult to assess whether the algorithm has learned to generalize from the training set or has simply memorized it. A

`Train-test split` is a **model validation procedure** that allows you to simulate *how a model would perform on new/unseen data*. 

![image.png](attachment:image.png)

This consists of `random sampling` without replacement about 75 percent of the rows (you can vary this) and putting them into your training set. 

The remaining 25 percent is put into your test set.

**Consequences of Not Using Train Test Split**

***Generalization*** refers to how well the concepts learned by a machine learning model apply to specific examples not seen by the model when it was learning.

A program that **generalizes** well will be able to `effectively perform a task with new data`. 

In contrast, a program that **memorizes** the training data by learning an overly complex model could
`predict` the values of the response variable `for the training set accurately`, but will `fail
to predict `the value of the response variable `for new examples`.

Memorizing the training set is called **over-fitting**.

## Overfitting in ML

**Overfitting** refers to a model that models the training data too well.

Overfitting happens *when a model learns the detail and noise in the training data* to the extent that it negatively impacts the performance of the model on new data. 

This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. 

The problem is that these concepts do not apply to new data and negatively impact the models ability to generalize.

To simulete overfitting - you could try not using train test split and instead train and test the model on the same data.

![image.png](attachment:image.png)

Example of overfitted training data.
The green line best follows the training data. 

![image.png](attachment:image.png)

The causes of overfitting can be numerous:

* Complex models. Using an overly complex model for a simple task can lead to overfitting. For instance, using a high-degree polynomial regression for data that's linear in nature.
* Insufficient data. If there's not enough data, the model might find patterns that don't really exist.
* Noisy data. If the training data contains errors or random fluctuations, an overfitted model will treat these as patterns.

The impact of overfitting is significant. While an overfitted model will have high accuracy on its training data, it will perform poorly on new, unseen data because it's not generalized enough.

Detecting overfitting is a crucial step in the machine learning process.


## Underfitting

While overfitting is a model's excessive adaptation to training data, **underfitting** is the opposite. 

An underfitted model fails to capture even the basic patterns in the training data.

There are several reasons why underfitting can occur, including:
* Model complexity: If the model is too simple, it may not have enough capacity to learn the patterns in the data. This can happen when the model has too few parameters or features.
* Insufficient training: If the model is not trained for long enough or with enough data, it may not be able to capture the underlying patterns in the data.
* Inappropriate model selection: If the model is not appropriate for the type of data being used, it may not be able to capture the underlying patterns in the data.
* Incorrect preprocessing: If the data is not preprocessed correctly, it may contain noise or irrelevant features that can confuse the model, leading to underfitting.

## A Good Fit in ML

Ideally, you want to select a model at the sweet spot between underfitting and overfitting.

This is the goal, but is very difficult to do in practice.

To understand this goal, we can look at the `performance of a machine learning algorithm over time` as it is learning a training data. 

We can `plot both the skill on the training data and the skill on a test dataset` we have held back from the training process.

Over time, as the algorithm learns, the *error for the model* on the training data goes down and so does the error on the test dataset. 

* If we train for too long, the performance on the training dataset may continue to decrease because the model is overfitting and learning the irrelevant detail and noise in the training dataset. 

* At the same time the error for the test set starts to rise again as the model’s ability to generalize decreases.

## Performance measures, bias, and variance

The goal of any supervised ML algorithm is to best estimate the `mapping function (f)` for the output variable (Y) given the input data (X). 

*  to discover a **`mapping function`** that will `map an input variable onto an output variable`
* model aims to train itself on the input variables(X) in such a way that the predicted values(Y) are as close to the actual values as possible

The `mapping function` is often called the *target function* because it is `the function that a given supervised ML algorithm aims to approximate`.

This difference between the actual values and predicted values is the **error** and it is used to evaluate the model. 

The error for any supervised Machine Learning algorithm comprises of 3 parts:

* `Bias` error
* `Variance` error
* The `noise`

While the *noise* is the irreducible error that we cannot eliminate, the other two i.e. `Bias` and `Variance` are reducible errors that we can attempt to minimize as much as possible.

![image.png](attachment:image.png)

### Bias Error

In the simplest terms, **`Bias`** `is the difference between the Predicted Value and the Expected Value`. 
* in general, a ML model `analyses the data, find patterns in it and make predictions`. 
    * While training, the model learns these patterns in the dataset (makes certain *assumptions*) and applies them to test data for prediction. 
    * While making predictions, a difference occurs between prediction values made by the model and actual values/expected values


 **Bias** can be defined as *an inability of ML algorithms to capture the true relationship* between the data points.

Each algorithm begins with some amount of bias.

A model has either:
* `Low Bias`: A low bias model will make fewer assumptions about the form of the target function.
* `High Bias`: A model with a high bias makes more assumptions, and the model becomes unable to capture the important features of our dataset. 

**!!!** A **high bias model** also cannot perform well on new data -> **underfitting**

![image.png](attachment:image.png)

### Variance Error

**Variance** is *the amount that the estimate of the target function will change if different training data was used* - **the amount of variation in the prediction**
* variance tells that how much a `random variable is different from its expected value`

The target function is estimated from the training data by a machine learning algorithm, so we should expect the algorithm to have some variance. 

Ideally, it should not change too much from one training dataset to the next, meaning that the algorithm is good at picking out the hidden underlying mapping between the inputs and the output variables.

* **Low variance** means there is a small variation in the prediction of the target function with changes in the training data set. 
* At the same time, **High variance** shows a large variation in the prediction of the target function with changes in the training dataset.

ML algorithms that have a **high variance** are strongly *influenced by the specifics of the training* data - >leads to **overfitting**.

### Different Combinations of Bias-Variance  using bulls-eye diagram

![image.png](attachment:image.png)

* **Low-Bias, Low-Variance**:
* * shows an *ideal machine learning model*. However, it is not possible practically.
* **Low-Bias, High-Variance**: 
* * model *predictions are inconsistent* and accurate on average. This case occurs when the model learns with a large number of parameters and hence leads to an `overfitting`
* **High-Bias, Low-Variance**: 
* * *predictions are consistent but inaccurate* on average. This case occurs when a model does not learn well with the training dataset or uses few numbers of the parameter. It leads to `underfitting` problems in the model.
* **High-Bias, High-Variance**:
* *  predictions are *inconsistent* and also *inaccurate* on average.

Ideally, a model will have both low bias and variance, but efforts to decrease one will
frequently increase the other. 

This is known as the **bias-variance trade-off**.

**How to identify High variance or High Bias?**

![image.png](attachment:image.png)

**High variance** can be identified if the model has
* Low training error and high test error.

**High Bias** can be identified if the model has:
* High training error and the test error is almost similar to training error.

### Accuracy, Precision, Recall or F1?

Most **performance measures** can only be calculated for a `specific type of task`.

ML systems should be evaluated using *performance measures* that represent `the costs associated with making errors in the real world`.

**The confusion matrix**
* is a tool used to evaluate the performance of a model and is visually represented as a table

The Confusion Matrix basic Structure

![image.png](attachment:image.png)

* True Positive (**TP**) - Your model predicted the positive class. 
    * For example, identifying a spam email as spam.
* True Negative (**TN**) - Your model correctly predicted the negative class. 
    * For example, identifying a regular email as not spam.
* False Positive (**FP**) - Your model incorrectly predicted the positive class. 
    * For example, identifying a regular email as spam.
* False Negative (**FN**) - Your model incorrectly predicted the negative class. 
    * For example, identifying a spam email as a regular email.

Let’s define **important metrics**.

**Accuracy**: 
* is a measure of the overall correctness of the model. 
* It is the ratio of correctly predicted instances to the total instances.

![image.png](attachment:image.png)

**Recall/Sensitivity:**
* is the ratio of correctly predicted positive observations to all observations in the actual class
* ITfocuses on how many actual positives were correctly predicted and is important when the cost of false negatives is high


![image.png](attachment:image.png)

**Precision:**
* is the ratio of correctly predicted positive observations to the total predicted positives
* focuses on the accuracy of positive predictions and is useful when the cost of false positives is high

![image.png](attachment:image.png)

**Specificity**
* the total number of true negatives divided by the total number of actual negatives

![image.png](attachment:image.png)

**Sensitivity VS Specificity**

* While `Sensitivity` measure is used to determine the proportion of actual positive cases, which got predicted correctly, 
* `Specificity` measure is used to determine the proportion of actual negative cases, which got predicted correctly.

Example:

Let's try and understand this with the model used for predicting whether a person is suffering from the disease. 

Specificity is a measure of the proportion of people not suffering from the disease who got predicted correctly as the ones who are not suffering from the disease. 

In other words, the person who is healthy actually got predicted as healthy is specificity.

**F1 Score:**
* is the harmonic mean of precision and recall. It provides a balance between precision and recall

!!! is useful when there is an uneven class distribution

![image.png](attachment:image.png)