## Types of Machine Learning Problem

- **Classification**: labeled data (a class is assigned to it), like "spam/non-spam". The model decides how to assign labels to unlabelled data. A discrimination problem: modelling the differences or similarities between groups.
- **Regression**: data is labeled with a real value (think floating point) rather than a label, like the price of stock over time. The decision being modelled is what value to predict for new unpredicted data (future stock price).
- **Clustering**: data is NOT labeled, but can be divided into groups based on similarity an dother measures of natural strucutre in the data. Organising pictures by faces without names (no label).
- **Rule extraction**: finding patterns in data. It's when a computer looks at lots of information and figures out simple "if this, then that" rules. For example, it might notice that in a store, when people buy diapers, they often also buy beer. These rules aren't about guessing the future; they're just noticing patterns that happen a lot.

## Learning styles in machine learning algorithms

- **Supervised learning**

Input data is called training data and has a known label or result such as spam/not-spam, or a stock price at a time. The model is prepared through a training process where it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data. Example problems are classification and regression.

- **Unsupervised learning**

Input data is not labeled and does not have a known result. A model is prepared by deducing strucutres present in the input data, like extracting general rules. It might be through a mathematical process to systematically reduce redundancy, or it may be to organize data by similarity. Some examples include: clustering, dimensionality reduction and association rule learning.

- **Semi-Supervised learning**

A mixture of labeled and unlabelled examples, where there is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions. Example problems are classification and regression.

## Algorithms grouped by similarity

- **Regression algorithms**
Ppredict a continuous outcome variable (dependent variable) based on the value of one or more predictor variables (independent variables). Their objective is to establish a mathematical relationship between the predictors and the outcome, which can be used to estimate the value of the outcome for new data. These algorithms are fundamental in data analysis and are used extensively across different fields for trends forecasting, determining cause and effect relationships, and making predictions

    - Ordinary Least Squares Regression (OLSR)
    - Linear Regression
    - Logistic Regression
    - Stepwise Regression
    - Multivariate Adaptive Regression Splines (MARS)
    - Locally Estimated Scatterplot Smoothing (LOESS)

- **Instance-based Algorithms**

Make predictions based on how closely new instances resemble specific examples in the training data. Rather than learning a general rule or equation from the training data, these models essentially "remember" the training instances and use them for comparison when making decisions. When new data comes in, the model looks for the most similar example(s) in its memory to make a prediction. This approach relies heavily on how similarity is defined and measured between instances. These models are particularly useful when the relationship between data points is complex and not easily captured by a traditional model.

    - k-Nearest Neighbor (kNN)
    - Learning Vector Quantization (LVQ)
    - Self-Organizing Map (SOM)
    - Locally Weighted Learning (LWL)
    - Support Vector Machines (SVM)

- **Regularization Algorithms**

Improves the performance of a model by preventing it from becoming too complex and overfitting the training data. Overfitting happens when a model learns the training data too well, including the noise and outliers, which makes it perform poorly on new, unseen data. Regularization adds a penalty to the model for having too many large coefficients, effectively simplifying the model. This helps to ensure that the model not only fits the training data well but also generalizes better to new data. It's like guiding the model to focus on the most important patterns and not get distracted by minor details. Regularization is commonly applied to various types of models, including regression, to keep them straightforward and more robust.

Regularization is not a standalone model; rather, it's a technique applied during the model's learning process. Think of it as a guiding principle or a modification to existing algorithms like linear regression. It's incorporated into the learning algorithm to influence how it learns from the data, steering it towards simpler explanations for the data rather than complex ones.

    - Ridge Regression
    - Least Absolute Shrinkage and Selection Operator (LASSO)
    - Elastic Net
    - Least-Angle Regression (LARS)

- **Decision Tree Algorithms**


Decision tree methods are like playing a game of "20 Questions" with your data. They involve breaking down a dataset into smaller subsets while at the same time developing an associated decision tree. The tree is made up of "nodes" that represent questions or decisions about the data, and "branches" that represent the possible answers or outcomes.

Starting at the top, each node in the tree looks at an attribute of the data and splits into branches based on the value of that attribute. This process continues, creating a tree-like structure of decisions, until the algorithm arrives at a leaf node with a prediction for the target variable (like classifying an email as spam or not spam, or predicting the price of a house).

Decision trees can handle both tasks that classify data into categories and those that predict a number (like the price of something). They are popular because they're easy to understand, quick to run, and versatile, working well for many problems in machine learn

    - Classification and Regression Tree (CART)
    - Iterative Dichotomiser 3 (ID3)
    - C4.5 and C5.0 (different versions of a powerful approach)
    - Chi-squared Automatic Interaction Detection (CHAID)
    - Decision Stump
    - M5
    - Conditional Decision Trees

- **Bayesian Algorithms**

Use Bayes’ Theorem to update predictions with new data. They start with prior knowledge and adjust it as more information is collected. This approach is practical for classifying data (like deciding if an email is spam) and for predicting values (like forecasting sales). It’s especially useful when you have some initial insight and expect to refine your predictions over time as you gather more data.

    - Naive Bayes
    - Gaussian Naive Bayes
    - Multinomial Naive Bayes
    - Averaged One-Dependence Estimators (AODE)
    - Bayesian Belief Network (BBN)
    - Bayesian Network (BN)

- **Clustering Algorithms**


Clustering in machine learning is the process of grouping data points so that those within a cluster have more in common with each other than with those in other clusters. It's about finding hidden patterns in data without having pre-labeled categories.

There are different strategies for clustering. Centroid-based clustering, for instance, identifies the central point of a cluster and then groups data points based on which center they are closest to. Hierarchical clustering doesn't work with a single central point; instead, it builds a hierarchy of clusters either by starting with individual data points and merging them into larger clusters (agglomerative approach) or by starting with the entire data set and diinto iding   clusters usters ble approach).
 
    - k-Medians
    - Expectation Maximisation (EM)
    - Hierarchical Clustering

- **Association Rule Learning Algorihtms**

Association rule learning is a technique in machine learning that finds interesting relationships or 'associations' between different variables in large datasets. These rules are often used to uncover specific patterns and connections that are not immediately obvious.

The classic example of association rule learning is market basket analysis, where retailers use it to discover combinations of products that frequently co-occur in transactions. For instance, if customers often buy bread and butter together, an association rule would highlight this relationshi

    - Apriori algorithm
    - Eclat algorithm

- **Artificial Neural Network Algorithms**

Artificial Neural Networks (ANNs) are computational models inspired by the human brain. They consist of interconnected units or 'neurons' which process data by simulating the way biological neural networks work.

ANNs are part of a broader field of machine learning and are primarily used for recognizing patterns, which makes them well-suited for tasks like classification (labeling things) and regression (predicting numerical values). While there are hundreds of algorithms and variations within this subfield, classical neural networks typically refer to simpler structures without the deep architecture that characterizes de.
ep learning
    - Perceptron
    - Multilayer Perceptrons (MLP)
    - Back-Propagation
    - Stochastic Gradient Descent
    - Hopfield Network
    - Radial Basis Function Network (RBFN)

- **Deep Learning Algorithms**

Deep Learning is an advanced branch of machine learning that involves larger and more complex neural networks, often referred to as "deep" because of their many layers. These networks mimic human cognition to find patterns and interpret data like images, text, and sound.

What sets deep learning apart is its ability to process and learn from enormous amounts of data. This capability is powered by today's cheaper and more powerful computational resources. By using large datasets, deep learning models can perform high-level, intricate tasks, such as recognizing speech, identifying images, or making decisions for self-driving cars.

Deep learning represents a significant evolution in the field of artificial neural networks, offering vast improvements in accuracy and effectiveness for tasks that require human-liksenses.

    - Convolutional Neural Network (CNN)
    - Recurrent Neural Networks (RNNs)
    - Long Short-Term Memory Networks (LSTMs)
    - Stacked Autp-Encoders
    - Deep Boltymann Machine (DBM)
    - Deep Belief Networks (DBN)

- **Dimensionality Reduction Algorithms**


Dimensionality reduction is a process in machine learning where the goal is to simplify the amount of input variables under consideration, to make the data easier to explore and visualize. It also helps in improving the performance of machine learning models by reducing the computational burden and the risk of overfitting.

Unlike clustering, which groups similar data, dimensionality reduction focuses on eliminating redundancies and capturing the essence of data in fewer dimensions. This is done without supervision – the methods do not use outside information or labels to reduce dimensions; they rely solely on the intrinsic structure of the dat

    - Principal Component Analysis (PCA)
    - Principal Component Regression (PCR)
    - Partial Least Squares Regression (PLSR)
    - Sammon Mapping
    - Multidimensional Scaling (MDS)
    - Projection Pursuit
    - Linear Discriminant Analysis (LDA)
    - Mixture Discriminant Analysis (MDA)
    - Quadratuc Discriminant Analysis (QDA)
    - Flexible Discriminant Analysis (FDA)
    - t-distributed Stochastic Neighbor Embedding (t-SNE)
    - Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP)

- **Ensemble Algorithms**

Ensemble methods are a powerful approach in machine learning that involve combining several 'weaker' models to form a stronger predictive model. The individual models are trained independently and contribute collectively to the final prediction. This combination can be done in various ways, such as by voting for classifications or averaging for regression tasks.

The key to ensemble methods is diversity; each weak learner should offer a unique perspective on the data. This diversity helps to cover different aspects of the problem, reducing the likelihood of a shared mistake among all model

The choice of which weak learners to use and how to combine them is critical, as it can significantly impact the performance of the ensemble. By effectively merging these models, ensemble methods often achieve higher accuracy and better generalize to new data compared to individual models, making them a popular choice in both academic and applied settings.

    - Boosting
    - Bootstrapped Aggregation (Bagging)
    - AdaBoost
    - Weighted Average (Blending)
    - Stacked Generalization (Stacking)
    - Gradient Boosting Machines (GBM)
    - Gradient Boosted Regression Trees (GBRT)
    - Random Forest

- **Other Machine Learning Algorithms**

Many algorithms were not covered. Like algorithms from specialty tasks, such as:

    - Feature selection algorihtms
    - Algorithm accuracy evaluation
    - Performance measures
    - Optimization algorithms

Neither are algorithms from specialty subfield of machine learning covered, such as:

    - Computational intelligence (evolutionary algorithms, etc.)
    - Computer Vision (CV)
    - Natural Language Processing (NLP)
    - Recommender Systems
    - Reinforcement Learning
    - Graphical Models
    - And more.... a.e intuition and 

## Algorithms grouped by similarity

- **Regression algorithms**

Predict a continuous outcome variable (dependent variable) based on the value of one or more predictor variables (independent variables). Their objective is to establish a mathematical relationship between the predictors and the outcome, which can be used to estimate the value of the outcome for new data. These algorithms are fundamental in data analysis and are used extensively across different fields for trends forecasting, determining cause and effect relationships, and making predictions