# Supervised Learning

![image.png](https://www.mathworks.com/help/stats/machinelearning_supervisedunsupervised.png)

A supervised learning algorithm takes a known set of (input) data and known responses (labels) to the data (output) and trains a model to generate reasonable predictions for the response to new data.

Supervised learning uses classification and regression techniques to develop predictive models.

### What is unsupervised learning?

In unsupervised learning, the algorithm is given unlabeled data as a training set. Unlike supervised learning, there are no correct output values; the algorithm determines the patterns and similarities within the data, as opposed to relating it to some external measurement.

Note that the main difference happens during the training stage - hence "learning".

![image](https://images.datacamp.com/image/upload/v1661171231/Supervised_vs_Unsupervised_aa9a08ac32.png)

## Regression vs Classification

### Regression models predict a continuous response. 

Typical applications include algorithmic trading, electricity load forecasting, fluctuations in power demand, sales forecasting.

#### Classification models classify input data into categories.

Typical applications include medical imaging, image and speech recognition, credit scoring and spam detection.


NOTE: For reference, unsupervised learning finds hidden patterns or intrinsic structures in data. It is used to draw inferences from datasets consisting of input data without labeled responses. Typical applications include exploratory data analysis to find hidden patterns or groupings in data.

![image](https://images.datacamp.com/image/upload/v1661171231/Regression_77f74a45c6.png)

## Supervised Learning Steps

#### 1: Data Collection

Gather labeled data consisting of input features (independent variables) and their corresponding target labels or output variables (dependent variables). This labeled data is essential for training supervised learning models.

#### 2: Data Exploration and Preparation

Perform exploratory data analysis (EDA) to gain insights into the dataset, including calculating summary statistics, visualizing data, identifying missing values and outliers. Then, prepare the data by handling missing values, removing outliers, encoding categorical variables, and scaling/normalizing numerical features.

#### 3: Split Data into Training and Test Sets

Divide the prepared data into training and test sets, where the training set is used to train the model, and the test set is used to evaluate its performance.

#### 4: Choose a Supervised Learning Algorithm

Select an appropriate supervised learning algorithm based on the problem type (classification or regression), data characteristics, interpretability requirements, training time, and other practical considerations.

#### 5: Train the Model

Use the training data to train the chosen supervised learning algorithm, which involves optimizing the model's parameters or weights to minimize the loss function.

#### 6: Evaluate Model Performance

Assess the trained model's performance on the test set using appropriate evaluation metrics, such as accuracy, precision, recall, F1-score for classification problems, or mean squared error for regression problems.

#### 7: Model Tuning and Selection

If the model's performance is not satisfactory, tune its hyperparameters or try different algorithms, and select the best-performing model based on evaluation metrics and practical considerations.



## Table of Contents

1. Data Collection
   * 1.1\. Data Sources
   * 1.2\. Data Collection Considerations
2. Data Exploration and Preparation
   * 2.1\. Data Exploration
   * 2.2\. Data Preparation/Cleaning
3. Split Data into Training and Test Sets
   * 3.1\. Holdout Method
   * 3.2\. Cross Validation
   * 3.3\. Data Leakage
   * 3.4\. Best Practices
4. Choose a Supervised Learning Algorithm
   * 4.1\. Consider algorithm categories
   * 4.2\. Evaluate algorithm characteristics
   * 4.3\. Try multiple algorithms
5. Train the Model
   * 5.1\. Objective Function (Loss/Cost Function)
   * 5.2\. Optimization Algorithms
   * 5.3\. Overfitting and Underfitting
6. Evaluate Model Performance
   * 6.1\. Performance Metrics for Regression Models
   * 6.2\. Performance Metrics for Classification Models
7. Model Tuning and Selection
   * 7.1\. Hyperparameter Tuning
   * 7.2\. Ensemble Methods