# Types of Supervised Learning Algorithms

Supervised learning algorithms are widely used in various fields to solve classification and regression problems. These algorithms learn from labeled data to make predictions on unseen data.

In this notebook, we will cover the most commonly used supervised learning algorithms, including:

1. Linear Regression
2. Logistic Regression
3. Decision Trees
4. Random Forests
5. Support Vector Machines (SVM)
6. k-Nearest Neighbors (k-NN)
7. Naive Bayes

---

## Table of Contents

1. [Linear Regression](#1-linear-regression)
2. [Logistic Regression](#2-logistic-regression)
3. [Decision Trees](#3-decision-trees)
4. [Random Forests](#4-random-forests)
5. [Support Vector Machines (SVM)](#5-support-vector-machines)
6. [k-Nearest Neighbors (k-NN)](#6-k-nearest-neighbors)
7. [Naive Bayes](#7-naive-bayes)

---

## 1. Linear Regression
Linear Regression is used to predict a continuous target variable. The model assumes a linear relationship between the input features and the target. It's commonly used for tasks like predicting prices, quantities, or other numeric outcomes.

### Usage Areas:
- House price prediction
- Stock price prediction
- Sales forecasting


In [6]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression

X,y = make_regression(n_samples = 10,n_features=1,noise = 10,random_state = 42)

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = .3,random_state = 42)
linear_reg = LinearRegression()
linear_reg.fit(X_train,y_train)

y_pred = linear_reg.predict(X_test)
mse = mean_squared_error(y_test,y_pred)
print("MSE:",mse)

[-11.75481858 -15.5696545    6.52981107 -19.49595443   3.90695784
 -11.53111035   3.3177018   37.19618219  24.4335712  -14.34889541]
MSE: 84.38555896440674


## 2. Logistic Regression
Logistic Regression is a classification algorithm used when the target variable is categorical. It models the probability that a given input belongs to a specific class. It’s commonly used for binary classification tasks.

### Usage Areas:
- Medical diagnoses (e.g., predicting whether a tumor is benign or malignant)
- Fraud detection (e.g., predicting fraudulent transactions)
- Marketing (e.g., customer conversion prediction)

In [17]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification

X, y = make_classification(n_classes=2, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

y_pred = log_reg.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy",accuracy)

Accuracy 1.0


## 3. Decision Trees
Decision Trees work by splitting the data based on feature values to form a tree structure. They are useful for both classification and regression tasks and provide interpretable models.

### Usage Areas:
- Customer segmentation
- Credit risk assessment
- Recommendation systems

In [20]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

X, y = make_classification(n_classes=2,random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

tree_clf = DecisionTreeClassifier()
tree_clf.fit(X_train,y_train)

y_pred = tree_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Decision Tree Accuracy: 0.9333333333333333


## 4. Random Forests
Random Forest is an ensemble learning algorithm that combines multiple decision trees to improve performance. It reduces overfitting and provides better accuracy for classification and regression tasks.

### Usage Areas:
- Stock market prediction
- Medical diagnoses
- Fraud detection

In [21]:
from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier(n_estimators=100)
rf_clf.fit(X_train,y_train)

y_pred = rf_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9


## 5. Support Vector Machines (SVM)
SVM is a powerful classification algorithm that finds the optimal boundary (hyperplane) between classes. It is useful for both linear and non-linear classification using kernel functions.

### Usage Areas:
- Image recognition
- Text classification
- Bioinformatics (e.g., protein classification)

In [22]:
from sklearn.svm import SVC

svm_clf = SVC(kernel = "linear",random_state = 42)
svm_clf.fit(X_train,y_train)

y_pred = svm_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9666666666666667


## 6. k-Nearest Neighbors (k-NN)
k-NN is a simple classification algorithm that assigns a label to a data point based on the majority class among its k-nearest neighbors. It is non-parametric and requires no training phase.

### Usage Areas:
- Pattern recognition
- Recommender systems
- Medical diagnostics

In [23]:
from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier(n_neighbors = 3)
knn_clf.fit(X_train,y_train)

y_pred = knn_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.8666666666666667


## 7. Naive Bayes
Naive Bayes is a probabilistic classifier based on Bayes' Theorem. It assumes that features are independent of each other. Despite this assumption, Naive Bayes performs well in many real-world applications, especially with large datasets.

### Usage Areas:
- Spam detection
- Sentiment analysis
- Document classification

In [24]:
from sklearn.naive_bayes import GaussianNB

nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)

y_pred = nb_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9333333333333333
