# Decision Boundary


## Objective
In this lab, you will:
- learn the decision boundary


## Library

In [None]:
import numpy as np
%matplotlib widget
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.linear_model import LogisticRegression

## Theory

In machine learning, especially in the context of classification problems, a decision boundary is a hypersurface that partitions the underlying space into sets, one for each class. The classifier will then classify all the points on one side of the decision boundary as belonging to one class and all those on the other side as belonging to the second class.

Factors Affecting the Decision Boundary:
- Nature of the Classifier: A linear classifier like logistic regression will always produce a straight line as the decision boundary, while non-linear classifiers like decision trees and SVM with non-linear kernels can produce more complex boundaries.

- Data Distribution: If data from different classes overlap significantly, finding an accurate decision boundary becomes challenging. Outliers can also influence the shape and position of the decision boundary.

- Feature Engineering: The way features are extracted and used can change the decision boundary. For instance, adding polynomial features can change a linear decision boundary to a non-linear one.

In [None]:
# Load iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features for visualization
y = iris.target

# Create an instance of Logistic Regression and fit the data.
logreg = LogisticRegression(C=1e5, solver='lbfgs', multi_class='multinomial')
logreg.fit(X, y)

# Plot the decision boundary. 
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = .02  # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(8, 6))
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)

# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('Decision Boundary with Logistic Regression')
plt.show()


In this example, the shaded regions represent the decision boundaries created by the logistic regression classifier. Each color corresponds to a different class of the iris dataset.