<h1>Supervised Learning methods: Part 1</h1>

<p>
Supervised Learning is a task of learning to predict a numerical or categorical output for a given input sample.</br>

This technique works on the special type of data that is <i><b>Labeled Data</b></i>. It simply means that the training data for the model will have some labels attached to it so that model can easily understand and map the labels to the input values with there respective outputs.</br>

In this chapter there are few number of the supervised learning algorithms which are as follows:-

<ol>
<li>Linear Regression</li>
<li>Logistc Regression</li>
<li>Decision Trees</li>
</ol>

There are two types of data:- 

- Data for classification
- Data for regression


<p>
Classification data is actually either Nominal or Ordinal type of data.

Regression data is actually either Ratio data or the Interval data.


In short classification works on catagorical data and regression works on numerical data.
</p>



---

<h2>Linear Regression</h2>

<p>
It is a supervised learning algorithm specially to model the relationship betwee the dependent and independent variables. It focuses on constructing the linear function that outputs the value.
</p>

<b>Finding the Regression Line</b>

<p>
It works on the function which takes the value of an independent variable as an input ( Usually 'x' ) and provides the value of the dependent variable ( Usually 'y' ) as an output.</br>
the Equation on which it works is the line equation :-</br>

<b>y = mx + c</b>

</br>
where :-

- m is the slope of the line
- c is where the line meets the y axis.

</br>
Any line on a 2D plane can be defined by these two parameters.
</br>

The error is defined as the diffrence between the independent variable's actual value and the value determined by our reghression line.</br>
We wish to find the slope 'm' and 'y' intercept 'c' such that the total cost, given by the average value of squared of the errors.
</br>
Usually, the data will have more than one column (components of x) that will be
referred as x1, x2, x3… xn, which will lead to a line that has slopes of m1, m2, m3… mn across
the n axes. Thus, the number of parameters you learn will be (n + 1) where n is the number
of columns, or dimensions of the data. For simplicity, we will continue the explanation for a
case where you have only one independent column, x.</br>
The slope across each axis is given by:-

![Image](./Media/linear_reg_formula.png)

And the y intercept is:-

![Y_intercept](./Media/Y_intercept_linear.png)

Based on the y-intercept b0 and one or more sloped bk, the final equation of the line can be written as:-

![Final_eq](./Media/Final_eq_linear.png)
</p>

<b>Linear Regression Using Python</b>

<p>
Here we will se and learn how to implement the linear regression model using python on a self made dataset.
</p>
</p>

In [None]:
import numpy as np
import pandas as pd
data = pd.DataFrame({"marks":[34,95,64,88,99,51], "salary":[3400, 2900, 4250, 5000, 5100, 5600]})

In [None]:
X = data[['marks']].values
y = data['salary'].values

In [None]:
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X,y)

In [None]:
reg.predict([[70]])

In [None]:
reg.predict([[100],[50],[80]])

Here we will visualize what learnt till now

In [None]:
print (reg.coef_)
print (reg.intercept_)

In [None]:
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
plt.scatter(X, y)
# ax.axline( (0, reg.intercept_), slope=reg.coef_ ,
# label='regression line')
ax.legend()
plt.show()

In [None]:
import matplotlib.pyplot as plt
import random
fig,ax = plt.subplots()
fig.set_size_inches(15,7)
plt.scatter(X, y)
# ax.axline( (0, reg.intercept_), slope=reg.coef_ ,
# label='regression line')
ax.legend()
ax.set_xlim(0,110)
ax.set_ylim(1000,10000)
for point in zip(X, y):
    ax.text(point[0][0], point[1]+5, "("+str(point[0]
[0])+","+str(point[1])+")")
plt.show()

In [None]:
results_table = pd.DataFrame(data=X, columns=['Marks'])
results_table['Predicted Salary'] = reg.predict(X)
results_table['Actual Salary'] = y
results_table['Error'] = results_table['Actual Salary']-results_table['Predicted Salary']
results_table['Error Squared'] = results_table['Error']*results_table['Error']

<h2>Evaluating Linear Regression</h2>

<p>
This process is done to verify the performance of the model trained.

</br>
It can be done using the certain parameters:-

- Mean Absolute Error
- Mean Squared Error
- Root Mean Squared Error
- R^2 Score
</p>

In [None]:
import math
import numpy as np
mean_absolute_error = np.abs(results_table['Error']).mean()
mean_squared_error = results_table['Error Squared'].mean()
root_mean_squared_error = math.sqrt(mean_squared_error)
print (mean_absolute_error)
print (mean_squared_error)
print (root_mean_squared_error)

In [None]:
from sklearn.metrics import mean_squared_error,mean_absolute_error, r2_score
print(mean_squared_error(results_table['Actual Salary'], results_table['Predicted Salary']))
print(math.sqrt(mean_squared_error(results_table['Actual Salary'], results_table['Predicted Salary'])))
print(mean_absolute_error(results_table['Actual Salary'],
results_table['Predicted Salary']))

In [None]:
print ("R Squared: %.2f" % r2_score(y, reg.predict(X)))

<h3>Here the Linear Regression get's completed</h3>

---

<h2>Logistic Regression</h2>

<p>
It is a clasification model that helps to models the <strong>probability of a data item belonging to one of the categories</strong>.</br>
</p>

<b>Line v/s Curve for Expression Probability</b>

<p>
Assuming that we have a data only in a single dimention and we have two classes for which we can try capturing this relationship using a linear regression line that gives us the probability  of a point belonging to a certain class.</br>
The target values are either 0 or 1, where: 

- 0 represents the negative class
- 1 represents the positive class

As this type of data will be hard to capture using the linear rwegression line so we use the <b>Sigmoid or logistic curve</b> that tries to capture the pattern in which most of the predicted values will lie on either, 

- y = 0
- y = 1

and there wioll be some values within this range. this dependent value can also be treated as the probability for the point to belong to one of the classes.</br>

The sigmoid or logistic function is given as:-


![Sigmoid_function](./Media/sigmoid_1.png) </br>
or </br></br>
![Sigmoid_function](./Media/sigmoid_2.png)

where:

- -x is the input for the function in figure 1
- where θ0, θ1, … represents the parameter (or parameters, in case of data with more
than one column).

The S-shaped curve is highly suitable for such use case. The aim of learning process is to find the θ for which we have the minimum possible error of predicting the probability. However, the cost (or error) of the model is based on the predicted class rather than the probability values.
</p>


<b>Learning the Parameters</b>

<p>
We use a simple iterative method to learn the parameters. Any shift in the values of the parameters causes a shift in the linear decision boundary. We begin with random initial values of the parameter, and by observing the error, we update the value of the parameter to slighlty reduce the error cost. This method is called<b>Gradient Decent</b>.
</p>

<b>Implemementation using Python</b>

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()
iris_data = pd.DataFrame(iris["data"], columns=iris['feature_names'])
iris_data['target'] = iris['target']
iris_data['target'] = iris_data['target'].apply(lambda x: iris['target_names'][x])
df = iris_data.query('target == "setosa" | target == "versicolor"')
import seaborn as sns
sns.FacetGrid(df, hue='target', height=5).map(plt.scatter, "petal length (cm)","petal width (cm)").add_legend()

In [None]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
x = iris_data.drop(columns=['target'])
y = iris_data['target']
lr.fit(x,y)

In [None]:
x_test = [[5.6,2.4,3.8,1.2]]
lr.predict(x_test)

<b>Visualizing the Decision Boundary</b>
<p>
We will recreate the model using the 2 dimentions and plot a 2D chart bsed on the sepal length and sepal width
</p>

In [None]:
df =  iris_data.query("target == 'setosa' | target == 'versicolor'")[['sepal length (cm)','sepal width (cm)','target']]
x = df.drop(columns=['target']).values
y = df['target'].values
y = [1 if x == 'setosa' else 0 for x in y]
lr.fit(x,y)

In [None]:
x_min, x_max = x[:,0].min()-1, x[:,0].max()+1
x_min, x_max = x[:, 0].min()-1, x[:,0].max()+1
y_min, y_max = x[:, 1].min()-1, x[:,1].max()+1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
Z = lr.predict(np.c_[xx.ravel(),
yy.ravel()]).reshape(xx.shape)
plt.rcParams['figure.figsize']=(10,10)
plt.figure()
plt.contourf(xx, yy, Z, alpha=0.4)
plt.scatter(x[:,0], x[:,1], c=y, cmap='Blues')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

Here the Logistic Regression is completed.

---

<h2><b>Decision Tree</b></h2>

<p>
It is a machine learning model that makes the prediction by the set of decisions, where the rules can be written down in a flowchart manner.
</br>
The visual diagram of this tree is similar to the tree data structure where the nodes contain the rules and conditions for the decision tree.
</p>


<h3><b>Building a Decsion Tree</b></h3>

<p>
The learning process of a decision tree is a recurssive process, where each recurtion takes the tree to a new condition on which the data has to be trained, as the data will find the optimal path to get to the prediction by passing through several conditions. It is basically used for classifying.
</p>


<h3><b>Picking the Splitting Attribute</b></h3>

- Decision tree learning is a **recursive process**

- At each recursion, the algorithm tries to find the **best possible split**

- A split is made if there's enough data with enough variation of target classes

<b>Leaf Nodes</b>

- A **leaf node** is created when:
   - Training data is too small, OR
   - Data belongs to the same target class
   - The leaf node is assigned the **majority class label**

<b>Splitting Criterion</b>

- **Picking the splitting attribute is the core of the algorithm**

<b>Entropy</b>

- **Entropy** measures the amount of randomness or uncertainty in data
   - Formula: `H(X) = -Σ(p(x) × log₂(p(x)))`
   - Entropy = 0 when sample is completely homogeneous (same class)
   - Entropy = 1 when sample is equally divided among all classes

- Goal is to **minimize entropy** (lower randomness) and create **purer nodes**

<b>Information Gain</b>

- **Information gain** measures how much a feature tells us about the class
   - Select the attribute with **highest information gain**

<b>Gini Index</b>

- **Gini index** is used for continuous data attributes
   - Measures impurity
   - Higher Gini = higher homogeneity
   - Used by **CART** (Classification and Regression Tree) to create binary splits

<b>Implementation</b>

- In Scikit-learn, use the **criterion hyperparameter** to select splitting criteria


<h3><b>Decision Tree Implementation</b></h3>

In [None]:
import pandas as pd
from sklearn.datasets import load_iris
iris_data = pd.DataFrame(iris['data'], columns=iris['feature_names'])
iris_data['target'] = iris['target']
iris_data['target'] = iris_data['target'].apply(lambda x: iris['target_names'][x])
print(iris_data.shape)

In [None]:
x = iris_data[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']]
y = iris_data['target']
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, random_state=42, test_size=0.2)

In [None]:
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(criterion='gini', max_depth=10)
dt.fit(x_train, y_train)

In [None]:
y_pred = dt.predict(x_test)
print(y_pred)

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix
print(accuracy_score(y_test,y_pred))
confusion_matrix(y_test,y_pred)

<h2>Decision Tree and the chapter has been completed successfully!</h2>

--- 