## Machine Learning Evaluation Metrics

Machine learning evaluation metrics are used to assess the performance and quality of machine learning models. These metrics help us understand how well our models are performing and provide insights into their strengths and weaknesses. Let's explore some commonly used evaluation metrics along with their formulas and examples:

1. **Accuracy**: Accuracy measures the proportion of correct predictions compared to the total number of predictions. It is suitable for balanced datasets where the classes are equally represented. However, it can be misleading when dealing with imbalanced datasets.
- Formula: $$\frac{TP + TN} {TP + TN + FP + FN}$$

        TP: True Positives, TN: True Negatives, FP: False Positives, FN: False Negatives

- Example:

| Actual   | Predicted |
|----------|-----------|
| Yes      | Yes       |
| No       | No        |
| Yes      | No        |
| Yes      | Yes       |
| Yes      | Yes       |

 $Accuracy =\frac{3 + 1} {5} = 0.8$


2. **Precision and Recall**: Precision and recall are metrics used in binary classification problems, particularly when the classes are imbalanced. Precision measures the proportion of correctly predicted positive instances out of all predicted positives. Recall, also known as sensitivity or true positive rate, measures the proportion of correctly predicted positive instances out of all actual positives.

**Precision Formula**: $$\frac{TP}{TP + FP}$$
**Recall Formula:** $$\frac{TP}{TP + FN}$$
**Example:**


|Actual | Predicted|
|-------|-------|
|Yes|Yes|
|No|	Yes|
|Yes|	No|
|Yes|	Yes|
|No|	No|

$$Precision = \frac{2}{2 + 1} = 0.67$$
$$Recall = \frac{2}{2 + 1} = 0.67$$


3. **F1 Score**: The F1 score is the harmonic mean of precision and recall. It provides a balance between precision and recall and is useful when we want to consider both metrics together.

**Formula**: $$ \frac{2 \times Precision \times Recall}{Precision + Recall} $$
Example:
$$F1 Score =  \frac {2 \times 0.67 \times 0.67 }{0.67 + 0.67} = 0.67$$

4. **Mean Squared Error (MSE)**: MSE is commonly used for regression problems. It measures the average squared difference between the predicted and true values. A lower MSE indicates better performance.

**Formula** : $$ \frac{1}{n} \times Σ(y_true - y_pred)^2$$
**Example** :
|y_true|	y_pred|
|-------|-------|
|2|	1.5|
|3|	2.0|
|5|	4.5|
|4|	3.5|
|6|	5.5|

$$MSE = \frac{1}{5} \times ((2 - 1.5)^2 + (3 - 2.0)^2 + (5 - 4.5)^2 + (4 - 3.5)^2 + (6 - 5.5)^2) = 0.25$$
5. **R-squared (Coefficient of Determination)**: R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with 1 indicating a perfect fit and 0 indicating no linear relationship.

**Formula**: $ 1 - \frac{SSR}{SST}$

*SSR: Sum of Squared Residuals, SST: Total Sum of Squares*

**Example**:
$$R-squared = 1 - \frac{SSR}{SST} = 1 - \frac{0.25}{10} = 0.975$$

6. **ROC-AUC**: Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) are used for binary classification problems. ROC-AUC evaluates the model's ability to distinguish between classes by plotting the true positive rate against the false positive rate.

*ROC Curve*: Plots True Positive Rate (TPR) against False Positive Rate (FPR)

*AUC*: Area Under the ROC Curve

7. **Log Loss (Logistic Loss)**: Log loss is commonly used for probabilistic classification problems. It measures the logarithm of the likelihood of the predicted probabilities matching the true labels. A lower log loss indicates better calibration of probabilities.

**Formula**: $$-\frac{1}{n} \times Σ(y_true \times log(y_pred) + (1 - y_true) \times log(1 - y_pred))$$

8. **Confusion Matrix**: A confusion matrix provides a comprehensive summary of the model's performance, especially in multi-class classification problems. It displays the counts of true positives, true negatives, false positives, and false negatives.

Provides counts of *True Positives (TP)*, *True Negatives (TN)*, *False Positives (FP)*, and *False Negatives (FN)*

**Example**: See the table with counts of each category.

It' s important to choose the appropriate evaluation metric based on the specific problem and requirements. Different metrics provide different insights into the model's performance, and it's often recommended to consider multiple metrics together to gain a comprehensive understanding.

## Train-Test Split in Machine Learning

Train-test split is a common technique used in machine learning to evaluate the performance of a model on unseen data. It involves splitting the available dataset into two subsets: one for training the model (the training set) and the other for evaluating the model's performance (the test set). The train-test split helps assess how well the model generalizes to new, unseen data.

we'll be using another library from SKLearn


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [3]:
#Load the dataset:
df = pd.read_csv("Cleaned House Data.csv",)

In [4]:
df.head()

Unnamed: 0,price,crime_rate,resid_area,air_qual,room_num,age,teachers,poor_prop,airport,n_hos_beds,n_hot_rooms,rainfall,waterbody_Encoded,avg_dist
0,24.0,0.0063,32.31,0.538,6.575,65.2,24.7,4.98,1,5.48,11.192,23,1.0,4.0875
1,21.6,0.026944,37.07,0.469,6.421,78.9,22.2,9.14,0,7.332,12.1728,42,2.0,4.9675
2,34.7,0.026924,37.07,0.469,7.185,61.1,22.2,4.03,0,7.394,46.19856,38,0.0,4.9675
3,33.4,0.031857,32.18,0.458,6.998,45.8,21.3,2.94,1,9.268,11.2672,45,2.0,6.065
4,36.2,0.06677,32.18,0.458,7.147,54.2,21.3,5.33,0,8.824,11.2896,55,2.0,6.0625


In [5]:
X = df.drop("price", axis=1)
y = df['price']

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In this example, we are splitting the data into a training set (80% of the data) and a test set (20% of the data). The random_state parameter ensures reproducibility of the split.

In [7]:
X_train.shape

(404, 13)

In [8]:
X_test.shape

(102, 13)

Train-test split is crucial to assess the model's performance on unseen data and avoid overfitting. The model is trained on the training set and then evaluated on the test set to measure its ability to generalize. It helps identify potential issues such as underfitting or overfitting.

It's important to note that the train-test split should be representative of the original data to ensure reliable evaluation. The choice of the test set size depends on the dataset size, available data, and specific requirements of the problem at hand.


In [9]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()

In [11]:
lr.fit(X_train,y_train)

LinearRegression()

In [12]:
y_pred = lr.predict(X_train)

In [13]:
from sklearn.metrics import r2_score
r2_score(y_train, y_pred)

0.729439440716198

## Overfitting and Underfitting in Machine Learning

In machine learning, overfitting and underfitting are two common problems that occur when training models. Understanding these concepts is crucial for building robust and accurate machine learning models. Let's take a closer look at each:

### 1. Overfitting:

Overfitting occurs when a model performs exceptionally well on the training data but fails to generalize well on new, unseen data. It happens when the model becomes too complex or overly flexible, effectively "memorizing" the training data instead of learning the underlying patterns. Key characteristics of overfitting include:

- The model shows low training error but high test error.
- The model captures noise and irrelevant details in the training data.
- The model may have excessive complexity, such as having too many features or high polynomial degrees.

Remedies for overfitting:

- Increase the size of the training dataset.
- Reduce the complexity of the model, such as by decreasing the number of features or applying feature selection techniques.
- Regularize the model using techniques like L1 or L2 regularization.
- Use cross-validation for hyperparameter tuning.
### 2. Underfitting:
Underfitting occurs when a model is too simple or inflexible to capture the underlying patterns in the training data. It fails to learn the relationships between the features and the target variable, resulting in poor performance on both the training and test data. Key characteristics of underfitting include:

- The model shows high training error and high test error.
- The model is too simplistic to capture the complexity of the data.
- The model may have insufficient features or inadequate training time.

Remedies for underfitting:

- Increase the complexity of the model, such as by adding more features or increasing the model's capacity.
- Perform feature engineering to extract more meaningful features from the data.
- Increase the training time or adjust the learning rate for iterative algorithms.

Balancing between overfitting and underfitting is a crucial aspect of model training. The goal is to find the right level of complexity that allows the model to generalize well to new, unseen data. Techniques like cross-validation, regularization, and hyperparameter tuning play important roles in mitigating overfitting and underfitting.

It's essential to monitor the model's performance on both the training and test data and make adjustments accordingly. Regular evaluation and fine-tuning of the model help strike the right balance between complexity and generalization, leading to more accurate and reliable predictions.

In [14]:
y_test_pred = lr.predict(X_test)

In [15]:
from sklearn.metrics import r2_score
r2_score(y_train, y_pred)

0.729439440716198

In [16]:
r2_score(y_test, y_test_pred)

0.6571081693323386

## Feature Selection
Feature selection is an important step in machine learning and data analysis that involves selecting a subset of relevant features from the original set of available features. It aims to improve model performance, reduce computational complexity, and enhance interpretability by focusing on the most informative and influential features. Feature selection helps mitigate the curse of dimensionality and can lead to more accurate and efficient models. 
Subset and shrinkage methods are two common approaches to feature selection. Let's take a closer look at each:

### 1.  Subset Selection Methods:
Subset selection methods aim to find the best subset of features that maximizes the model's performance. There are two main types of subset selection methods:

- Forward Selection: This method starts with an empty set of features and iteratively adds the most significant feature that improves the model's performance the most until a stopping criterion is met.
- Backward Elimination: This method starts with the full set of features and iteratively removes the least significant feature that has the least impact on the model's performance until a stopping criterion is met.

Subset selection methods evaluate the performance of different feature subsets using metrics like cross-validation error or information criteria. They can be computationally expensive, especially for large feature sets, but they provide an optimal subset of features based on the evaluation criterion.

### 2. Shrinkage Methods:
Shrinkage methods, also known as regularization methods, add a penalty term to the objective function during model training. These methods encourage sparse solutions, effectively shrinking less important feature coefficients towards zero. Two popular shrinkage methods are:

- Lasso Regression: Lasso regression applies L1 regularization, resulting in sparse solutions where some feature coefficients become exactly zero. It automatically performs feature selection by effectively excluding irrelevant features from the model.
- Ridge Regression: Ridge regression applies L2 regularization, which reduces the magnitude of feature coefficients without excluding any feature entirely. While ridge regression does not perform explicit feature selection, it can still reduce the impact of less important features.

Shrinkage methods provide a trade-off between model complexity and overfitting. They can effectively handle high-dimensional datasets and multicollinearity by reducing the impact of less relevant features.



In [20]:
# Get the feature names
feature_names = X_train.columns  # Replace <your_feature_names> with the actual feature names from your dataset

# Get the coefficients
coefficients = lr.coef_

# Print feature names and coefficients
for feature, coef in zip(feature_names, coefficients):
    print(f"{feature}:\t\t\t {coef}")

crime_rate:			 -0.4188348647708363
resid_area:			 -0.009670598602540215
air_qual:			 -14.799264505460645
room_num:			 4.4945488635581645
age:			 -0.008730671784292014
teachers:			 0.9318093769750563
poor_prop:			 -0.5684201491712669
airport:			 1.1024770184686619
n_hos_beds:			 0.2943603909476938
n_hot_rooms:			 0.09245739022506491
rainfall:			 0.02620869785341541
waterbody_Encoded:			 -0.03835395999776788
avg_dist:			 -1.2631951142871494


In [25]:
selected_features = [feature for feature, coef in zip(feature_names, coefficients) if coef > 0.5 or coef < -0.5]

In [26]:
selected_features

['air_qual', 'room_num', 'teachers', 'poor_prop', 'airport', 'avg_dist']

In [27]:
X_selected = X_train[selected_features]


In [28]:
reg_model_selected = LinearRegression()

In [29]:
reg_model_selected.fit(X_selected, y_train)

LinearRegression()

In [31]:
y_selected_pred = reg_model_selected.predict(X_selected)

In [33]:
r2_score(y_train, y_pred)

0.729439440716198

In [34]:
r2_score(y_train, y_selected_pred)

0.7246273222575257

In [35]:
y_test_selected_pred = reg_model_selected.predict(X_test[selected_features])

In [37]:
r2_score(y_test, y_test_pred)

0.6571081693323386

In [40]:
r2_score(y_test, y_test_selected_pred)

0.662103466590333

Subset selection and shrinkage methods offer different approaches to feature selection, each with its own advantages and considerations. The choice between them depends on the specific problem, the size of the feature set, and the desired balance between model complexity and interpretability. Experimentation and evaluation of different feature selection methods are important to identify the most relevant subset of features for a given problem.

## Other Regression Models

Linear regression is a widely used regression model, but there are several other regression models that can be applied to different types of data and scenarios. Here are a few examples:

#### 1. Ridge Regression:
Ridge regression is a regularization technique that adds an L2 penalty term to the linear regression objective function. It helps reduce overfitting by shrinking the coefficients towards zero. Ridge regression is particularly useful when dealing with multicollinearity in the data.

#### 2. Lasso Regression:
Lasso regression is another regularization technique that adds an L1 penalty term to the linear regression objective function. It encourages sparsity by setting some coefficients to exactly zero, effectively performing feature selection. Lasso regression is helpful when you want to identify the most relevant features for prediction.

#### 3. ElasticNet Regression:
ElasticNet regression combines both L1 and L2 regularization terms in the linear regression objective function. It offers a balance between the feature selection capability of Lasso regression and the coefficient shrinkage of Ridge regression. ElasticNet regression is useful when dealing with high-dimensional datasets and multicollinearity.

#### 4. Decision Tree Regression:
Decision tree regression builds a regression model by recursively partitioning the data based on feature values. It predicts the target variable based on the average target value of the training instances within each leaf node. Decision tree regression can capture complex relationships and handle both numerical and categorical features.

#### 5. Random Forest Regression:
Random forest regression is an ensemble method that combines multiple decision trees to make predictions. It improves the predictive accuracy and handles overfitting by averaging the predictions of multiple trees. Random forest regression is robust, can handle high-dimensional data, and provides feature importance rankings.


These are just a few examples of regression models beyond linear regression. Each model has its own strengths and assumptions, and the choice of model depends on the specific problem, the nature of the data, and the desired trade-offs between accuracy, interpretability, and computational complexity. It's important to understand the characteristics of different regression models and experiment with them to find the most suitable one for your specific task.




<center><h1>Ridge Regression:</h1></center>

In [42]:
from sklearn.linear_model import Ridge

# Train the ridge regression model
ridge_reg = Ridge(alpha=0.5)  # Specify the regularization strength (alpha)
ridge_reg.fit(X_train, y_train)


Ridge(alpha=0.5)

In [43]:
y_ridge = ridge_reg.predict(X_train)
y_test_ridge = ridge_reg.predict(X_test)


In [44]:
r2_score(y_train, y_ridge)

0.7287998343826629

In [45]:
r2_score(y_test,y_test_ridge)

0.6545725618645957

# Lasso Regression:

In [46]:
from sklearn.linear_model import Lasso

# Train the lasso regression model
lasso_reg = Lasso(alpha=0.1)  # Specify the regularization strength (alpha)
lasso_reg.fit(X_train, y_train)


Lasso(alpha=0.1)

In [52]:
y_lasso = lasso_reg.predict(X_train)
y_test_lasso = lasso_reg.predict(X_test)


In [53]:
r2_score(y_train, y_lasso)

0.7204680771169034

In [54]:
r2_score(y_test, y_test_lasso)

0.6493647009418789

# ElasticNet Regression:

In [51]:
from sklearn.linear_model import ElasticNet

# Assuming X and y are your feature matrix and target variable, respectively

# Train the elastic net regression model
elastic_net = ElasticNet(alpha=0.5, l1_ratio=0.5)  # Specify the regularization strengths (alpha, l1_ratio)
elastic_net.fit(X_train, y_train)

ElasticNet(alpha=0.5)

In [55]:
y_elast = elastic_net.predict(X_train)
y_test_elast = elastic_net.predict(X_test)

In [56]:
r2_score(y_train, y_elast)

0.6932751810792401

In [57]:
r2_score(y_test, y_test_elast)


0.6648641852012285

# Decision Tree Regression:

In [58]:
from sklearn.tree import DecisionTreeRegressor

# Assuming X and y are your feature matrix and target variable, respectively

# Train the decision tree regression model
tree_reg = DecisionTreeRegressor()
tree_reg.fit(X_train, y_train)

DecisionTreeRegressor()