## **Machine Learning Algorithm**

## **Type**: Supervised Learning 

## **Regression + Classification**

## **Day 3**: ElasticNet Regression + Random Forest Algorithm

## **Student**: Muhammad Shafiq

-------------------------------------------

# **Random Forest Algorithm:**

Random Forest is a machine learning algorithm that uses many decision trees to make better predictions. Each tree looks at different random parts of the data and their results are combined by voting for classification or averaging for regression. This helps in improving accuracy and reducing errors.

## **Random Forest Applications**

- **Customer churn prediction**: Businesses can use random forests to predict which customers are likely to churn (cancel their service) so that they can take steps to retain them. For example, a telecom company might use a random forest model to identify customers who are using their phone less frequently or who have a history of late payments.

- **Fraud detection**: Random forests can identify fraudulent transactions in real-time. For instance, a bank might employ a random forest model to spot transactions made from unusual locations or involving unusually large amounts of money.

- **Stock price prediction**: It can predict future stock prices. However, it is important to note that stock price prediction is a very difficult task, and no model is ever going to be perfectly accurate.

- **Medical diagnosis**: These can help doctors diagnose diseases. For example, a doctor might use a random forest model to help them diagnose a patient with cancer.

- **Image recognition**: It can recognize objects in images. For example, a self-driving car might use a random forest model to identify pedestrians and other vehicles on the road.

## **Real-Life Analogy of Random Forest**

Let’s dive into a real-life analogy to understand this concept further. A student named X wants to choose a course after his 10+2, and he cant decide which course fit for his skill set. So he decides to consult various people like his cousins, teachers, parents, degree students, and working people. He asks them varied questions like why he should choose, job opportunities with that course, course fee, etc. Finally, after consulting various people about the course he decides to take the course suggested by most people.

## **Working of Random Forest Algorithm**
Before understanding the working of the random forest algorithm in machine learning, we must look into the ensemble learning technique. Ensemble simplymeans combining multiple models. Thus a collection of models is used to make predictions rather than an individual model.

Ensemble uses two types of methods:

- **Bagging**
- **Boosting**

As mentioned earlier, Random forest Classifier works on the Bagging principle. Now let’s dive in and understand bagging in detail.



## **Bagging**





Bagging, also known as Bootstrap Aggregation, serves as the ensemble technique in the Random Forest algorithm. Here are the steps involved in Bagging:

- **Selection of Subset**: Bagging starts by choosing a random sample, or subset, from the entire dataset.

- **Bootstrap Sampling**: Each model from these samples, called Bootstrap Samples, which we take from the original data with replacement. This process is known as row sampling.

- **Bootstrapping**: The step of row sampling with replacement is referred to as bootstrapping.

- **Independent Model Training**: We train each model independently on its corresponding Bootstrap. This training process generates results for each model.

- **Majority Voting**: The final output by combining the results of all models through majority voting. We select the most commonly predicted outcome among the models.

- **Aggregation**: This step by combining all the results and generating the final output based on majority voting, which we call aggregation.

## **Steps in Random Forest:**

Steps Involved in Random Forest Algorithm

- **Step 1**: In this model, we select a subset of data points and a subset of features to construct each decision tree. Simply put, we take n random records and m features from a dataset containing k records.
- **Step 2**: We construct individual decision trees for each sample.
- **Step 3**: Each decision tree will generate an output.
- **Step 4**: We consider the final output based on Majority Voting for classification and Averaging for regression, respectively.

## **Important Features of Random Forest**


Random Forest is distinguished by several key features that contribute to its effectiveness and versatility:

- **Diversity**: Each decision tree in the Random Forest is built from a different subset of data and features. This diversity helps in reducing overfitting and improving the model’s generalization capability.
- **Robustness**: By averaging the results from multiple trees, Random Forest reduces the variance and improves the robustness of the predictions.
- **Handling of Missing Values**: It can handle missing values internally by using surrogate splits or by averaging results from other trees that do not have missing values for the same data points.
- **Feature Importance**: It provides insights into the importance of each feature in the prediction process. This can be particularly useful for feature selection and understanding the underlying data patterns.
- **Scalability**: Random Forest can be parallelized because each tree is built independently of the others. This makes it scalable to large datasets and high-dimensional data.
- **Versatility**: It can be used for both classification and regression tasks. The algorithm is also effective for tasks involving categorical and continuous variables.

## **Important Hyperparameters in Random Forest**


Random forests use hyperparameters to enhance model performance and predictive power or to increase the model’s speed.

**Increase the Predictive Power**

- `n_estimators`: Number of trees the algorithm builds before averaging the predictions.

- `max_features`: Maximum number of features random forest considers splitting a node.

- `mini_sample_leaf`: Determines the minimum number of leaves required to split an internal node.

- `criterion`: How to split the node in each tree? (Entropy/Gini impurity/Log Loss)

- `max_leaf_nodes`: Maximum leaf nodes in each tree

**Increase the Speed**

- `n_jobs`: it tells the engine how many processors it is allowed to use. If the value is 1, it can use only one processor, but if the value is -1, there is no limit.

- `random_state`: controls randomness of the sample. The model always produces the same results if it has a definite value of random state and receives the same hyperparameters and training data.

- `oob_score`: OOB means out of the bag. It is a random forest cross-validation method. One-third of the sample does not train the data; instead, we use it to evaluate its performance. We call these samples out-of-bag samples.

## **Coding in Python – Random Forest Classifier**

Now let’s implement Random Forest in scikit-learn.

#### **1. Let’s import the libraries**.

In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report



#### **Load Data & Spliting**

In [4]:
# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)



#### **Train Random Forest**

In [5]:
# Train Random Forest
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)




0,1,2
,n_estimators,100
,criterion,'gini'
,max_depth,5
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


#### **Prediction and Evaluation**

In [6]:
# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.965034965034965
              precision    recall  f1-score   support

           0       0.96      0.94      0.95        54
           1       0.97      0.98      0.97        89

    accuracy                           0.97       143
   macro avg       0.96      0.96      0.96       143
weighted avg       0.97      0.97      0.96       143



## **Advantages and Disadvantages of Random Forest Algorithm**

## Advantages

- You can use random forest for classification and regression problems.
- It solves the problem of overfitting as output is based on majority voting or averaging.
- It performs well even if the data contains null/missing values.
Each decision tree created is independent of the other; thus, it shows the property of parallelization.
- It maintains high stability by taking the average answers from a large number of trees.
- It maintains diversity because each decision tree does not consider all the attributes, although this is not true in all cases.
- It is immune to the curse of dimensionality. Since each tree ignores some attributes, the feature space reduces.
- We don’t need to segregate data into training and testing sets because 30% of the data will always remain unanalyzed by the decision tree created from bootstrap.

## Disadvantages

- Random forest is more complex than decision trees, where you can make decisions by following the path of the tree.
- Training time is more than other models due to its complexity. Whenever it has to make a prediction, each decision tree has to generate output for the given input data.

---------------------------------


# **Elastic Net Regression**

Elastic-Net Regression is a modification of Linear Regression which shares the same hypothetical function for prediction. The cost function of Linear Regression is represented by J.

                                   1/m ∑i=1m(y (i)−h(x(i))) 2

Here, m is the total number of training examples in the dataset.
h(x(i)) represents the hypothetical function for prediction.
y(i) represents the value of target variable for ith training example.

Linear Regression suffers from overfitting and can't deal with collinear data. When there are many features in the dataset and even some of them are not relevant for the predictive model. This makes the model more complex with a too inaccurate prediction on the test set (or overfitting). Such a model with high variance does not generalize on the new data. So, to deal with these issues, we include both L-2 and L-1 norm regularization to get the benefits of both Ridge and Lasso at the same time. The resultant model has better predictive power than Lasso. It performs feature selection and also makes the hypothesis simpler. The modified cost function for Elastic-Net Regression is given below :


## **Mathematical Intuition**:

During gradient descent optimization of its cost function, added L-2 penalty term leads to reduces the weights of the model close to zero. Due to the penalization of weights, the hypothesis gets simpler, more generalized, and less prone to overfitting. Added L1 penalty shrunk weights close to zero or zero.  Those weights which are shrunken to zero eliminates the features present in the hypothetical function. Due to this, irrelevant features don't participate in the predictive model. This penalization of weights makes the hypothesis more predictive which encourages the sparsity ( model with few parameters ). 

Different cases for tuning values of lambda1 and lamda2. 

- If lambda1 and lambda2 are set to be 0, Elastic-Net Regression equals Linear Regression.
- If lambda1 is set to be 0, Elastic-Net Regression equals Ridge Regression.
- If lambda2 is set to be 0, Elastic-Net Regression equals Lasso Regression.
- If lambda1 and lambda2 are set to be infinity, all weights are shrunk to zero

So, we should set lambda1 and lambda2 somewhere in between 0 and infinity

###  **Load libraries and dataset**

In [1]:
from sklearn.datasets import fetch_california_housing
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load data
data = fetch_california_housing(as_frame=True)
df = data.frame

# Separate features and target
X = df.drop("MedHouseVal", axis=1)
y = df["MedHouseVal"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features (for ElasticNet only)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


### **Train Model and Evaluation**

In [2]:
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

# Create ElasticNet model
elastic_model = ElasticNet(alpha=1.0, l1_ratio=0.5, random_state=42)
elastic_model.fit(X_train_scaled, y_train)

# Predict
y_pred_elastic = elastic_model.predict(X_test_scaled)

# Evaluate
print("ElasticNet R2:", r2_score(y_test, y_pred_elastic))
print("ElasticNet MSE:", mean_squared_error(y_test, y_pred_elastic))
print("Non-zero Coefficients:", (elastic_model.coef_ != 0).sum())


ElasticNet R2: 0.2031259919367321
ElasticNet MSE: 1.0442308546929173
Non-zero Coefficients: 1


- The model explains only 20.31% of the variation in the target variable.
- On average, the squared error between predicted and actual values is ~1.04 units.
- Only 1 feature out of all inputs was selected by ElasticNet; the rest were shrunk to zero.