## **Machine Learning Algorithm**

## **Type**: Supervised Learning 

## **Regression + Classification**

## **Day 3**: Lasso Regression + Decision Tree Algorithm

## **Student**: Muhammad Shafiq

-------------------------------------------

## **Lasso Regression:**

Lasso Regression is a regression method based on **Least Absolute Shrinkage and Selection Operator** and is used in regression analysis for variable selection and regularization. It helps remove irrelevant data features and prevents overfitting. This allows features with weak influence to be clearly identified as the coefficients of less important variables are shrunk toward zero.

Lasso Regression is a regularization technique used to prevent overfitting. It improves linear regression by adding a penalty term to the standard regression equation. It works by minimizing the sum of squared differences between the observed and predicted values by fitting a line to the data.

However in real-world datasets features have strong correlations with each other known as multicollinearity where Lasso Regression actually helps.

- **For example**: 

if we're predicting house prices based on features like location, square footage and number of bedrooms. Lasso Regression can identify most important features. It might determine that location and square footage are the key factors influencing price while others has less impact. By making coefficient for the bedroom feature to zero it simplifies the model and improves its accuracy.

## **Understanding Lasso Regression Working**

Lasso Regression is an extension of linear regression. While traditional linear regression minimizes the sum of squared differences between the observed and predicted values to find the best-fit line, it doesn’t handle the complexity of real-world data well when many factors are involved.

 1. **Ordinary Least Squares (OLS) Regression**

 It builds on Ordinary Least Squares (OLS) Regression method by adding a penalty term. The basic equation for OLS is:

                    min RSS=Σ(yᵢ− y^ᵢ)²

Where

yi  : is the observed value.
y^ᵢ : is the predicted value for each data point 

 2. **Penalty Term for Lasso Regression**

 In Lasso regression a penalty term is added to the OLS equation. Penalty is the sum of the absolute values of the coefficients. Updated cost function becomes:


                      RSS+λ×∑∣βi

Where,

 βi:   represents the coefficients of the predictors

 λ : is the tuning parameter that controls the strength of the penalty. As λ increases more coefficients are pushed towards zero

 3. **Shrinking Coefficients:**

 Key feature of Lasso is its ability to make coefficients of less important features to zero. This removes irrelevant features from the model helps in making it useful for high-dimensional data with many predictors relative to the number of observations.

4. **Selecting the optimal λ**

 Selecting correct lambda value is important. Cross-validation techniques are used to find the optimal value helps in balancing model complexity and predictive performance.

 Primary objective of Lasso regression is to minimize residual sum of squares (RSS) along with a penalty term multiplied by the sum of the absolute values of the coefficients.

## **When to use Lasso Regression**

Lasso Regression is useful in the following situations:

- **Feature Selection**: It automatically selects most important features by reducing the coefficients of less significant features to zero.
- **Collinearity**: When there is multicollinearity it can help us by reducing the coefficients of correlated variables and selecting only one of them.
- **Regularization**: It helps preventing overfitting by penalizing large coefficients which is useful when the number of predictors is large.
- **Interpretability**: Compared to traditional linear regression models that have all features lasso regression generates a model with fewer non-zero coefficients making model simpler to understand.

## **Advantages of Lasso Regression**

- **Feature Selection**: It removes the need to manually select most important features hence the developed regression model becomes simpler and more explainable.
- **Regularization**: It constrains large coefficients so a less biased model is generated which is robust and general in its predictions.
- **Interpretability**: This creates another models helps in making them simpler to understand and explain which is important in fields like healthcare and finance.
- **Handles Large Feature Spaces**: It is effective in handling high-dimensional data such as images and videos.

## **Disadvantages**

- **Selection Bias**: Lasso may randomly select one variable from a group of highly correlated variables which leads to a biased model.
- **Sensitive to Scale**: It is sensitive to features with different scales as they can impact the regularization and affect model's accuracy.
- **Impact of Outliers**: It can be easily affected by the outliers in the given data which results to overfitting of the coefficients.
- **Model Instability**: It can be unstable when there are many correlated variables which causes it to select different features with small changes in the data.
- **Tuning Parameter Selection**: Analyzing different λ (alpha) values may be problematic but can be solved by cross-validation.

**Read From Here For Full Details of Lasso Regression**

[Machine Learning Lasso Regression](https://www.geeksforgeeks.org/machine-learning/what-is-lasso-regression/)

[Machine Learning Lasso Regression](https://www.mygreatlearning.com/blog/understanding-of-lasso-regression/)


In [1]:
from sklearn.linear_model import Lasso
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score

# Load and scale data
X, y = load_diabetes(return_X_y=True)
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train Lasso model
model = Lasso(alpha=0.1)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("R2 Score:", r2_score(y_test, y_pred))
print("Non-zero coefficients:", (model.coef_ != 0).sum())


R2 Score: 0.48658970856215966
Non-zero coefficients: 9


-----------------------------

# **Decision Tree:**

A Decision Tree helps us make decisions by showing different options and how they are related. It has a tree-like structure that starts with one main question called the root node which represents the entire dataset. From there, the tree branches out into different possibilities based on features in the data.

- **Root Node**: Starting point representing the whole dataset.
- **Branches**: Lines connecting nodes showing the flow from one decision to another.
- **Internal Nodes**: Points where decisions are made based on data features.
- **Leaf Nodes** : End points of the tree where the final decision or prediction is made.

There are mainly two types of Decision Trees based on the target variable:

- **Classification Trees**: Used for predicting categorical outcomes like spam or not spam. These trees split the data based on features to classify data into predefined categories.

- **Regression Trees**: Used for predicting continuous outcomes like predicting house prices. Instead of assigning categories, it provides numerical predictions based on the input features.

## **Splitting Criteria in Decision Trees**

In a Decision Tree, the process of splitting data at each node is important. The splitting criteria finds the best feature to split the data on. Common splitting criteria include Gini Impurity and Entropy.

- **Gini Impurity**: This criterion measures how "impure" a node is. The lower the Gini Impurity the better the feature splits the data into distinct categories.
- **Entropy**: This measures the amount of uncertainty or disorder in the data. The tree tries to reduce the entropy by splitting the data on features that provide the most information about the target variable.

These criteria help decide which features are useful for making the best split at each decision point in the tree.

### **Pruning in Decision Trees**

Pruning is an important technique used to prevent overfitting in Decision Trees. Overfitting occurs when a tree becomes too deep and starts to memorize the training data rather than learning general patterns. This leads to poor performance on new, unseen data.
This technique reduces the complexity of the tree by removing branches that have little predictive power. It improves model performance by helping the tree generalize better to new data. It also makes the model simpler and faster to deploy.
It is useful when a Decision Tree is too deep and starts to capture noise in the data.

### **Advantages of Decision Trees**

- **Easy to Understand**: Decision Trees are visual which makes it easy to follow the decision-making process.
- **Versatility**: Can be used for both classification and regression problems.
- **No Need for Feature Scaling**: Unlike many machine learning models, it don’t require us to scale or normalize our data.
- **Handles Non-linear Relationships**: It capture complex, non-linear relationships between features and outcomes effectively.
- **Interpretability**: The tree structure is easy to interpret helps in allowing users to understand the reasoning behind each decision.
- **Handles Missing Data**: It can handle missing values by using strategies like assigning the most common value or ignoring missing data during splits.

## **Disadvantages of Decision Trees**

- **Overfitting**: They can overfit the training data if they are too deep which means they memorize the data instead of learning general patterns. This leads to poor performance on unseen data.
- **Instability**: It can be unstable which means that small changes in the data may lead to significant differences in the tree structure and predictions.
- **Bias towards Features with Many Categories**: It can become biased toward features with many distinct values which focuses too much on them and potentially missing other important features which can reduce prediction accuracy.
- **Difficulty in Capturing Complex Interactions**: Decision Trees may struggle to capture complex interactions between features which helps in making them less effective for certain types of data.
- **Computationally Expensive for Large Datasets**: For large datasets, building and pruning a Decision Tree can be computationally intensive, especially as the tree depth increases.

## **Applications of Decision Trees**

Decision Trees are used across various fields due to their simplicity, interpretability and versatility lets see some key applications:

- **Loan Approval in Banking**: Banks use Decision Trees to assess whether a loan application should be approved. The decision is based on factors like credit score, income, employment status and loan history. This helps predict approval or rejection helps in enabling quick and reliable decisions.
- **Medical Diagnosis**: In healthcare they assist in diagnosing diseases. For example, they can predict whether a patient has diabetes based on clinical data like glucose levels, BMI and blood pressure. This helps classify patients into diabetic or non-diabetic categories, supporting early diagnosis and treatment.
- **Predicting Exam Results in Education**: Educational institutions use to predict whether a student will pass or fail based on factors like attendance, study time and past grades. This helps teachers identify at-risk students and offer targeted support.
- **Customer Churn Prediction**: Companies use Decision Trees to predict whether a customer will leave or stay based on behavior patterns, purchase history, and interactions. This allows businesses to take proactive steps to retain customers.
- **Fraud Detection**: In finance, Decision Trees are used to detect fraudulent activities, such as credit card fraud. By analyzing past transaction data and patterns, Decision Trees can identify suspicious activities and flag them for further investigation.

**Read From Here For Full Details of Decision Tree Algorithm**

[Machine Learning Decision Tree](https://www.geeksforgeeks.org/machine-learning/decision-tree/)