# Decision Tree
A Decision Tree is like a flowchart or a tree-shaped diagram used in machine learning and decision analysis to make decisions. It's a versatile tool that's easy to understand, making it great for beginners. A decision tree is a machine learning algorithm that can be used for both classification and regression tasks. It works by constructing a tree-like structure of decisions that leads to a prediction. The tree is constructed by recursively splitting the data into smaller subsets based on the values of the features. The process stops when a leaf node is reached, which contains the final prediction.

Imagine you are trying to build a model to predict whether a customer will churn (cancel their subscription) or not. You have a dataset of customers with their churn status and other features, such as their age, gender, and subscription plan.

You could start by building a simple decision tree with two nodes. The root node of the tree would be the question "Is the customer's age greater than 30?". If the answer is yes, the tree would predict that the customer is likely to churn. If the answer is no, the tree would predict that the customer is unlikely to churn.

You could then expand the tree by adding more nodes. For example, you could add a node for customers who are male and have a monthly subscription plan. If the customer is male and has a monthly subscription plan, the tree could predict that the customer is very likely to churn.

You could continue to expand the tree by adding more nodes for different combinations of features. The more complex the tree, the more accurate the predictions may be. However, it is important to avoid overfitting the tree to the training data. Overfitting occurs when the tree learns the training data too well and does not generalize well to new data.

### Objective:
Imagine you have to make a series of decisions, like whether to go for a walk or stay indoors based on the weather, your mood, and other factors. A Decision Tree helps you make these decisions systematically.

### Explanation:

1. Tree Structure: A Decision Tree looks like an upside-down tree with branches. At the top is the "root node," representing the initial decision you need to make. Each branch represents a choice or decision based on specific criteria.

2. Nodes and Leaves: Along the branches, you have "nodes" (decision points) and "leaves" (endpoints or final decisions). Think of nodes as questions, and leaves as answers or outcomes.

3. Splitting Criteria: To make decisions, a Decision Tree uses various factors or features (e.g., weather, mood) at each node. These factors split the data into smaller groups based on their characteristics.

## Advantages
1. Easy to Understand: Decision Trees are like simple flowcharts, making them very intuitive and easy to interpret. You can see exactly how a decision is made step by step.
2. No Assumptions: Decision Trees don't require complex assumptions about data distribution, making them versatile for various types of data.
3. Feature Importance: They can show which features (questions) are most important in making a decision.
4. Can be used for both classification and regression tasks
5. Can handle categorical and numerical features
6. Robust to outliers

## Disadvantages
1. Overfitting: Decision Trees can be prone to overfitting, which means they may create very complex trees that fit the training data perfectly but don't generalize well to new data.
2. Instability: Small changes in the data can lead to significant changes in the tree's structure, making them unstable.
3. Not Suitable for Complex Relationships: They may not capture complex relationships in the data as effectively as some other models.
4. Can be computationally expensive to train for large datasets
5. Can be biased towards features that appear earlier in the tree

## Application/Uses

1. Classification: For tasks like spam email detection, sentiment analysis, and medical diagnosis.
2. Regression: To predict numerical values, like predicting house prices based on features.
3. Recommendation Systems: In e-commerce and content recommendation, helping users find products or content based on their preferences.
4. Risk Assessment: In finance and insurance, assessing risk factors for loans or insurance policies.

In summary, a Decision Tree is like a visual guide that helps you make decisions by following a series of questions and criteria. It's easy to grasp, has advantages like simplicity and transparency, but it can also be sensitive to overfitting and may not handle complex relationships as well as some other models. Decision Trees find applications in various fields where decision-making based on data is essential.

In [16]:
import pandas as pd
from pandas_datareader import data
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import r2_score
from sklearn.datasets import load_boston
from sklearn.model_selection import GridSearchCV

In [4]:
boston = load_boston()
df = pd.DataFrame(boston.data)

In [6]:
df.columns = boston.feature_names
df['MEDV'] = boston.target

In [7]:
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [8]:
X = df.iloc[:,0:13]
y = df.iloc[:,13]

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)

In [11]:
rt = DecisionTreeRegressor(criterion = 'mse', max_depth=5)

In [12]:
rt.fit(X_train,y_train)

DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=5,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=None, splitter='best')

In [13]:
y_pred = rt.predict(X_test)

In [14]:
r2_score(y_test,y_pred)

0.8833565347917997

# Hyperparameter Tuning

In [17]:
param_grid = {
    'max_depth':[2,4,8,10,None],
    'criterion':['mse','mae'],
    'max_features':[0.25,0.5,1.0],
    'min_samples_split':[0.25,0.5,1.0]
}

In [19]:
reg = GridSearchCV(DecisionTreeRegressor(),param_grid=param_grid)

In [20]:
reg.fit(X_train,y_train)

GridSearchCV(cv=None, error_score=nan,
             estimator=DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse',
                                             max_depth=None, max_features=None,
                                             max_leaf_nodes=None,
                                             min_impurity_decrease=0.0,
                                             min_impurity_split=None,
                                             min_samples_leaf=1,
                                             min_samples_split=2,
                                             min_weight_fraction_leaf=0.0,
                                             presort='deprecated',
                                             random_state=None,
                                             splitter='best'),
             iid='deprecated', n_jobs=None,
             param_grid={'criterion': ['mse', 'mae'],
                         'max_depth': [2, 4, 8, 10, None],
                         'max_features'

In [21]:
reg.best_score_

0.6452352174104019

In [22]:
reg.best_params_

{'criterion': 'mse',
 'max_depth': None,
 'max_features': 0.5,
 'min_samples_split': 0.25}

# Feature Importance

In [15]:
for importance, name in sorted(zip(rt.feature_importances_, X_train.columns),reverse=True):
  print (name, importance)

RM 0.6344993240692652
LSTAT 0.19426427075925173
CRIM 0.07395590730917082
DIS 0.06744514557703153
B 0.011905660139828182
AGE 0.006176126174365511
PTRATIO 0.004391097507128497
NOX 0.0035610403857026535
INDUS 0.002627468726682041
RAD 0.0011739593515739223
ZN 0.0
TAX 0.0
CHAS 0.0
