# What is Decision Tree


- A decision tree is a supervised machine learning algorithm used for classification and regression.
- It represents decisions in a tree structure, where:
- >Each internal node represents a condition on a feature.
- >Each branch represents the outcome of that condition.
- >Each leaf node gives the final prediction.
- The model works by recursively splitting the data so that similar outputs fall into the same group.   

## Decision Tree several key components :
- Root Node: The starting point of the tree, representing the entire dataset.
- Internal Decision Node: A node that represents a test or decision on a specific attribute or feature.
- Branch/Subtree: The outcome of a decision, connecting a parent node to a child node.
- Leaf Node: A terminal node that does not split further and provides the final prediction or class label.
- Splitting: The process of dividing a node into multiple sub-nodes base on a certain condition.

    


### Types of Decision Tree
- Decision Tree Classifier.
- Decision Tree Regression.

## Decision Tree Classifier
- The classification Decision Tree is used when the target variable is categorical(discrete classes).
#### Examples of target values:
- Yes/No
- Spam/Not spam
- pass/Fail
#### How it works:
- The tree splits yhe dataset in such a way that each resulting node becomes as pure as possible, meaning most data points in that node belong to the same class. 

## Decision Tree Regression
- A Regression Decision Tree is used when the target variable is continuous (numerical  values).
#### Examples of target values:
- House Price
- Salary
- Temperature
#### How it works:
- Instead of class purity, the goal is to minimize prediction error.

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error,r2_score

In [5]:
data = pd.read_csv(r"C:\Users\HP\Downloads\house_price_decision_tree.csv")
data

Unnamed: 0,house_size_sqft,bedrooms,age_years,price_lakhs
0,450,1,25,38.5
1,500,1,22,42.0
2,550,1,20,45.2
3,600,2,18,50.1
4,650,2,17,53.0
...,...,...,...,...
86,4750,5,1,348.0
87,4800,5,1,351.6
88,4850,5,1,355.2
89,4900,5,1,358.8


In [6]:
x = data[['house_size_sqft','bedrooms','age_years']]
y = data['price_lakhs']

In [7]:
x_train,x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2,random_state=42)

In [8]:
model = DecisionTreeRegressor(criterion='squared_error', max_depth=5, random_state=42)

In [9]:
model.fit(x_train,y_train)

In [10]:
y_pred_train = model.predict(x_train)
y_pred_test = model.predict(x_test)

train_mse = mean_squared_error(y_train, y_pred_train)
test_mse = mean_squared_error(y_test, y_test_train)

In [14]:
train_mse = mean_squared_error(y_train, y_pred_train)
test_mse = mean_squared_error(y_test, y_pred_test)

print("Train MSE:", train_mse)
print("Test MSE:", test_mse)

Train MSE: 6.263796296296292
Test MSE: 36.81209064327487


In [15]:
train_r2 = r2_score(y_train, y_pred_train)
test_r2 = r2_score(y_test, y_pred_test)

print("Train R2:", train_r2)
print("Test R2:", test_r2)

Train R2: 0.9993019896137146
Test R2: 0.9956923642606307


In [23]:
new_house = pd.DataFrame({
    'house_size_sqft': [1500],
    'bedrooms':[3],
    'age_years':[5]
})
predicted_price = model.predict(new_house)

print("Predicted House Price:", predicted_price[0])

Predicted House Price: 124.33333333333333
