<a href="https://colab.research.google.com/github/ArfaKhalid/Machine-Learning-Models/blob/main/Decision_Tree_Regessor_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning Model- Decision Tree Regressor Model
A Decision Tree regression model is a supervised machine learning algorithm used for predicting continuous outcomes, making it suitable for regression tasks. It is a tree-like structure where each node represents a decision or a test on an attribute, each branch represents the outcome of the test, and each leaf node holds the predicted continuous value.

**Key Characteristics of Decision Tree Regression:**

1. **Tree Structure:** The decision tree is structured like a flowchart, where each internal node represents a decision based on a feature, and each leaf node represents the predicted outcome.

2. **Recursive Partitioning:** The tree is built through a process called recursive partitioning. At each step, the algorithm selects the feature and the threshold that best splits the data into subsets, maximizing the homogeneity of the target variable within each subset.

3. **Predictive Power:** Decision trees are powerful for capturing complex relationships in the data. They can handle non-linear patterns and interactions between features.

4. **Interpretability:** Decision trees are inherently interpretable, as the decision logic can be easily visualized. This makes them useful for understanding the factors influencing predictions.

5. **Overfitting Concerns:** Without proper constraints, decision trees can become overly complex and fit the training data too closely, leading to overfitting. Techniques like pruning or setting a minimum number of samples required to split a node can help mitigate overfitting.

**How Decision Tree Regression Works:**

1. **Root Node:** The algorithm starts with the entire dataset and selects the feature and threshold that best split the data into two subsets.

2. **Internal Nodes:** The process is repeated recursively for each subset, creating additional internal nodes until a stopping criterion is met (e.g., a maximum depth is reached, or a node contains a minimum number of samples).

3. **Leaf Nodes:** The final nodes, or leaves, contain the predicted continuous values. The prediction is typically the average (or another summary statistic) of the target variable within that leaf.

4. **Prediction:** To make a prediction for a new data point, it traverses the tree from the root to a leaf, following the decision rules at each node.

**Use Cases:**

Decision Tree regression is well-suited for tasks where the relationship between input features and the target variable is not linear and involves complex interactions. It has been applied in various domains, including finance (e.g., predicting stock prices), environmental science (e.g., predicting temperature), and marketing (e.g., predicting sales).

While decision trees have their strengths, it's important to be mindful of their limitations, such as sensitivity to small variations in the data and the potential for overfitting. Ensemble methods like Random Forests or Gradient Boosted Trees are often employed to address these challenges and improve predictive performance.







# Data Preprocessing

## Loading Data

In [1]:
import pandas as pd

## Reviewing The Data

In [2]:

# Provide the correct file path or URL for the CSV file
file_path = "melb_data.csv"

# Read the data and store it in a DataFrame titled melbourne_data
melbourne_data = pd.read_csv(file_path)

# Print a summary of the data in Melbourne data
print(melbourne_data.describe())

              Rooms         Price      Distance      Postcode      Bedroom2  \
count  13580.000000  1.358000e+04  13580.000000  13580.000000  13580.000000   
mean       2.937997  1.075684e+06     10.137776   3105.301915      2.914728   
std        0.955748  6.393107e+05      5.868725     90.676964      0.965921   
min        1.000000  8.500000e+04      0.000000   3000.000000      0.000000   
25%        2.000000  6.500000e+05      6.100000   3044.000000      2.000000   
50%        3.000000  9.030000e+05      9.200000   3084.000000      3.000000   
75%        3.000000  1.330000e+06     13.000000   3148.000000      3.000000   
max       10.000000  9.000000e+06     48.100000   3977.000000     20.000000   

           Bathroom           Car       Landsize  BuildingArea    YearBuilt  \
count  13580.000000  13518.000000   13580.000000   7130.000000  8205.000000   
mean       1.534242      1.610075     558.416127    151.967650  1964.684217   
std        0.691712      0.962634    3990.669241   

# Identifying variables

In [3]:
#check variables
melbourne_data.columns

Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
       'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
       'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
       'Longtitude', 'Regionname', 'Propertycount'],
      dtype='object')

In [4]:
#remove missing values
# dropna drops missing values (think of na as "not available")
melbourne_data = melbourne_data.dropna(axis=0)

In [5]:
#Selecting the prediction Target
y= melbourne_data.Price

In [6]:
#Choosing "Features"
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X= melbourne_data[melbourne_features]


In [7]:
X.describe()

Unnamed: 0,Rooms,Bathroom,Landsize,Lattitude,Longtitude
count,6196.0,6196.0,6196.0,6196.0,6196.0
mean,2.931407,1.57634,471.00694,-37.807904,144.990201
std,0.971079,0.711362,897.449881,0.07585,0.099165
min,1.0,1.0,0.0,-38.16492,144.54237
25%,2.0,1.0,152.0,-37.855438,144.926198
50%,3.0,1.0,373.0,-37.80225,144.9958
75%,4.0,2.0,628.0,-37.7582,145.0527
max,8.0,8.0,37000.0,-37.45709,145.52635


In [8]:
X.head()

Unnamed: 0,Rooms,Bathroom,Landsize,Lattitude,Longtitude
1,2,1,156,-37.8079,144.9934
2,3,2,134,-37.8093,144.9944
4,4,1,120,-37.8072,144.9941
6,3,2,245,-37.8024,144.9993
7,2,1,256,-37.806,144.9954


# Decision Tree Regressor Model- Model Building
The steps to build a Model
- Define : What type of Model? Parameters?
- Fit : Identifying Pattern from the data.
- Predict : What makes sense
- Evaluate : Check accuarcy of Model Predictions

In [12]:
from sklearn.tree import DecisionTreeRegressor

# define model
melbourne_model = DecisionTreeRegressor(random_state=1)
# Fit model
melbourne_model.fit(X, y)

# Predictions

In [13]:
print("Making predictions for the following 5 houses:")
print(X.head())
print("The predictions are")
print(melbourne_model.predict(X.head()))

Making predictions for the following 5 houses:
   Rooms  Bathroom  Landsize  Lattitude  Longtitude
1      2         1       156   -37.8079    144.9934
2      3         2       134   -37.8093    144.9944
4      4         1       120   -37.8072    144.9941
6      3         2       245   -37.8024    144.9993
7      2         1       256   -37.8060    144.9954
The predictions are
[1035000. 1465000. 1600000. 1876000. 1636000.]
