<a href="https://colab.research.google.com/github/bilalsarimeseli/Linear_Regression_Cost_Function/blob/main/Regression_Trees.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p style="text-align:center">
    <a href="https://skills.network/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML0101ENSkillsNetwork1047-2023-01-01">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Regression Trees**


In this notebook, I'll be demonstrating the creation of regression trees using the **ScikitLearn** library. Additionally, I will illustrate the significance of various parameters, the process of training a regression tree, and ultimately how to evaluate its accuracy.

## Objectives


* Training a **Regression Tree**
* Evaluating a **Regression Trees Performance**


----


## Setup


The following cells below will install the libraries we need.


In [1]:
# Install libraries using pip
#!pip install pandas==1.3.4
#!pip install sklearn==0.20.1

In [2]:
# Pandas will allow us to create a dataframe of the data so it can be used and manipulated
import pandas as pd
# Regression Tree Algorithm
from sklearn.tree import DecisionTreeRegressor
# Split our data into a training and testing data
from sklearn.model_selection import train_test_split

## About Our Dataset


Picture yourself as a data scientist employed by a real estate firm looking to make investments in the _Boston real estate market_. You've gathered data regarding different neighborhoods in Boston and have been assigned the responsibility of developing a predictive model for estimating the _median house prices_ within those neighborhoods. This model will be instrumental in formulating offers for potential properties.

The dataset contains information related to neighborhoods or towns, rather than individual houses. The features include:

**CRIM**: Crime per capita

**ZN**: Proportion of residential land zoned for lots over 25,000 sq.ft.

**INDUS**: Proportion of non-retail business acres per town

**CHAS**: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

**NOX**: Nitric oxides concentration (parts per 10 million)

**RM**: Average number of rooms per dwelling

**AGE**: Proportion of owner-occupied units built prior to 1940

**DIS**: Weighted distances to ﬁve Boston employment centers

**RAD**: Index of accessibility to radial highways

**TAX**: Full-value property-tax rate per $10,000

**PTRAIO**: Pupil-teacher ratio by town

**LSTAT**: Percent lower status of the population

**MEDV**: Median value of owner-occupied homes in $1000s


## Read the Data


Lets read in the data we have downloaded


In [3]:
data = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%203/data/real_estate_data.csv")

In [4]:
data.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1,296,15.3,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3,222,18.7,,36.2


Now let's learn about the size of our data:


In [5]:
data.shape

(506, 13)

The majority of the data is valid; however, there are some rows that contain missing values, and we will address these during the **pre-processing phase**.

In [6]:
data.isna().sum()

CRIM       20
ZN         20
INDUS      20
CHAS       20
NOX         0
RM          0
AGE        20
DIS         0
RAD         0
TAX         0
PTRATIO     0
LSTAT      20
MEDV        0
dtype: int64

## Data Pre-Processing


First, we drop the rows with missing values as we have enough data in our dataset


In [7]:
data.dropna(inplace=True)

Now we can see our dataset has no missing values


In [8]:
data.isna().sum()

CRIM       0
ZN         0
INDUS      0
CHAS       0
NOX        0
RM         0
AGE        0
DIS        0
RAD        0
TAX        0
PTRATIO    0
LSTAT      0
MEDV       0
dtype: int64

Lets split the dataset into our features and what we are predicting (target)


In [9]:
X = data.drop(columns=["MEDV"])
Y = data["MEDV"]

In [10]:
X.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1,296,15.3,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94
5,0.02985,0.0,2.18,0.0,0.458,6.43,58.7,6.0622,3,222,18.7,5.21


In [11]:
Y.head()

0    24.0
1    21.6
2    34.7
3    33.4
5    28.7
Name: MEDV, dtype: float64

Finally I am splitting the data into a training and testing dataset using `train_test_split` from `sklearn.model_selection`


In [12]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.2, random_state=1)

## Create Regression Tree


Regression Trees are implemented using `DecisionTreeRegressor` from `sklearn.tree`

The important parameters of `DecisionTreeRegressor` are:

**`criterion`**: {'friedman_mse', 'squared_error', 'poisson', 'absolute_error'} - The function used to measure error

**`max_depth`** - The max depth the tree can be

**`min_samples_split`** - The minimum number of samples required to split a node

**`min_samples_leaf`** - The minimum number of samples that a leaf can contain

**`max_features`**: {"auto", "sqrt", "log2"} - The number of feature we examine looking for the best one, used to speed up training


First let's start by creating a `DecisionTreeRegressor` object, setting the `criterion` parameter to **`friedman_mse`** for Mean Squared Error


In [17]:
regression_tree = DecisionTreeRegressor(criterion = "friedman_mse")

## Training


Now let's train our model using the **`fit`** method on the `DecisionTreeRegressor` object providing our training data


In [18]:
regression_tree.fit(X_train, Y_train)

## Evaluation


To evaluate our dataset we will use the `score` method of the `DecisionTreeRegressor` object providing our testing data, this number is the $R^2$ value which indicates the coefficient of determination


In [19]:
regression_tree.score(X_test, Y_test)

0.8482082956031893

We can also find the average error in our testing set which is the average error in median home value prediction


In [20]:
prediction = regression_tree.predict(X_test)

print("$",(prediction - Y_test).abs().mean()*1000)

$ 2841.7721518987337


### Let's try different criterion types


We can train a regression tree using the `criterion` `squared_error` then report its $R^2$ value and average error


In [22]:
regression_tree = DecisionTreeRegressor(criterion = "squared_error")

regression_tree.fit(X_train, Y_train)

print(regression_tree.score(X_test, Y_test))

prediction = regression_tree.predict(X_test)

print("$",(prediction - Y_test).abs().mean()*1000)

0.8334581957665081
$ 2825.3164556962024


In [25]:
#Now, let's try `criterion` `absolute_error` then report its $R^2$ value and average error
regression_tree = DecisionTreeRegressor(criterion = "absolute_error")

regression_tree.fit(X_train, Y_train)

print(regression_tree.score(X_test, Y_test))

prediction = regression_tree.predict(X_test)

print("$",(prediction - Y_test).abs().mean()*1000)

0.8581428757797558
$ 2699.9999999999986


In [26]:
#Lastly, let's try `criterion` `poisson` then report its $R^2$ value and average error
regression_tree = DecisionTreeRegressor(criterion = "poisson")

regression_tree.fit(X_train, Y_train)

print(regression_tree.score(X_test, Y_test))

prediction = regression_tree.predict(X_test)

print("$",(prediction - Y_test).abs().mean()*1000)

0.821557692099833
$ 3093.67088607595


### CONCLUSION

As we see, all the different criterion gave somewhat similar results. Still, among all, the best has been `absolute_error` with an $R^2$ value of 0.858...

Using the **'absolute_error'** criterion for splitting nodes in the decision tree resulted in the best predictive model. This criterion effectively explains and predicts the variance in the target variable based on the chosen features. The higher **$R^2$** value indicates a superior fit of the model to the data, implying that it captures the relationships between features and the target variable slightly more accurately than other criteria.







Copyright © 2020 IBM Corporation. All rights reserved.
