# Exercises
## Step 1: Specify Prediction Target
Select the target variable, which corresponds to the sales price. Save this to a new variable called <b>y</b>. <br> 
You'll need to print a list of the columns to find the name of the column you need.

In [3]:
import numpy as np
import pandas as pd
import warnings 
warnings.filterwarnings('ignore')

In [4]:
housing = pd.read_csv('datasets\\housing_train.csv')

In [5]:
housing

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,RL,62.0,7917,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,175000
1456,1457,20,RL,85.0,13175,Pave,,Reg,Lvl,AllPub,...,0,,MnPrv,,0,2,2010,WD,Normal,210000
1457,1458,70,RL,66.0,9042,Pave,,Reg,Lvl,AllPub,...,0,,GdPrv,Shed,2500,5,2010,WD,Normal,266500
1458,1459,20,RL,68.0,9717,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2010,WD,Normal,142125


In [6]:
y = housing['SalePrice']

### Step 2: Create Feature Matrix (X)

In this step, you will create a DataFrame called `X` to hold the features used for prediction.

Since you only need a specific set of columns from the original dataset, begin by creating a list that contains the names of these selected columns. The columns you'll include in the list are:

- 'LotArea'
- 'YearBuilt'
- '1stFlrSF'
- '2ndFlrSF'
- 'FullBath'
- 'BedroomAbvGr'
- 'TotRmsAbvGrd'

Once you have the list of feature names, use it to create a DataFrame `X` that will be used for training the model.

In [7]:
feature_names = [
    'LotArea',
    'YearBuilt',
    '1stFlrSF',
    '2ndFlrSF',
    'FullBath',
    'BedroomAbvGr',
    'TotRmsAbvGrd'
]

X = housing[feature_names]
X.head()

Unnamed: 0,LotArea,YearBuilt,1stFlrSF,2ndFlrSF,FullBath,BedroomAbvGr,TotRmsAbvGrd
0,8450,2003,856,854,2,3,8
1,9600,1976,1262,0,2,3,6
2,11250,2001,920,866,2,3,6
3,9550,1915,961,756,1,3,7
4,14260,2000,1145,1053,2,4,9


<strong>Check:</strong> After updating the starter code, use <code>check()</code> to verify if your code is correct. 
Make sure to update the code that defines the variable <b>X</b>.

### Review Data
- Before building a model, take a quick look at X to verify it looks sensible

## Step 3: Specify and Fit Model¶

- Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.

Then fit the model you just created using the data in X and y that you saved above.

<hr>
<strong>What is a Decision Tree Regressor </strong>
<p>A decision tree regressor is a machine learning model that uses a tree-like structure to predict a continuous target variable . It works by recursively partitioning the data into smaller and smaller subsets based on the features, ultimately fitting a simple model (like the mean or median of the target variable) to each subset</p>
<img src= 'https://miro.medium.com/1*62Am0QdlxCq5Vmt5siR-7Q.png' />

## Decision Tree

In [10]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split

X_train, X_valid, y_train, y_valid = train_test_split(X, y, random_state=42)

# Define model
iowa_model = DecisionTreeRegressor(random_state=42)
iowa_model.fit(X_train, y_train)
predictions = iowa_model.predict(X_valid)
print(predictions[:10])

[132000. 412500. 119000. 115000. 256300.  80000. 190000. 148500.  81000.
  97500.]


## Linear Regression

In [7]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_valid)
print(predictions[:10])

[129669.72820543 311654.58707548 104228.55188975 162899.24250664
 247425.06635474 104429.94077235 195690.22234682 169144.5466029
 102837.18360575 158759.81847982]


## Step 4: Make Predictions
Make predictions with the model's predict command using X as the data. Save the results to a variable called predictions.

In [8]:
# Make predictions
predictions = iowa_model.predict(X)

# Show first 5 predictions
print(predictions[:-5])

[208500. 181500. 223500. ... 146500.  84500. 185000.]


In [9]:
housing

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,RL,62.0,7917,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,175000
1456,1457,20,RL,85.0,13175,Pave,,Reg,Lvl,AllPub,...,0,,MnPrv,,0,2,2010,WD,Normal,210000
1457,1458,70,RL,66.0,9042,Pave,,Reg,Lvl,AllPub,...,0,,GdPrv,Shed,2500,5,2010,WD,Normal,266500
1458,1459,20,RL,68.0,9717,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2010,WD,Normal,142125


## Step 5: TESTING THE ACCURACY OF PREDICTIONS
Since the sales prices are continuing data or ordinal data. 



In [10]:
from sklearn.metrics import r2_score

r2 = r2_score(y, predictions)
print("R² Score:", r2)

R² Score: 0.9229415499187086
