**Goal**

It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable. 

**Metric**

Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

**Submission File Format**

The file should contain a header and have the following format:

In [1]:
# import the basic libraries 
import pandas as pd
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
import xgboost as xgb

In [2]:
# load data
data_train = pd.read_csv('/content/drive/MyDrive/Kaggle challenges/House prices/train (1).csv')
data_test = pd.read_csv('/content/drive/MyDrive/Kaggle challenges/House prices/test (1).csv')
sample_sub = pd.read_csv('/content/drive/MyDrive/Kaggle challenges/House prices/sample_submission.csv')

In [3]:
# join the sets
data = pd.concat([data_train,data_test],axis=0).reset_index(drop=True)

In [4]:
# Convert categorical variables to numeric.
data = pd.get_dummies(data, drop_first = True)

In [6]:
# fill null or missing values
data = data.fillna(method = 'bfill')

In [8]:
# cut the data set to fit
data = data.head(1459)

In [9]:
# select dependent/independent variables
X = data.drop('SalePrice',axis=1)
y = data['SalePrice']

In [10]:
# create model 
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state =1) # 75 - 25 split   (test_size=0.25)
model = XGBRegressor()
model.fit(X_train, y_train)



XGBRegressor()

In [11]:
# predictions 
predictions = model.predict(X_test)

In [12]:
# evaluation y_train 
from sklearn.metrics import mean_squared_error, r2_score
import math

pred_train = model.predict(X_train)

print("R^2 train: ", r2_score(y_train, pred_train))  ## model.score
print("RMSE train: ", math.sqrt(mean_squared_error(y_train, pred_train)))  

R^2 train:  0.9698667111790383
RMSE train:  13938.192189901656


In [13]:
# evaluation y_test 

print("R^2 test: ", r2_score(y_test, predictions))  ## model.score
print("RMSE test: ", math.sqrt(mean_squared_error(y_test, predictions)))  

R^2 test:  0.8999092171747091
RMSE test:  24264.678633133237


In [14]:
p = model.predict(X)
sample_sub['SalePrice'] = p
sample_sub.head()

Unnamed: 0,Id,SalePrice
0,1461,204487.484375
1,1462,170691.609375
2,1463,210564.234375
3,1464,176462.0625
4,1465,283104.375
