# Training model

First we'll train an XGBoost model to predict housing prices using all the provided features.

In [None]:
library(tidyverse)
library(caret)
library(xgboost)

training_data <- read_csv("../input/train.csv", )

set.seed(1234)

In [None]:
# index for splitting data in testing & training
trainIndex <- createDataPartition(training_data$SalePrice, p = .8, 
                                  list = FALSE, 
                                  times = 1)

# convert training data to matrix
training_data_matrix <- training_data %>%
    select(-SalePrice) %>%
    mutate_if(is.character, as.factor) %>% # convert strings to factors
    mutate_if(is.factor, as.numeric) %>% # label encode categorical factors
    select_if(is.numeric) %>%
    as.matrix()

# train XGBoost model with training data
xgboost_model <- xgboost(data = training_data_matrix[trainIndex,],
        label = training_data$SalePrice[trainIndex],
        nrounds = 50, eval_metric="mae") # using same metric as comp.

Our trained model is preforming pretty well on our training set, but let's check using the testing data we set aside to make sure we're not overfitting. (If we have much worse performance on our testing data, that would suggest we were overfitting.)

In [None]:
# generate predictions for our held-out testing data
pred <- predict(xgboost_model, training_data_matrix[-trainIndex, ])

# get & print the classification error
err <- mean(pred - training_data$SalePrice[-trainIndex])
print(paste("test-error=", err))

Great, looks like we're not overfitting! Now let's make our final predictions.

# Saving Predictions 

Here we're di

In [None]:
# read in test data (won't have labels)
test_data <- read_csv("../input/test.csv")

# convert testing data to matrix (using same techniques
# we used for our training data)
test_data_matrix <- test_data %>%
    # convert strings to factors
    mutate_if(is.character, as.factor) %>% 
    # label encode categorical factors
    mutate_if(is.factor, as.numeric) %>% 
    select_if(is.numeric) %>%
    as.matrix()

# predict prices
submission_predictions <- predict(xgboost_model, 
                                  test_data_matrix)

# format csv as specificed in competition discription
prediction_file <- tibble(Id = test_data$Id, 
                         SalePrice = submission_predictions)

# save file
write_csv(prediction_file, "submission.csv")

# Submitting our predictions

Now that we've written the code to save our model out, there are just a few things left to do:

1.  **Commit** our notebook. This will run all our code top to bottom and create the output file we need to submit. 
2.  **Submit** the output file. Once our notebook is done commiting, we can go to the commited version by clicking the "Open Version" button in the commit window. Then scroll down to the output file and click the "Submit to Competition" button. 

And that's all you need to do to make your first Kaggle comepetition submission. 