# Introduction
**This will be your workspace for the [Machine Learning course](https://www.kaggle.com/learn/machine-learning).**

You will need to translate the concepts to work with the data in this notebook, the Iowa data. Each page in the Machine Learning course includes instructions for what code to write at that step in the course.

# Write Your Code Below

In [3]:
"""By Aryan Deorah
Predicts Housing Prices of Houses in Iowa"""

#imports pandas as pd
import pandas as pd

#imports the training and test data sets as data and test respectively
data = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')
test = pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')

#sets data_target equal to the Sale Price and data_predictors and test_predictors equal to the data excluding the id and the sale preice
data_target = data.SalePrice
data_predictors = data.drop(['Id','SalePrice'], axis=1)
test_predictors = test.drop(['Id'], axis=1)

#sets low_cardinality_cols equal to the columns in data_predictors that are the dtype object and have less than 10 unique values
low_cardinality_cols = [cname for cname in data_predictors.columns if 
                                data_predictors[cname].nunique() < 10 and
                                data_predictors[cname].dtype == "object"]

#sets numeric_cols equal to the columns that have numerical values
numeric_cols = [cname for cname in data_predictors.columns if 
              data_predictors[cname].dtype in ['int64', 'float64']]

#sets my_cols equal to the other cols combined and sets the training and testing predictors equal to the my_cols in their respective data sets
my_cols = low_cardinality_cols + numeric_cols
train_predictors = data_predictors[my_cols]
test_predictors = test_predictors[my_cols]

#one hot encodes the testing and training predictors object columns to by expressed by numbers and alights the testing and training predictors
one_hot_encoded_training_predictors = pd.get_dummies(train_predictors)
one_hot_encoded_test_predictors = pd.get_dummies(test_predictors)
final_train, final_test = one_hot_encoded_training_predictors.align(one_hot_encoded_test_predictors,
                                                                    join='left', 
                                                                    axis=1)

#imports xgbregressor, make_pipeline, Imputer, and SimpleImputer
from xgboost import XGBRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Imputer
from sklearn.impute import SimpleImputer

#makes my pipeline impute the data then run XGBregressor, then fits the data to final_train and data_target and sets the predictions equal to the predictions on final_test
my_pipeline = make_pipeline(SimpleImputer(), XGBRegressor())
my_pipeline.fit(final_train, data_target)
predictions = my_pipeline.predict(final_test)

#imports cross validation score and then prints the cross validation score of the model on the training data
from sklearn.model_selection import cross_val_score
scores = cross_val_score(my_pipeline, final_train, data_target, scoring='neg_mean_absolute_error',cv=5)
print(scores)

#sets the file my_submission to a table with two columns, Id and SalePrice, which is the predictions and makes it a csv file
my_submission = pd.DataFrame({'Id': test.Id, 'SalePrice': predictions})
my_submission.to_csv('submission.csv', index=False)



[-16083.90097924 -17366.88709332 -16598.72043557 -15189.39886558
 -17291.8169146 ]



**If you have any questions or hit any problems, come to the [Learn Discussion](https://www.kaggle.com/learn-forum) for help. **

**Return to [ML Course Index](https://www.kaggle.com/learn/machine-learning)**