# Using Pipelines to create a more efficient Model

"Pipelines are a simple way to keep your data processing and modeling code organized. Specifically, a pipeline bundles preprocessing and modeling steps so you can use the whole bundle as if it were a single step." (DanB, Pipelines)

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
import warnings

warnings.simplefilter(action='ignore', category=DeprecationWarning) 
warnings.simplefilter(action='ignore', category=FutureWarning)
#so pandas doesn't spit out a warning everytime

# DATA PREPROCESSING

# Loading in Iowa housing data
data = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')
data.dropna(axis=0, subset=['SalePrice'], inplace=True) #drops data with missing SalePrice value

# Now we setup our variables
y = data.SalePrice
X = data.drop(['SalePrice'], axis=1).select_dtypes(exclude=['object'])

print('Setup Complete...')

Now we will make our pipeline by using an Imputer to fill in our missing values and a XG-Boost RandomForestRegressor to make our predictions.

In [None]:
from xgboost import XGBRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Imputer

my_pipeline = make_pipeline(Imputer(), XGBRegressor())

Now we will fit our model and make our predictions

In [None]:
# neg_mean_absolute_error
scores = cross_val_score(my_pipeline, X, y, scoring='neg_mean_absolute_error', cv=5)
train_predictions = my_pipeline.fit(X,y)

print('Mean AbsoluteError: %2f' %(-1 * scores.mean()))

Now we will make and store predictions for our test data set:

In [None]:
test = pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')
predictions = test.select_dtypes(exclude=['object'])
predicted_prices = my_pipeline.predict(predictions)

# Making our submission

In [None]:
my_submission = pd.DataFrame({'Id': test.Id, 'SalePrice': predicted_prices})
my_submission.to_csv('submission.csv', index=False)
print('Submitted!')