# Predict House Value using Watson Studio

In this notebook we will walk through a simple machine learning example in order to explore ways to leverage Watson Studio to simplify building and collaborating on AI and data science applications.

## Learning Goals
1. Understand what Watson Studio is and the value it brings
1. Build simple 3 simple models
1. Understand the deployment pipeline with Watson Machine Learning

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Import the data
In the cell above, import the CSV into our notebook from Watson Studio.

In [None]:
## IMPORT DATA FRAME HERE 

## IMORTANT 
Set the value of `my_df` to the variable assigned by Watson for the dataframe (in this case, Watson used `df_data_1`)

In [None]:
my_df = df_data_2 # CHANGE df_data_X TO YOUR VARIABLE FROM ABOVE

In [None]:
my_df.head()

## Initial Observations

Lets first take a visual look at our data

In [None]:
my_df.hist(bins=50, figsize=(30, 25))
plt.show()

# Clean the data

Before we do anything further, we want to make sure the data is usable. In this case, all we have to do is take care of any missing values in columns.

In [None]:
threshold = 15 # Anything that occurs less than N times will be removed.

# We only want to remove values from non-numeric columns
for col in my_df.select_dtypes(include=['object']).columns:
    my_df[col].fillna('NA', inplace = True)
    value_counts = my_df[col].value_counts() # Specific column 
    to_remove = value_counts[value_counts <= threshold].index
    my_df[col].replace(to_remove, 'NA', inplace=True)

In [None]:
my_df.head()

# Splitting the data

Now that the data is cleaned, we are ready to begin building some models! In this example, we will be building a model to predict `SALEPRICE` based on the other feratures in the data. First, we'll split our data into two sets, training and test. 

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

In [None]:
# The variable we want to predict
y = my_df['SALEPRICE']

my_df = my_df.drop(['SALEPRICE','ID'],  axis=1)


le = LabelEncoder()
X_2 = my_df.apply(le.fit_transform)

enc = OneHotEncoder(handle_unknown='ignore')
ft = enc.fit(X_2)

onehotlabels = enc.transform(X_2).toarray()
onehotlabels.shape

x = X_2

x_train, x_test, y_train, y_test = train_test_split(x,y)

# Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression

regressor = LinearRegression(fit_intercept=False)
linear_regression_model = regressor.fit(x_train, y_train)

In [None]:
y_pred = regressor.predict(x_test)

In [None]:
lin_mse = mean_squared_error(y_pred, y_test)
lin_rmse = np.sqrt(lin_mse)

print('Linear Regression RMSE: $%.4f' % lin_rmse)

lin_mae = mean_absolute_error(y_pred, y_test)
print('Linear Regression MAE: $%.4f' % lin_mae)

# Random Forest

In [None]:
from sklearn.ensemble import RandomForestRegressor
forest_reg = RandomForestRegressor(random_state=42, n_estimators=100)
forest_reg.fit(x_train, y_train)

In [None]:
print('Random Forest R squared": %.4f' % forest_reg.score(x_test, y_test))

In [None]:
y_pred = forest_reg.predict(x_test)
forest_mse = mean_squared_error(y_pred, y_test)
forest_rmse = np.sqrt(forest_mse)

print('Random Forest RMSE: $%.4f' % forest_rmse)

# Gradient Boosting

In [None]:
from sklearn import ensemble
from sklearn.ensemble import GradientBoostingRegressor

model = ensemble.GradientBoostingRegressor()
model.fit(x_train, y_train)

In [None]:
print('Gradient Boosting R squared": %.4f' % model.score(x_test, y_test))

In [None]:
y_pred = model.predict(x_test)
model_mse = mean_squared_error(y_pred, y_test)
model_rmse = np.sqrt(model_mse)

print('Gradient Boosting RMSE: $%.4f' % model_rmse)

# Deploying a Model with Watson Machine Learning

One of the benefits of using a service like Watson Machine Learning is it allows data scientists and researchers to focus on building the best possible models, while not having to worry about infrastrucre to make those models usable by others. Here we will build a simple deployment pipeline to deploy one of our models from above and make it accessible through a private API.

In this example we will choose to deploy the **Random Forest** model.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer

x_train, x_test, y_train, y_test = train_test_split(my_df, y)

categorical_feature_mask = (my_df.dtypes==object)
numerical_features = ~categorical_feature_mask

In [None]:
preprocess = make_column_transformer(
    (OneHotEncoder(), categorical_feature_mask),
    (make_pipeline(SimpleImputer(), StandardScaler()), numerical_features)
)

In [None]:
model = make_pipeline(preprocess, RandomForestRegressor(n_estimators=100))

In [None]:
model.fit(x_train, y_train)

In [None]:
model.predict(x_test)

# Watson Machine Learning
In the cell below, fill in the credentials for your Watson Machine Learning instance. 

In [None]:
# Replace the credentials that you got from Watson Machine Learning service
from watson_machine_learning_client import WatsonMachineLearningAPIClient
wml_credentials = {
    "apikey": "<API KEY>",
    "instance_id": "<INSTANCE ID>",
    "url": "<URL>"
}

client = WatsonMachineLearningAPIClient(wml_credentials)

In [None]:
runtimes_meta = {
    client.runtimes.ConfigurationMetaNames.NAME: "House_Value_Model", 
    client.runtimes.ConfigurationMetaNames.DESCRIPTION: "House Value Model", 
    client.runtimes.ConfigurationMetaNames.PLATFORM: { "name": "python", "version": "3.6" }, 
}
runtime_details = client.runtimes.store(runtimes_meta)
runtime_details
runtime_url = client.runtimes.get_url(runtime_details)
runtime_uid = client.runtimes.get_uid(runtime_details)
print("Runtimes URL: " + runtime_url)
print("Runtimes UID: " + runtime_uid)

In [None]:
model_props = {client.repository.ModelMetaNames.NAME: "House Value Model",
               client.repository.ModelMetaNames.RUNTIME_UID: runtime_uid
              }
published_model = client.repository.store_model(model=model, meta_props=model_props)
import json
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)
print(json.dumps(model_details, indent=2))

In [None]:
created_deployment = client.deployments.create(published_model_uid, name="House_Value_Model")

In [None]:
scoring_endpoint = client.deployments.get_scoring_url(created_deployment)
print(scoring_endpoint)
x_train.iloc[0].values

# Using the deployed model
Now that we've deployed the model, we can now send a request to the API endpoint with a data payload to get a house price prediction.

In [None]:
scoring_payload = {'fields': ['LOTAREA', 'BLDGTYPE', 'HOUSESTYLE', 'OVERALLCOND', 'YEARBUILT',
       'ROOFSTYLE', 'EXTERCOND', 'FOUNDATION', 'BSMTCOND', 'HEATING',
       'HEATINGQC', 'CENTRALAIR', 'ELECTRICAL', 'FULLBATH', 'HALFBATH',
       'BEDROOMABVGR', 'KITCHENABVGR', 'KITCHENQUAL', 'TOTRMSABVGRD',
       'FIREPLACES', 'FIREPLACEQU', 'GARAGETYPE', 'GARAGEFINISH', 'GARAGECARS',
       'GARAGECOND', 'POOLAREA', 'POOLQC', 'FENCE', 'MOSOLD', 'YRSOLD' ], 
                   'values': [[9000, '1Fam', '2Story', 9, 1920, 'Hip', 'Gd', 'PConc', 'TA',
       'GasA', 'Ex', 'Y', 'SBrkr', 1, 0, 3, 1, 'TA', 7, 0, 'NA', 'Detchd',
       'Unf', 2, 'TA', 0, 'NA', 'NA', 7, 2009]]}

In [None]:
predictions = client.deployments.score(scoring_endpoint, scoring_payload)

In [None]:
print(json.dumps(predictions, indent=2))