**For this part of the exam, you will create a Streamlit app that will allow users to predict the price of a home by inputting certain information about it.  It will include inputs for features of the home and produce a predicted price.**

Load in the provided ML model (part2-model-pipeline.joblib) and training data (part2-training-data.joblib) to determine which features were used and the range of values included in each feature. If you receive an error about no module named dill, run “!pip install dill” in your notebook before using joblib.load

In [2]:
pip install dill

Collecting dill
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: dill
Successfully installed dill-0.3.7
Note: you may need to restart the kernel to use updated packages.


In [54]:
import joblib

X_train, y_train = joblib.load('Models/part2-training-data.joblib')

In [55]:
X_train.head()

Unnamed: 0_level_0,bedrooms,bathrooms,sqft_living
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2473372170,4,3.25,2820
597000195,3,1.75,1460
4346300010,3,2.5,1560
629650030,4,2.5,2233
713500030,5,3.5,4800


In [56]:
y_train.head()

id
2473372170     432000.0
597000195      527200.0
4346300010     545500.0
629650030      317500.0
713500030     1350000.0
Name: price, dtype: float64

In [57]:
model_pipeline = joblib.load('Models/part2-model-pipeline.joblib')

In [58]:
model_pipeline

### load in the filepaths.json file from the config folder.

In [59]:
import joblib, json
with open('config/filepaths.json') as f:
    FPATHS = json.load(f)
FPATHS

{'data': {'ml': {'train': 'data/part2-training-data.joblib',
   'test': 'data/part2-test-data.joblib'}},
 'models': {'linear_regression': 'models/part2-model-pipeline.joblib'}}

### Use the filepaths dictionary to load in the provided ML model and training data (X_train, y_train) to determine which features were used and the range of values included in each feature.

In [60]:
# Define the filepath using the new dictionary structure
# load train data 
traindata = FPATHS['data']['ml']['train']
[X_train,y_train] = joblib.load(traindata)

# load test data 
testdata = FPATHS['data']['ml']['test']
[X_test,y_test] = joblib.load(testdata)

# display Xtrain and Xtest

X_train.head()


Unnamed: 0_level_0,bedrooms,bathrooms,sqft_living
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2473372170,4,3.25,2820
597000195,3,1.75,1460
4346300010,3,2.5,1560
629650030,4,2.5,2233
713500030,5,3.5,4800


In [61]:
X_test.head()

Unnamed: 0_level_0,bedrooms,bathrooms,sqft_living
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5416500660,4,2.5,2960
259801030,4,2.0,1610
7577700185,4,1.0,1440
1939000030,4,2.5,2540
7524950870,4,2.25,2110


In [62]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 15905 entries, 2473372170 to 7806450190
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   bedrooms     15905 non-null  int64  
 1   bathrooms    15905 non-null  float64
 2   sqft_living  15905 non-null  int64  
dtypes: float64(1), int64(2)
memory usage: 497.0 KB


In [63]:
# Now we can save our model using the filepath from the dictionary
linreg_path = FPATHS['models']['linear_regression']
linreg_path


linreg_model = joblib.load(linreg_path)

In [64]:
linreg_model

### Explore the features


In [65]:
X_train.describe()

Unnamed: 0,bedrooms,bathrooms,sqft_living
count,15905.0,15905.0,15905.0
mean,3.34901,2.088086,2030.448287
std,0.925197,0.736655,837.357259
min,0.0,0.0,290.0
25%,3.0,1.5,1420.0
50%,3.0,2.25,1890.0
75%,4.0,2.5,2500.0
max,33.0,7.5,7480.0


In [66]:
import streamlit as st
#bedrooms
bedrooms = st.sidebar.slider('Bedrooms',
                            min_value = X_train['bedrooms'].min(),
                            max_value = X_train['bedrooms'].max(),
                            step = 1, value = 3)


In [67]:
bedrooms

3

In [68]:
#bathrooms
bathrooms = st.sidebar.slider('Bathrooms',
                             min_value = X_train['bathrooms'].min(),
                             max_value = X_train['bathrooms'].max(),
                             step = .25, value = 2.5)

In [69]:
bathrooms

2.5

In [70]:
#sqft_living
sqft_living = st.sidebar.number_input('Sqft Living Area',
                                     min_value=290,
                                     max_value=X_train['sqft_living'].max(),
                                     step=150, value=2500)

In [71]:
# Define function to convert widget values to dataframe
import pandas as pd
def get_X_to_predict():
    X_to_predict = pd.DataFrame({'Bedroom': bedrooms,
                                 'Bathroom':bathrooms,
                                 'Living Area Sqft':sqft_living},
                             index=['House'])
    return X_to_predict

In [72]:
def get_prediction(model,X_to_predict):
    return  model.predict(X_to_predict)[0]

In [73]:
X_to_pred = get_X_to_predict()
X_to_pred

Unnamed: 0,Bedroom,Bathroom,Living Area Sqft
House,3,2.5,2500


In [78]:
# #predict one value
# linreg_model.predict(X_to_pred)[0]

In [80]:
X_train['bedrooms'].max()

33