# <u>Part 2 Deployment:
- For this part of the exam, you will create a Streamlit app that will allow users to predict the price of a home by inputting certain information about it. It will include inputs for features of the home and produce a predicted price

- ## Use the filepaths dictionary to load in the provided ML model and training data (X_train, y_train) to determine which features were used and the range of values included in each feature. If you receive an error about no module named dill, run “!pip install dill” in your notebook before using joblib.load
    - ### `Tip:` Explore the original X_train values to determine what the features were and which widgets/components and values would be appropriate for each feature.

In [1]:
pip install dill

Collecting dill
  Obtaining dependency information for dill from https://files.pythonhosted.org/packages/c9/7a/cef76fd8438a42f96db64ddaa85280485a9c395e7df3db8158cfec1eee34/dill-0.3.8-py3-none-any.whl.metadata
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
   ---------------------------------------- 0.0/116.3 kB ? eta -:--:--
   ---------------------------------------- 0.0/116.3 kB ? eta -:--:--
   --- ------------------------------------ 10.2/116.3 kB ? eta -:--:--
   -------------------- ------------------ 61.4/116.3 kB 648.1 kB/s eta 0:00:01
   -------------------------------------- 116.3/116.3 kB 848.1 kB/s eta 0:00:00
Installing collected packages: dill
Successfully installed dill-0.3.8
Note: you may need to restart the kernel to use updated packages.


## `Load in the filepaths.json file from the config folder.`

In [2]:
# import joblib
import joblib, json
with open('config/filepaths.json') as f:
    FPATHS = json.load(f)
FPATHS

{'data': {'ml': {'train': 'data/part2-training-data.joblib',
   'test': 'data/part2-test-data.joblib'}},
 'models': {'linear_regression': 'models/part2-model-pipeline.joblib'}}

In [3]:
import streamlit as st
@st.cache_data
def load_Xy_data(fpath):
    train_path = fpath['data']['ml']['train']
    X_train, y_train =  joblib.load(train_path)
    test_path = fpath['data']['ml']['test']
    X_test, y_test = joblib.load(test_path)
    return X_train, y_train, X_test, y_test
 
@st.cache_resource
def load_model_ml(fpath):
    model_path = fpath['models']['linear_regression']
    linreg = joblib.load(model_path)
    return linreg



In [4]:
X_train, y_train, X_test, y_test = load_Xy_data(FPATHS)

2024-04-15 14:19:04.908 
  command:

    streamlit run C:\Users\Valde\anaconda3\envs\dojo-env\lib\site-packages\ipykernel_launcher.py [ARGUMENTS]
2024-04-15 14:19:04.909 No runtime found, using MemoryCacheStorageManager


In [5]:
X_test.head()

Unnamed: 0_level_0,bedrooms,bathrooms,sqft_living
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5416500660,4,2.5,2960
259801030,4,2.0,1610
7577700185,4,1.0,1440
1939000030,4,2.5,2540
7524950870,4,2.25,2110


In [6]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 15905 entries, 2473372170 to 7806450190
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   bedrooms     15905 non-null  int64  
 1   bathrooms    15905 non-null  float64
 2   sqft_living  15905 non-null  int64  
dtypes: float64(1), int64(2)
memory usage: 497.0 KB


In [7]:
linreg_model = load_model_ml(FPATHS)

In [8]:
linreg_model

In [9]:
X_train.describe()

Unnamed: 0,bedrooms,bathrooms,sqft_living
count,15905.0,15905.0,15905.0
mean,3.34901,2.088086,2030.448287
std,0.925197,0.736655,837.357259
min,0.0,0.0,290.0
25%,3.0,1.5,1420.0
50%,3.0,2.25,1890.0
75%,4.0,2.5,2500.0
max,33.0,7.5,7480.0


In [10]:
import streamlit as st
#bedrooms
bedrooms = st.sidebar.slider('Bedrooms',
                            min_value = X_train['bedrooms'].min(),
                            max_value = X_train['bedrooms'].max(),
                            step = 1, value = 3)

In [11]:
bedrooms

3

In [12]:
#bathrooms
bathrooms = st.sidebar.slider('Bathrooms',
                             min_value = X_train['bathrooms'].min(),
                             max_value = X_train['bathrooms'].max(),
                             step = .25, value = 2.5)

In [13]:
bathrooms

2.5

In [14]:
#sqft_living
sqft_living = st.sidebar.number_input('Sqft Living Area',
                                     min_value=290,
                                     max_value=X_train['sqft_living'].max(),
                                     step=150, value=2500)

In [15]:
# Define function to convert widget values to dataframe
import pandas as pd
def get_X_to_predict():
    X_to_predict = pd.DataFrame({'Bedroom': bedrooms,
                                 'Bathroom':bathrooms,
                                 'Living Area Sqft':sqft_living},
                             index=['House'])
    return X_to_predict

In [16]:
def get_prediction(model,X_to_predict):
    return  model.predict(X_to_predict)[0]

In [17]:
X_to_pred = get_X_to_predict()
X_to_pred

Unnamed: 0,Bedroom,Bathroom,Living Area Sqft
House,3,2.5,2500


In [18]:
type(X_to_pred)

pandas.core.frame.DataFrame

In [19]:
X_train['bedrooms'].max()

33