Skip to content

IBM/predicting-house-prices-using-netezza

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build a Machine learning web application for predicting house rental prices with Netezza in-database analytics and Streamlit

Project Description:

  • In this repository, you will learn how to leverage Netezza Python in-database analytics (nzpyida) and Streamlit to quickly build and deploy in-database machine learning applications. While nzpyida allows users to push custom ML inside Netezza database, Streamlit allows users to build beautiful web applications for such ML models. We will use US housing rental prediction use-case as an example scenario to illustrate all the steps in detail, connecting to the Netezza Performance server, performing data analysis, creating the machine learning model, and integration of the machine learning model with a web application.
  • The housing dataset which is publicly available (housing_data) is a collection of rental records for various houses in several cities/towns across the United States. Based on the dependent variables in the dataset (housing type, bds, baths etc.), the target variable, rent of the house (price) is to be predicted. Since the price of one-bedroom apartment in San Francisco varies a lot from that of a one-bedroom apartment in Idaho, it might be helpful to build a ML model catering to each location separately rather than having one single model built for all the locations. Netezza with its powerful MPP (Massively Parallel Processing) architecture is ideally suited to handle such scenarios.
  • Our goal is to perform the following steps: transform the data (impute the columns by assigning default values for null values), build ML models on location basis (build a gradient-boosting regressor for the transformed data), then build web application to interact with predictions (using Streamlit). The web application can easily be downloaded and replicated on your local machine.

Steps for installing and running the repository:

  1. Make sure your local machine has python installed.

python --version

  1. Use venv to setup a virtual environment in which your application will reside.

python3 -m venv /the-folder-your-application-will-reside-in

  1. Download the code available in this repository either by downloading a zip file or clone the repository.

git clone https://github.ibm.com/pratik-joseph-dabre/housing_model.git

  1. Navigate to the folder where the repository resides and download the dependencies required to run the application.

pip3 install -r requirements.txt

  1. After successful installation of all the dependencies, start the streamlit application.

cd streamlit

streamlit run main.py

Output in the browser:

1. Model training:

  • Instead of training the model by a set of commands, we can trigger the model training from the data application using just a single click.

Screen Shot 2022-07-13 at 11 30 52 AM

Screen Shot 2022-07-13 at 11 31 12 AM

2. Selecting the attributes to find houses for:

Screen Shot 2022-07-14 at 11 44 48 AM

3. Visualizing the results:

  • After selecting the parameters, you get back all the houses matching your description in a table format.

  • Using streamlit-aggrid, an interactive, customizable grid is rendered on the browser, on which you can perform multiple operations.

Screen Shot 2022-07-14 at 11 46 02 AM

  • Based on various parameters, you can filter out the rows as per your needs.

  • After filtering out the records, a user can perform operations on the records.

Screen Shot 2022-07-13 at 11 29 38 AM

(For example, here the user is able to visualize all the records selected on a map)

Screen Shot 2022-07-13 at 11 28 56 AM