## Introduction

Haven-Kings Property Management, overseeing a vast portfolio of houses in the King County area, faces a dynamic and competitive real estate market. It's undeniable that Seattle and, by extension, King County stand as prime locations for acquiring and overseeing rental assets. Savvy investors understand that while a single property in this thriving market is beneficial, expanding one's real estate holdings is the key to amplifying profits and securing long-term financial stability.

However, improper strategy can lead to an accumulation of properties that fail to generate revenue. A fruitful investment in real estate is not just an assortment of various properties; it must be a well-thought-out portfolio that not only yields income but also capitalizes on tax benefits. Otherwise, these so-called "investments" can quickly become costly burdens. Property managers, real estate agents, and individual property owners need to set the right pricing to maximize their returns while staying competitive. The challenge is to find the optimal balance between high rental or sales prices and market demand. This is where data-driven insights can make a significant difference.

## Executive Summary

Haven-Kings Property Management is our client, overseeing a vast portfolio of houses in King County, and they aim to optimize their rental pricing strategy. Haven-Kings has previously relied on traditional methods of setting rent and pricing for their investors. They have been dependent on Comparative Market Analysis (CMA), where they conduct local comparisons by looking at similar properties in the same area to gauge reasonable pricing. Another method they have relied on is the "1% Rule": The monthly rent/pricing should be approximately 1% of the property's total value. Additionally, they increase prices each year based on the rate of inflation or a fixed percentage.

The aim of this project is to revolutionize Haven-Kings Property Management's approach to house pricing. This will be achieved by developing a predictive model using data analytics, specifically linear regression and machine learning techniques, to provide dynamic pricing recommendations based on various property features. This initiative is not just a step but a leap towards maximizing revenue and maintaining a competitive edge in Seattle's ever-evolving real estate market. The implementation of a predictive model for rental pricing is a forward-looking initiative that will position Haven-Kings Property Management as an industry leader in adopting data-driven strategies.

## Business Problem 

"Haven-Kings Property Management" manages a portfolio of houses in the King County area and seeks data-driven solutions to optimize property management and investment decisions. The business problem is to develop predictive models that leverage data from the King County House Sales dataset to assist in optimizing house pricing, making informed decisions about property renovations and investments, and providing dynamic pricing recommendations for their rental properties.

## Objective 

The primary objective is to create predictive models using multiple linear regression that "Haven-Kings Property Management" can use for the following purposes:

1. **Optimize House Pricing:** Provide "Haven-Kings Property Management" with a tool to determine optimal pricing for the houses in their portfolio in the King County area. This tool should consider house characteristics, location, and market conditions, enabling the company to maximize property value while remaining competitive in the local real estate market.

2. **Dynamic Pricing Recommendations:** Develop dynamic pricing recommendations for "Haven-Kings Property Management's" rental properties, leveraging data analytics and machine learning techniques, particularly linear regression, to adjust rental rates based on property features and market conditions. This will enhance revenue optimization.

## Research Questions 

To address the business problem and achieve the objectives, the following research questions can guide the analysis:

1. **House Pricing:**
   - What are the key factors that most strongly influence house prices in the King County area?
   - How do house characteristics (e.g., size, number of bedrooms, amenities) and location impact property values in this specific market?
   - Can a predictive model accurately estimate house prices for "Haven-Kings Property Management's" portfolio?

2. **Dynamic Pricing Recommendations:**
   - How can dynamic pricing recommendations be generated for "Haven-Kings Property Management's" rental properties using linear regression and machine learning?
   - What data-driven factors should be considered when adjusting rental rates based on property features and market conditions?
   - How will the implementation of dynamic pricing impact the company's revenue and competitiveness in the real estate market?

These refined business problem, objective, and research questions, with consistent naming, provide a comprehensive framework for addressing the challenges and opportunities faced by "Haven-Kings Property Management" in the competitive real estate market.


### **Data Overview: King County House Sales**

**Objective**: Predict the sales price of houses in King County, Seattle.

**Time Frame**: Homes sold between May 2014 and May 2015.

**Structure**:
- **Observations**: 21,613
- **Features**: 20 (excluding target variable)
- **Target Variable**: Price

**Key Features**:
- **Size & Structure**: `bedrooms`, `bathrooms`, `sqft_living`, `sqft_lot`, `floors`, `sqft_above`, `sqft_basement`
- **Location & View**: `waterfront`, `view`, `zipcode`, `lat`, `long`
- **Quality & Condition**: `condition`, `grade`
- **Age & Renovation**: `yr_built`, `yr_renovated`
- **Recent Renovations**: `sqft_living15`, `sqft_lot15`

**Insights**: 
- Price is heavily influenced by features like `bedrooms`, `sqft_living`, and the house's location.
- No missing values, aiding in model accuracy.

**Analysis Steps**:
1. Import necessary libraries.
2. Load the dataset.
3. Explore data structure, types, and basic statistics.
4. Visualize data for insights.
5. Perform regression analyses: simple, multiple, and polynomial.

In [1]:
# libraries for numerical
import pandas as pd
import numpy as np

# libraries for visualization
import matplotlib.pyplot as plt
import seaborn as sns

# date libraries
import datetime

# libraries for machine learning
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn import metrics
import statsmodels.api as sm
import scipy.stats as stats

import warnings
warnings.filterwarnings('ignore')
# to plot the diagrams within the cells
%matplotlib inline

In [2]:
# Load the kc housing dataset
df = pd.read_csv("data/kc_house_data.csv")
df.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,10/13/2014,221900.0,3,1.0,1180,5650,1.0,,NONE,...,7 Average,1180,0.0,1955,0.0,98178,47.5112,-122.257,1340,5650
1,6414100192,12/9/2014,538000.0,3,2.25,2570,7242,2.0,NO,NONE,...,7 Average,2170,400.0,1951,1991.0,98125,47.721,-122.319,1690,7639
2,5631500400,2/25/2015,180000.0,2,1.0,770,10000,1.0,NO,NONE,...,6 Low Average,770,0.0,1933,,98028,47.7379,-122.233,2720,8062
3,2487200875,12/9/2014,604000.0,4,3.0,1960,5000,1.0,NO,NONE,...,7 Average,1050,910.0,1965,0.0,98136,47.5208,-122.393,1360,5000
4,1954400510,2/18/2015,510000.0,3,2.0,1680,8080,1.0,NO,NONE,...,8 Good,1680,0.0,1987,0.0,98074,47.6168,-122.045,1800,7503


In [3]:
#finding no of rows and columns

df.shape

(21597, 21)