# The Impact of Location and Property Characteristics on House Sale Prices : An Inferential Analysis

## 1. Business Understanding

### (a) Introduction

Real Estate is property consisting of land and the buildings on it, along with its natural resourses.The history of real estate can be tracked back to ancient times, when land was acquired by conquest, purchase or inheritance. In the United States, real estate brokers began presenting houses for sale around 1900. By 1908, the National Association of Real Estate Exchanges was founded to bring brokers and agents together to facilitate selling of homes.

Real Estate agencies are business organizations that generally represent either the buyer or the seller in terms of home transactions, and work as a collective group of lincensed agents and/or brokers who operate a given geographical area. Real Estate agents are hired to market and sell properties on behalf of home sellers. They vet potential buyers, lead viewings, and help negotiate final selling price. They usually work to a base annual salary and may earn commision for house sales.

Agents who work for the seller, also known as listing agents, advise clients on how to price the property and prepare for a sale, including providing tips on last-minute improvements that can help boost the price or encourage speedy offers. Seller agents market the property through listing services, networking and advertisements. On the other hand, agents who work for the buyers search for available properties that match the buyer's price range and wish list.These agents often look at past sales data on comparable properties to help prospective buyers come up with a fair bid. 

Generally , real estate agencies act as intermidiaries between property buyers, sellers, landlords and tenants. They represent their clients' interests and work to achieve their goals in real estate transactions. This representation may involve marketing properties, identifying potential buyers or tenants, negotiating deals and handling paperwork.

This project aims to use linear regression to analyze the relationship between the location, the house characteristics and its sale price by developing a model that takes into account intrinsic characteristics of a property such as number of bedrooms , number of bathrooms , square footage of living space , level of craftmanship used to build the house, square footage of the lot and extrinsic factors such as the location of the property. By analyzing this factors, the model will be able to provide guidance to Azizi Realtors real estate agency when it comes to advising their clientel on property valuations. This approach offers a more scientific approach to real estate valuations compared to the traditional approaches that can lean towards the qualitative side. This analysis will provide valuable insights to the clientel of Azizi Realtors real estate agency helping them make informed decisions which in turn will benefit the agency by providing valuable service to their clients.




### (b) Problem Statement

Azizi Realtors want to provide effective advice to their clientel on how the location and house characteristics may increase the estimated value of a house. For the agency to do this effectively, they need a deep understanding on the factors that influence property values. Our goal is to develop a linear regression model that uses data on past properties to accurately capture the relationship between a house's location, characteristics and sale price. This model can provide valuable insights for the analysis, allowing us to estimate how sale price changes as the independent variables change. By providing Azizi Realtors with this information, they can effectively advice their clients when it comes to buying, selling and investing in properties. By doing so we aim to increase the business value of the agency enabling them provide accurate and informed advice to their clientel , leading to increased customer flow, satisfaction and loyalty.

### (c) Defining a metric for success

### (d) Main Objective

To develop a multiple linear regression model that can establish a relationship between a house's location and characteristics and their impact on the house prices

### (e) Specific Objectives

- Testing the assumptions of multiple linear regression
- Analyzing data to identify the most important factors that affect house prices
- Building a linear regression model that evaluates a house's price and how it is impacted by location and house characteristics.
- Evaluating the models statistical significance , MSE and coefficients to come up with interpretable results

### (f) Recording the Experimental Design

- **Reading and checking data** 
This stage involves examining the data and making sense of the column names and their various meaning.

- **Data Wrangling** 
This stage involves handling missing and place holder values, removing outliers and handling categorical data to be able to use it in the model.

- **Modelling**
This stage involves fitting a linear regression model with the sale price as a dependent variable and examining how it changes as the independent variables change.

- **Regresssion Results**
This stage involves interpretating the model's coefficients, the R-squared and the MSE to come up with meaningful insights

- **Conclusions and Recommendations**
This stage involves using the results of the analysis to come up with insights and recommendations for Azizi Realtors inorder for them to provide effective advice to their clientel.

### (g) Data Relevance

This analysis will use data from King County housing dataset. The dataset has 21598 rows and 21 columns.This dataset includes information such as the number of bedrooms and bathrooms, the square footage of the house and the its location.By using this data I can develop a multiple linear regression model that can establish a relationship between a house's location and characteristics and their impact on the house prices. By using this data, we can gain insights on how these factors affect the house prices in King County, Washington. This can help the Azizi Realtors make future predictions on house prices and therefore effectively advice their clients.

## 2. Reading and Checking Data

In [3]:
# importing the necessary libraries

import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
import seaborn as sns
sns.set_style("darkgrid")
import matplotlib.pyplot as plt
%matplotlib inline





In [5]:
# reading into the data

housing_df = pd.read_csv("data/kc_house_data.csv")
housing_df.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,10/13/2014,221900.0,3,1.0,1180,5650,1.0,,NONE,...,7 Average,1180,0.0,1955,0.0,98178,47.5112,-122.257,1340,5650
1,6414100192,12/9/2014,538000.0,3,2.25,2570,7242,2.0,NO,NONE,...,7 Average,2170,400.0,1951,1991.0,98125,47.721,-122.319,1690,7639
2,5631500400,2/25/2015,180000.0,2,1.0,770,10000,1.0,NO,NONE,...,6 Low Average,770,0.0,1933,,98028,47.7379,-122.233,2720,8062
3,2487200875,12/9/2014,604000.0,4,3.0,1960,5000,1.0,NO,NONE,...,7 Average,1050,910.0,1965,0.0,98136,47.5208,-122.393,1360,5000
4,1954400510,2/18/2015,510000.0,3,2.0,1680,8080,1.0,NO,NONE,...,8 Good,1680,0.0,1987,0.0,98074,47.6168,-122.045,1800,7503


In [6]:
# checking the last rows of the dataset

housing_df.tail()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
21592,263000018,5/21/2014,360000.0,3,2.5,1530,1131,3.0,NO,NONE,...,8 Good,1530,0.0,2009,0.0,98103,47.6993,-122.346,1530,1509
21593,6600060120,2/23/2015,400000.0,4,2.5,2310,5813,2.0,NO,NONE,...,8 Good,2310,0.0,2014,0.0,98146,47.5107,-122.362,1830,7200
21594,1523300141,6/23/2014,402101.0,2,0.75,1020,1350,2.0,NO,NONE,...,7 Average,1020,0.0,2009,0.0,98144,47.5944,-122.299,1020,2007
21595,291310100,1/16/2015,400000.0,3,2.5,1600,2388,2.0,,NONE,...,8 Good,1600,0.0,2004,0.0,98027,47.5345,-122.069,1410,1287
21596,1523300157,10/15/2014,325000.0,2,0.75,1020,1076,2.0,NO,NONE,...,7 Average,1020,0.0,2008,0.0,98144,47.5941,-122.299,1020,1357
