## Predicting House Prices in King County: Evaluating the Impact of Home Features and Renovations using  Comprehensive Analysis of
 Multiple Linear Regression Models

### INTRODUCTION
The real estate market in King County is dynamic and competitive, with various factors influencing property values. Homeowners and real estate agencies are particularly interested in understanding how different features of a house, as well as renovations, can impact its market value. Accurate and data-driven insights into these factors can significantly enhance decision-making processes for buying, selling, and renovating homes.

### KEY OBJECTIVES
1. Create a Home Price Predictive Model:

Build and improve a regression model to precisely forecast King County real estate values depending on a range of property characteristics and remodeling factors.
Make that the model is reliable, strong, and has high predicted accuracy.

2. List the Main Factors Affecting Home Prices:

Determine which characteristics—such as location, number of bedrooms, and square footage—have the most effects on home pricing by analyzing the dataset.
Analyze the impact of particular improvements on home values, such as kitchen remodels and bathroom additions.

3. Give Homeowners Useful Information:

Utilize the model's output to provide homeowners with useful guidance on how to increase the market value of their property through well-chosen upgrades.
Determine which upgrades are most cost-effective and provide the best return on investment.

4. Facilitate Decision-Making in Real Estate Agencies:
Provide the real estate company with data-driven insights so that it may provide clients with more informed purchasing and selling advice.
Boost the agency's capacity to advise customers on the types of home upgrades that will most likely raise their property's worth.

5. Improve Knowledge and Application of Data:

Find patterns, correlations, and trends in the dataset by doing in-depth exploratory data analysis.
To enhance model performance and data quality, apply feature engineering.

6. Assure Reproducible and Transparent Analysis:

Completely record the modeling, assessment, and data processing procedures.
Make sure stakeholders and other data scientists can replicate and comprehend the analysis.

7. Effectively Communicate Findings:

Write a thorough report that succinctly and clearly summarizes the approach, conclusions, and suggestions.
Make presentations and graphics to effectively communicate ideas and support stakeholder decision-making.

### OUTLINE
1. Data analysis and cleaning: Outliers and missing values were handled as the King County House Sales dataset was loaded and cleaned.
2. Exploratory Data Analysis (EDA): Investigated the connections between attributes and prices and produced a visual representation of the distribution of home prices.
3. Determined the essential characteristics most closely correlated with home values.
4. Model Development: Four more sophisticated linear regression models were constructed and assessed.
5. Model Evaluation: R-squared (R²) and Mean Absolute Error (MAE) were used to evaluate the performance of the model.
6. Suggestions: gave homeowners and real estate agents practical advice on how to increase house values by emphasizing living space optimization and quality enhancements.


In [2]:
#imports 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score,mean_absolute_error


### Data Loading

In [5]:
df = pd.read_csv("../data/kc_house_data.csv")
df.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,10/13/2014,221900.0,3,1.0,1180,5650,1.0,,0.0,...,7,1180,0.0,1955,0.0,98178,47.5112,-122.257,1340,5650
1,6414100192,12/9/2014,538000.0,3,2.25,2570,7242,2.0,0.0,0.0,...,7,2170,400.0,1951,1991.0,98125,47.721,-122.319,1690,7639
2,5631500400,2/25/2015,180000.0,2,1.0,770,10000,1.0,0.0,0.0,...,6,770,0.0,1933,,98028,47.7379,-122.233,2720,8062
3,2487200875,12/9/2014,604000.0,4,3.0,1960,5000,1.0,0.0,0.0,...,7,1050,910.0,1965,0.0,98136,47.5208,-122.393,1360,5000
4,1954400510,2/18/2015,510000.0,3,2.0,1680,8080,1.0,0.0,0.0,...,8,1680,0.0,1987,0.0,98074,47.6168,-122.045,1800,7503
