# Northwestern County Housing Renovation Analysis

Author: Armun Shakeri

# Overview

This project analyzes housing renovations and how renovations might increase the value of a property owners home. 

# Business Problem

Recently property values have been increasing throughout the United States. For those wanting to sell their homes, home renovations might be a way homeowners can increase their property value. This project analyzes these renovations and explores if they do have a positive increase on the house's value. 

# Data Understanding

The following data is imported from KC house data and shows different data about homes such as the date a home was sold, price, bedrooms...etc. For this project we will not need to use some of these columns, they will be removed. 

In [3]:
# Import standard packages 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3
%matplotlib inline

In [4]:
# Run code to explore the data
kc = pd.read_csv('data/kc_house_data.csv')

In [5]:
# Below shows the variables for kc dataset. Some of these columns will be removed. 
kc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21597 entries, 0 to 21596
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             21597 non-null  int64  
 1   date           21597 non-null  object 
 2   price          21597 non-null  float64
 3   bedrooms       21597 non-null  int64  
 4   bathrooms      21597 non-null  float64
 5   sqft_living    21597 non-null  int64  
 6   sqft_lot       21597 non-null  int64  
 7   floors         21597 non-null  float64
 8   waterfront     19221 non-null  object 
 9   view           21534 non-null  object 
 10  condition      21597 non-null  object 
 11  grade          21597 non-null  object 
 12  sqft_above     21597 non-null  int64  
 13  sqft_basement  21597 non-null  object 
 14  yr_built       21597 non-null  int64  
 15  yr_renovated   17755 non-null  float64
 16  zipcode        21597 non-null  int64  
 17  lat            21597 non-null  float64
 18  long  

In [6]:
# Below we drop values that are irrelevant towards renovation analysis 
kc = kc.drop(['date', 'view', 'sqft_above', 'sqft_basement', 'yr_renovated', 'zipcode', 'lat', 'long', 'sqft_living15', 
        'sqft_lot15'], axis=1)
kc.head()

Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,condition,grade,yr_built
0,7129300520,221900.0,3,1.0,1180,5650,1.0,,Average,7 Average,1955
1,6414100192,538000.0,3,2.25,2570,7242,2.0,NO,Average,7 Average,1951
2,5631500400,180000.0,2,1.0,770,10000,1.0,NO,Average,6 Low Average,1933
3,2487200875,604000.0,4,3.0,1960,5000,1.0,NO,Very Good,7 Average,1965
4,1954400510,510000.0,3,2.0,1680,8080,1.0,NO,Average,8 Good,1987


In [7]:
# Here we drop all Na values. 
kc.dropna()

Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,condition,grade,yr_built
1,6414100192,538000.0,3,2.25,2570,7242,2.0,NO,Average,7 Average,1951
2,5631500400,180000.0,2,1.00,770,10000,1.0,NO,Average,6 Low Average,1933
3,2487200875,604000.0,4,3.00,1960,5000,1.0,NO,Very Good,7 Average,1965
4,1954400510,510000.0,3,2.00,1680,8080,1.0,NO,Average,8 Good,1987
5,7237550310,1230000.0,4,4.50,5420,101930,1.0,NO,Average,11 Excellent,2001
...,...,...,...,...,...,...,...,...,...,...,...
21591,2997800021,475000.0,3,2.50,1310,1294,2.0,NO,Average,8 Good,2008
21592,263000018,360000.0,3,2.50,1530,1131,3.0,NO,Average,8 Good,2009
21593,6600060120,400000.0,4,2.50,2310,5813,2.0,NO,Average,8 Good,2014
21594,1523300141,402101.0,2,0.75,1020,1350,2.0,NO,Average,7 Average,2009


In [9]:
kc.sort_values('grade', ascending=False).head(30)

Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,condition,grade,yr_built
19724,3630240140,585000.0,4,3.0,2110,1286,2.0,NO,Average,9 Better,2007
7907,6352600490,820000.0,4,3.5,2770,8049,2.0,NO,Average,9 Better,2002
13921,4379400580,698000.0,3,2.5,2580,4636,2.0,NO,Average,9 Better,2006
10193,3530200160,654950.0,4,2.5,2790,45902,2.0,NO,Average,9 Better,1987
18480,7732410220,808000.0,4,2.25,2500,8866,2.0,NO,Good,9 Better,1987
2151,2880100160,1010000.0,4,3.5,3350,3752,2.0,NO,Average,9 Better,2007
20059,1085623640,428900.0,4,2.5,2598,5553,2.0,NO,Average,9 Better,2014
6188,5469502780,350000.0,4,2.5,2260,13755,1.0,NO,Good,9 Better,1975
4253,3629920600,619500.0,3,2.5,2170,5000,2.0,,Average,9 Better,2003
4251,913000315,1300000.0,6,4.5,3902,3880,3.0,NO,Good,9 Better,1977


# Data Modeling

In [6]:
# Here shows the different statistical values of the kc dataframe
kc.describe()

Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,yr_built
count,21597.0,21597.0,21597.0,21597.0,21597.0,21597.0,21597.0,21597.0
mean,4580474000.0,540296.6,3.3732,2.115826,2080.32185,15099.41,1.494096,1970.999676
std,2876736000.0,367368.1,0.926299,0.768984,918.106125,41412.64,0.539683,29.375234
min,1000102.0,78000.0,1.0,0.5,370.0,520.0,1.0,1900.0
25%,2123049000.0,322000.0,3.0,1.75,1430.0,5040.0,1.0,1951.0
50%,3904930000.0,450000.0,3.0,2.25,1910.0,7618.0,1.5,1975.0
75%,7308900000.0,645000.0,4.0,2.5,2550.0,10685.0,2.0,1997.0
max,9900000000.0,7700000.0,33.0,8.0,13540.0,1651359.0,3.5,2015.0
