# Project Title

Building a Model for Home Sellers.

# Overview

We use multiple linear regression modeling to analyze house sales in a northwestern county.

# Business and Data Understanding

The data being used for this project provided various features for homes that had sold from 5/2/2014 - 5/27/2015. The price target variable was included, which allowed comparison of relationships between the various features and the target. 

The data schema is as follows:

The columns mean: 
* `id` - Unique identifier for a house
* `date` - Date house was sold
* `price` - Sale price (prediction target)
* `bedrooms` - Number of bedrooms
* `bathrooms` - Number of bathrooms
* `sqft_living` - Square footage of living space in the home
* `sqft_lot` - Square footage of the lot
* `floors` - Number of floors (levels) in house
* `waterfront` - Whether the house is on a waterfront
  * Includes Duwamish, Elliott Bay, Puget Sound, Lake Union, Ship Canal, Lake Washington, Lake Sammamish, other lake, and river/slough waterfronts
* `view` - Quality of view from house
  * Includes views of Mt. Rainier, Olympics, Cascades, Territorial, Seattle Skyline, Puget Sound, Lake Washington, Lake Sammamish, small lake / river / creek, and other
* `condition` - How good the overall condition of the house is. Related to maintenance of house.
    * 1 = Poor- Worn out. Repair and overhaul needed on painted surfaces, roofing, plumbing, heating and numerous functional inadequacies. Excessive deferred maintenance and abuse, limited value-in-use, approaching abandonment or major reconstruction 
    * 2 = Fair- Badly worn. Much repair needed. Many items need refinishing or overhauling, deferred maintenance obvious, inadequate building utility and systems all shortening the life expectancy and increasing the effective age.
    * 3 = Average- Some evidence of deferred maintenance and normal obsolescence with age in that a few minor repairs are needed, along with some refinishing. All major components still functional and contributing toward an extended life expectancy. Effective age and utility is standard for like properties of its class and usage.
    * 4 = Good- No obvious maintenance required but neither is everything new. Appearance and utility are above the standard and the overall effective age will be lower than the typical property.
    * 5= Very Good- All items well maintained, many having been overhauled and repaired as they have shown signs of wear, increasing the life expectancy and lowering the effective age with little deterioration or obsolescence evident with a high degree of utility.
* `grade` - Overall grade of the house. Related to the construction and design of the house.
    * Represents the construction quality of improvements. Grades run from grade 1 to 13. Generally defined as:
        * 1-3 Falls short of minimum building standards. Normally cabin or inferior structure.
        * 4 Generally older, low quality construction. Does not meet code.
        * 5 Low construction costs and workmanship. Small, simple design.
        * 6 Lowest grade currently meeting building code. Low quality materials and simple designs.
        * 7 Average grade of construction and design. Commonly seen in plats and older sub-divisions.
        * 8 Just above average in construction and design. Usually better materials in both the exterior and interior finish work.
        * 9 Better architectural design with extra interior and exterior design and quality.
        * 10 Homes of this quality generally have high quality features. Finish work is better and more design quality is seen in the floor plans. Generally have a larger square footage.
        * 11 Custom design and higher quality finish work with added amenities of solid woods, bathroom fixtures and more luxurious options.
        * 12 Custom design and excellent builders. All materials are of the highest quality and all conveniences are present.
        * 13 Generally custom designed and built. Mansion level. Large amount of highest quality cabinet work, wood trim, marble, entry ways etc.
* `sqft_above` - Square footage of house apart from basement
* `sqft_basement` - Square footage of the basement
* `yr_built` - Year when house was built
* `yr_renovated` - Year when house was renovated
* `zipcode` - ZIP Code used by the United States Postal Service
* `lat` - Latitude coordinate
* `long` - Longitude coordinate
* `sqft_living15` - The square footage of interior housing living space for the nearest 15 neighbors
* `sqft_lot15` - The square footage of the land lots of the nearest 15 neighbors

# Methodology

The methodology used for the Exploratory and analysis of building the model is Statsmodel ols.

# Modelling

Modelling process involved using raw data, scaled data and unscaled data and finally the log transformation to explore the target in relationship with  correlated features in order to build a model that generate a high Rsquared score.

# Regression Results

After the analysis and exploration we came up with a model that include three features that the home sellers can control:
* grade
* bathroom
* square foot of living

These features yield a high Rsquared score of 52%. 

# Conclusion

Based on the regression results, We concluded that Home sellers need to look into grade and bathrooms when building a house while also considering the squarefoot of the living area.

# Authors

* Crystal Gould Perrott
* Titilayo Amuwo
* Michael Ajayi

# Repository Navigation



> ## Data



* External Kingcounty housing data from Kaggle
* Processed Data- Final data sets used for analysis.
* Transformed data set
* Scaled dataset

> ## Notebooks

 * Final notebook
 * ipynb_checkpoints folders
 * Eda and modelling ipynb notebook
 * Train Data ipynb  notebook
 * Data Evaluation ipynb notebook
 * Baseline ipynb notebook

> ### README.md