# Research The Domain
Step one, research the project domain. Look for other solutions and ideas, find out what works and what doesnt work. Look at large players with solutions, such as Zillow and the county assessor.

## King County Assessor
King county is where our data set is from, so it makes sense to start there and look for clues about what the data contains, and the context of the dataset.

We find a little bit of information about the dataset, but there is a lot that is not included in the appendix.

For example, it tells us about 'grade' and 'condition', though leaves out important features such as 'view'.

We take the information and make a table out of it.

### Column Names and descriptions for Kings County Data Set
Information about the data. Taken from multiple sources online including the King County government website.

| Name | Description | 
|---|---|
|**id**| Unique identified for a house |
|**date**| Date house was sold |
|**price**| The prediction target |
|**bedrooms**| Number of bedrooms per house |
|**bathrooms**| Number of bathrooms |
|**sqft_livingsquare**| Total finished square footage of the home |
|**sqft_lotsquare**| Total square footage of the lot |
|**floorsTotal**| Total floors (levels) in the house |
|**waterfront**| Whether or not the house has waterfront access |
|**view**| Rating of the view quality |
|**condition**| Relative to age and grade. Coded 1-5. See condition below |
|**grade**| Represents the construction quality of improvements. Grades run from grade 1 to 13. See grade below |
|**sqft_above**| Square footage of the house apart from basement |
|**sqft_basement**| Square footage of the basement |
|**yr_built**| Year the house was built |
|**yr_renovated**| Year the house was renovated | 
|**zipcode**| zip | 
|**lat**| Latitude coordinate |
|**long**| Longitude coordinate |
|**sqft_living15**| The square footage of interior housing living space for the nearest 15 neighbors |
|**sqft_lot15**| The square footage of the land lots of the nearest 15 neighbors |

### Condition:
From <a href="https://blue.kingcounty.com/Assessor/eRealProperty/ResidentialGlossary.aspx?idx=viewall&Parcel=7960900070&AreaReport=http://www.KingCounty.gov/depts/Assessor/Reports/area-reports/2019/residential-northeast/033.aspx#BuildingGradeKing">King County Assesor Website</a>.


| Index | Rating | Description |
|---|---|---|
| 1 | Poor | Repair and overhaul needed on painted surfaces, roofing, plumbing, heating and numerous functional inadequacies. Excessive deferred maintenance and abuse, limited value-in-use, approaching abandonment or major reconstruction; reuse or change in occupancy is imminent. Effective age is near the end of the scale regardless of the actual chronological age.|
| 2 | Fair | Much repair needed. Many items need refinishing or overhauling, deferred maintenance obvious, inadequate building utility and systems all shortening the life expectancy and increasing the effective age.|
| 3 | Average | Some evidence of deferred maintenance and normal obsolescence with age in that a few minor repairs are needed, along with some refinishing. All major components still functional and contributing toward an extended life expectancy. Effective age and utility is standard for like properties of its class and usage.|
| 4 | Good | No obvious maintenance required but neither is everything new. Appearance and utility are above the standard and the overall effective age will be lower than the typical property.|
| 5 | Very Good | All items well maintained, many having been overhauled and repaired as they have shown signs of wear, increasing the life expectancy and lowering the effective age with little deterioration or obsolescence evident with a high degree of utility.|

### Grade:
Also from <a href="https://blue.kingcounty.com/Assessor/eRealProperty/ResidentialGlossary.aspx?idx=viewall&Parcel=7960900070&AreaReport=http://www.KingCounty.gov/depts/Assessor/Reports/area-reports/2019/residential-northeast/033.aspx#BuildingGradeKing">King County Assesor Website</a>.

| Index | Description |
|---|---|
| 1-3 | Falls short of minimum building standards. Normally cabin or inferior structure.|
| 4 | Generally older, low quality construction. Does not meet code.|
| 5 | Low construction costs and workmanship. Small, simple design.|
| 6 | Lowest grade currently meeting building code. Low quality materials and simple designs.|
| 7 | Average grade of construction and design. Commonly seen in plats and older sub-divisions.|
| 8 | Just above average in construction and design. Usually better materials in both the exterior and interior finish work.|
| 9 | Better architectural design with extra interior and exterior design and quality.|
| 10 | Homes of this quality generally have high quality features. Finish work is better and more design quality is seen in the floor plans. Generally have a larger square footage.|
| 11 | Custom design and higher quality finish work with added amenities of solid woods, bathroom fixtures and more luxurious options.| 
| 12 | Custom design and excellent builders. All materials are of the highest quality and all conveniences are present.|
| 13 | Generally custom designed and built. Mansion level. Large amount of highest quality cabinet work, wood trim, marble, entry ways etc.|

## Zillow
Zillow is the leader in home price predictions. They use a proprietary algorithm that makes predictions based on many different inputs. Features such as location, neighborhood, proximity to good schools, walk score, crime, houses purchase history, and others that are not disclosed.

Some possible features that are easily obtainable and can be used for this project are: walk score, proximity to good schools, and crime.

### Walk Score
Walk score is a easy attribute to add to our dataset, since there is a public API that we can query with address and lat/lon. Since we do not have any addresses though we will have to reverse look up the address with the lat and long.
This is possible through the Bing API, and probably any other map/search provider, though Bing is the most easily available option.

Once we had the address, we were ready to query the Walk Score API for each house in our dataset, and save the scores in new columns: Walk Score, Drive Score, Transit Score.

### Proximity to School (GreatSchools.com)
For the proximity to schools we need to find a list of all the high rated schools in King County and then get the driving distance and time from each house to each school.

This was a little bit more difficult, because the GreatSchools API did not appear to offer a simple route for obtaining the data I was looking for, which is a list of best schools and their respective ratings and addresses.

Instead I created a scraper script that searched each city in the dataset, and viewed the results by highest rated. I then used this sorted rating by city results page to scrape the 9 and 10 rated schools, and save their information in a new file.

After I had a list of all the 9 and 10 rated schools in King County, I was ready to use the Bing API again, this time to find the drive time and miles from each house to the nearest schools. Because I had a limited amount of API calls, I could not get drive time to each of the 50+ schools in King County, so I narrowed it down to the closest 10 schools using the Haversine distance, and then getting drive time and miles to those 10 schools.

I saved the new data as two seperate columns: 'school_kms', 'school_mins', to indicate the kilometers required to drive to the nearest school, and the minutes to drive to the closest school.

### Crime
If I had more time I would use the King County crime rate information, and create a new column based on the crime rate of the neighborhood that the house is part of.