# Jacksonville Starter Home Predictor

House sales have soared since the onset of the pandemic. Demand surged as homes became home offices and home schools and as many apartment renters sought out space and more control over their environment. Buyers also benefitted from low interest rates and stimulus funds.

Inventory has not kept pace, however, since those who didn't have to sell their homes hunkered down. As a result, prices have surged, pricing out many first-time buyers - which creates a business opportunity for homebuilders.

With that in mind, I will build a recommender system that helps developers and builders understand the home characteristics first-time homebuyers are looking for and where in Jacksonville they are looking.

## 1. Data

The Duval County Property Appraiser maintains a database of Jacksonville property parcels, including each property's total square feet, heated square feet, market value, tax exemptions, property use, zoning rights, sale history and more.

The data is available for download as a txt file <a href="https://www.coj.net/departments/property-appraiser/information-offerings" >here</a>. 

## 2. Method

I will employ two modeling methods:
- __Random Forest:__ A supervised learning algorithm that combines many decision trees. A decision tree separates data branch by branch until data is grouped mostly homogenously, but is deterministic based on the data upon which the tree is trained. A random forest of these trees more accurately makes predictions by circumventing this problem.
- __K-Nearest Neighbors:__ A supervised learning algorithm that groups data into clusters by assigning new data points to the cluster of the majority of the k-nearest neighbors, where k is an assigned number of points.

## 3. Data Cleaning

In gathering the data, I first had to combine a file of sales data and a file of parcel data. To do this, I put both in a SQLite database. I also added a table of property appraiser qualification codes, which detail whether a sale was an arms-length transaction, involved multiple parcels, resulted from a bankruptcy, etc.

- <a href= "https://github.com/WillRobinson152/JacksonvilleHousingAnalysis/blob/main/1.GetData.ipynb" >Data acquisition</a>

I then executed queries to limit results to single-family, detached homes. 
- <a href="https://github.com/WillRobinson152/JacksonvilleHousingAnalysis/blob/main/2.MergeData.ipynb" >Data merging</a>

The merged data needed several cleaning steps. This includes using string comparison to standardize texts in mailing addresses, cities, states, zipcodes and owners. This step aided in exploratory data analysis, but many of these features were not used by the models.

Another step involved creating an adjusted valuation that creates distinct values for each house on parcels that contain more than one house. Finally, I condensed the dataset by creating binary columns indicating whether or not each property has the 10 most common home features, the 10 most common home subfeatures, the 10 most common property characteristics and the 10 most common zoning rights. This allowed me to create one row per property and to limit the total number of columns by eliminating hundreds of rare land uses, features, etc.

- <a href="https://github.com/WillRobinson152/JacksonvilleHousingAnalysis/blob/main/3.CleanData.ipynb" >Data cleaning</a>



## 4. Exploratory Data Analysis