# Housing Price Trends and Factors in Current Day America

Table of Contents:
    
1. Rising Concerns for Housing
2. A Preliminary Look at the Data
3. *Later Thing*
4. *Later Thing*


By Eric Chi and Fox Davenport

# Rising Concerns for Housing

## Goal of Our Project

The project is trying to address the growing concern among Generation Z and Millennials regarding the increasing difficulty of homeownership due to rising housing prices and rental costs. This analysis aims to identify and quantify the various factors that influence housing prices and rent. The goal is to create a multiple linear regression model that can reasonably predict housing and rent prices, so that consumers can understand the trends in housing right now. With this new found knowledge, they can make more informed financial decisions. We will start by creating an MLR with all potential factors in our data. Using a combination of partial F-tests and ANOVAs, we will determine the significant predictors. Then we will establish a machine learning model to create and train MLRs before performing an F-test to determine the best model. Finally, we will check to make sure our model assumptions for MLR are satisfied before drawing conclusions.

## Why Housing?

Day after day, you constantly hear the news rerpot about the struggles that Generation Z and Millenials face. There is a growing shared sentiment amongst Generation Z and Millenials that the world is filled with dread and gloom. The world is becoming harder to survive and live in causing worries about the future. Housing is one of these issues with many people exclaiming how rent and housing prices only seem to go up. There are countless stories of people paying outrageous prices for poor living situations that would have been cheaper in the past. 

As students looking to work in the data science field this project allows us to practice our abilities in pattern recognition and data management for a topic which concerns us and our colleagues. I, Eric, am a math/economics major so a topic such as the housing market and the variables which can make the market fluctuate has a direct tie into my studies in business and mathematical modeling. I, Fox, am a financial actuarial math major so the analysis of contributing factors to an economic trend will be part of my daily work life in the future. One of our personal greatest fears is being able to secure a comfortable and proper living after college. Housing is crucial for that lifestyle and understanding the general trends that affect housing prices/rental costs will give us an advantage in decision making when we enter the housing market.

# A Preliminary Look at the Data

Our data was taken from Kaggle and is extensive housing data for the Ames, Iowa region. Necessary precautions will be taken to prevent overfitting and try to make it relatively generalizable to the rest of the USA. Already provides us with training and testing data

https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview

In [93]:
# import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
from tensorflow.keras import layers, initializers, utils
from sklearn.model_selection import train_test_split
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder, StandardScaler
import seaborn as sns
import time
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import KernelPCA
from sklearn.metrics import accuracy_score
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

In [112]:
# Read in our data. Already given training and test ones
train_data = pd.read_csv('https://github.com/FoxDavenport/PIC16BFinalProject/blob/main/train.csv?raw=true')

test_data = pd.read_csv('https://github.com/FoxDavenport/PIC16BFinalProject/blob/main/test.csv?raw=true')

## Data Exploration and Processing 

Let's observe what information our dataset contains, how it's structued, and what it looks like. We will only observe this for the training data

In [126]:
train_data.shape

(1460, 81)

The training data has 1460 houses and 81 features for each house. We will use all entries from the training data as the size is small enough that crashes shouldn't occur.

Let us examine to see what features are potential factors for our housing price.

In [136]:
# Gets column names
train_data.columns

Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive