# Project Report

**Team Members**  
- Purva Tandel  
- Sushanth Thota  
- Armaan Ashfaque  
- Vamsi Dath  
- Abhiram Vasudeva  

**GitHub Repository**  
[GitHub Repository Link](https://github.com/cs418-fa24/project-check-in-team.git)

## Introduction

Our project aims to provide forward-looking forecasts for properties available for trade across various cities in the United States. The objective is to enable users to make well-informed decisions regarding their real estate investments by assessing potential risks and opportunities based on the historical appreciation trends of properties. 

The dataset we have utilized for this analysis comes from the **Zillow website**, specifically from the **Zillow Home Value Index (ZHVI)**  
[Zillow Home Value Index Link](https://www.zillow.com/research/data/).  
The ZHVI serves as a critical metric, representing the typical home value and tracking market changes for homes within the 35th to 65th percentile range. This data is available in both a smoothed, seasonally adjusted format and a raw measure, allowing for different types of analysis. Our dataset includes house prices on a monthly basis for every zip code in the United States, providing comprehensive coverage at various levels of geographical granularity—county, city, state, and zip code.

We focus on answering several key questions that are crucial for real estate investors and stakeholders:
1. The influence of macroeconomic indicators, such as mortgage rates, on the average percentage change in property values across the United States. This will help us understand the broader economic forces affecting real estate markets.
2. Seasonal trend analysis of changes in house prices over the past two decades, to identify recurring patterns or shifts in the housing market at different times of the year.
3. The impact of election results on housing prices, exploring whether political cycles and outcomes have a noticeable effect on the real estate market.

Additionally, we have developed a predictive model to forecast house prices over different time horizons: the next month, quarter, and year. This model compares our forecasts to Zillow's own estimates for future home values, allowing us to assess the accuracy and reliability of our predictions.

## Changes

Initially, our project was designed to use a different dataset that was unfortunately made unavailable on the website. As a result, we transitioned to using Zillow’s dataset, which was conveniently available in CSV format. This dataset provided us with the necessary information to carry forward our analysis on real estate trends across the United States.

In addition to this, we also incorporated the **Mortgage Rate dataset**, specifically the **MORTGAGE30US.csv**, which we acquired from the Federal Reserve Economic Data (FRED) website.  
[Mortgage Rate Data Link](https://fred.stlouisfed.org/series/MORTGAGE30US)  
This dataset allowed us to explore the influence of macroeconomic factors, particularly mortgage rates, on the fluctuations in house prices, helping us examine how changes in mortgage rates might impact the housing market.

## Data Cleaning

### Zillow CSV Files
- **Removal of Redundant Metadata and Columns**: The initial step involved eliminating unnecessary metadata and irrelevant columns from the Zillow CSV files. This ensured that only the essential data related to property values and relevant time periods were retained for analysis.
- **Handling Missing Values (NaN)**: To address any missing data (NaN values), we employed a forward fill method. This technique ensured that any missing entries were filled with the most recent valid data point, maintaining the integrity of the time series.
- **Converting Datetime Columns to Numeric Values**: In order to calculate the percentage change in house prices on a monthly basis, we converted the datetime columns to numeric values, specifically using Unix timestamps. This transformation enabled us to perform efficient time-based calculations and facilitate percentage change calculations.
- **Calculating Monthly Percentage Change in Property Values**: Rather than working with absolute property price values, we calculated the monthly percentage change in property prices for each ZIP code. This transformation allowed for a more standardized comparison of price movements across different regions and time periods.
- **Yearly Average Percentage Change**: To get a broader view of market trends, we calculated the yearly average percentage change in property values across various states in the U.S. This was achieved by grouping the data by state and averaging the percentage changes in property values over the course of each year.

### Mortgage30US CSV File
- **Extracted Mortgage Rate Data**: We filtered the mortgage rate data to include only the years that align with the Zillow dataset, specifically from 2000 to 2024. This ensured consistency and allowed us to compare the mortgage rate trends directly with the housing price data over the same time period.
- **Adjusting Mortgage Rate Data (Flipping Across the Z-Axis)**: Given that mortgage rates are inversely related to property appreciation (i.e., as mortgage rates increase, property prices generally decrease), we flipped the mortgage rate data across the Z-axis. This transformation allowed for a direct, more meaningful comparison of how fluctuations in mortgage rates correlated with changes in property values, making the analysis more intuitive for understanding their relationship.

## Exploratory Data Analysis and Visualization

### Influence of Macroeconomic Indicators
**Observation**: The median percentage change in property values (from ZHVI – Zillow Home Value Index) across different states considering the dataset's granularity on zip codes stays extremely positive (+) during the end of 2020 through 2021. This was observed from the interactive choropleth map.

![EDA1_1](.\Images\EDA1_1.png)
    
**Assumption**: Mortgage rates reached record lows during the latter half of 2020 and into early 2021. This is the same period during the Covid pandemic, when the demand for housing kept increasing. There are macroeconomic indicators that can explain this trend.

**Worked out**:
- Averaged out the percentage change in property values across states to get a single trend representing the whole of the U.S.
- Observed various macroeconomic indicators like mortgage rates, GDP, Federal interest rates during the period and tried to fit a factor that can tightly reason for the trend.
                                                                                                                                                 
![EDA1_2](.\Images\EDA1_2.png)

![EDA1_3](.\Images\EDA1_3.png)

**GDP(2000-2024)**

![EDA1_4](.\Images\EDA1_4.png)

**Mortgage Rate(2000-2024)**

![EDA1_5](.\Images\EDA1_5.png)
                                                                                                                                                 
**Finding**:
- The inverse graph of mortgage rates is fitting the average percentage increase trend in property values across the U.S.

![EDA1_6](.\Images\EDA1_6.png)

![EDA1_7](.\Images\EDA1_7.png)

**Verifying the Claim**:

![EDA1_8](.\Images\EDA1_8.png)
    
**Conclusion**: 
- The "Mortgage Rate data" is inversely related to the "Percentage change in property values across the U.S."

### Seasonal Trend Analysis
**Introduction**: In this analysis we investigate potential seasonal trends in average house prices over the years. Using raw data with a datetime column, we categorized each entry into seasons (Fall, Winter, Spring, Summer) based on the month. This approach allows us to analyze any fluctuations in house prices across seasons over a long period.
    
**Methodology**: 
 1.	Data Preparation: The raw data contained a datetime column, which was converted into rows. We then added a new column, "Season," categorizing each month into Fall, Winter, Spring, or Summer.
 2. Graph Analysis: After adding the seasonal column, we plotted a line graph to compare the average house prices for each season over the years. The graph below illustrates these seasonal trends.

**Observations**:
- The chart displays the average house prices over the years from around 2000 to 2025.
- Each line represents house prices for a different season (Fall, Spring, Summer, Winter).
- There’s a general upward trend in average house prices over time, with notable dips and peaks.
                                                                                                                                                 
![EDA2_1](.\Images\EDA2_1.png)
                                                                                                                                                 
**Assumptions**:
- The dataset used for this chart includes sufficient historical data on seasonal average house prices, possibly sourced from a reliable real estate database.
- The chart aims to show whether there’s a seasonal effect on house prices, looking for distinct price differences between seasons over the years.

**Findings**:
- Similar Trends Across Seasons: All four seasons display similar patterns in price increases and decreases over time, with minimal differences between them, especially after 2015.
- Peaks and Troughs: There’s a peak in prices around 2005, a dip after 2007, a steady low around 2010, and then significant growth from around 2015 onward.
- Minor Seasonal Differences: Seasonal variation appears minor, with all seasonal lines staying close to each other across the years.

**Conclusion**:
- The average house prices appear to follow a consistent pattern across seasons, with no significant seasonal effect observed in recent years. Prices largely vary based on broader economic cycles rather than seasonal factors, suggesting that, at least in this dataset, seasonality doesn’t have a strong impact on average house prices over the years.

### Elections vs House Prices

- In the years 2000-2008, during the Republican Administration under Geoge W Bush, it was considered as the period of economic expansion. Interest rates were low, the availability of credit was easy, and minimal regulations around mortgages led to a housing boom which encouraged the investments in real estate. The American Dream Downpayment Act made it easier for people to own homes.
- 
But during the global financial crisis in 2008, this steep decline in home prices across the US as foreclosures surged and the housing dema d dropped, that created the economic downtur .**








- In 2008-2016, during the Democratic Administration under Obama, the ARRA (American Recovery and Reinvestment Act) in 2009 aimed to stimulate the economic growth and support recovery, ideally it tried to stabilize the situation which can be seen from the graph. The recovery started to restore confidence in the economy. Priority was also given to affordable housing and assistance, though they had a limited effect on curbing the overall rise in house prices due to high ongoing demand and slow supply growth
- During the Republican administration in 2016 - 2020 under Trump underwent through a major policy, Tax Cuts and Jobs Act, that had provisions for lowering taxes for some individuals, which boosted disposable income for many and eventually made the housing market bit more welcoming, which made the house prices rise and that can be seen from the upward trend.
- During 2020-2021, mortgage rates were at their all time low, also covid brought demand in housing which caused a rise in demand, subsequently rise in property rates.

**Prediction based on recent elections:**
- In the 2024 fall, the feds have decreased the interest rates for the first time after 2020, also the elections held in November 2024, the republican party has come into power, substituting the democrats. Predicting the previous trends, it can be projected to be increasing ahead, since fed rates are low, this leads to lower mortgage rates increasing the housing demand and higher house rates.


![EDA3_1](.\Images\EDA3_1.png)

## ML Analysis

- In this analysis, we use the ARIMA* model to forecast housing prices for specific zip codes based on monthly price data from January 2000 to September 2024. Our aim is to predict prices for the next month, the next quarter, and the next year. From these forecasts, we calculate the percentage change relative to the most recent price.
- To evaluate the model's performance, we compare these forecasted percentage changes to those provided by Zillow for the same future dates. We measure the absolute difference in percentage change for each period and check whether both ARIMA and Zillow predict the same direction (i.e., increase or decrease) in prices. This allows us to assess whether ARIMA captures both the magnitude and direction of price changes.
- The following graphs show three forecast lines, each representing predictions for the next month, quarter, and year. The lines illustrate ARIMA’s performance when trained on different data ranges: the last year, last 5 years, last 10 years, and the full dataset from 2000 to 2024.- *ARIMA, or AutoRegressive Integrated Moving Average, is a statistical model used for time series forecasting, especially effective for data with trends or seasonal patterns. It's widely applied in economics, finance, and other fields for trend prediction and short-term forecasting.
![ML1](.\Images\ML1.png)

![ML2](.\Images\ML2.png)

**Interpretation**:
1.  Although ARIMA captures the direction of price changes at times, its directional accuracy is close to only 50%.
2.	Quarterly predictions perform worst in terms of directional accuracy.
3.	Predictions one year ahead show the largest errors in magnitude, highlighting ARIMA’s limitations for longer-term forecasting.
4.	The model's magnitude of error is lowest when using only the last year's data. However, directional accuracy improves when using all available years from 2000 onwa

**Conclusion**:
- ARIMA's performance is below expectations, suggesting that using only raw price data is insufficient for reliable future forecasts. To improve accuracy, additional features such as macroeconomic indicators or seasonal data may be necessary

**Note**:
- In addition to ARIMA, we fit a linear regression model to predict prices for the same time periods using the full dataset from 2000 to 2024. This model achieved a directional accuracy of 70%. 
.




## Reflection

1. **What is the hardest part of the project that you’ve encountered so far?**
   - The hardest part of the project so far has been **Exploratory Data Analysis (EDA)**. The primary difficulty arose from the complexity and scale of the datasets. With both the Zillow and Mortgage30US datasets containing large volumes of data over multiple years, it was challenging to clean, transform, and analyze it efficiently. Additionally, EDA required us to interpret and adjust for the potential relationships between various factors, such as mortgage rates and property values, and to identify regional variations in housing price trends. Balancing the exploration of these complex relationships while ensuring data integrity has been a challenging but necessary process.

2. **What are your initial insights?**
   - Early insights indicate that there is indeed a noticeable inverse relationship between mortgage rates and house prices. When mortgage rates rise, we see a deceleration in property price appreciation, and vice versa.
   - Additionally, seasonal trends do not appear to affect house prices, with all the months showing the same pattern. There was just a minor dip in house prices during the Fall season between the years 2007-2009 as compared to the other seasons in the same time period.
   - Another key insight is that property appreciation rates differ significantly by state, highlighting the importance of localized analysis in understanding the broader national trends.

3. **Are there any concrete results you can show at this point? If not, why not?**
   - As of now, we are still in the data preprocessing and exploration phase. Yet, we’ve extracted useful insights and have run a predictive model to generate results. The initial data analysis has provided some foundational knowledge as well that we have summarized above. 

4. **Going forward, what are the current biggest problems you’re facing?**
   - The biggest problem we’re facing is selecting and training the right model for house price prediction. There are various algorithms to choose from, and given the complexity of the data (e.g., multiple variables like mortgage rates, property values, regional differences), it’s important to identify the most effective approach. We’re also working on fine-tuning hyperparameters to ensure optimal model performance.

5. **Do you think you are on track with your project? If not, what parts do you need to dedicate more time to?**
   - Yes, we are generally on track with our project, though we need to dedicate more time to model selection and optimization. Our data preparation and exploration have taken longer than expected, but now that we’re nearing the end of this stage, we need to focus on running and evaluating different machine learning models for the price prediction task. We also need to allocate more time to evaluation metrics to validate our model's accuracy.

6. **Given your initial exploration of the data, is it worth proceeding with your project, why? If not, how are you going to change your project and why do you think it’s better than your current results?**
   - Yes, it is definitely worth proceeding with the project. Our initial exploration has shown that the data contains valuable trends and relationships, such as the inverse correlation between mortgage rates and house prices, and seasonal fluctuations in property values. These insights provide a solid foundation for making predictions. Additionally, by incorporating macroeconomic indicators like mortgage rates,
                                                                                                      
## Roles and Coordination 
                                                                               
1. **Finding Data Sources:**
    - Everyone contributed to that.

2. **Data Cleaning and Pre-processing:**
    - Purva Tandel, Abhiram Vasudeva

3. **Exploratory Data Analysis and Visualisation:**
    - Sushanth Thota, Vamsi Dath 

4. **Machine Learning Modelling and Analysis:**
    - Armaan Ashfaque, Niyati Malik

**On some days, we did pair programming as well.**

## Next Steps
                            
1. **Enhance Model Features**:
- We plan to identify and incorporate additional features that could improve our model’s predictive capability. By expanding the feature set, we aim to capture a more comprehensive view of the variables impacting house prices.

2. **Increase Accuracy to Approach Zillow's Estimators**:
- Our goal is to narrow the accuracy gap between our predictions and Zillow’s estimators. By refining our data preprocessing and fine-tuning our model’s parameters, we aim to improve predictive accuracy and reliability, ultimately producing results that approach Zillow’s benchmark for home value estimates.

3. **Experiment with Additional Models**:
- To find the model that best aligns with our use case, we’ll implement and assess a range of machine learning models. We’ll evaluate each model based on different metrics, selecting the one that balances accuracy with interpretability and scalab

4. **Model Evaluation**:
- To better understand the differences in performance between the Linear Regression and ARIMA models, we can explore several aspects contributing to their respective accuracy scores. Specifically, we'll analyze why one model outperforms the other and what factors may influence each model's ability to capture housing price trends effectively.
ility.

### Evaluation Criteria

**We’ll evaluate our progress based on how well our model meets the goals outlined:**
- Improved prediction accuracy.
- The effectiveness of new features in capturing price trends.
- Insights gained from trying different model architectures and performance results.
                                                                               
**Our focus is to iteratively refine the model, ensuring it not only fits the data well but also generalizes effectively across diverse real estate scenarios.**
