<hr/>
# **United States Real Estate Market Trends Visualization**
<span id="0"></span>
[**Maede Maftouni**](https://www.kaggle.com/maedemaftouni)
<hr/>
<font color=green>
1. [Overview](#1)
1. [Importing Modules, Reading the Dataset](#2)
1. [Data Cleaning](#3)
1. [Grouping the Records based on Price Change](#4)
1. [Market Trend Visualizations](#5)
    * [Mean monthly price accross the US since 2016](#6)
    * [States housing price range comparison](#7) 
    * [Plotting states housing data on the US map](#8) 
    * [States housing price comparison on the map- averaged over 2016 to 2021](#9) 
    * [States housing total listing comparison on the map- averaged over 2016 to 2021](#10) 
    * [Average percentage of 2021 price change from last year](#11) 
    * [Price per square foot distribution in Utah through the years](#12) 
1. [Price and Number of Listings on the Market Correlation](#13)    
    
    
    
    
    
    
    
    

# <span id="1"></span> Overview
<hr/>
Welcome to my Kernel! In this kernel, I visualize the Realtor.com Data from 2016 through 2021 for better insights about the housing market trends. This notebook uses the data broken down at the state level. I will use the zip code level data in a separate notebook. 

If you have any question or feedback, do not hesitate to write in the comments and if you like this kernel, please <b><font color="green">do not forget to UPVOTE </font></b> 🙂 

<br/>
<img src="https://homespropertyguide.com/wp-content/uploads/2021/07/14012216_G-683x375.jpg" title="source: imgur.com" />

# <span id="2"></span> Importing Modules, Reading the Dataset
#### [Return Contents](#0)
<hr/>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
import datetime
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely.geometry import Point, Polygon
%matplotlib inline
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('/kaggle/input/real-estate-market-trends/RDC_Inventory_Core_Metrics_State_History.csv')
print('Dataset has ',df.shape[0],' records and ',df.shape[1], ' columns' )
print(' ')
df.head() # head shows the first 5 rows by default

### List of columns and the count and type of data in each column

In [None]:
df.info()

### Basic statistics of each column

In [None]:
df.describe()

### Correcting the format of the Date column 

In [None]:
df['month_date_yyyymm'] = pd.to_datetime(df['month_date_yyyymm'],format = '%Y%m')
df.rename(columns={'month_date_yyyymm':'Date'}, inplace=True)

# <span id="3"></span> Data Cleaning 
#### [Return Contents](#0)
<hr/>

### Extracting the Year and Month from dates

In [None]:
df['Month'] = df['Date'].dt.month
df['Year'] = df['Date'].dt.year
df['Date'] = df['Date'].dt.date
df.head()


### Sorting records by State and Date

In [None]:
df = df.sort_values(["state", "Date"], ascending = (True, True)).reset_index(drop = True) # Default is ascending
df

# <span id="4"></span> Grouping the Records based on Price Change
#### [Return Contents](#0)
<hr/>

### Classifying the records (State-Months) in one of three categories: 1- Decreased listing price from last year 2- Decreased listing price from last month 3- Increased listing price.

In [None]:
filters = [
   (df['average_listing_price_yy'] < 0) & (df['median_listing_price_yy'] < 0),
   (df['average_listing_price_mm'] < 0) & (df['median_listing_price_mm'] < 0),
]
values = ["Down from last year", "Down from last month"]

df["category"] = np.select(filters, values, default="Increase in prices" )

### Majority of all the records (73.7%) show an increase in the listing price compared to the last year and month

In [None]:
print('Price ratios for all the records:')
df['category'].value_counts()/len(df['category'])*100

### Even bigger portion of records from 2020 and 2021, 76% and 79.8% accordingly, show an increase in the listing price compared to the last year and month

#### It is noteworthy that only the first half of 2021 is included in the dataset. This explains the extreme ratios for 2021 since first half of the year is usually a seller's market. This claim is also supported by the current data as displayed in the  following "Mean monthly price accross the US" graph

In [None]:
print('2020 Price ratios:')
print(df.loc[df['Year']== 2020,'category'].value_counts()/len(df[df['Year']== 2020])*100)
print(' ')
print('2021 Price ratios:')
df.loc[df['Year']== 2021,'category'].value_counts()/len(df[df['Year']== 2021])*100

# <span id="5"></span> Market Trend Visualizations
#### [Return Contents](#0)
<hr/>

## <span id="6"></span> Mean monthly price accross the US since 2016 

In [None]:
 
ax = df.groupby(['Date'])['average_listing_price'].mean().plot(kind = 'bar', figsize = (20,10))
ax.set_ylabel("Average Listing Price")
plt.show()
plt.close()

In [None]:
down = df.loc[df['category'].isin(["Down from last year"])]

print("The states experiencing at least one month of decrease in price in 2020 compared to 2019 are:")
print(down.loc[down['Year'] == 2020]['state'].unique())
print('  ')
print("The states which experienced at least one month of decrease in price in 2021 compared to 2020 are:")
print(down.loc[down['Year'] == 2021]['state'].unique())

## <span id="7"></span> States housing price range comparison

In [None]:
# Visualization
plt.figure(figsize=(10, 8), dpi=80)
box_plot = sns.boxplot(x = 'state',y = 'average_listing_price',data = df.sort_values('average_listing_price'))
plt.ylabel('Price')
plt.xlabel('Property Type')

ax = box_plot.axes
lines = ax.get_lines()
categories = ax.get_xticks()
ax.tick_params(axis='x', rotation=90)

  
box_plot.figure.tight_layout()

fig = box_plot.get_figure()

## <span id="8"></span> Plotting states housing data on the US map

In [None]:
# Reading the geodataframe
usa = gpd.read_file("/kaggle/input/real-estate-market-trends/states_21basic/states.shp")
usa['STATE_NAME'] = usa['STATE_NAME'].str.lower()

# joining the geodataframe with the aggregated housing data
df_agg = df.groupby('state').agg('mean').reset_index().rename(columns={"index": "state"})
merged = usa.set_index('STATE_NAME').join(df_agg.set_index('state'))
merged = merged.reset_index()
merged_filtered = merged[~merged['STATE_NAME'].isin(['alaska','hawaii'])]

## <span id="9"></span> States housing price comparison on the map- averaged over 2016 to 2021


#### Alaska and Hawaii are excluded to fit the map to screen

In [None]:
gdf = gpd.GeoDataFrame(merged_filtered)
variable = 'average_listing_price'
fig = plt.figure(1, figsize=(25,15)) 
ax = fig.add_subplot()
gdf.apply(lambda x: ax.annotate(text=x.STATE_ABBR, xy=x.geometry.centroid.coords[0], ha='center', fontsize=10),axis=1);
gdf.boundary.plot(ax=ax, color='Black', linewidth=.4)

gdf.plot(ax =ax ,column=variable, cmap='Reds', figsize=(30,20))

plt.axis('off')
plt.show()
plt.close()


## <span id="10"></span> States housing total listing comparison on the map- averaged over 2016 to 2021

In [None]:
variable = 'total_listing_count'
fig = plt.figure(1, figsize=(25,15)) 
ax = fig.add_subplot()
gdf.apply(lambda x: ax.annotate(text=x.STATE_ABBR, xy=x.geometry.centroid.coords[0], ha='center', fontsize=10),axis=1);
gdf.boundary.plot(ax=ax, color='Black', linewidth=.4)

gdf.plot(ax =ax ,column=variable, cmap='Reds', figsize=(30,20))

#plt.legend(loc=2, bbox_to_anchor=(0.5, 0., 0.5, 0.5))
plt.axis('off')
plt.show()
plt.close()

In [None]:
# joining the geodataframe with the aggregated housing data of year 2021
df_agg_2021 = df[df['Year'] == 2021].groupby('state').agg('mean').reset_index().rename(columns={"index": "state"})
merged = usa.set_index('STATE_NAME').join(df_agg_2021.set_index('state'))
merged = merged.reset_index()
merged_filtered = merged[~merged['STATE_NAME'].isin(['alaska','hawaii'])]

## <span id="11"></span> Average percentage of 2021 price change from last year

#### The darker the red, the higher the increase!

In [None]:
gdf = gpd.GeoDataFrame(merged_filtered)
variable = 'average_listing_price_yy'
fig = plt.figure(1, figsize=(25,15)) 
ax = fig.add_subplot()
gdf.apply(lambda x: ax.annotate(text=x.STATE_ABBR, xy=x.geometry.centroid.coords[0], ha='center', fontsize=10),axis=1);
gdf.boundary.plot(ax=ax, color='Black', linewidth=.4)

gdf.plot(ax =ax ,column=variable, cmap='Reds', figsize=(30,20),legend =True)

plt.axis('off')
plt.show()
plt.close()

### As you can see on the graph, Utah's housing market is sizzling hot at the moment!
#### Utah had 37.8% (the maximum in the US) increase in the listing prices in April 2021 compared to April 2020

In [None]:
df_2021 = df[df['Year'] == 2021]
df_2021.loc[df['average_listing_price_yy'] == df_2021['average_listing_price_yy'].max()]

## <span id="12"></span> Price per square foot distribution in Utah through the years

#### The price has doubled since 2016!

In [None]:
sns.displot( 
    data = df[df['state']=='utah'],
    x = "median_listing_price_per_square_foot",
    hue = "Year",
    kind = "hist",
    aspect = 1.5,
    log_scale = 10,
    bins = 20
             )

### Price trend comparison of select states

In [None]:
df.loc[df['state'].isin(["utah","california","connecticut"])].pivot_table(index='state', values='average_listing_price', aggfunc='mean',columns='Date')\
.plot(kind="bar",figsize=(15, 10),color = sns.color_palette("vlag", 12))
plt.ylabel('Mean Price')
plt.title('Average Listing Price Trend of Select States')
plt.legend(bbox_to_anchor=(1.01, 1), loc='upper left')
plt.show()
plt.close()

## <span id="13"></span> Price and Number of Listings on the Market Correlation

### High demand (because of low mortgage rates and the rise of remote workers) and relatively low supply are the contributing factors in the home sales booming. 
#### Increase in the price is negatively correlated with the supply. Therefore, if the number of houses for sale go up to meet the demand, the housing market will become stable. 

In [None]:
df_reduced = df[['average_listing_price','median_listing_price','median_listing_price_per_square_foot','price_reduced_count','active_listing_count','new_listing_count','median_days_on_market','Year']]
sns.heatmap(df_reduced.corr(), annot = True)