## Project Description

This website is part of the Capstone Project of the **Masters of Applied Data Science** Program of the **University of Michigan**. The team is composed by **Alex Moura** and **Ryan Thoma**s and has as primary objective to build a real time machine learning powered affordability prediction tool to help investors and renters/travelers to assess the potential impact that their decision to invest or stay in a short term rental may have in the overall housing affordability of the local community.

## Context and Motivation

Although the original concept of Airbnb started as a clever way to rent available idle space in people’s homes to travelers and event goes generating additional income to hosts and affordable lodging to travelers, the company created a new industry that has become a popular real estate investment opportunity. This is sparked a broad discussion about the topic and in 2019 the Economic Policy Institute conducted a study that concluded “that the costs of Airbnb outweighs the benefits” citing the impacting on housing affordability as one of the main negative externalities driving this costs which also included loss of taxation revenue and impact on the local lodging labor market. The economic costs and benefits of Airbnb: No reason for local policymakers to let Airbnb bypass tax or regulatory obligations. (n.d.). Economic Policy Institute. (https://www.epi.org/publication/the-economic-costs-and-benefits-of-airbnb-no-reason-for-local-policymakers-to-let-airbnb-bypass-tax-or-regulatory-obligations)

With the proliferation of short-term rentals in major metropolitan areas offering new opportunities for income to property owners, we are looking to provide a rental price prediction dashboard that also provides insights into neighborhood rental affordability to assist potential owners and renters to better understand the neighborhood composition in order to make conscientious investment decisions. We aim to help investors be aware of the potential impact that their decision to purchase a property may have in the overall affordability of homes in the local communities. We also aim to inform potential short term renters of the same impact.  We hope that our tool can help investor and renters better align their investment and travel lodging decisions to their values.


## Our Methodology
[NEED WRITE UP FOR THE METHODOLOGY]



## Our data
To investigate the impact of Airbnb on housing affordability, we rely on data from InsideAirbnb, an independent project that collects and analyzes Airbnb listing data in cities across the globe. InsideAirbnb provides us with a comprehensive dataset that allows us to examine the extent to which Airbnb listings have become commercialized and its implications for housing availability and affordability.

## Project Navigation
Our capstone project is structured to explore and address the intricate relationship between Airbnb rentals and housing affordability:
1.	Affordability Prediction Model
o	On our first page, we present a machine learning model that predicts housing prices in selected cities. 
2.	Affordability Index Visualization
o	The second page showcases a visualization of our affordability index.This index offers a visual representation of how Airbnb rentals might be affecting housing affordability in different regions.
3.	Technology Stack
o	Our third page delves into the technology stack we employed for this project. We utilized Python libraries for data analysis, machine learning, and visualization, along with Streamlit for creating interactive presentations, and Git for version control and collaboration.

Join us on this journey to uncover valuable insights, contribute to data transparency, and foster an informed Airbnb community.


In [66]:
import pandas as pd


In [10]:
zip_trimmed=pd.read_csv('app/uszip_augmented_filtered.csv')
zip_trimmed = zip_trimmed [[
     'population', 
     'housing_units', 
     'rent_median', 
     'rent_burden', 
     'neighborhood', 
     'city_file']]


In [68]:
city_neighboorhood=pd.read_csv('app/trimmed_listings.csv')[['neighbourhood_cleansed','city']].drop_duplicates()

In [14]:
# Rename columns to match for merging
zip_trimmed_renamed = zip_trimmed.rename(columns={'city_file': 'city', 'neighborhood': 'neighbourhood_cleansed'})

# Merge
merged_df = zip_trimmed_renamed.merge(city_neighboorhood , on=['city', 'neighbourhood_cleansed'], how='right', indicator=True)




In [21]:
import numpy as np

zip_trimmed=pd.read_csv('app/uszip_augmented_filtered.csv')
zip_trimmed = zip_trimmed [[
     'population', 
     'housing_units', 
     'rent_median', 
     'rent_burden', 
     'neighborhood', 
     'city_file']]
city_neighboorhood=pd.read_csv('app/trimmed_listings.csv')[['neighbourhood_cleansed','city']].drop_duplicates()


# Rename columns to match for merging
zip_trimmed_renamed = zip_trimmed.rename(columns={'city_file': 'city', 'neighborhood': 'neighbourhood_cleansed'})

# Merge
merged_df = zip_trimmed_renamed.merge(city_neighboorhood , on=['city', 'neighbourhood_cleansed'], how='right', indicator=True)





# Define a function for weighted average
def weighted_average(group, avg_name, weight_name):
    d = group[avg_name]
    w = group[weight_name]
    try:
        return (d * w).sum() / w.sum()
    except ZeroDivisionError:
        return np.nan

# Fill NAs in rent_median and rent_burden with the weighted average of the city
merged_df['rent_median'] = merged_df.groupby('city')['rent_median'].transform(lambda x: x.fillna(weighted_average(merged_df, 'rent_median', 'population')))
merged_df['rent_burden'] = merged_df.groupby('city')['rent_burden'].transform(lambda x: x.fillna(weighted_average(merged_df, 'rent_burden', 'population')))

#fill NA in hte population and housing units with a 1
merged_df['population']= merged_df['population'].fillna(1)
merged_df['housing_units']= merged_df['housing_units'].fillna(1)


# Group by city and neighborhood and aggregate
aggregations = {
    'population': 'sum',
    'housing_units': 'sum',
    'rent_median': lambda x: weighted_average(merged_df.loc[x.index], 'rent_median', 'population'),
    'rent_burden': lambda x: weighted_average(merged_df.loc[x.index], 'rent_burden', 'population')
}
cleaned_data = merged_df.groupby(['city', 'neighbourhood_cleansed']).agg(aggregations).reset_index()

cleaned_data.to_csv('app/uszip_stats.csv', index=False) 

  # Remove the CWD from sys.path while we load stuff.


Unnamed: 0,city,neighbourhood_cleansed,population,housing_units,rent_median,rent_burden
0,asheville,28704,21609.0,7800.0,1182.019146,34.399968
1,asheville,28715,27886.0,10235.0,1035.020108,27.500223
2,asheville,28732,18275.0,7084.0,1130.000000,26.600000
3,asheville,28801,14396.0,6545.0,966.043744,44.099278
4,asheville,28803,34038.0,14258.0,1246.020550,33.700000
...,...,...,...,...,...,...
1448,washington-dc,"Twining, Fairlawn, Randle Highlands, Penn Bran...",1.0,1.0,1595.732564,33.706966
1449,washington-dc,"Union Station, Stanton Park, Kingman Park",7.0,7.0,1595.732564,33.706966
1450,washington-dc,"West End, Foggy Bottom, GWU",4665.0,649.0,1633.940215,41.196726
1451,washington-dc,"Woodland/Fort Stanton, Garfield Heights, Knox ...",1.0,1.0,1595.732564,33.706966


In [18]:
merged_df

Unnamed: 0,population,housing_units,rent_median,rent_burden,neighbourhood_cleansed,city,_merge
0,36365.0,14802.0,1922.000000,21.900000,Loyal Heights,seattle,both
1,28956.0,10001.0,1627.000000,28.500000,Eagle Rock,los-angeles,both
2,1.0,1.0,1595.732564,33.706966,West Highland,denver,right_only
3,23473.0,7272.0,1810.000000,32.900000,Lahaina,hawaii,both
4,1.0,1.0,1595.732564,33.706966,Lahaina,hawaii,both
...,...,...,...,...,...,...,...
3049,1.0,1.0,1595.732564,33.706966,Rosebank,new-york-city,right_only
3050,1.0,1.0,1595.732564,33.706966,Edison Park,chicago,right_only
3051,44509.0,18608.0,1095.000000,28.200000,Co-op City,new-york-city,both
3052,1.0,1.0,1595.732564,33.706966,New Brighton,new-york-city,right_only


In [83]:
# Group by city and neighborhood and aggregate
aggregations = {
    'population': 'sum',
    'housing_units': 'sum',
    'rent_median': lambda x: weighted_average(data.loc[x.index], 'rent_median', 'population'),
    'rent_burden': lambda x: weighted_average(data.loc[x.index], 'rent_burden', 'population')
}
cleaned_data = data.groupby(['city', 'neighbourhood_cleansed']).agg(aggregations).reset_index()

# Display the cleaned data
cleaned_data_info = cleaned_data.info()
cleaned_data.head(), initial_info, cleaned_data_info

  from ipykernel import kernelapp as app


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1453 entries, 0 to 1452
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   city                    1453 non-null   object 
 1   neighbourhood_cleansed  1453 non-null   object 
 2   population              1453 non-null   float64
 3   housing_units           1453 non-null   float64
 4   rent_median             1449 non-null   float64
 5   rent_burden             1449 non-null   float64
dtypes: float64(4), object(2)
memory usage: 68.2+ KB


(        city neighbourhood_cleansed  population  housing_units  rent_median  \
 0  asheville                  28704     21609.0         7800.0  1182.019146   
 1  asheville                  28715     27886.0        10235.0  1035.020108   
 2  asheville                  28732     18275.0         7084.0  1130.000000   
 3  asheville                  28801     14396.0         6545.0   966.043744   
 4  asheville                  28803     34038.0        14258.0  1246.020550   
 
    rent_burden  
 0    34.399968  
 1    27.500223  
 2    26.600000  
 3    44.099278  
 4    33.700000  ,
 None,
 None)

In [84]:
cleaned_data





Unnamed: 0,city,neighbourhood_cleansed,population,housing_units,rent_median,rent_burden
0,asheville,28704,21609.0,7800.0,1182.019146,34.399968
1,asheville,28715,27886.0,10235.0,1035.020108,27.500223
2,asheville,28732,18275.0,7084.0,1130.000000,26.600000
3,asheville,28801,14396.0,6545.0,966.043744,44.099278
4,asheville,28803,34038.0,14258.0,1246.020550,33.700000
...,...,...,...,...,...,...
1448,washington-dc,"Twining, Fairlawn, Randle Highlands, Penn Bran...",1.0,1.0,1595.732564,33.706966
1449,washington-dc,"Union Station, Stanton Park, Kingman Park",7.0,7.0,1595.732564,33.706966
1450,washington-dc,"West End, Foggy Bottom, GWU",4665.0,649.0,1633.940215,41.196726
1451,washington-dc,"Woodland/Fort Stanton, Garfield Heights, Knox ...",1.0,1.0,1595.732564,33.706966


In [6]:
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'rent': [1000, 1500, 2000, 2500, 3000],
    'rent_actual': [2100, 1400, 2100, 2300, 3200]
})

# Base chart for the data points using emoji
points = alt.Chart(data).mark_text(
    text='🏠',  # Unicode for house emoji
    fontSize=20,
    align='center'
).encode(
    x='rent_actual:Q',
    y='rent:Q'
)

# Chart for the 45-degree line

df = pd.DataFrame({'x': [0, data['rent_actual'].max()], 'y': data['rent_actual'].max()})
line = alt.Chart(df).mark_line(
    color='black'
).encode(
    x='x:Q',
    y='x:Q'
)

# Area under the line
area = alt.Chart(pd.DataFrame({'x': [data['rent_actual'].min(), data['rent_actual'].max()]})).mark_area(
    color='lightred', opacity=1
).encode(
    x='x:Q',
    y='x:Q',
    y2=alt.value(data['rent'].min())  # Base of the shaded area
)

# Combine the charts
chart = area + line + points

chart


In [7]:
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'rent': [1000, 1500, 2000, 2500, 3000],
    'rent_actual': [2100, 1400, 2100, 2300, 3200]
})

# Base chart for the data points using emoji
points = alt.Chart(data).mark_text(
    text='🏠',  # Unicode for house emoji
    fontSize=20,
    align='center'
).encode(
    x='rent_actual:Q',
    y='rent:Q'
)

# Define the maximum value for x and y to create the 45-degree line and area
max_val = max(data['rent'].max(), data['rent_actual'].max())

# Chart for the 45-degree line
line_data = pd.DataFrame({'x': [0, max_val], 'y': [0, max_val]})
line = alt.Chart(line_data).mark_line(color='black').encode(x='x', y='y')

# Area under the line
area_data = pd.DataFrame({'x': [0, max_val], 'y1': [0, 0], 'y2': [0, max_val]})
area = alt.Chart(area_data).mark_area(opacity=0.3, color='lightred').encode(
    x='x',
    y='y1',
    y2='y2'
)

# Combine the charts
chart = area + line + points
chart


In [11]:
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'rent': [1000, 1500, 2000, 2500, 3000],
    'rent_actual': [2100, 1400, 2100, 2300, 3200]
})

# Base chart for the data points using emoji
points = alt.Chart(data).mark_text(
    text='🏠',  # Unicode for house emoji
    fontSize=20,
    align='center'
).encode(
    x='rent_actual:Q',
    y='rent:Q'
)

# Define the maximum value for x and y to create the 45-degree line and area
max_val = max(data['rent'].max(), data['rent_actual'].max())

# Chart for the 45-degree line
line_data = pd.DataFrame({'x': [0, max_val], 'y': [0, max_val]})
line = alt.Chart(line_data).mark_area(color="lightblue",
    interpolate='step-after',
    line=True).encode(x='x', y='y')

# Area under the line
area_data = pd.DataFrame({'x': [0, max_val], 'y1': [0, 0], 'y2': [0, max_val]})
area = alt.Chart(area_data).mark_area(opacity=0.3, color='lightred').encode(
    x='x',
    y='y1',
    y2='y2'
)

# Combine the charts and remove grid lines
chart = (area + line + points).configure_axis(
    grid=False
)

chart


In [22]:
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'rent': [1000, 1500, 2000, 2500, 3000],
    'rent_actual': [2100, 1400, 2100, 2300, 3200]
})

# Base chart for the data points using emoji
points = alt.Chart(data).mark_text(
    text='🏠',  # Unicode for house emoji
    fontSize=30,
    align='center'
).encode(
    x='rent_actual:Q',
    y='rent:Q'
)

# Define the maximum value for x and y to create the area under the 45-degree line
max_val = max(data['rent'].max(), data['rent_actual'].max())

# Area under the 45-degree line
area_data = pd.DataFrame({'x': [0, max_val], 'y1': [0, 0], 'y2': [0, max_val]})
area = alt.Chart(area_data).mark_area(opacity=0.3, color='lightred').encode(
    x='x',
    y='y1',
    y2='y2'
)

# Combine the charts and remove grid lines
chart = (area + points).configure_axis(
    grid=False
)

chart


In [9]:




import altair as alt
import pandas as pd
import numpy as np


def affordability_pressure_chart(predicted_rent, median_rent, max_val):
    
    max_val=max(predicted_rent, median_rent, max_val)

    # Create Data Frame with Points
    data = pd.DataFrame({
        'Predicted AirbnB Income': [predicted_rent],
        'Neigboorhood Median Rent': [median_rent]
    })

    # Base chart 
    points = alt.Chart(data).mark_text(
        text='🏠', 
        fontSize=30,
        align='center'
    ).encode(
        x='Predicted AirbnB Income:Q',
        y='Neigboorhood Median Rent:Q'
    )


    # Create a DataFrame for the 45-degree line
    line_data = pd.DataFrame({'x': np.linspace(0, max_val, 100)})
    line_data['y'] = line_data['x']

    # Create the area chart
    area_chart = alt.Chart(line_data).mark_area(
        line={'color':'darkred'},
        color=alt.Gradient(
            gradient='linear',
            stops=[alt.GradientStop(color='white', offset=0),
                   alt.GradientStop(color='red', offset=1)],
            x1=0,
            x2=1,
            y1=0,
            y2=1
        )
    ).encode(
    x=alt.X('x', axis=alt.Axis(title='Predicted AirbnB Income')),  # Removing x-axis title
    y=alt.Y('y', axis=alt.Axis(title='Neigboorhood Median Rent'))  # Setting y-axis title
)


    text = alt.Chart(pd.DataFrame({'x': [max_val * 0.95], 'y': [max_val * 0.05], 'text': ['Affordability Pressure Region']})).mark_text(
        align='right',
        baseline='bottom',
        fontSize=12,
        color='white'
    ).encode(
        x='x:Q',
        y='y:Q',
        text='text:N'
    )


    return_chart = (area_chart + points + text).configure_axis(
        grid=False
    ).properties(
    width=250,
    height=250
)



    return return_chart 

affordability_pressure_chart(2000, 344, 500)


In [1]:
import altair as alt

# Print the version of Altair
print("Altair version:", alt.__version__)

Altair version: 5.0.1


In [3]:
!pip list



Package                                           Version
------------------------------------------------- -------------------
absl-py                                           0.9.0
accessible-pygments                               0.0.4
aiofiles                                          0.5.0
aiohttp                                           3.6.2
alabaster                                         0.7.12
alembic                                           1.4.2
algoliasearch                                     2.3.0
altair                                            5.0.1
altair-data-server                                0.4.1
altair-viewer                                     0.4.0
anaconda-client                                   1.7.2
anaconda-navigator                                1.9.12
anaconda-project                                  0.8.3
anyio                                             3.6.2
AnyQt                                             0.0.11
appdirs     

You should consider upgrading via the '/Users/alexmoura/opt/anaconda3/bin/python -m pip install --upgrade pip' command.[0m
