# Predicting Changing Neighborhoods

## Introduction

### Background

Increasing rent in major cities is becoming an ever-present discussion of its ills.  The intersection of rents and the presence of **and/or** the coming presence of certain amenities could herald increased rents for neighborhoods.  Using publicly available data on rents and the Foursquare API data I will seek to explore this intersection.  At the end of this paper, I hope that Readers will gain an understanding of what drives increases in rents in specific neighborhoods as well as what could be possible precursors of rent increases.  The paper will make use of three cities to drive this discussion - Philadelphia, USA, Munich, Germany and Toronto, Canada.

### Problem

Predicting Changing Neighborhoods 

Using Foursquare data and publicly available rental data I will explore the intersection of rents and amenities.  I will provide a discussion on rents and amenities driven by exploratory data analyssis and attempt to build several predictive models that could be used to predict rental increases.  The simplest models will explore the relationship between neighborhood rents and amenities while more advanced models will seek to predict the likelihood of rental increases based on neighborhood amenities.  I am particularly interested in if the presence of certain amenities or sequence of amenities herald increased rents.

### Interest

#### General Interest
Readers of this paper will gain an understanding of which amenities in a neighborhood have the greatest influences on rents in a neighborhood.  Those involved in discussions around gentrification or neighborhood transition could gain greater insights into how the evolution of a neighborhood is driven by amenities.

#### Personal Interest
I will make use of book, *Regression Models, Methods and Applications* by Fahrmeir, L., Kneib, Th., Lang, S., Marx, B. which has great explorations of all parametric and non-parametric tools within the Regression framework.  Each method that the authors present to their readers are always paired with a real-world example.  One that I have always enjoyed was the Munich Rent dataset.  I have never been to Munich, but for some reason felt more interested in the discussion of say **General Likelihood Mixed Models** when this dataset was referenced.  This book was also the cornerstone of a Regression class I took in the past in school and would like to revisit some of its methods to help rekindle some of my dormant statistics knowledge.  This book and the Course Notes for the Data Science Specialization will help drive my exploration of the data and models for this paper.

## Data Acquisition and Cleaning

I will need to collect data from two sources.  One data source will be rental data on Philadelphia, Munich and Toronto.  The second data source will consist of Foursquare data for the three cities. I will attempt to merge the data together in a sensible way and see what Foursquare data has strong correlation with higher rents in respective neighborhoods.  I expect there will be a need to do some feature selection within the data sets.  I will then attempt to create several models, yet to be determined, that predict rent of the neighborhood based on Foursquare data.  For example, the simple model could quantify that the presence of one dog parks in the neighborhood produces X percent increase in rent. A more advanced model could predict the likelihood of rent increases based on the sequence of amenities that are established or are going to be established.  

### Rental Data

Rental data is difficult to come by.  I will make use of publicly available data for neighborhoods of each city and attempt to find as current data as possible for each city.  I will focus on a few trendy neighborhoods, a few solidly middle/working class neighbrhods and low income neighborhoods 

### Foursquare Data

The collection of the Foursquare data will proceed as was done in the class.  I make use of the geopy library to find the locations for Toronto, CA, Philadelphia, USA and Munich, Germany

In [None]:
import import_ipynb
import libraries_import # Bring in libraries
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [None]:
# Find location data for each Point of Interest
def find_loc(address):
    geolocator = Nominatim(user_agent="to_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print(address, 'is located at', latitude, longitude)
    map_city= folium.Map(location=[latitude, longitude], zoom_start=12)
    return(map_city)

In [None]:
find_loc('Toronto, Ontario, Canada')

In [51]:
find_loc('Philadelphia, Pennsylvania, United States')

Philadelphia, Pennsylvania, United States is located at 39.9527237 -75.1635262


In [None]:
find_loc('Munich, Bavaria, Germany')

In [48]:
CLIENT_ID = '4KCZQIWP2C3DCADJSJDFHDXE1NWQEGV5MADYDVVSAN3CKYK0' # your Foursquare ID
CLIENT_SECRET = 'C31NBUPKLN1SHGFL32Z55LIDNHNLNA5PQDRQC2N1LPF1JKLK' # your Foursquare Secret
ACCESS_TOKEN = 'Z44MRGHEWGI000AZY1I0CINVYPDNLS1CI54W003YTKO1M3ZG' # your FourSquare Access Token
LIMIT = 100 # A default Foursquare API limit value
VERSION = '20180604'
LIMIT = 1000 # limit of number of venues returned by Foursquare API
radius = 5000 # define radius

def get_4sqr_data(lat, long):
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat,
        long,
        radius, 
        LIMIT)
    results = requests.get(url).json() # Print suppressed as its massive unwieldy file!
    return(results)

# get the result to a json file
toronto_4sqr = get_4sqr_data(43.6534817, -79.3839347)
philly_4sqr = get_4sqr_data(39.9527237, -75.1635262)
munich_4sqr = get_4sqr_data(48.1371079, 11.5753822)

In [50]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

def get_pretty(results):
    venues = results['response']['groups'][0]['items']
    nearby_venues = json_normalize(venues) # flatten JSON

    # filter columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_venues =nearby_venues.loc[:, filtered_columns]

    # filter the category for each row
    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

    # clean columns
    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    nearby_venues.head(5)
    


  del sys.path[0]


Unnamed: 0,name,categories,lat,lng
0,Dilworth Park,Park,39.952772,-75.164723
1,La Colombe Coffee Roasters,Coffee Shop,39.951659,-75.165238
2,Philadelphia Film Center,Movie Theater,39.950835,-75.164683
3,Reading Terminal Market,Market,39.953341,-75.159306
4,Blick Art Materials,Arts & Crafts Store,39.950621,-75.163159


In [None]:
toronto_results

The following sections will be developed within the paper

## Exploratory Data Analysis
## Predictive Modeling
## Conclusions
## Future Work