Skip to content

DistrictDataLabs/03-gentrifuge

Repository files navigation

##Data Exploration for DDL Incubator Group 3

##Team

Saran Ahluwalia - McAwesome

Karpura Suryadevara - Wannabe Predictor

Tom Caputo: Visualization - The Italian Job

Dave Clare - Beast

##Problem Statement

Distill Zillow API, Yelp and Census tract Data combined with user input about current location in order to provide liklihood of Starbucks being available within a furture temporal range (yet to be parameterized)

###Research Questions

Can we make a prediction on if a Starbucks will be available in the future in DC (or any other large city)?

Can we define the probability of this defined by user preferences?

Does a Starbucks indicate certain income threshold or budget breakdown for different locations (zip code, neighborhood) depending on the persons gross household income and/or household size?

Can we recommend or predict where the next Starbucks will be available and how far it will be from the individual based on their life preferences and other information/ preferences?

What are the strongest dependant variables (ie. costs for housing, grocery or insurance; or perhaps walkability score?) that feed into the availablility of a Starbucks (as an indicator of gentrification)?

Can we predict if 'making it' will change over time due to inflation or other changes in consumer prices?

###Hypothesis

Utilizing open data sources (open and user-given) and by using widely accepted models for probability it can be inferred the percent chance that a Starbucks will be available within a certain time frame

###Value Proposition

Provide small businesses and enterprises information on the transitions of certain regional areas within cities that may indicate a growing market need for both goods and services

Provide individuals with an indicator that may inform their choice to relocate, invest in a property or invest their leisure time in a specific location

Provide non-profits and government agencies with information on the changing demographics of cities in order to invest human capital, monetary investments and social services (for those who may be displaced)

###Strategy for Analysis

For the DC Market:

Segment - Divide and segment publically available housing data and Yelp API into demographic groups (young singles under 25 and 25+, married without kids etc)

Location - Break down all data in DC by Ward, Zip code and/or neighboorhood

Machine Learning Applications - Amalgamate Naive Bayes bag of words model, cluster analysis (k-means etc.) and Gradient Boosting Classifier

Output

Attempt to create a model that provides an indicator for gentrification (a future or current Starbucks location) and location in DC and other cities

With this model, can we complement this model with additional information to give users a more realistic understanding of the ?

##Data Sources

Seed Data

Census Tract Data

Can we find more data out there on this?

Complementary Data

Craigs List's Apartment Listings

Zillow API

###Craigslist

Craigslist Apartment Crawler Craig's List Apartment Crawler

craigsuck: A Craigslist RSS poller

###Zillow API

[PyZillow] and "Homegrown" Thin Client