##Data Exploration for DDL Incubator Group 3
##Team
Saran Ahluwalia - McAwesome
Karpura Suryadevara - Wannabe Predictor
Tom Caputo: Visualization - The Italian Job
Dave Clare - Beast
##Problem Statement
Distill Zillow API, Yelp and Census tract Data combined with user input about current location in order to provide liklihood of Starbucks being available within a furture temporal range (yet to be parameterized)
###Research Questions
Can we make a prediction on if a Starbucks will be available in the future in DC (or any other large city)?
Can we define the probability of this defined by user preferences?
Does a Starbucks indicate certain income threshold or budget breakdown for different locations (zip code, neighborhood) depending on the persons gross household income and/or household size?
Can we recommend or predict where the next Starbucks will be available and how far it will be from the individual based on their life preferences and other information/ preferences?
What are the strongest dependant variables (ie. costs for housing, grocery or insurance; or perhaps walkability score?) that feed into the availablility of a Starbucks (as an indicator of gentrification)?
Can we predict if 'making it' will change over time due to inflation or other changes in consumer prices?
###Hypothesis
Utilizing open data sources (open and user-given) and by using widely accepted models for probability it can be inferred the percent chance that a Starbucks will be available within a certain time frame
###Value Proposition
Provide small businesses and enterprises information on the transitions of certain regional areas within cities that may indicate a growing market need for both goods and services
Provide individuals with an indicator that may inform their choice to relocate, invest in a property or invest their leisure time in a specific location
Provide non-profits and government agencies with information on the changing demographics of cities in order to invest human capital, monetary investments and social services (for those who may be displaced)
###Strategy for Analysis
For the DC Market:
Segment - Divide and segment publically available housing data and Yelp API into demographic groups (young singles under 25 and 25+, married without kids etc)
Location - Break down all data in DC by Ward, Zip code and/or neighboorhood
Machine Learning Applications - Amalgamate Naive Bayes bag of words model, cluster analysis (k-means etc.) and Gradient Boosting Classifier
Output
Attempt to create a model that provides an indicator for gentrification (a future or current Starbucks location) and location in DC and other cities
With this model, can we complement this model with additional information to give users a more realistic understanding of the ?
##Data Sources
Seed Data
Census Tract Data
Can we find more data out there on this?
Complementary Data
Craigs List's Apartment Listings
###Craigslist
Craigslist Apartment Crawler Craig's List Apartment Crawler
craigsuck: A Craigslist RSS poller
###Zillow API
[PyZillow] and "Homegrown" Thin Client