# Capstone Project - The Battle of the Neighborhoods

## Applied Data Science Capstone by IBM/Coursera

## Table of contents

- Introduction
- Data
- Methodology
- Analysis
- Results and Discussion
- Conclusion

## 1. Introduction

   Houston Texas is the largest city in the state comprising of 88 super neighborhoods with an estimated population of 2,325,502. Houston is commonly known for its food, job availability, night life, and NASA. With this popularity Houston is classified as one of the fastest growing cities in the nation. With this ever-increasing population comes a growing demand to address the needs of the residents that make Houston home.  One of the largest problems facing the city is that of food deserts. Food deserts according to the CDC are “areas that lack access to affordable fruits, vegetables, whole grains, low-fat milk, and other foods that make up the full range of a healthy diet.” Individuals living in areas classified as food deserts are shown to have an increased risk of developing diabetes, cardiovascular disease, and being obese. While food deserts can be present in areas typically associated as having low income, they can be present in any area that causes an individual to not have easy access to an available grocery store, and overall, about 23.5 million Americans and a reported 250,000  individuals living in Houston, TX lack easy access to grocery stores according to Rice University.  

   While living in food deserts are shown to negatively impact ones health, these areas would present lower competition for new grocery stores and allow for a perfect opportunity for business owners to open new grocery stores that not only help alleviate public health concerns, but to also serve the demand for fresh food for individuals that live in impacted areas leading to a greater possibility for a successful store.  
   
   
 

### 1.1 Objective

The objective of this project is to find what neighborhoods could possibly be labeled as food deserts and to select the best locations to open a new grocery store that not only alleviates the problem of food deserts but also has the best probability of success due to lack of competition in the city of Houston, Tx. 

### 1.2 Interest

This project is useful to a variety of groups, such as public health officials in finding possible food deserts within the Houston community, people looking to move into a neighborhood that has easy access to a grocery store, and finally business owners or developers looking to open new grocery store in an area with lower competition in Houston, Tx. This project is relevant as food deserts are not just a public health issue, but are an untapped resource for business and developers who are looking at finding an area to create a store with the highest probability of success and in areas with low competition.  
 

## 2. Data

### 2.1 Source of data

- The data for the project comes from the  page (http://www.houstontx.gov/planning/Demographics/demograph_docs/income_avgs.htm) which contains the list of neighborhoods within Houston as well as their location relative to downtown, Texas totaling to 88 super neighborhoods.
- Geocoder 
- Foursquare API 

### 2.2 Data Preparation

Importing the libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Load the coordinates of the boroughs into a dataframe

In [13]:
df=pd.read_html("https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods",header=0)[0]
df.head()

Unnamed: 0,#,Name,Location relative to Downtown Houston,Approximate boundaries
0,1,Willowbrook,Northwest,Along Texas State Highway 249 northwest of Bel...
1,2,Greater Greenspoint,North,Around the junction of Beltway 8 and Interstat...
2,3,Carverdale,Northwest,South of the junction of Beltway 8 and U.S. Ro...
3,4,Fairbanks / Northwest Crossing,Northwest,Along U.S. Route 290 between Interstate 610 an...
4,5,Greater Inwood,Northwest,North of Fairbanks / Northwest Crossing and ea...


In [14]:
df.drop(["Approximate boundaries"],axis=1,inplace=True)

In [18]:
df.rename(columns={"#":"Number"},inplace=True)
df.head()

Unnamed: 0,Number,Name,Location relative to Downtown Houston
0,1,Willowbrook,Northwest
1,2,Greater Greenspoint,North
2,3,Carverdale,Northwest
3,4,Fairbanks / Northwest Crossing,Northwest
4,5,Greater Inwood,Northwest


In [20]:
df.shape

(88, 3)