**The Battle of the Neighborhoods - Week 1**

__Introduction & Business Problem__

Imagine the company "Butterscotch Pancakes" from Russia. It is a chain of restaurants of Russian cuisine, specialized in Pancakes with different toppings. A CEO of a company has a dream - to enter a US market. In a cold winter of 2019 he decided to pursuit his dream! To open a restaurant one need to choose a city, populated enough for a restaurant to succeed. Our CEO has chosen New York, because of big population of ex-soviet and Russian Immigrants.

The food market in NY is highly competitive. So there has to be a thorough analysis of business environment to form a strategy. This will help to reduce the risk of restaurant failure.

Our goal is to find an optimal location for the Restaurant.

There is a huge variety of food on the streets of New York:
- Fast food (hot dogs, bagels, ice cream, burgers etc.)
- Italian restaurants
- Asian restaurants (Thai, Chinese, Indian etc.)
- Coffee shops
- Middle Eastern restaurants.

Various factors need to be studied in order to decide on the Location such as :

- New York Population
- New York City Demographics
- Sources of ingredients
- Popular attractions nearby
- Competitors
- Segmentation of neighborhoods (Boroughs)
and so on...


The objective of this project is to deliver a recommendation of which neighborhood of NY will be the best choice to build the restaurant.

This project can be used by anyone who looking forward of opening a restaurant in any city.

**Data**

1 - New York City Neighborhood names (https://geo.nyu.edu/catalog/nyu_2451_34572)

2 - The list of farmers market of NY (https://data.cityofnewyork.us/dataset/DOHMH-Farmers-Markets/8vwk-6iz2)

3 - Location data of Fresh food box. Fresh Food Box Program is a food access initiative that enables under-served communities to purchase fresh, healthy, and primarily regionally grown produce well below traditional retail prices (https://www.grownyc.org/greenmarketco/foodbox)

4 - Wikipedia (data on population, economy, demographics, cuisine etc.)

5 - Foursquare API.

***The Battle of the Neighborhoods - Week 2***

*New York city geographical coordinates data*

New york has a total of 5 boroughs and 306 neighborhoods. We need the latitude and longitude coordinates of each neighborhood.

We will get this data from the following dataset https://geo.nyu.edu/catalog/nyu_2451_34572

In [106]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import csv
import json
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

In [118]:
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



In [116]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



In [107]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


**Exploring the data**

In [108]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [109]:
neighborhoods_data = newyork_data['features']

neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [110]:
# next we transfer data into pandas dataframe by creating a table and populating it with our data
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [111]:
# populate our table Neighborhoods with data
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [114]:
neighborhoods.to_csv('NYC_GEO.csv',index=False)

In [119]:
# lets get the geographical coordinates of New York city
address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [120]:
# create a map of NY and demonstrate boroughs
map_NewYork = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NewYork)  
    
map_NewYork

**Web scrapping of Population and Demographics data of New York city from Wikipedia**

Web scrapping of Population data from wikipedia page - https://en.wikipedia.org/wiki/New_York_City

In [13]:
pip install beautifulsoup4

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/3b/c8/a55eb6ea11cd7e5ac4bacdf92bac4693b90d3ba79268be16527555e186f0/beautifulsoup4-4.8.1-py3-none-any.whl (101kB)
[K     |████████████████████████████████| 102kB 17.8MB/s ta 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/81/94/03c0f04471fc245d08d0a99f7946ac228ca98da4fa75796c507f61e688c2/soupsieve-1.9.5-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.8.1 soupsieve-1.9.5
Note: you may need to restart the kernel to use updated packages.


In [122]:
from bs4 import BeautifulSoup

In [2]:
pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [125]:
website_url = requests.get('https://en.wikipedia.org/wiki/Demographics_of_New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('POPULATION1.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)

In [126]:
Pop_data=pd.read_csv('POPULATION1.csv')
Pop_data.drop(Pop_data.columns[[7,8,9,10,11]], axis=1,inplace=True)
print('Data downloaded!')

Data downloaded!


In [127]:
Pop_data.columns = Pop_data.columns.str.replace(' ', '')
Pop_data.columns = Pop_data.columns.str.replace('\'','')
Pop_data.rename(columns={'Borough':'persons_sq_mi','County':'persons_sq_km'}, inplace=True)
Pop_data

Unnamed: 0,NewYorkCitysfiveboroughsvte\n,Jurisdiction\n,Population\n,GrossDomesticProduct\n,Landarea\n,Density\n,persons_sq_mi,squarekm,persons/sq.mi,persons/km2\n
0,The Bronx\n,\n Bronx\n,"1,471,160\n",28.787\n,"19,570\n",42.10\n,109.04\n,,,
1,Brooklyn\n,\n Kings\n,"2,648,771\n",63.303\n,"23,900\n",70.82\n,183.42\n,,,
2,Manhattan\n,\n New York\n,"1,664,727\n",629.682\n,"378,250\n",22.83\n,59.13\n,,,
3,Queens\n,\n Queens\n,"2,358,582\n",73.842\n,"31,310\n",108.53\n,281.09\n,,,
4,Staten Island\n,\n Richmond\n,"479,458\n",11.249\n,"23,460\n",58.37\n,151.18\n,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188,,,
6,State of New York,19849399,1547.116,78354,47214,122284,416.4,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [128]:
Pop_data.rename(columns = {'NewYorkCitysfiveboroughsvte\n' : 'Borough',
                   'Jurisdiction\n':'County',
                   'Population\n':'Estimate_2017', 
                   'Landarea\n':'square_miles',
                    'Density\n':'square_km'}, inplace=True)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\n,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2\n
0,The Bronx\n,\n Bronx\n,"1,471,160\n",28.787\n,"19,570\n",42.10\n,109.04\n,,,
1,Brooklyn\n,\n Kings\n,"2,648,771\n",63.303\n,"23,900\n",70.82\n,183.42\n,,,
2,Manhattan\n,\n New York\n,"1,664,727\n",629.682\n,"378,250\n",22.83\n,59.13\n,,,
3,Queens\n,\n Queens\n,"2,358,582\n",73.842\n,"31,310\n",108.53\n,281.09\n,,,
4,Staten Island\n,\n Richmond\n,"479,458\n",11.249\n,"23,460\n",58.37\n,151.18\n,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188,,,
6,State of New York,19849399,1547.116,78354,47214,122284,416.4,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [129]:
Pop_data['Borough']=Pop_data['Borough'].replace(to_replace='\n', value='', regex=True)
Pop_data['County']=Pop_data['County'].replace(to_replace='\n', value='', regex=True)
Pop_data['Estimate_2017']=Pop_data['Estimate_2017'].replace(to_replace='\n', value='', regex=True)

Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\n,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2\n
0,The Bronx,Bronx,1471160.0,28.787\n,"19,570\n",42.10\n,109.04\n,,,
1,Brooklyn,Kings,2648771.0,63.303\n,"23,900\n",70.82\n,183.42\n,,,
2,Manhattan,New York,1664727.0,629.682\n,"378,250\n",22.83\n,59.13\n,,,
3,Queens,Queens,2358582.0,73.842\n,"31,310\n",108.53\n,281.09\n,,,
4,Staten Island,Richmond,479458.0,11.249\n,"23,460\n",58.37\n,151.18\n,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188,,,
6,State of New York,19849399,1547.116,78354,47214,122284,416.4,,,
7,Sources:[14] and see individual borough articles,,,,,,,,,


In [130]:
Pop_data['square_miles']=Pop_data['square_miles'].replace(to_replace='\n', value='', regex=True)
Pop_data['square_km']=Pop_data['square_km'].replace(to_replace='\n', value='', regex=True)

Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\n,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2\n
0,The Bronx,Bronx,1471160.0,28.787\n,19570.0,42.1,109.04\n,,,
1,Brooklyn,Kings,2648771.0,63.303\n,23900.0,70.82,183.42\n,,,
2,Manhattan,New York,1664727.0,629.682\n,378250.0,22.83,59.13\n,,,
3,Queens,Queens,2358582.0,73.842\n,31310.0,108.53,281.09\n,,,
4,Staten Island,Richmond,479458.0,11.249\n,23460.0,58.37,151.18\n,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188,,,
6,State of New York,19849399,1547.116,78354,47214.0,122284.0,416.4,,,
7,Sources:[14] and see individual borough articles,,,,,,,,,


In [131]:

Pop_data.loc[5:,['square_km','persons_sq_mi']] = Pop_data.loc[2:,['square_km','persons_sq_mi']].shift(1,axis=1)
Pop_data.loc[5:,['square_miles','square_km']] = Pop_data.loc[2:,['square_miles','square_km']].shift(1,axis=1)
Pop_data.loc[5:,['Estimate_2017','square_miles']] = Pop_data.loc[2:,['Estimate_2017','square_miles']].shift(1,axis=1)
Pop_data.loc[5:,['County','Estimate_2017']] = Pop_data.loc[2:,['County','Estimate_2017']].shift(1,axis=1)
Pop_data.loc[5:,['Borough','County']] = Pop_data.loc[2:,['Borough','County']].shift(1,axis=1)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\n,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2\n
0,The Bronx,Bronx,1471160.0,28.787\n,19570.0,42.1,109.04\n,,,
1,Brooklyn,Kings,2648771.0,63.303\n,23900.0,70.82,183.42\n,,,
2,Manhattan,New York,1664727.0,629.682\n,378250.0,22.83,59.13\n,,,
3,Queens,Queens,2358582.0,73.842\n,31310.0,108.53,281.09\n,,,
4,Staten Island,Richmond,479458.0,11.249\n,23460.0,58.37,151.18\n,,,
5,,City of New York,8622698.0,93574,806.863,302.64,783.83,,,
6,,State of New York,19849399.0,78354,1547.116,47214.0,122284,,,
7,,Sources:[14] and see individual borough articles,,,,,,,,


In [132]:
# remove n/a
Pop_data = Pop_data.fillna('')
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\n,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2\n
0,The Bronx,Bronx,1471160.0,28.787\n,19570.0,42.1,109.04\n,,,
1,Brooklyn,Kings,2648771.0,63.303\n,23900.0,70.82,183.42\n,,,
2,Manhattan,New York,1664727.0,629.682\n,378250.0,22.83,59.13\n,,,
3,Queens,Queens,2358582.0,73.842\n,31310.0,108.53,281.09\n,,,
4,Staten Island,Richmond,479458.0,11.249\n,23460.0,58.37,151.18\n,,,
5,,City of New York,8622698.0,93574,806.863,302.64,783.83,,,
6,,State of New York,19849399.0,78354,1547.116,47214.0,122284,,,
7,,Sources:[14] and see individual borough articles,,,,,,,,


In [133]:
i = Pop_data[((Pop_data.County == 'Sources: [2] and see individual borough articles'))].index
Pop_data.drop(i)

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\n,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2\n
0,The Bronx,Bronx,1471160.0,28.787\n,19570.0,42.1,109.04\n,,,
1,Brooklyn,Kings,2648771.0,63.303\n,23900.0,70.82,183.42\n,,,
2,Manhattan,New York,1664727.0,629.682\n,378250.0,22.83,59.13\n,,,
3,Queens,Queens,2358582.0,73.842\n,31310.0,108.53,281.09\n,,,
4,Staten Island,Richmond,479458.0,11.249\n,23460.0,58.37,151.18\n,,,
5,,City of New York,8622698.0,93574,806.863,302.64,783.83,,,
6,,State of New York,19849399.0,78354,1547.116,47214.0,122284,,,
7,,Sources:[14] and see individual borough articles,,,,,,,,


In [134]:
# our dataframe is ready. Let's save it as CSV file
Pop_data.to_csv('POPULATION.csv',index=False)

**DEMOGRAPHICS DATA**

We will get Demographics data from wikipedia page - https://en.wikipedia.org/wiki/New_York_City, save it as an Excel file and import into python


In [81]:
pip install xlrd

Collecting xlrd
[?25l  Downloading https://files.pythonhosted.org/packages/b0/16/63576a1a001752e34bf8ea62e367997530dc553b689356b9879339cf45a4/xlrd-1.2.0-py2.py3-none-any.whl (103kB)
[K     |████████████████████████████████| 112kB 16.7MB/s eta 0:00:01
[?25hInstalling collected packages: xlrd
Successfully installed xlrd-1.2.0
Note: you may need to restart the kernel to use updated packages.


In [91]:
Demo = pd.read_excel (r'NY Demo.xlsx')
print (Demo)

   Unnamed: 0                Racial composition  2010[237]  1990[239]  \
0           0                             White      0.440      0.523   
1           1                     —Non-Hispanic      0.333      0.432   
2           2         Black or African American      0.255      0.287   
3           3  Hispanic or Latino (of any race)      0.286      0.244   
4           4                             Asian      0.127      0.070   

   1970[239] 1940[239]  
0      0.766     0.936  
1      0.629      0.92  
2      0.211     0.061  
3      0.162     0.016  
4      0.012         −  


In [93]:
Demo.columns

Index(['Unnamed: 0', 'Racial composition', '2010[237]', '1990[239]',
       '1970[239]', '1940[239]'],
      dtype='object')

In [101]:
Demo.rename(columns = {'2010[237]' : '2010',
                   '1990[239]':'1990',
                   '1970[239]':'1970', 
                   '1940[239]':'1940',
                    }, inplace=True)
Demo

Unnamed: 0,Racial composition,2010,1990,1970,1940
0,White,0.44,0.523,0.766,0.936
1,—Non-Hispanic,0.333,0.432,0.629,0.92
2,Black or African American,0.255,0.287,0.211,0.061
3,Hispanic or Latino (of any race),0.286,0.244,0.162,0.016
4,Asian,0.127,0.07,0.012,−


In [104]:
Demo.to_csv('DEMOGRAPHICS.csv',index=False)