## The Battles of The Neighborhoods - Week 2

<Strong>New York City<strong> has a total of 5 boroughs and at least 154 or more neighborhoods.In order to compare the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

<i>The link to the dataset I will be using:<i> https://geo.nyu.edu/catalog/nyu_2451_34572

<mark>I will now download all the dependencies that I'll need.<mark>

In [1]:
import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes 
import folium 

import csv

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

<mark>The json file is downloaded and it's placed on the server. I will run a wget command and access the data.<mark>

In [2]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


<mark>Now, I will load and explore the data.<mark>

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

<mark>All the relevant data is in the features key, which is a list of all the neighborhoods. Now I will define a new variable that includes this data.<mark>

In [4]:
neighborhoods_data = newyork_data['features']

<mark>Let's take a look at the first item in the list.<mark>

In [5]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

<mark>I will now tranform the data into a pandas dataframe.<mark>

In [6]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

neighborhoods = pd.DataFrame(columns=column_names)

In [7]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


<mark>Now I'll loop through the data and fill the dataframe one row at a time.<mark>

In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


<mark> Now I'll make sure the dataset has all 5 boroughs and at least up to 154 or more neighborhoods. <mark>

In [10]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [11]:
neighborhoods.to_csv('BON1_NYC_GEO.csv',index=False)

<mark>I will now use geopy library to get the latitude and longitude values of New York City.<mark>

In [12]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


<mark>Now it's time to create a map of New York with the neighborhoods placed on top.<mark>

In [13]:
map_NewYork = folium.Map(location=[latitude, longitude], zoom_start=10)


for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NewYork)  
    
map_NewYork

<center> Note: This map may not be visable on GitHub.<center>

## Population Data

<i>The population data I will be using is from the following wikipedia page:<i> https://en.wikipedia.org/wiki/New_York_City

<mark>I will start off by downloading all the dependencies that are needed.<mark>

In [14]:
import numpy as np

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize 


import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

from bs4 import BeautifulSoup 

import csv 

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [35]:
website_url = requests.get('https://en.wikipedia.org/wiki/Demographics_of_New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('BON2_POPULATION1.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)

<mark>Now I'll load data from CSV.<mark>

In [16]:
Pop_data=pd.read_csv('BON2_POPULATION1.csv')
Pop_data.drop(Pop_data.columns[[7,8,9,10,11]], axis=1,inplace=True)
print('Data downloaded!')

Data downloaded!


In [17]:
Pop_data.columns = Pop_data.columns.str.replace(' ', '')
Pop_data.columns = Pop_data.columns.str.replace('\'','')
Pop_data.rename(columns={'Borough':'persons_sq_mi','County':'persons_sq_km'}, inplace=True)
Pop_data

Unnamed: 0,NewYorkCitysfiveboroughsvte,Jurisdiction,Population,GrossDomesticProduct,Landarea,Density,persons_sq_mi,squarekm,persons/sq.mi,persons/km2
0,The Bronx\n,\n Bronx\n,"1,432,132\n",42.695\n,"29,200\n",42.10\n,109.04\n,,,
1,Brooklyn\n,\n Kings\n,"2,582,830\n",91.559\n,"34,600\n",70.82\n,183.42\n,,,
2,Manhattan\n,\n New York\n,"1,628,701\n",600.244\n,"360,600\n",22.83\n,59.13\n,,,
3,Queens\n,\n Queens\n,"2,278,906\n",93.310\n,"39,600\n",108.53\n,281.09\n,,,
4,Staten Island\n,\n Richmond\n,"476,179\n",14.514\n,"30,300\n",58.37\n,151.18\n,,,
5,City of New York,8398748,842.343,97700,302.64,783.83,28188,,,
6,State of New York,19745289,1701.399,85700,47214,122284,416.4,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [18]:
Pop_data.rename(columns = {'NewYorkCitysfiveboroughsvte\n' : 'Borough',
                   'Jurisdiction\n':'County',
                   'Population\n':'Estimate_2017', 
                   'Landarea\n':'square_miles',
                    'Density\n':'square_km'}, inplace=True)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2
0,The Bronx\n,\n Bronx\n,"1,432,132\n",42.695\n,"29,200\n",42.10\n,109.04\n,,,
1,Brooklyn\n,\n Kings\n,"2,582,830\n",91.559\n,"34,600\n",70.82\n,183.42\n,,,
2,Manhattan\n,\n New York\n,"1,628,701\n",600.244\n,"360,600\n",22.83\n,59.13\n,,,
3,Queens\n,\n Queens\n,"2,278,906\n",93.310\n,"39,600\n",108.53\n,281.09\n,,,
4,Staten Island\n,\n Richmond\n,"476,179\n",14.514\n,"30,300\n",58.37\n,151.18\n,,,
5,City of New York,8398748,842.343,97700,302.64,783.83,28188,,,
6,State of New York,19745289,1701.399,85700,47214,122284,416.4,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [21]:
Pop_data = Pop_data.fillna('')
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2
0,The Bronx,Bronx,1432132.0,42.695\n,29200.0,42.1,109.04,,,
1,Brooklyn,Kings,2582830.0,91.559\n,34600.0,70.82,183.42,,,
2,Manhattan,New York,1628701.0,600.244\n,360600.0,22.83,59.13,,,
3,Queens,Queens,2278906.0,93.310\n,39600.0,108.53,281.09,,,
4,Staten Island,Richmond,476179.0,14.514\n,30300.0,58.37,151.18,,,
5,City of New York,8398748,842.343,97700,302.64,783.83,28188.0,,,
6,State of New York,19745289,1701.399,85700,47214.0,122284.0,416.4,,,
7,Sources:[14] and see individual borough articles,,,,,,,,,


In [22]:
i = Pop_data[((Pop_data.County == 'Sources: [2] and see individual borough articles'))].index
Pop_data.drop(i)

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/km2
0,The Bronx,Bronx,1432132.0,42.695\n,29200.0,42.1,109.04,,,
1,Brooklyn,Kings,2582830.0,91.559\n,34600.0,70.82,183.42,,,
2,Manhattan,New York,1628701.0,600.244\n,360600.0,22.83,59.13,,,
3,Queens,Queens,2278906.0,93.310\n,39600.0,108.53,281.09,,,
4,Staten Island,Richmond,476179.0,14.514\n,30300.0,58.37,151.18,,,
5,City of New York,8398748,842.343,97700,302.64,783.83,28188.0,,,
6,State of New York,19745289,1701.399,85700,47214.0,122284.0,416.4,,,
7,Sources:[14] and see individual borough articles,,,,,,,,,


<mark>Time to save the dataframe as a CSV file.<mark>

In [25]:
Pop_data.to_csv('BON2_POPULATION.csv',index=False)

## New York City Cuisine Dataset

<mark>I will be getting the data from the following Wikipedia Page:<mark> https://en.wikipedia.org/wiki/Cuisine_of_New_York_City

In [31]:
import numpy as np 
import pandas as pd 
from PIL import Image 

%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot') 

print ('Matplotlib version: ', mpl.__version__) 

!conda install -c conda-forge wordcloud==1.4.1 --yes

from wordcloud import WordCloud, STOPWORDS

print ('Wordcloud is installed and imported!')

Matplotlib version:  3.0.2
Solving environment: done

# All requested packages already installed.

Wordcloud is installed and imported!


my_file = project.get_file("BON3_NYC_CUISINE.csv")

my_file.seek(0)
import pandas as pd
NYC_CUISINE=pd.read_csv(my_file)
NYC_CUISINE.drop(NYC_CUISINE.columns[[3,4,5,6,7]], axis=1,inplace=True) 
NYC_CUISINE.head()

<img src="https://i.imgur.com/1le1aAm.png">

NYC_CUISINE['Borough'].value_counts().to_frame()

 <img src="https://imgur.com/ZGoberX">

CUISINE_WC = NYC_CUISINE[['Cuisine']]
CUISINE_WC

<img src="https://imgur.com/ci2RVxO">

In [None]:
CUISINE_WC.to_csv('CUISINE_WC.txt', sep=',', index=False)

In [None]:
CUISINE_WC1 = open('CUISINE_WC.txt', 'r').read()

In [None]:
stopwords = set(STOPWORDS)

In [None]:
NYC_CUISINE_WC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

NYC_CUISINE_WC.generate(CUISINE_WC1)

<wordcloud.wordcloud.WordCloud at 0x7f562c250c18>

In [None]:
plt.imshow(NYC_CUISINE_WC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

<img src="https://imgur.com/JoPpwFA">

<matplotlib.figure.Figure at 0x7f562c289160>
Most Preferred Food in New York City -

1. Italian
2. Purto Rican
3. Mexican
4. Jewish
5. Indian
6. Pakistani
7. Dominican

## Brooklyn

In [None]:
Brooklyn_data = NYC_CUISINE[NYC_CUISINE['Borough'] == 'Brooklyn'].reset_index(drop=True)
Brooklyn_data.head()

<img src="https://imgur.com/JfZrO7J">

In [None]:
BR_CUISINE_WC.to_csv('BR_CUISINE.txt', sep=',', index=False)

In [None]:
BR_CUISINE_WC = open('BR_CUISINE.txt', 'r').read()

In [None]:
stopwords = set(STOPWORDS)

In [None]:
BR_CUISINE_NYC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)


BR_CUISINE_NYC.generate(BR_CUISINE_WC)

<wordcloud.wordcloud.WordCloud at 0x7f562c182828>

In [None]:
plt.imshow(BR_CUISINE_NYC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

<img src="https://imgur.com/WpPM4Lo">

<matplotlib.figure.Figure at 0x7f562c149b00>
Most Preferred Food in Brooklyn is -

1. Italian
2. Purto Rican
3. Mexican

# Queens

In [None]:
Queens_data = NYC_CUISINE[NYC_CUISINE['Borough'] == 'Queens'].reset_index(drop=True)
Queens_data.head()

<img src="https://imgur.com/xyM8z9S">

In [None]:
Q_CUISINE_WC.to_csv('Q_CUISINE.txt', sep=',', index=False)

In [None]:
Q_CUISINE_WC = open('Q_CUISINE.txt', 'r').read()

In [None]:
stopwords = set(STOPWORDS)

In [None]:
Q_CUISINE_NYC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

Q_CUISINE_NYC.generate(Q_CUISINE_WC)

<wordcloud.wordcloud.WordCloud at 0x7f562c178cc0>

In [None]:
plt.imshow(Q_CUISINE_NYC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

<img src="https://imgur.com/uzYSLfj">

<matplotlib.figure.Figure at 0x7f562c122160>
Most Preferred Food in Queens is -

1. Indian
2. Irish
3. Pakistani
4. Mexican

# Manhattan

In [None]:
Manhattan_data = NYC_CUISINE[NYC_CUISINE['Borough'] == 'Manhattan'].reset_index(drop=True)
Manhattan_data.head()

<img src="https://imgur.com/ynWnPOK">

In [None]:
MN_CUISINE_WC.to_csv('MN_CUISINE.txt', sep=',', index=False)

In [None]:
MN_CUISINE_WC = open('MN_CUISINE.txt', 'r').read()

In [None]:
stopwords = set(STOPWORDS)

In [None]:
MN_CUISINE_NYC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

MN_CUISINE_NYC.generate(MN_CUISINE_WC)

<wordcloud.wordcloud.WordCloud at 0x7f562c126c50>

In [None]:
plt.imshow(MN_CUISINE_NYC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

<img src="https://imgur.com/UI7gDhg">

<matplotlib.figure.Figure at 0x7f562c178d68>
Most Preferred Food in Manhattan is -

1. Italian
2. American
3. Puerto Rican
4. Indian

# The Bronx

In [None]:
Bronx_data = NYC_CUISINE[NYC_CUISINE['Borough'] == 'The Bronx'].reset_index(drop=True)
Bronx_data.head()

<img src="https://imgur.com/VYnANxN">

In [None]:
BX_CUISINE_WC.to_csv('BX_CUISINE.txt', sep=',', index=False)

In [None]:
BX_CUISINE_WC = open('BX_CUISINE.txt', 'r').read()

In [None]:
stopwords = set(STOPWORDS)

In [None]:
BX_CUISINE_NYC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

BX_CUISINE_NYC.generate(BX_CUISINE_WC)

<wordcloud.wordcloud.WordCloud at 0x7f562c149438>

In [None]:
plt.imshow(BX_CUISINE_NYC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

<img src="https://imgur.com/kOdybT6">

<matplotlib.figure.Figure at 0x7f562c126978>
Most Preferred Food in The Bronx is -

1. Italian
2. Puerto Rican
3. Albanian
4. Dominican