<a href="https://colab.research.google.com/github/Mudita98Sharma/Coursera_Capstone/blob/master/Battle_of_Neighbbourhood.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Part 1 Download and Explore New York city geographical coordinates dataset

Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

Luckily, this dataset exists for free on the web. Link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

First, let's download all the dependencies that we will need.

In [91]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import csv # implements classes to read and write tabular data in CSV form

print('Libraries imported.')

/bin/bash: conda: command not found
/bin/bash: conda: command not found
Libraries imported.


In [92]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded')

Data downloaded


## Load and explore the data

In [0]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [0]:
neighborhoods_data = newyork_data['features']

In [95]:
neighborhoods_data[0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}


##Tranform the data into a pandas dataframe

The next task is essentially transforming this data of nested Python dictionaries into a pandas dataframe. Start by creating an empty dataframe.

In [0]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [97]:

neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [0]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [99]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [100]:

print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [0]:
neighborhoods.to_csv('BON1_NYC_GEO.csv',index=False)

##Use geopy library to get the latitude and longitude values of New York City

In [102]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


##Create a map of New York with neighborhoods superimposed on top.

Folium is a great visualization library. We can zoom into the below map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

In [103]:

# create map of Toronto using latitude and longitude values
map_NewYork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NewYork)  
    
map_NewYork

#Part 2 Web scrapping of Population and Demographics data of New York city from Wikipedia

##A : POPULATION DATA

Web scrapping of Population data from wikipedia page - https://en.wikipedia.org/wiki/New_York_City

Download all the dependencies that is needed.

In [104]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# conda install -c anaconda beautiful-soup --yes
from bs4 import BeautifulSoup # package for parsing HTML and XML documents

import csv # implements classes to read and write tabular data in CSV form

print('Libraries imported.')

/bin/bash: conda: command not found
Libraries imported.


##Web scrapping of Population data from wikipedia page using BeautifulSoup.

###Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). 

It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

In [0]:
website_url = requests.get('https://en.wikipedia.org/wiki/Demographics_of_New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})
#print(soup.prettify())

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('BON2_POPULATION1.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)

In [106]:
Pop_data=pd.read_csv('BON2_POPULATION1.csv')
Pop_data.drop(Pop_data.columns[[7,8,9,10,11]], axis=1,inplace=True)
print('Data downloaded!')

Data downloaded!


In [107]:

Pop_data.columns = Pop_data.columns.str.replace(' ', '')
Pop_data.columns = Pop_data.columns.str.replace('\'','')
Pop_data.rename(columns={'Borough':'persons_sq_mi','County':'persons_sq_km'}, inplace=True)
Pop_data


Unnamed: 0,NewYorkCitysfiveboroughsvte,Jurisdiction,Population,GrossDomesticProduct,Landarea,Density,persons_sq_mi,squarekm,persons/sq.mi,persons/sq.km
0,The Bronx\n,\n Bronx\n,"1,471,160\n",28.787\n,"19,570\n",42.10\n,109.04\n,,,
1,Brooklyn\n,\n Kings\n,"2,648,771\n",63.303\n,"23,900\n",70.82\n,183.42\n,,,
2,Manhattan\n,\n New York\n,"1,664,727\n",629.682\n,"378,250\n",22.83\n,59.13\n,,,
3,Queens\n,\n Queens\n,"2,358,582\n",73.842\n,"31,310\n",108.53\n,281.09\n,,,
4,Staten Island\n,\n Richmond\n,"479,458\n",11.249\n,"23,460\n",58.37\n,151.18\n,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188,,,
6,State of New York,19849399,1547.116,78354,47214,122284,416.4,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [108]:
Pop_data.rename(columns = {'NewYorkCitysfiveboroughsvte\n' : 'Borough',
                   'Jurisdiction\n':'County',
                   'Population\n':'Estimate_2017', 
                   'Landarea\n':'square_miles',
                    'Density\n':'square_km'}, inplace=True)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/sq.km
0,The Bronx\n,\n Bronx\n,"1,471,160\n",28.787\n,"19,570\n",42.10\n,109.04\n,,,
1,Brooklyn\n,\n Kings\n,"2,648,771\n",63.303\n,"23,900\n",70.82\n,183.42\n,,,
2,Manhattan\n,\n New York\n,"1,664,727\n",629.682\n,"378,250\n",22.83\n,59.13\n,,,
3,Queens\n,\n Queens\n,"2,358,582\n",73.842\n,"31,310\n",108.53\n,281.09\n,,,
4,Staten Island\n,\n Richmond\n,"479,458\n",11.249\n,"23,460\n",58.37\n,151.18\n,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188,,,
6,State of New York,19849399,1547.116,78354,47214,122284,416.4,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [109]:
Pop_data['Borough']=Pop_data['Borough'].replace(to_replace='\n', value='', regex=True)
Pop_data['County']=Pop_data['County'].replace(to_replace='\n', value='', regex=True)
Pop_data['Estimate_2017']=Pop_data['Estimate_2017'].replace(to_replace='\n', value='', regex=True)
Pop_data['square_miles']=Pop_data['square_miles'].replace(to_replace='\n', value='', regex=True)
Pop_data['square_km']=Pop_data['square_km'].replace(to_replace='\n', value='', regex=True)
Pop_data['persons_sq_mi']=Pop_data['persons_sq_mi'].replace(to_replace='\n', value='', regex=True)
#Pop_data['persons_sq_km']=Pop_data['persons_sq_km'].replace(to_replace='\n', value='', regex=True)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/sq.km
0,The Bronx,Bronx,1471160.0,28.787\n,19570.0,42.1,109.04,,,
1,Brooklyn,Kings,2648771.0,63.303\n,23900.0,70.82,183.42,,,
2,Manhattan,New York,1664727.0,629.682\n,378250.0,22.83,59.13,,,
3,Queens,Queens,2358582.0,73.842\n,31310.0,108.53,281.09,,,
4,Staten Island,Richmond,479458.0,11.249\n,23460.0,58.37,151.18,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188.0,,,
6,State of New York,19849399,1547.116,78354,47214.0,122284.0,416.4,,,
7,Sources:[14] and see individual borough articles,,,,,,,,,


In [110]:
#Pop_data.loc[5:,['persons_sq_mi','persons_sq_km']] = Pop_data.loc[2:,['persons_sq_mi','persons_sq_km']].shift(1,axis=1)
Pop_data.loc[5:,['square_km','persons_sq_mi']] = Pop_data.loc[2:,['square_km','persons_sq_mi']].shift(1,axis=1)
Pop_data.loc[5:,['square_miles','square_km']] = Pop_data.loc[2:,['square_miles','square_km']].shift(1,axis=1)
Pop_data.loc[5:,['Estimate_2017','square_miles']] = Pop_data.loc[2:,['Estimate_2017','square_miles']].shift(1,axis=1)
Pop_data.loc[5:,['County','Estimate_2017']] = Pop_data.loc[2:,['County','Estimate_2017']].shift(1,axis=1)
Pop_data.loc[5:,['Borough','County']] = Pop_data.loc[2:,['Borough','County']].shift(1,axis=1)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/sq.km
0,The Bronx,Bronx,1471160.0,28.787\n,19570.0,42.1,109.04,,,
1,Brooklyn,Kings,2648771.0,63.303\n,23900.0,70.82,183.42,,,
2,Manhattan,New York,1664727.0,629.682\n,378250.0,22.83,59.13,,,
3,Queens,Queens,2358582.0,73.842\n,31310.0,108.53,281.09,,,
4,Staten Island,Richmond,479458.0,11.249\n,23460.0,58.37,151.18,,,
5,,City of New York,8622698.0,93574,806.863,302.64,783.83,,,
6,,State of New York,19849399.0,78354,1547.116,47214.0,122284.0,,,
7,,Sources:[14] and see individual borough articles,,,,,,,,


In [111]:
Pop_data = Pop_data.fillna('')
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/sq.km
0,The Bronx,Bronx,1471160.0,28.787\n,19570.0,42.1,109.04,,,
1,Brooklyn,Kings,2648771.0,63.303\n,23900.0,70.82,183.42,,,
2,Manhattan,New York,1664727.0,629.682\n,378250.0,22.83,59.13,,,
3,Queens,Queens,2358582.0,73.842\n,31310.0,108.53,281.09,,,
4,Staten Island,Richmond,479458.0,11.249\n,23460.0,58.37,151.18,,,
5,,City of New York,8622698.0,93574,806.863,302.64,783.83,,,
6,,State of New York,19849399.0,78354,1547.116,47214.0,122284.0,,,
7,,Sources:[14] and see individual borough articles,,,,,,,,


In [112]:
# Drop the last row

i = Pop_data[((Pop_data.County == 'Sources: [2] and see individual borough articles'))].index
Pop_data.drop(i)

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct,square_miles,square_km,persons_sq_mi,squarekm,persons/sq.mi,persons/sq.km
0,The Bronx,Bronx,1471160.0,28.787\n,19570.0,42.1,109.04,,,
1,Brooklyn,Kings,2648771.0,63.303\n,23900.0,70.82,183.42,,,
2,Manhattan,New York,1664727.0,629.682\n,378250.0,22.83,59.13,,,
3,Queens,Queens,2358582.0,73.842\n,31310.0,108.53,281.09,,,
4,Staten Island,Richmond,479458.0,11.249\n,23460.0,58.37,151.18,,,
5,,City of New York,8622698.0,93574,806.863,302.64,783.83,,,
6,,State of New York,19849399.0,78354,1547.116,47214.0,122284.0,,,
7,,Sources:[14] and see individual borough articles,,,,,,,,


In [0]:
# save the dataframe as csv

Pop_data.to_csv('BON2_POPULATION.csv',index=False)

## B : DEMOGRAPHICS DATA

We will web scrap Demographics data from wikipedia page - https://en.wikipedia.org/wiki/New_York_City

Web scrapping of Demographics data from wikipedia page using BeautifulSoup.
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

In [0]:
website_url = requests.get('https://en.wikipedia.org/wiki/New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable'})
#print(soup.prettify())

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('NYC_DEMO.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)

In [115]:
Demo_data=pd.read_csv('NYC_DEMO.csv')
print('Data downloaded!')

Data downloaded!


In [116]:
Demo_data

Unnamed: 0,New York City's five boroughsvte,Jurisdiction,Population,Gross Domestic Product,Land area,Density,Borough,County,Estimate (2017)[207],billions(US$)[208],per capita(US$),square miles,squarekm,persons / sq. mi,persons /sq. km
0,The Bronx\n,\n Bronx\n,"1,471,160\n",28.787\n,"19,570\n",42.10\n,109.04\n,"34,653\n","13,231\n",,,,,,
1,Brooklyn\n,\n Kings\n,"2,648,771\n",63.303\n,"23,900\n",70.82\n,183.42\n,"37,137\n","14,649\n",,,,,,
2,Manhattan\n,\n New York\n,"1,664,727\n",629.682\n,"378,250\n",22.83\n,59.13\n,"72,033\n","27,826\n",,,,,,
3,Queens\n,\n Queens\n,"2,358,582\n",73.842\n,"31,310\n",108.53\n,281.09\n,"21,460\n","8,354\n",,,,,,
4,Staten Island\n,\n Richmond\n,"479,458\n",11.249\n,"23,460\n",58.37\n,151.18\n,"8,112\n","3,132\n",,,,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188,"10,947\n",,,,,,,
6,State of New York,19849399,1547.116,78354,47214,122284,416.4,159\n,,,,,,,
7,Sources:[209] and see individual borough artic...,,,,,,,,,,,,,,


In [117]:
# Remove whitespaces and rename columns

Demo_data.columns

Index(['New York City's five boroughsvte\n', 'Jurisdiction\n', 'Population\n',
       'Gross Domestic Product\n', 'Land area\n', 'Density\n', 'Borough',
       'County', 'Estimate (2017)[207]', 'billions(US$)[208]',
       'per capita(US$)', 'square miles', 'squarekm', 'persons / sq. mi',
       'persons /sq. km\n'],
      dtype='object')

In [118]:
Demo_data.rename(columns = {'2010[237]' : '2010',
                   '1990[239]':'1990',
                   '1970[239]':'1970', 
                   '1940[239]\n':'1940',
                    }, inplace=True)
Demo_data

Unnamed: 0,New York City's five boroughsvte,Jurisdiction,Population,Gross Domestic Product,Land area,Density,Borough,County,Estimate (2017)[207],billions(US$)[208],per capita(US$),square miles,squarekm,persons / sq. mi,persons /sq. km
0,The Bronx\n,\n Bronx\n,"1,471,160\n",28.787\n,"19,570\n",42.10\n,109.04\n,"34,653\n","13,231\n",,,,,,
1,Brooklyn\n,\n Kings\n,"2,648,771\n",63.303\n,"23,900\n",70.82\n,183.42\n,"37,137\n","14,649\n",,,,,,
2,Manhattan\n,\n New York\n,"1,664,727\n",629.682\n,"378,250\n",22.83\n,59.13\n,"72,033\n","27,826\n",,,,,,
3,Queens\n,\n Queens\n,"2,358,582\n",73.842\n,"31,310\n",108.53\n,281.09\n,"21,460\n","8,354\n",,,,,,
4,Staten Island\n,\n Richmond\n,"479,458\n",11.249\n,"23,460\n",58.37\n,151.18\n,"8,112\n","3,132\n",,,,,,
5,City of New York,8622698,806.863,93574,302.64,783.83,28188,"10,947\n",,,,,,,
6,State of New York,19849399,1547.116,78354,47214,122284,416.4,159\n,,,,,,,
7,Sources:[209] and see individual borough artic...,,,,,,,,,,,,,,


In [119]:
Demo_data.columns

Index(['New York City's five boroughsvte\n', 'Jurisdiction\n', 'Population\n',
       'Gross Domestic Product\n', 'Land area\n', 'Density\n', 'Borough',
       'County', 'Estimate (2017)[207]', 'billions(US$)[208]',
       'per capita(US$)', 'square miles', 'squarekm', 'persons / sq. mi',
       'persons /sq. km\n'],
      dtype='object')

In [0]:

Demo_data.columns = Demo_data.columns.str.replace(' ', '')

In [121]:
# Replace newline('\n') from each string from left and right sides

Demo_data= Demo_data.replace('\n',' ', regex=True)
Demo_data

Unnamed: 0,NewYorkCity'sfiveboroughsvte,Jurisdiction,Population,GrossDomesticProduct,Landarea,Density,Borough,County,Estimate(2017)[207],billions(US$)[208],percapita(US$),squaremiles,squarekm,persons/sq.mi,persons/sq.km
0,The Bronx,Bronx,1471160.0,28.787,19570.0,42.1,109.04,34653.0,13231.0,,,,,,
1,Brooklyn,Kings,2648771.0,63.303,23900.0,70.82,183.42,37137.0,14649.0,,,,,,
2,Manhattan,New York,1664727.0,629.682,378250.0,22.83,59.13,72033.0,27826.0,,,,,,
3,Queens,Queens,2358582.0,73.842,31310.0,108.53,281.09,21460.0,8354.0,,,,,,
4,Staten Island,Richmond,479458.0,11.249,23460.0,58.37,151.18,8112.0,3132.0,,,,,,
5,City of New York,8622698,806.863,93574.0,302.64,783.83,28188.0,10947.0,,,,,,,
6,State of New York,19849399,1547.116,78354.0,47214.0,122284.0,416.4,159.0,,,,,,,
7,Sources:[209] and see individual borough artic...,,,,,,,,,,,,,,


In [0]:
# Strip '[240]' from third column - 1970

#Demo_data['1970'] = Demo_data['1970'].str.rstrip('[240]')
#Demo_data

In [0]:
# Save dataframe as csv file

Demo_data.to_csv('BON2_DEMOGRAPHICS.csv',index=False)

## Part 3 Download and Explore New York city and its Boroughs Cuisine dataset
Download all the dependencies that is need.

In [124]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
from PIL import Image # converting images into arrays

%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot') # optional: for ggplot-like style

# check for latest version of Matplotlib
print ('Matplotlib version: ', mpl.__version__) # >= 2.0.0

# install wordcloud
!conda install -c conda-forge wordcloud==1.4.1 --yes

# import package and its set of stopwords
from wordcloud import WordCloud, STOPWORDS

print ('Wordcloud is installed and imported!')


Matplotlib version:  3.0.3
/bin/bash: conda: command not found
Wordcloud is installed and imported!


In [125]:
website_url = requests.get('https://en.wikipedia.org/wiki/Cuisine_of_New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})
#print(soup.prettify())

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('BON2_POPULATION1.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)

AttributeError: ignored

In [126]:
import pandas as pd

# Read the table
# The table headers are in row 0
table = pd.read_html('https://data.cityofnewyork.us/dataset/DOHMH-Farmers-Markets/8vwk-6iz2', header=0)

# Create the initial dataframe from the table
df = pd.DataFrame(data = table[0])

# Print the shape
print('The shape of the Raw Inital Datafram is: ', df.shape)

# Output the Head of the Table
df.head()


ValueError: ignored

In [0]:

my_file = df.get_file("BON3_NYC_CUISINE.csv")

# Read the CSV data file from the object storage into a pandas DataFrame
my_file.seek(0)
import pandas as pd
NYC_CUISINE=pd.read_csv(my_file)
NYC_CUISINE.drop(NYC_CUISINE.columns[[3,4,5,6,7]], axis=1,inplace=True) 
NYC_CUISINE.head()

In [127]:

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

/bin/bash: conda: command not found
/bin/bash: conda: command not found
Libraries imported.


In [128]:
NYC_Geo=pd.read_csv('BON1_NYC_GEO.csv')
print('Data downloaded!')

Data downloaded!


In [129]:
NYC_Geo.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [130]:
NYC_Geo['Borough'].value_counts().to_frame()

Unnamed: 0,Borough
Queens,81
Brooklyn,70
Staten Island,63
Bronx,52
Manhattan,40


In [131]:

NYC_Geo.shape

(306, 4)

In [132]:
print(NYC_Geo.Borough.unique())

['Bronx' 'Manhattan' 'Brooklyn' 'Queens' 'Staten Island']


In [133]:
NYC_Geo.isnull().sum()

Borough         0
Neighborhood    0
Latitude        0
Longitude       0
dtype: int64

In [134]:

BM_Geo = NYC_Geo.loc[(NYC_Geo['Borough'] == 'Brooklyn')|(NYC_Geo['Borough'] == 'Manhattan')]
BM_Geo = BM_Geo.reset_index(drop=True)
BM_Geo.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Brooklyn,Bay Ridge,40.625801,-74.030621
2,Brooklyn,Bensonhurst,40.611009,-73.99518
3,Brooklyn,Sunset Park,40.645103,-74.010316
4,Brooklyn,Greenpoint,40.730201,-73.954241


In [135]:

BM_Geo.shape

(110, 4)

In [136]:
import time
start_time = time.time()

address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

print("--- %s seconds ---" % round((time.time() - start_time), 2))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.
--- 0.4 seconds ---


In [137]:

# create map of Toronto using latitude and longitude values
map_BM = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(BM_Geo['Latitude'], BM_Geo['Longitude'], BM_Geo['Borough'], BM_Geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_BM)  
    
map_BM

In [138]:
CLIENT_ID = 'RSFX2VMGYFSSADXA1EX4RPSPDSDFIRG5CAP4EJIWZYGORKV2' # your Foursquare ID
CLIENT_SECRET = '3UEW24UR0HO40NKJQJBXYEVCPZ0CGH0YC3JGG0HFO055KNR1' # your Foursquare Secret
VERSION = '20190824' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RSFX2VMGYFSSADXA1EX4RPSPDSDFIRG5CAP4EJIWZYGORKV2
CLIENT_SECRET:3UEW24UR0HO40NKJQJBXYEVCPZ0CGH0YC3JGG0HFO055KNR1


In [0]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT=200, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [140]:
BM_venues = getNearbyVenues(names=BM_Geo['Neighborhood'],
                                  latitudes=BM_Geo['Latitude'],
                                  longitudes=BM_Geo['Longitude'],
                                  LIMIT=200)

print('The "BM_venues" dataframe has {} venues and {} unique venue types.'.format(
      len(BM_venues['Venue Category']),
      len(BM_venues['Venue Category'].unique())))

BM_venues.to_csv('BM_venues.csv', sep=',', encoding='UTF8')
BM_venues.head()

Marble Hill
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill


KeyError: ignored

In [141]:
colnames = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
BM_venues = pd.read_csv('BM_venues.csv', skiprows=1, names=colnames)
BM_venues.columns = BM_venues.columns.str.replace(' ', '')
BM_venues.head()

FileNotFoundError: ignored