# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

### BY AHMED ELREFAeY

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Chinese restaurant** in **Bronox**,NY, USA.

Since there are lots of restaurants in Berlin we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Italian restaurants in vicinity**.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

our data are :
-- NEW YORK dataset :
from their official location which includes the city's boroughs and their neighborhoods with latitude and longitude

-- Foursquare location data:
we will get it with requests with their api so we can search and explore each neighborhood to search for other Chinese restaurants

## Importing needed  libraries 

In [1]:
# import numpy as np # library to handle data in a vectorized manner
import numpy as np
import pandas as pd # library for data analsysis

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.4.0               |             py_0          26 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                       

In [2]:
import folium

<a id='item1'></a>

##  Download and Explore Dataset

In [3]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data.

In [4]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [5]:
neighborhoods_data = newyork_data['features']

#### Tranform the data into a *pandas* dataframe

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [7]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [9]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


And make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [10]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


### Use geopy library to get the latitude and longitude values of New York City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [11]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.

In [12]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Bronox that have low restaurant density, particularly those with low number of Chinese restaurants. We will limit our analysis to neighborhoods in Bronox 

we will use Foursquare data  to get the locations and number of the restaurants in each neighboerhood
we will use that data and merge them with data from NY website to get the full view and then direct the data to the machine learning algorithms 

In [13]:
neighborhoods['Borough'].unique()

array(['Bronx', 'Manhattan', 'Brooklyn', 'Queens', 'Staten Island'],
      dtype=object)

In [14]:
neighborhoods.groupby(neighborhoods['Borough']).count()

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bronx,52,52,52
Brooklyn,70,70,70
Manhattan,40,40,40
Queens,81,81,81
Staten Island,63,63,63


In [17]:
bronx_data = neighborhoods[neighborhoods['Borough'] == 'Bronx'].reset_index(drop=True)
bronx_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Let's get the geographical coordinates of Manhattan.

In [15]:

address = 'Bronx, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bronx are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bronx are 40.8466508, -73.8785937.


As we did with all of New York City, let's visualizat Manhattan the neighborhoods in it.

In [18]:
# create map of Manhattan using latitude and longitude values
map_Bronx = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(bronx_data['Latitude'], bronx_data['Longitude'], bronx_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Bronx)  
    
map_Bronx

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [19]:
CLIENT_ID = 'MBT4LMRUADMEP1UAE3H53UWI0AIOBTA0UHTHR0RRSFR5BQE0' # your Foursquare ID
CLIENT_SECRET = 'SLTAEWBCL5JZ50L2SNMQX3AAP5F5MMPNQFL4U5GYNJAG03BS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MBT4LMRUADMEP1UAE3H53UWI0AIOBTA0UHTHR0RRSFR5BQE0
CLIENT_SECRET:SLTAEWBCL5JZ50L2SNMQX3AAP5F5MMPNQFL4U5GYNJAG03BS


## 2. Explore Neighborhoods in Bronox

#### Let's create a function to repeat the same process to all the neighborhoods in Bronox

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT=100
    Query='Chinese restaurant'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            Query)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],
            v['venue']['location']['distance'],
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',
                  'Average Distance',           
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *manhattan_venues*.

In [21]:
# type your answer here

bronox_venues = getNearbyVenues(names=bronx_data['Neighborhood'],
                                   latitudes=bronx_data['Latitude'],
                                   longitudes=bronx_data['Longitude']
                                  )



Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Claremont Village
Concourse Village
Mount Eden
Mount Hope
Bronxdale
Allerton
Kingsbridge Heights


Double-click __here__ for the solution.
<!-- The correct answer is:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
--> 

#### Let's check the size of the resulting dataframe

In [22]:
print(bronox_venues.shape)


(193, 8)


Let's check how many venues were returned for each neighborhood

In [23]:
bronox_grouped=bronox_venues.groupby('Neighborhood').count()

#### Let's find out how many unique categories can be curated from all the returned venues

In [24]:
print('There are {} uniques categories.'.format(len(bronox_venues['Venue Category'].unique())))

There are 2 uniques categories.


<a id='item3'></a>

## Analysis <a name="analysis"></a>

In [30]:
# one hot encoding
bronox_onehot = pd.get_dummies(bronox_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bronox_onehot['Neighborhood'] = bronox_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bronox_onehot.columns[-1]] + list(bronox_onehot.columns[:-1])
bronox_onehot = bronox_onehot[fixed_columns]
bronox_onehot.drop(columns=['Dim Sum Restaurant'],inplace=True)


And let's examine the new dataframe size.

In [31]:
print(bronox_onehot.shape)

print(bronox_venues.shape)


(193, 2)
(193, 8)


In [32]:
bronox_1=pd.merge(bronox_venues['Neighborhood'],bronox_onehot,on='Neighborhood')
bronox_1.head()

Unnamed: 0,Neighborhood,Chinese Restaurant
0,Co-op City,1
1,Eastchester,1
2,Kingsbridge,1
3,Kingsbridge,1
4,Kingsbridge,1


#### Next, let's group rows by neighborhood and by taking the number of Chinese Restaurants in each neighborhood 

In [33]:
bronox_grouped = bronox_1.groupby('Neighborhood').count().reset_index()



In [34]:
bronox_grouped.head()

Unnamed: 0,Neighborhood,Chinese Restaurant
0,Allerton,16
1,Baychester,9
2,Bedford Park,100
3,Belmont,4
4,Bronxdale,25


##  Cluster Neighborhoods in  BRONOX 
THis IS THE MACHINE LEARNING ALGORITHM RESPONSIBLE FOR SELECTING THE BEST Neighborhood

Run *k*-means to cluster the neighborhood into 7 clusters.

In [36]:
# set number of clusters
# import k-means from clustering stage
from sklearn.cluster import KMeans

kclusters = 7

bronox_grouped_clustering = bronox_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bronox_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 3, 5, 6, 3, 2, 0, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [37]:
# add clustering labels
bronox_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

bronox_final=bronox_grouped
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
bronox_final = bronox_final.join(bronx_data.set_index('Neighborhood'), on='Neighborhood')

bronox_final.head() # check the last columns!

Unnamed: 0,Cluster Labels,Neighborhood,Chinese Restaurant,Borough,Latitude,Longitude
0,0,Allerton,16,Bronx,40.865788,-73.859319
1,0,Baychester,9,Bronx,40.866858,-73.835798
2,1,Bedford Park,100,Bronx,40.870185,-73.885512
3,3,Belmont,4,Bronx,40.857277,-73.888452
4,5,Bronxdale,25,Bronx,40.852723,-73.861726


## Finally, let's visualize the resulting clusters

In [39]:
# create map
import numpy as np 
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bronox_final['Latitude'], bronox_final['Longitude'], bronox_final['Neighborhood'], bronox_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

##  Examine Clusters AND CHOOSING THE BEST ONE

#### Cluster 1

In [40]:
print(bronox_final.loc[bronox_final['Cluster Labels'] == 0].shape)
bronox_final.loc[bronox_final['Cluster Labels'] == 0, bronox_final.columns[[1] +  [2]]].reset_index(drop=True)

(13, 6)


Unnamed: 0,Neighborhood,Chinese Restaurant
0,Allerton,16
1,Baychester,9
2,Concourse Village,9
3,Edgewater Park,16
4,Longwood,9
5,Melrose,16
6,Morris Heights,16
7,Morris Park,9
8,Morrisania,16
9,Olinville,9


#### Cluster 2

In [41]:
print(bronox_final.loc[bronox_final['Cluster Labels'] == 1].shape)

bronox_final.loc[bronox_final['Cluster Labels'] == 1, bronox_final.columns[[1] + [2]]].reset_index(drop=True)

(2, 6)


Unnamed: 0,Neighborhood,Chinese Restaurant
0,Bedford Park,100
1,Fordham,100


#### Cluster 3

In [42]:
print(bronox_final.loc[bronox_final['Cluster Labels'] ==2].shape)

bronox_final.loc[bronox_final['Cluster Labels'] == 2, bronox_final.columns[[1] + [2]]].reset_index(drop=True)


(4, 6)


Unnamed: 0,Neighborhood,Chinese Restaurant
0,Concourse,64
1,Kingsbridge,64
2,Mount Hope,64
3,Norwood,64


#### Cluster 4

In [43]:
print(bronox_final.loc[bronox_final['Cluster Labels'] == 3].shape)

bronox_final.loc[bronox_final['Cluster Labels'] == 3, bronox_final.columns[[1] +  [2]]].reset_index(drop=True)

(12, 6)


Unnamed: 0,Neighborhood,Chinese Restaurant
0,Belmont,4
1,Co-op City,1
2,Country Club,4
3,Eastchester,1
4,Edenwald,1
5,North Riverdale,4
6,Pelham Gardens,4
7,Pelham Parkway,4
8,Port Morris,1
9,Spuyten Duyvil,1


### Cluster 5

In [44]:
print(bronox_final.loc[bronox_final['Cluster Labels'] == 4].shape)
bronox_final.loc[bronox_final['Cluster Labels'] == 4, bronox_final.columns[[1] +  [2]]].reset_index(drop=True)

(6, 6)


Unnamed: 0,Neighborhood,Chinese Restaurant
0,East Tremont,36
1,High Bridge,36
2,Mott Haven,36
3,Mount Eden,36
4,Parkchester,36
5,Unionport,36


### Cluster 6

In [46]:
print(bronox_final.loc[bronox_final['Cluster Labels'] == 5].shape)
bronox_final.loc[bronox_final['Cluster Labels'] == 5, bronox_final.columns[[1] +  [2]]].reset_index(drop=True)

(4, 6)


Unnamed: 0,Neighborhood,Chinese Restaurant
0,Bronxdale,25
1,Kingsbridge Heights,25
2,Schuylerville,25
3,West Farms,25


### Cluster 7

In [47]:
print(bronox_final.loc[bronox_final['Cluster Labels'] == 6].shape)
bronox_final.loc[bronox_final['Cluster Labels'] == 6, bronox_final.columns[[1] +  [2]]].reset_index(drop=True)

(3, 6)


Unnamed: 0,Neighborhood,Chinese Restaurant
0,Claremont Village,49
1,Pelham Bay,49
2,University Heights,49


## Conclusion <a name="conclusion"></a>

 THIS PROJECT WAS TO FIND THE BEST NEIGHBORHOOD FOR BUILDING A NEW CHINESE RESTAURANT taking in comsideration the number of chinese restaurants in each  neighborhood with its location 
 and it was a big problem for our investors so we used the machine learning to decide and we fed the algorithms with data from the official website of new york and the Foursquare company 
 then we run the algorithms and it diveded the neighborhods int 7 clusters and we chose the best one which was 
## Cluster 4 ,so the potentional neighborhoods are :


0             Belmont

1          Co-op City

2        Country Club

3         Eastchester

4            Edenwald

5     North Riverdale

6      Pelham Gardens

7      Pelham Parkway

8         Port Morris

9      Spuyten Duyvil

10        Throgs Neck

11           Woodlawn


### Thank you for your effort hope you get what you want after completing all these courses 
## always KEEP IT UP 
