# Capstone Project: The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find out an optimal livable location in a city.
By livable location here it means the number of facilities, nearby venues provided in the vicinity of a location. This report specifically targets stakeholders who are interested in finding a livable location in **Pune city, India**.

Also, the stakeholder would choose a particular location which is closest to its workplace. Hence, we would categorize various locations on the basis of their distance between **the most prominent workplaces/ industrial zones in the city** and **nearby venues to those locations**

We would cluster all the important/livable and recommended regions where someone can live which have the most number of amenities/venues in the locality. User/Stakeholder can choose according the location nearest to his/her office/campus. 

For example: - Hinjewadi and Magarpatta are two IT company zones which are situated in opposite direction to each other, If someone lives in a location near to Hinjewadi and his/her company moves to Magarpatta, which location should be suggested having similar facilities/ venues compared to his/her current living location.

We would identify areas with most promising characteristics and their advantages will then be clearly expressed, so that best possible final location may be chosen by our stakeholders.

## Data <a name="data"></a>

We have taken latitude and longitude of most prominent pune locations from ___[PMC Open Data Store](http://opendata.punecorporation.org/Citizen/CitizenDatasets/Index)___ offical website and __[Kaggle Dataset](https://www.kaggle.com/dynamic22/pune-property-prices)__. The data derived from kaggle dataset do not have latitude longitude information, hence we have used geopy library to fetch latitude and longitude values for such locations.

In [3]:
#Let's try to read noise pollution data
import pandas as pd
location_df=pd.read_excel('dataset.xlsx', sheet_name='Sheet2',  header=0, nrows=199)


In [4]:
import requests
location_df

Unnamed: 0,Location,Latitude,Longitude
0,Bund Garden,18.539848,73.885117
1,Shivajinagar,18.510099,73.817398
2,Aundh,18.563162,73.809555
3,Kondhwa,18.478436,73.890213
4,Chinchwad,18.636131,73.796143
5,Satara Road,18.488499,73.857956
6,Kothrud,18.508699,73.812500
7,Senapati Bapat Road,18.534451,73.837349
8,Kalyani Nagar,18.548101,73.900070
9,Hinjewadi Phase1,18.586555,73.734741


Let's import folium and locate this points on Pune map

In [9]:
import folium
from geopy.geocoders import Nominatim
print('libraries imported')

libraries imported


#### Use geopy library to get longitude and latitude of Pune city, India

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>pune_explorer</em>, as shown below.

The below code is used to fetch and store latitude and longitude values in excel file.

In [6]:
import math
latlon = []
for index in range(66,121):
    if math.isnan(location_df.loc[index,'Latitude']):
        address = location_df.loc[index,'Location'] + ', Pune IN'
        geolocator = Nominatim(user_agent="pune_explorer")
        location = geolocator.geocode(address)
        if location is not None:
            location_df.loc[index,'Latitude'] = location.latitude
            location_df.loc[index,'Longitude'] = location.longitude
            print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))
location_df.to_excel('dataset1.xlsx')

We will also use location of 10 industrial areas in Pune. This data is also taken from ___[PMC Open Data Store](http://opendata.punecorporation.org/Citizen/CitizenDatasets/Index)___ offical website. The latitude and longitude of these locations is also fetched using geopy library and then stored in the excel file. Let's read that file

In [7]:
industries_df=pd.read_excel('Major Industries.xlsx', sheet_name='Sheet1',  header=0, nrows=10)
industries_df

Unnamed: 0,Industries,Latitude,Longitude
0,Pimpri Chinchwad MIDC,18.627929,73.800983
1,Rajiv Gandhi InfoTech Park Hinjewadi Phase I,18.591684,73.734782
2,Rajiv Gandhi InfoTech Park Hinjewadi Phase II,18.598255,73.706207
3,Rajiv Gandhi InfoTech Park Hinjewadi Phase III...,18.59177,73.733895
4,Magarpatta City,18.522141,73.93174
5,Kharadi Knowledge Park,18.550518,73.942494
6,Talawade InfoTech Park,18.739658,73.806857
7,Talegaon Floriculture Park,18.729488,73.654067
8,Ranjangaon Industrial Area,18.753635,74.244579
9,Chakan Industrial Area,18.762311,73.862545


Now, create a map of Pune city with nearby locations superimposed on top

In [20]:
# create map of Pune using latitude and longitude values
address = 'Pune, IN'
geolocator = Nominatim(user_agent="pune_explorer")
location = geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
map_pune = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, Location in zip(location_df['Latitude'], location_df['Longitude'], location_df['Location']):
    label = Location
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_pune)  
    
map_pune

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [12]:
CLIENT_ID = 'TR4OJXMM340Z5YTVDZ1QGAG1YHPBOJ4NEPK4V52K10Y0RYPY' # your Foursquare ID
CLIENT_SECRET = 'DZFZHGTHTBEWNOM25GNI24LBCGGUTLVPQMALSZQBQQWSIZMT' # your Foursquare Secret
VERSION = '20190324' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TR4OJXMM340Z5YTVDZ1QGAG1YHPBOJ4NEPK4V52K10Y0RYPY
CLIENT_SECRET:DZFZHGTHTBEWNOM25GNI24LBCGGUTLVPQMALSZQBQQWSIZMT


## Explore Neighborhoods in Pune

Let's borrow the **get_category_type** function from the Foursquare lab.

In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Let's create a function to retrieve nearby venues in Pune

We will use 1 Km as radius this time

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    LIMIT=100
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now lets write the code to run the above function on each neighborhood and create a new dataframe.

In [19]:
pune_venues = getNearbyVenues(names=location_df['Location'],
                                   latitudes=location_df['Latitude'],
                                   longitudes=location_df['Longitude']
                                  )

Bund Garden
Shivajinagar
Aundh
Kondhwa
Chinchwad
Satara Road
Kothrud
Senapati Bapat Road
Kalyani Nagar
Hinjewadi Phase1
Hinjewadi Phase2
Magarpatta City
VimanNagar
Baner
Hinjewadi
Kirkee
Fatima Nagar
Pimpri
Model Colony - Wealth Branch
Pimple Saudagar
Sinhagad Road
Tilak Road
Bavdhan
Katraj
Aundh - Nagardas Road
Koregaon Park - Wealth Branch
Kharadi
Nigdi
Thermax Chowk
Mayur Colony
Sus Pashan Road
WTC-Kharadi
Navi Peth
Warje
Murti - Baramati
Karanjepul
Kamthadi
Kikvi
Malad Patas
Deulgaon Raje
Pimpalgaon - Daund
Pargaon
B T Kawade
Balewadi, Maharashtra
Karve Nagar
Nanded City, Maharashtra
Bhigwan
Sahakar Nagar
Bhandarkar Road
Raviwar Peth
Sadashiv Peth
Erandavana
Camp
Paud Road
Ghole Road
Blueridge Hinjewadi
Ravet
New Sanghvi
Pirangut
Narhe
Wagholi
Mohammadwadi
Bhosle Nagar
Undri Pisoli
E Square University Road
Vishrantwadi
Nana Peth
Fursungi
Salunke Vihar Road
Manjari Road Hadapsar
Phoenix Mall
Shivar Garden Chowk
Baner-D-Mart Complex
Null Stop-Karve Road
Market Yard
Pimple Nilakh
Prab

In [21]:
print(pune_venues.shape)
pune_venues.head()

(5383, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bund Garden,18.539848,73.885117,La Pizzeria,18.539621,73.883401,Italian Restaurant
1,Bund Garden,18.539848,73.885117,Hidden Place - The Hangout,18.539651,73.887023,Pub
2,Bund Garden,18.539848,73.885117,Savya Rasa,18.538874,73.886561,South Indian Restaurant
3,Bund Garden,18.539848,73.885117,Starbucks Coffee: A Tata Alliance,18.539341,73.886602,Coffee Shop
4,Bund Garden,18.539848,73.885117,Little Italy,18.539598,73.883464,Italian Restaurant


Let's check how many venues were returned for each neighborhood

In [22]:
pune_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adarsh Nagar,100,100,100,100,100,100
Akurdi,20,20,20,20,20,20
Alandi,4,4,4,4,4,4
Alandi Road,4,4,4,4,4,4
Ambedkar Nagar,100,100,100,100,100,100
Anand Nagar,15,15,15,15,15,15
Anand Park Nagar,15,15,15,15,15,15
Ashok Nagar,10,10,10,10,10,10
Aundh,61,61,61,61,61,61
Aundh - Nagardas Road,52,52,52,52,52,52


#### Let's find out how many unique categories can be curated from all the returned venues

In [23]:
print('There are {} uniques categories.'.format(len(pune_venues['Venue Category'].unique())))

There are 233 uniques categories.


I will use these 3 datasets to determine similarity across various locations in Pune city.