# <center>Coursera_Capstone<center>
    
## <center>Title: To analyse suitable place for opening New Shopping Mall<center><center>Place: Pune, India<center>

### Work done:


1. Build a dataframe of neighborhoods in Pune , India which are under Pune Muncipal Corporation by scraping data from Wikipedia    page
2. Get the geographical coordinates of the neighborhoods
3. Obtain the venue data for the neighborhoods from Foursquare API
4. Explore and cluster the neighborhoods
5. Select the best cluster to open a new shopping mall



##### 1. Importing all required libraries

In [5]:
# Import necessary libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

##### 2. Scraping data from wikipedia to get names of neighborhood's which comes under Pune Muncipal Corporation

In [6]:
data = requests.get("https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Pune").text

In [7]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [8]:
neighborhoodList = []

In [9]:
# append the data into the list
for row in soup.find_all("ul")[1].find_all("li"):
    neighborhoodList.append(row.text)

In [10]:
# create a new DataFrame from the list
Pune_df = pd.DataFrame({"Neighborhood": neighborhoodList})

Pune_df

Unnamed: 0,Neighborhood
0,Ambegaon
1,Aundh
2,Baner
3,Bavdhan Khurd
4,Bavdhan Budruk
5,Balewadi
6,Bhamburde (now called Shivajinagar)
7,Bibvewadi
8,Bhugaon
9,Bhukum


In [11]:
# print the number of rows of the dataframe
Pune_df.shape


(47, 1)

##### 3. Get coordinates for all neighborhood's by using geocoder library. Then Merge two datasets to get neighborhoods and coordinates together

In [12]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Pune, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [13]:
import geocoder

In [14]:
coords = [ get_latlng(neighborhood) for neighborhood in Pune_df["Neighborhood"].tolist() ]

In [15]:
coords

[[19.00496000000004, 73.94583000000006],
 [18.563450000000046, 73.81227000000007],
 [18.548200000000065, 73.77316000000008],
 [18.511100000000056, 73.77773000000008],
 [18.51827000000003, 73.76557000000008],
 [18.576020000000028, 73.77983000000006],
 [18.537230000000022, 73.83808000000005],
 [18.471870000000024, 73.86336000000006],
 [18.499220000000037, 73.75316000000004],
 [18.495100000000036, 73.72124000000008],
 [18.46628000000004, 73.85326000000003],
 [18.57856000000004, 73.89264000000003],
 [18.447020000000066, 73.80757000000006],
 [18.509650000000022, 73.83124000000004],
 [18.473650000000077, 73.97473000000008],
 [18.522320000000036, 73.89712000000003],
 [18.502530000000036, 73.92706000000004],
 [18.479790000000037, 73.83075000000008],
 [18.49150000000003, 73.82172000000008],
 [18.578450000000032, 73.87489000000005],
 [18.447320000000047, 73.86405000000008],
 [18.561140000000023, 73.85300000000007],
 [18.544620000000066, 73.93922000000003],
 [18.43825000000004, 73.89895000000007]

In [16]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [17]:
# merge the coordinates into the original dataframe
Pune_df['Latitude'] = df_coords['Latitude']
Pune_df['Longitude'] = df_coords['Longitude']

In [18]:
print(Pune_df.shape)
Pune_df

(47, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Ambegaon,19.00496,73.94583
1,Aundh,18.56345,73.81227
2,Baner,18.5482,73.77316
3,Bavdhan Khurd,18.5111,73.77773
4,Bavdhan Budruk,18.51827,73.76557
5,Balewadi,18.57602,73.77983
6,Bhamburde (now called Shivajinagar),18.53723,73.83808
7,Bibvewadi,18.47187,73.86336
8,Bhugaon,18.49922,73.75316
9,Bhukum,18.4951,73.72124


In [19]:
# import data to csv for further use
Pune_df.to_csv(r'Pune_Cord.csv',index=False)

##### 4. Plot the map for neighborhoods from the cordinated obtained

In [20]:
# get the coordinates of Kuala Lumpur
address = 'Pune, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Pune, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Pune, India 18.521428, 73.8544541.


In [21]:
# create map of Pune using latitude and longitude values
map_PU = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(Pune_df['Latitude'], Pune_df['Longitude'], Pune_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_PU)  
    
map_PU

#### 5. Get 100 venues from 2000 meter radius . For that we need to get data by using Foursquare API

In [69]:
# define Foursquare Credentials and Version
CLIENT_ID = 'XXXXXXXXX'# your Foursquare ID
CLIENT_SECRET = 'XXXXXXXXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [23]:
radius = 5000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(Pune_df['Latitude'], Pune_df['Longitude'], Pune_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [24]:
# convert the venues list into a new DataFrame
venues = pd.DataFrame(venues)

# define the column names
venues.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues.shape)
venues.head()

(3345, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ambegaon,19.00496,73.94583,Go cheese world,18.995747,73.944337,Museum
1,Ambegaon,19.00496,73.94583,Axis Bank ATM,19.00798,73.927422,ATM
2,Ambegaon,19.00496,73.94583,Axis Bank ATM,19.00798,73.927422,ATM
3,Ambegaon,19.00496,73.94583,Indraprasta,19.026795,73.9513,Fast Food Restaurant
4,Ambegaon,19.00496,73.94583,Hotel Sarja,19.033881,73.955288,Indian Restaurant


In [26]:
venues.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ambegaon,7,7,7,7,7,7
Aundh,100,100,100,100,100,100
Balewadi,100,100,100,100,100,100
Baner,100,100,100,100,100,100
Bavdhan Budruk,74,74,74,74,74,74
Bavdhan Khurd,73,73,73,73,73,73
Bhamburde (now called Shivajinagar),100,100,100,100,100,100
Bhugaon,42,42,42,42,42,42
Bhukum,9,9,9,9,9,9
Bibvewadi,100,100,100,100,100,100


In [28]:
print('There are {} uniques categories.'.format(len(venues['VenueCategory'].unique())))

There are 135 uniques categories.


##### 6. list of unique categories

In [30]:
# print out the list of categories
venues['VenueCategory'].unique()[:50]

array(['Museum', 'ATM', 'Fast Food Restaurant', 'Indian Restaurant',
       'Vegetarian / Vegan Restaurant', 'Asian Restaurant', 'Bookstore',
       'Shopping Mall', 'English Restaurant', 'Dessert Shop',
       'Coffee Shop', 'Donut Shop', 'Ice Cream Shop', 'Gym', 'Lounge',
       'Clothing Store', 'Chinese Restaurant', 'Bakery',
       'Mexican Restaurant', 'Chocolate Shop', 'Hotel', 'Brewery',
       'Italian Restaurant', 'Malay Restaurant', 'Multiplex',
       'South Indian Restaurant', 'Jewelry Store', 'BBQ Joint',
       'Snack Place', 'Breakfast Spot', 'Bistro',
       'Molecular Gastronomy Restaurant', 'Restaurant', 'Nightclub',
       'Trail', 'Café', 'Theme Park', 'Other Great Outdoors',
       'Motorcycle Shop', 'Sandwich Place', 'French Restaurant', 'Bar',
       'Punjabi Restaurant', 'Pizza Place', 'Seafood Restaurant',
       'Beer Garden', 'Golf Course', 'Department Store', 'Stadium',
       'Food Court'], dtype=object)

#### 7. Analyze Neighborhoods and also implement one hot encoding

In [33]:
# one hot encoding
PU_onehot = pd.get_dummies(venues[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
PU_onehot['Neighborhoods'] = venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [PU_onehot.columns[-1]] + list(PU_onehot.columns[:-1])
PU_onehot = PU_onehot[fixed_columns]

print(PU_onehot.shape)
PU_onehot.head()

(3345, 136)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport Service,American Restaurant,Arcade,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach Bar,Beer Garden,Bistro,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Burrito Place,Bus Station,Café,Chaat Place,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Falafel Restaurant,Fast Food Restaurant,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Italian Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Lounge,Maharashtrian Restaurant,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Mountain,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Nightclub,North Indian Restaurant,Organic Grocery,Other Great Outdoors,Other Nightlife,Park,Pizza Place,Plaza,Pool,Pub,Punjabi Restaurant,Racetrack,Resort,Restaurant,River,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Supermarket,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Trail,Tunnel,Vegetarian / Vegan Restaurant,Warehouse Store,Yoga Studio,Zoo
0,Ambegaon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ambegaon,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ambegaon,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ambegaon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ambegaon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [34]:
PU_grouped = PU_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(PU_grouped.shape)
PU_grouped

(46, 136)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport Service,American Restaurant,Arcade,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach Bar,Beer Garden,Bistro,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Burrito Place,Bus Station,Café,Chaat Place,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Falafel Restaurant,Fast Food Restaurant,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Italian Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Lounge,Maharashtrian Restaurant,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Molecular Gastronomy Restaurant,Motel,Motorcycle Shop,Mountain,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Nightclub,North Indian Restaurant,Organic Grocery,Other Great Outdoors,Other Nightlife,Park,Pizza Place,Plaza,Pool,Pub,Punjabi Restaurant,Racetrack,Resort,Restaurant,River,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Supermarket,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Trail,Tunnel,Vegetarian / Vegan Restaurant,Warehouse Store,Yoga Studio,Zoo
0,Ambegaon,0.285714,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0
1,Aundh,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.04,0.01,0.0,0.01,0.03,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.07,0.01,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.13,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.08,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0
2,Balewadi,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.03,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.06,0.0,0.14,0.03,0.0,0.01,0.0,0.0,0.01,0.0,0.05,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
3,Baner,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.05,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.02,0.02,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.05,0.01,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.15,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bavdhan Budruk,0.0,0.0,0.0,0.0,0.0,0.027027,0.013514,0.040541,0.013514,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.148649,0.0,0.0,0.027027,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.040541,0.0,0.0,0.0,0.013514,0.0,0.162162,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.027027,0.013514,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bavdhan Khurd,0.0,0.0,0.0,0.0,0.0,0.027397,0.013699,0.041096,0.013699,0.0,0.013699,0.013699,0.0,0.0,0.0,0.013699,0.0,0.013699,0.013699,0.0,0.0,0.150685,0.0,0.0,0.041096,0.0,0.0,0.0,0.041096,0.0,0.0,0.013699,0.0,0.0,0.027397,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041096,0.0,0.027397,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.013699,0.027397,0.0,0.0,0.041096,0.0,0.0,0.0,0.027397,0.0,0.136986,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.013699,0.0,0.0,0.013699,0.0,0.0,0.013699,0.0,0.0,0.0,0.013699,0.013699,0.027397,0.0,0.0,0.013699,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.013699,0.0,0.0,0.0
6,Bhamburde (now called Shivajinagar),0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.06,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.03,0.01,0.0,0.02,0.0,0.1,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.02,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.02,0.0,0.06,0.0,0.0,0.0
7,Bhugaon,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.214286,0.0,0.0,0.02381,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bhukum,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bibvewadi,0.0,0.0,0.0,0.01,0.01,0.03,0.01,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.0,0.0,0.08,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.06,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.07,0.0,0.1,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.01


In [35]:
len(PU_grouped[PU_grouped["Shopping Mall"] > 0])

34

In [36]:
PU_mall = PU_grouped[["Neighborhoods","Shopping Mall"]]

In [37]:
PU_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Ambegaon,0.0
1,Aundh,0.02
2,Balewadi,0.01
3,Baner,0.01
4,Bavdhan Budruk,0.0


##### 8. Implement clustering with cluster size 3

In [59]:
# set number of clusters
kclusters = 3

PU_clustering = PU_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(PU_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 0, 1, 1, 0, 1, 1, 0])

In [60]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
PU_merged = PU_mall.copy()

# add clustering labels
PU_merged["Cluster Labels"] = kmeans.labels_

In [61]:
PU_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
PU_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Ambegaon,0.0,1
1,Aundh,0.02,0
2,Balewadi,0.01,0
3,Baner,0.01,0
4,Bavdhan Budruk,0.0,1


In [62]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
PU_merged = PU_merged.join(Pune_df.set_index("Neighborhood"), on="Neighborhood")

print(PU_merged.shape)
PU_merged.head() # check the last columns!

(46, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Ambegaon,0.0,1,19.00496,73.94583
1,Aundh,0.02,0,18.56345,73.81227
2,Balewadi,0.01,0,18.57602,73.77983
3,Baner,0.01,0,18.5482,73.77316
4,Bavdhan Budruk,0.0,1,18.51827,73.76557


In [63]:
# sort the results by Cluster Labels
print(PU_merged.shape)
PU_merged.sort_values(["Cluster Labels"], inplace=True)
PU_merged

(46, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
45,Yerwada,0.01,0,18.544836,73.884677
29,Mundhwa,0.02,0,18.53017,73.92125
28,Mohammed Wadi,0.02,0,18.47867,73.91594
35,Shivane,0.011765,0,18.46781,73.78897
26,Manjri,0.01,0,18.48194,73.865618
25,Kothrud,0.01,0,18.50517,73.80245
24,Koregaon Park,0.01,0,18.53533,73.89382
44,Warje,0.018868,0,18.47211,73.80213
21,Khadki,0.02,0,18.56114,73.853
37,Undri,0.017857,0,18.45427,73.91788


In [64]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(PU_merged['Latitude'], PU_merged['Longitude'], PU_merged['Neighborhood'], PU_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### 9. Examine Clusters

In [65]:
PU_merged.loc[PU_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
45,Yerwada,0.01,0,18.544836,73.884677
29,Mundhwa,0.02,0,18.53017,73.92125
28,Mohammed Wadi,0.02,0,18.47867,73.91594
35,Shivane,0.011765,0,18.46781,73.78897
26,Manjri,0.01,0,18.48194,73.865618
25,Kothrud,0.01,0,18.50517,73.80245
24,Koregaon Park,0.01,0,18.53533,73.89382
44,Warje,0.018868,0,18.47211,73.80213
21,Khadki,0.02,0,18.56114,73.853
37,Undri,0.017857,0,18.45427,73.91788


In [66]:
PU_merged.loc[PU_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
34,Pirangut,0.0,1,18.51123,73.68317
42,Wagholi,0.0,1,18.57953,73.98529
36,Sus,0.0,1,18.5467,73.75113
0,Ambegaon,0.0,1,19.00496,73.94583
31,Panmala,0.0,1,18.87647,73.89708
27,Markal,0.0,1,18.66757,73.95257
14,Fursungi,0.0,1,18.47365,73.97473
8,Bhukum,0.0,1,18.4951,73.72124
7,Bhugaon,0.0,1,18.49922,73.75316
5,Bavdhan Khurd,0.0,1,18.5111,73.77773


In [67]:
PU_merged.loc[PU_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
30,Nanded,0.03125,2,18.45642,73.792
23,Kondhwa,0.052632,2,18.43825,73.89895
20,Katraj,0.03125,2,18.44732,73.86405
12,Dhayari,0.035714,2,18.44702,73.80757
22,Kharadi,0.03,2,18.54462,73.93922


#### 10. Observation

* From analysis of clusters it is evident that cluster 2 has  more shopping malls as compared to cluster 1 and cluster 0.
* Also we can see that cluster 1 has no malls at all when compared to cluster 2 and 0
* So from above two observations we have cluster 1 to open shopping malss in order to get more revenue, as there would be no     competition for new mall opened
* There is one unique observation about cluster 2. The neighborhoods that are present in cluster 2 have renounced IT parks in     the area nearby to shopping malls.
* So there is also a chance for developers to propose new plan for IT parks in neighborhoods present in cluster 1 and 0.    Although another analysis for IT Park proposal would be needed to select places, but still the idea would work out fine.
* Main takeaway from this analysi for developers is, no investment for new proposals should be made in neighborhoods of cluster 2 as there is already enough competition going on among those present.