## Detailed Description

### Introduction
**Ahmedabad** is the largest city of the Indian state of **Gujarat**. Ahmedabad has emerged as an important economic and industrial hub in India. In 2010, Ahmedabad was ranked third in *Forbes's* list of fastest-growing cities of the decade. There are around **<a href='https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad'>81  Neighborhoods</a>** in Ahmedabad. Each Neighborhood may have different popular venues, and thus if we can find similar neighbors and their popular venue categories, it may help Businesses in making Investment Decisions.

The target audience for this project are the Entrepreneurs, Investors or the Organizations who is planning to expand/start their business.

<img src = 'https://i.ibb.co/3vrJyWq/Marketing-Business-Corporate-Start-up-Facebook-Cover.png'>

### Problem Statement
Our main goal of this project is to find common venues of the Neighborhoods of Ahmedabad & cluster them based on the similar categories of venues. This will help us identifying similar Neighborhoods, which could help potential investors for choosing appropriate Neighborhood for their choice of venues. For example, Let's say we find that **Anand Nagar** is more famous for *Fast Food Restaurants* then this information could be helpful for an Organization/Investor who is looking for Investing in Fast Food Restaurant. 

### Data
First, we have to find out all the Neighborhoods of Ahmedabad. We will web scrape the data from <a href='https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad'>this</a> Wikipedia page which contains the names of all the Neighborhoods of Ahmedabad. Next, we would need location data (coordinates) of each of these Neighborhoods, which we will get using the Geopy library. Lastly, we will pass these coordinates using Foursquare API to fetch the nearby venue details of each Neighborhood.

### Required Libraries
- Pandas
- Numpy
- Json
- Geopy
- Matplotlib
- Folium
- Sklearn
- BeautifulSoup

## Implementation

### Importing required Libraries

In [2]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
from bs4 import BeautifulSoup # for web scraping
from urllib.request import urlopen

### Web Scraping to get required data
Web scraping Wikipedia page to get names of all the Neighborhoods of Ahmedabad

In [3]:
data = pd.DataFrame(columns = ['Neighborhood','Latitude','Longitude'])

In [4]:
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad'
html = urlopen(url) 
soup = BeautifulSoup(html, 'html.parser')
#soup = soup.find_all('ul')
scrap = soup.find_all('li')
for i in range(81):
    data = data.append({'Neighborhood':scrap[i].text},ignore_index=True)

In [5]:
data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agol,,
1,Ahmedabad Cantonment,,
2,Alam Roza,,
3,Ambawadi,,
4,Amraiwadi,,


### Getting coordinates of all the Neighborhoods using GEOPY
We require coordinates to find nearby venues from Foursquare API.

In [6]:
geolocator = Nominatim(user_agent="abd_explorer", timeout = 5)
for i in range(len(data['Neighborhood'])):
    address = data.at[i,'Neighborhood'] + ', Ahmedabad, Gujarat, India'
    location = geolocator.geocode(address)
    try:
        data.at[i,'Latitude'] = location.latitude
        data.at[i,'Longitude'] = location.longitude
        print(address + ' Latitude: {}, Longitude: {}'.format(location.latitude,location.longitude))
    except AttributeError:
        data.drop(i, inplace = True)
    

Agol, Ahmedabad, Gujarat, India Latitude: 23.1357599, Longitude: 72.2528448
Ahmedabad Cantonment, Ahmedabad, Gujarat, India Latitude: 23.0216238, Longitude: 72.5797068
Alam Roza, Ahmedabad, Gujarat, India Latitude: 22.9940973, Longitude: 72.5895788
Ambawadi, Ahmedabad, Gujarat, India Latitude: 23.0226117, Longitude: 72.5490834
Amraiwadi, Ahmedabad, Gujarat, India Latitude: 23.0057574, Longitude: 72.6269822
Anand Nagar (Ahmedabad), Ahmedabad, Gujarat, India Latitude: 23.0110669, Longitude: 72.5137432
Asarwa, Ahmedabad, Gujarat, India Latitude: 23.0472488, Longitude: 72.6088046
Bapunagar, Ahmedabad, Gujarat, India Latitude: 23.0362408, Longitude: 72.630339
Behrampura, Ahmedabad, Gujarat, India Latitude: 23.0035894, Longitude: 72.58384
Bhairavnath Road, Ahmedabad, Gujarat, India Latitude: 22.995457, Longitude: 72.5990646
Bopal, Ahmedabad, Gujarat, India Latitude: 23.02969935, Longitude: 72.46557868354866
Chandkheda, Ahmedabad, Gujarat, India Latitude: 23.1100643, Longitude: 72.5811205
Cha

In [7]:
data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agol,23.1358,72.2528
1,Ahmedabad Cantonment,23.0216,72.5797
2,Alam Roza,22.9941,72.5896
3,Ambawadi,23.0226,72.5491
4,Amraiwadi,23.0058,72.627


### Creating the map of Ahmedabad displaying all the Neighborhoods 

In [8]:
#Finding coordinates of Ahmedabad 
address_amd = 'Ahmedabad, Gujarat, India'
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [9]:
# create map of Ahmedabad using Folium
map_ahmedabad = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(data['Latitude'], data['Longitude'], data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ahmedabad)  
    
map_ahmedabad

**As it can be seen from the map above that 'Virochannagar', 'Detroj' and 'Agol' are too far from Ahmedabad City, so we will remove them.**

In [10]:
#Dropping the neighborhoods which are too far from Ahmedabad City, They are 'Virochannagar', 'Detroj', 'Agol'
outliers = ['Virochannagar', 'Detroj', 'Agol']
data = data[~data['Neighborhood'].isin(outliers)]  # OR data.drop(data[data['Neighborhood'].isin(outliers)].index) 
data.reset_index(inplace = True, drop = True)

In [11]:
# create map of Ahmedabad
map_ahmedabad = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(data['Latitude'], data['Longitude'], data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ahmedabad)  
    
map_ahmedabad

### Exploring a Neighborhood
Lets first explore a single Neighborhood for demonstration and understanding purpose. We will be exploring 'Ahmedabad Cantonment', as we already have its coordinates we will use them to fetch nearby venues using Foursquare API. To understand more about how Foursquare API work, I recommend you to read <a href='https://developer.foursquare.com/docs/places-api'>this</a> documentation.

In [12]:
#lets explore a bit
data.loc[0, 'Neighborhood']
neighborhood_latitude = data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Ahmedabad Cantonment are 23.0216238, 72.5797068.


In [13]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
client_id='CCIPBGWW43TTYVCKSROHSRQZFBZXWQUZD2HMUBDX2OKNQALG',
client_secret='JB3TB5PUQ1UAQJHFJKPZW0SL14VANKHG3UIPQAXBXLQRXSNL',
v='20180323',
ll=  str(neighborhood_latitude) +','+ str(neighborhood_longitude),
limit=100,
radius=500
)
resp = requests.get(url=url, params=params)
data_venues = json.loads(resp.text)

In [14]:
data_venues

{'meta': {'code': 200, 'requestId': '5e8ef1d1dd0f850028bca29e'},
 'response': {'headerLocation': 'Ahmedabad',
  'headerFullLocation': 'Ahmedabad',
  'headerLocationGranularity': 'city',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 23.026123804500006,
    'lng': 72.58458708112815},
   'sw': {'lat': 23.017123795499995, 'lng': 72.57482651887184}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4f03db6c9911be1fbce68d8b',
       'name': 'Victoria Garden',
       'contact': {},
       'location': {'address': 'Near Ellis Bridge',
        'lat': 23.022117010866076,
        'lng': 72.57916816070612,
        'labeledLatLngs': [{'label': 'display',
          'lat': 23.022117010866076,
          'lng': 72.57916816070612}],
        'distance': 77,
        'cc': 'IN',
   

**As we can see above, there are details regarding the nearby venues of 'Ahmedabad Cantonment' Neighborhood. We will now fetch the name of venue and its category.**

In [15]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [16]:
venues = data_venues['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Victoria Garden,Park,23.022117,72.579168
1,Sabarmati River,River,23.022164,72.579867
2,Bhadra Fort,Castle,23.023141,72.581765
3,Jai Shankar Sundri Hall,Art Gallery,23.019782,72.583598


**Above you can see the Venue name and Venue Category**

### Finding Venues and their Category for every Neighborhood
We will repeat the same process for every neighborhood which we did above for 'Ahmedabad Cantonment'.

In [17]:
#Foursquare Credentials
client_id='CCIPBGWW43TTYVCKSROHSRQZFBZXWQUZD2HMUBDX2OKNQALG'
client_secret='JB3TB5PUQ1UAQJHFJKPZW0SL14VANKHG3UIPQAXBXLQRXSNL'
VERSION = '20180605'
LIMIT = 100

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
ahmedabad_venues = getNearbyVenues(names=data['Neighborhood'],
                                   latitudes=data['Latitude'],
                                   longitudes=data['Longitude']
                                  )


Ahmedabad Cantonment
Alam Roza
Ambawadi
Amraiwadi
Anand Nagar (Ahmedabad)
Asarwa
Bapunagar
Behrampura
Bhairavnath Road
Bopal
Chandkheda
Chandlodiya
Dariapur (Ahmedabad)
Ghatlodiya
Ghodasar
Girdharnagar
Gita Mandir Road
Godhavi
Gomtipur
Gota, Gujarat
Isanpur
Jamalpur, Gujarat
Jawahar Chowk
Jodhpur, Gujarat
Juhapura
Kalupur
Khadia, Ahmedabad
Khokhra
Lambha
Makarba
Maninagar
Memnagar
Motera
Naranpura
Naroda
Nava Vadaj
Navarangpura
Odhav
Paldi
Rajpur Gomtipur
Ramol
Ranip
Sabarmati (area)
Sardarnagar
Sarkhej
Shahibaug
Shastrinagar
Subhash Bridge
Sukhrampura
Thaltej
Usmanpura
Vastral
Vastrapur
Vejalpur


In [20]:
print(ahmedabad_venues.shape)
ahmedabad_venues.head()

(240, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ahmedabad Cantonment,23.021624,72.579707,Victoria Garden,23.022117,72.579168,Park
1,Ahmedabad Cantonment,23.021624,72.579707,Sabarmati River,23.022164,72.579867,River
2,Ahmedabad Cantonment,23.021624,72.579707,Bhadra Fort,23.023141,72.581765,Castle
3,Ahmedabad Cantonment,23.021624,72.579707,Jai Shankar Sundri Hall,23.019782,72.583598,Art Gallery
4,Alam Roza,22.994097,72.589579,Food Gallery Restora,22.996895,72.588965,Asian Restaurant


In [21]:
print('There are {} uniques categories.'.format(len(ahmedabad_venues['Venue Category'].unique())))

There are 79 uniques categories.


### Finding top common venues for each Neighborhood
Now we will find top 10 common venues for each of the Neighborhood. First, we will perform one hot encoding and then group all the values by mean. After that, we will be able to know which venue category are most common by their frequency.

In [47]:
# one hot encoding
ahmedabad_onehot = pd.get_dummies(ahmedabad_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ahmedabad_onehot['Neighborhood'] = ahmedabad_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ahmedabad_onehot.columns[-1]] + list(ahmedabad_onehot.columns[:-1])
ahmedabad_onehot = ahmedabad_onehot[fixed_columns]

ahmedabad_onehot.head()

Unnamed: 0,Neighborhood,ATM,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Big Box Store,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Bus Station,Business Service,Café,Castle,Child Care Service,Chinese Restaurant,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Flea Market,Food Court,Food Truck,Furniture / Home Store,Garden,Gas Station,Gourmet Shop,Gym,Historic Site,History Museum,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Liquor Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Motel,Moving Target,Multiplex,Office,Optical Shop,Park,Pharmacy,Pier,Pizza Place,Platform,Rental Car Location,Rest Area,Restaurant,River,Sandwich Place,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Sporting Goods Shop,Tea Room,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Yoga Studio
0,Ahmedabad Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ahmedabad Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ahmedabad Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ahmedabad Cantonment,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alam Roza,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [49]:
ahmedabad_grouped = ahmedabad_onehot.groupby('Neighborhood').mean().reset_index()
ahmedabad_grouped.head()

Unnamed: 0,Neighborhood,ATM,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Big Box Store,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Bus Station,Business Service,Café,Castle,Child Care Service,Chinese Restaurant,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Flea Market,Food Court,Food Truck,Furniture / Home Store,Garden,Gas Station,Gourmet Shop,Gym,Historic Site,History Museum,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Liquor Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Motel,Moving Target,Multiplex,Office,Optical Shop,Park,Pharmacy,Pier,Pizza Place,Platform,Rental Car Location,Rest Area,Restaurant,River,Sandwich Place,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Sporting Goods Shop,Tea Room,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Yoga Studio
0,Ahmedabad Cantonment,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alam Roza,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ambawadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0
3,Amraiwadi,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Anand Nagar (Ahmedabad),0.0,0.0,0.0,0.0,0.0,0.117647,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.058824


In [51]:
ahmedabad_grouped.shape

(47, 80)

In [52]:
num_top_venues = 5

for hood in ahmedabad_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = ahmedabad_grouped[ahmedabad_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ahmedabad Cantonment----
         venue  freq
0  Art Gallery  0.25
1         Park  0.25
2        River  0.25
3       Castle  0.25
4          ATM  0.00


----Alam Roza----
                venue  freq
0      Ice Cream Shop  0.33
1    Asian Restaurant  0.33
2                Lake  0.33
3                 ATM  0.00
4  Mexican Restaurant  0.00


----Ambawadi----
                 venue  freq
0    Indian Restaurant  0.33
1  Sporting Goods Shop  0.33
2                 Café  0.33
3                  ATM  0.00
4   Mexican Restaurant  0.00


----Amraiwadi----
               venue  freq
0                ATM  0.75
1  Indian Restaurant  0.25
2        Men's Store  0.00
3               Park  0.00
4       Optical Shop  0.00


----Anand Nagar (Ahmedabad)----
                  venue  freq
0  Fast Food Restaurant  0.18
1     Indian Restaurant  0.12
2             BBQ Joint  0.12
3           Yoga Studio  0.06
4    Mexican Restaurant  0.06


----Bapunagar----
               venue  freq
0                ATM 

                venue  freq
0            Platform   0.5
1  Light Rail Station   0.5
2                 ATM   0.0
3  Mexican Restaurant   0.0
4                Park   0.0




Above you can see top most popular venue by frequency in each Neighborhood. Some of them have 0 frequency but that due to lack of venues in that particular region.

In [53]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [123]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = ahmedabad_grouped['Neighborhood']

for ind in np.arange(ahmedabad_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ahmedabad_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ahmedabad Cantonment,Park,Art Gallery,Castle,River,Donut Shop,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Dessert Shop
1,Alam Roza,Ice Cream Shop,Asian Restaurant,Lake,Yoga Studio,Farmers Market,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop
2,Ambawadi,Indian Restaurant,Sporting Goods Shop,Café,Yoga Studio,Electronics Store,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner
3,Amraiwadi,ATM,Indian Restaurant,Farmers Market,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store
4,Anand Nagar (Ahmedabad),Fast Food Restaurant,BBQ Joint,Indian Restaurant,Yoga Studio,Diner,Vegetarian / Vegan Restaurant,Clothing Store,Mexican Restaurant,Tea Room,Bakery


**Above you can see Top 10 most common venue for each Neighborhood, Now we will cluster them on the basis Venue Similarity**

### Clustering

In [124]:
# set number of clusters
kclusters = 3

ahmedabad_grouped_clustering = ahmedabad_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ahmedabad_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 2, 1, 2, 2, 0, 1, 0])

In [125]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [126]:
ahmedabad_merged = data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
ahmedabad_merged = ahmedabad_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
ahmedabad_merged.dropna(inplace = True)
ahmedabad_merged['Cluster Labels'] = ahmedabad_merged['Cluster Labels'].astype('int32')
ahmedabad_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ahmedabad Cantonment,23.0216,72.5797,0,Park,Art Gallery,Castle,River,Donut Shop,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Dessert Shop
1,Alam Roza,22.9941,72.5896,0,Ice Cream Shop,Asian Restaurant,Lake,Yoga Studio,Farmers Market,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop
2,Ambawadi,23.0226,72.5491,1,Indian Restaurant,Sporting Goods Shop,Café,Yoga Studio,Electronics Store,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner
3,Amraiwadi,23.0058,72.627,2,ATM,Indian Restaurant,Farmers Market,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store
4,Anand Nagar (Ahmedabad),23.0111,72.5137,1,Fast Food Restaurant,BBQ Joint,Indian Restaurant,Yoga Studio,Diner,Vegetarian / Vegan Restaurant,Clothing Store,Mexican Restaurant,Tea Room,Bakery


In [127]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ahmedabad_merged['Latitude'], ahmedabad_merged['Longitude'], ahmedabad_merged['Neighborhood'], ahmedabad_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining Each Clusters
There are 5 Clusters, now we will explore each cluster.

In [129]:
cluster1 = ahmedabad_merged.loc[ahmedabad_merged['Cluster Labels'] == 0, ahmedabad_merged.columns[[0] + list(range(4, ahmedabad_merged.shape[1]))]]
cluster1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ahmedabad Cantonment,Park,Art Gallery,Castle,River,Donut Shop,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Dessert Shop
1,Alam Roza,Ice Cream Shop,Asian Restaurant,Lake,Yoga Studio,Farmers Market,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop
9,Bopal,Park,Historic Site,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store
11,Chandlodiya,ATM,Train Station,Platform,Rental Car Location,Donut Shop,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Dessert Shop
12,Dariapur (Ahmedabad),Pier,Pizza Place,Dessert Shop,Yoga Studio,Donut Shop,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Diner
13,Ghatlodiya,Pharmacy,Department Store,Ice Cream Shop,Lake,Farmers Market,Convenience Store,Cupcake Shop,Dessert Shop,Diner,Donut Shop
14,Ghodasar,ATM,Liquor Store,Convenience Store,Rest Area,Smoke Shop,Optical Shop,Construction & Landscaping,Cupcake Shop,Department Store,Dessert Shop
15,Girdharnagar,Yoga Studio,History Museum,Vegetarian / Vegan Restaurant,Electronics Store,Construction & Landscaping,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner
16,Gita Mandir Road,Multiplex,Bus Station,Shopping Mall,Clothing Store,Tea Room,Fast Food Restaurant,Donut Shop,Convenience Store,Cupcake Shop,Department Store
19,"Gota, Gujarat",Tea Room,Yoga Studio,Farmers Market,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store


In [130]:
ser = pd.Series() 
temp_c = cluster1.drop('Neighborhood', 1)
for c in temp_c.columns:
    temp = temp_c[c].value_counts()
    ser = ser.append(temp)
bar_cluster1 = ser.to_frame()
bar_cluster1.reset_index(inplace = True)
bar_cluster1.columns = ['Venue','Counts']
bar_cluster1 = bar_cluster1.groupby('Venue').sum()
fig = px.bar(bar_cluster1, x=bar_cluster1.index, y='Counts', labels={'x':'Venue Category'})
fig.show()

In [131]:
cluster2 = ahmedabad_merged.loc[ahmedabad_merged['Cluster Labels'] == 1, ahmedabad_merged.columns[[0] + list(range(4, ahmedabad_merged.shape[1]))]]
cluster2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Ambawadi,Indian Restaurant,Sporting Goods Shop,Café,Yoga Studio,Electronics Store,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner
4,Anand Nagar (Ahmedabad),Fast Food Restaurant,BBQ Joint,Indian Restaurant,Yoga Studio,Diner,Vegetarian / Vegan Restaurant,Clothing Store,Mexican Restaurant,Tea Room,Bakery
10,Chandkheda,Ice Cream Shop,Pizza Place,Indian Restaurant,Farmers Market,Yoga Studio,Electronics Store,Convenience Store,Cupcake Shop,Department Store,Dessert Shop
20,Isanpur,Dessert Shop,Indian Restaurant,Business Service,Yoga Studio,Farmers Market,Cupcake Shop,Department Store,Diner,Donut Shop,Electronics Store
25,Kalupur,Indian Restaurant,Clothing Store,Train Station,Asian Restaurant,Men's Store,Farmers Market,Cupcake Shop,Department Store,Dessert Shop,Diner
27,Khokhra,IT Services,Indian Restaurant,Yoga Studio,Farmers Market,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop
30,Maninagar,Fast Food Restaurant,Ice Cream Shop,Pizza Place,Electronics Store,Indian Restaurant,Clothing Store,Furniture / Home Store,Food Truck,Food Court,Flea Market
31,Memnagar,Fast Food Restaurant,Indian Restaurant,Electronics Store,Hotel,Pizza Place,Snack Place,Convenience Store,Cupcake Shop,Department Store,Dessert Shop
33,Naranpura,History Museum,Indian Restaurant,Farmers Market,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store
34,Naroda,Breakfast Spot,Multiplex,Indian Restaurant,Food Court,Diner,Fast Food Restaurant,Yoga Studio,Cupcake Shop,Department Store,Dessert Shop


In [132]:
ser = pd.Series() 
temp_c = cluster2.drop('Neighborhood', 1)
for c in temp_c.columns:
    temp = temp_c[c].value_counts()
    ser = ser.append(temp)
bar_cluster2 = ser.to_frame()
bar_cluster2.reset_index(inplace = True)
bar_cluster2.columns = ['Venue','Counts']
bar_cluster2 = bar_cluster2.groupby('Venue').sum()
fig = px.bar(bar_cluster2, x=bar_cluster2.index, y='Counts', labels={'x':'Venue Category'})
fig.show()

In [133]:
cluster3 = ahmedabad_merged.loc[ahmedabad_merged['Cluster Labels'] == 2, ahmedabad_merged.columns[[0] + list(range(4, ahmedabad_merged.shape[1]))]]
cluster3

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Amraiwadi,ATM,Indian Restaurant,Farmers Market,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store
6,Bapunagar,ATM,Indian Restaurant,Bakery,Fast Food Restaurant,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store
8,Bhairavnath Road,ATM,Indian Restaurant,Pizza Place,Breakfast Spot,Electronics Store,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner
17,Godhavi,ATM,Farmers Market,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Fast Food Restaurant
40,Ramol,ATM,Moving Target,Historic Site,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store
48,Sukhrampura,ATM,Farmers Market,Convenience Store,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Fast Food Restaurant


In [137]:
ser = pd.Series() 
temp_c = cluster3.drop('Neighborhood', 1)
for c in temp_c.columns:
    temp = temp_c[c].value_counts()
    ser = ser.append(temp)
bar_cluster3 = ser.to_frame()
bar_cluster3.reset_index(inplace = True)
bar_cluster3.columns = ['Venue','Counts']
bar_cluster3 = bar_cluster3.groupby('Venue').sum()
fig = px.bar(bar_cluster3, x=bar_cluster3.index, y='Counts', labels={'x':'Venue Category'})
fig.show()

### Conclusion
At the end we were able to form 3 Clusters. We can see top 10 most common venuees in each of the clusters. From the Bar Chart, we can see that in Cluster-1, there are a lot of Department Stores,Cupcake Shop & Yoga Studio. In Cluster-2, India Restaurant seems to be the most common. Cluster-3 is pretty mixed but unlike other clusters, ATM is the most common. Apart from the common venues, distinct venues can also be seen each of the Clusters. For example, Construction & Landscaping in Cluster-1, Fast Food Restaurant & Farmers Market in Cluster 2.