# The Battle Of Neighbourhoods


# Introduction
According to https://www.globenewswire.com/: The "India Multiplex Market, By Screen Type (Classic Vs. Premium), By Region, By Major States, Competition, Forecast & Opportunities, 2014 - 2024" report has been added to ResearchAndMarkets.com's offering.

The Indian Multiplex Market stood at 2950 screens in 2018 and is projected to grow at a CAGR of over 7% to surpass 4500 screens by 2024.

Growth in the Indian Multiplex Market can be attributed to increasing youth population, growing urbanization and hence, rising demand for better infrastructure and enhanced facilities in the cinemas across the country which is leading to development of multiplex market in India.

Additionally, increasing disposable income of middle-class urban population, especially in tier-I and tier-II cities has led to a shift in consumption pattern from savings to spending owing to which people are willing to pay extra for privacy and comfort. This factor is further pushing the market for multiplex in India.

Some of the leading players in the Indian Multiplex Market are PVR Cinemas, INOX Leisure Limited, Carnival Cinemas, Cinepolis India, and SRS Cinemas

In India, the Visakhapatnam city also known as the jewel of East Coast is recently crowned as the Executive Capital of Andhra Pradesh. A lot govenment plans are in progress for its further development. Increasing youth population, growing urbanization and hence, rising demand for better infrastructure and enhanced facilities in the cinemas across vi which is leading to development of multiplex market in India.

In this report we will be finding the best cluster of neihbourhoods for opening a new multiplex in Visakhapatnam,India


In [1]:
import numpy as np # library to handle data in a vectorized manner
# !pip install geocoder
import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
# !pip install folium
import json # library to handle JSON files
# !conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
!pip install geocoder
import geocoder # to get coordinates
# !pip install bs4
import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents



In [2]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
!pip install folium
import folium # map rendering library

print("Libraries imported.")

Libraries imported.


# Getting the Data

Build a dataframe of neighborhoods in Visakhapatnam, India by web scraping the data from Wikipedia page

Get the geographical coordinates of the neighborhoods by Python Geocoder package

Obtain the venue data for the neighborhoods from Foursquare API

Explore and cluster the neighbourhoods

Select the best cluster to open a new shopping mall

# Business Problem

This project is mainly focused on geospatial analysis of the Visakhapatnam City to understand which would be the best place to open a new Multiplex

In [3]:
#the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Visakhapatnam").text

In [4]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [5]:
#list to store neighbourhood data
nList = []

In [6]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    nList.append(row.text)

In [7]:
# a DataFrame from the list
Vn_df = pd.DataFrame({"Neighbourhood": nList})
Vn_df.head()

Unnamed: 0,Neighbourhood
0,Abidnagar
1,Adarsh Nagar
2,Adavivaram
3,Aganampudi
4,Akkayyapalem


In [8]:
# print the number of rows of the dataframe
Vn_df.shape

(125, 1)

In [9]:
# a function to get coordinates
def ltlg(neighbourhood):
    g = geocoder.arcgis('{}, Visakhapatnam, India'.format(neighbourhood))
    lt_lg_cds = g.latlng
    return lt_lg_cds

In [10]:
# calling function to get coordinates
cds = [ ltlg(neighbourhood) for neighbourhood in Vn_df["Neighbourhood"].tolist() ]

In [11]:
#temporary dataframe to store coordinates
df_cds = pd.DataFrame(cds, columns=['latitude', 'longitude'])

In [12]:
# merging coordinates into the original dataframe
Vn_df['latitude'] = df_cds['latitude']
Vn_df['longitude'] = df_cds['longitude']
print(Vn_df.shape)
Vn_df

(125, 3)


Unnamed: 0,Neighbourhood,latitude,longitude
0,Abidnagar,17.73786,83.29888
1,Adarsh Nagar,17.76391,83.33169
2,Adavivaram,17.78583,83.25242
3,Aganampudi,17.68904,83.13988
4,Akkayyapalem,17.73421,83.29713
5,Akkireddypalem,17.70872,83.20904
6,Allipuram,17.72027,83.29758
7,Anakapalle,17.68984,83.00175
8,Anandapuram,17.87772,83.30459
9,Appikonda,17.59629,83.20243


In [13]:
# getting the coordinates of Visakhapatnam
address = 'Visakhapatnam, India'

geolocator = Nominatim(user_agent="V-explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Visakhapatnam, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Visakhapatnam, India 17.7231276, 83.3012842.


In [14]:
# creating map of Visakhapatnam
V_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers
for lt, lg, n in zip(Vn_df['latitude'], Vn_df['longitude'], Vn_df['Neighbourhood']):
    label = '{}'.format(n)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lt, lg],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(V_map)  
    
V_map

# Use the foursquare API to explore the neighbourhoods

In [15]:
CLIENT_ID = 'YVRA1IT05SEME4EOMARWW5I1HY4S1CRZABBXN2KMWUGZ2CTY' # your Foursquare ID
CLIENT_SECRET = 'S5QE0R5FMGNUFM5CDQJ3RYLWI4FFJY0RBQ3DT03FXJ41G0E3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [16]:
radius = 2000
limit = 100

venues = []

for lt, lg, n in zip(Vn_df['latitude'], Vn_df['longitude'], Vn_df['Neighbourhood']):
    
    # createing API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lt,
        lg,
        radius, 
        limit)
    
    # making GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # appending only relevant information for each venue
    for venue in results:
        venues.append((
            n,
            lt, 
            lg, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [17]:
# convert the venues list into a new DataFrame
vv_df = pd.DataFrame(venues)

# define the column names
vv_df.columns = ['Neighbourhood', 'latitude', 'longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(vv_df.shape)
vv_df.head()

(2411, 7)


Unnamed: 0,Neighbourhood,latitude,longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abidnagar,17.73786,83.29888,Sangam Sarat Theatre,17.725508,83.302463,Indie Movie Theater
1,Abidnagar,17.73786,83.29888,Pizza Hut,17.72665,83.305531,Pizza Place
2,Abidnagar,17.73786,83.29888,Sai Ram Parlour,17.726339,83.303465,Indian Restaurant
3,Abidnagar,17.73786,83.29888,Shoppers Stop,17.729061,83.314433,Fabric Shop
4,Abidnagar,17.73786,83.29888,Deepak Punjabi Dhaba,17.723782,83.309922,Indian Restaurant


In [18]:
j=vv_df.groupby(["Neighbourhood"]).count().reset_index()
print(j.shape)
j.head()

(111, 7)


Unnamed: 0,Neighbourhood,latitude,longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abidnagar,16,16,16,16,16,16
1,Adarsh Nagar,6,6,6,6,6,6
2,Adavivaram,1,1,1,1,1,1
3,Aganampudi,4,4,4,4,4,4
4,Akkayyapalem,14,14,14,14,14,14



#### The above table is showing only 111 neighbourhoods out of 125 neighbourhoods of the Visakhapatnam city implying that the remaining 14 of its neighbourhoods doesn't have any popular venues associated them according to foursquare database . As the neighbourhoods associated with unreturned venues are still part of Visakhapatnam city and are also crucial in the effective decison making of a suitable neighbourhood for a Mutilpex establishment ,they are added to the visakhapatam venues data frame as venues. 

In [19]:
vv_df=Vn_df.join(vv_df.set_index('Neighbourhood'),on='Neighbourhood', lsuffix='', rsuffix='x')
vv_df.drop(['latitudex','longitudex'],axis=1,inplace=True)
vv_df['VenueName']=vv_df['VenueName'].replace(np.nan,'None')
vv_df[['VenueLatitude','VenueLongitude']]=vv_df[['VenueLatitude','VenueLongitude']].fillna(0)
vv_df[['VenueCategory']]=vv_df[['VenueCategory']].fillna('no popular venues')


# Now Lets check how many venues were returned for each neighbourhood

In [20]:
t=vv_df.groupby(["Neighbourhood"]).count().reset_index() 
print(t.shape)
t.head()

(125, 7)


Unnamed: 0,Neighbourhood,latitude,longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abidnagar,16,16,16,16,16,16
1,Adarsh Nagar,6,6,6,6,6,6
2,Adavivaram,1,1,1,1,1,1
3,Aganampudi,4,4,4,4,4,4
4,Akkayyapalem,14,14,14,14,14,14


# Let's find out how many unique categories can be curated from all the returned venues

In [21]:
print('There are {} uniques categories.'.format(len(vv_df['VenueCategory'].unique())))

There are 109 uniques categories.


In [22]:
vv_df['VenueCategory'].unique() #displays all the category names

array(['Indie Movie Theater', 'Pizza Place', 'Indian Restaurant',
       'Fabric Shop', 'Café', 'Shopping Mall',
       'Vegetarian / Vegan Restaurant', 'Park', 'Multiplex',
       'Cricket Ground', 'Platform', 'Stadium', 'Volleyball Court',
       'Pharmacy', 'Moving Target', 'Bus Station', 'Coffee Shop',
       'Mountain', 'Historic Site', 'ATM', 'IT Services', 'Train Station',
       'Bookstore', 'Food', 'Dessert Shop', 'Drive-in Theater',
       'Ice Cream Shop', 'Hotel', 'Italian Restaurant', 'Clothing Store',
       'Fast Food Restaurant', 'Restaurant', 'Bakery', 'Tea Room',
       'Movie Theater', 'Lake', 'no popular venues', 'Golf Course',
       'Snack Place', 'Beach', 'Multicuisine Indian Restaurant',
       'Breakfast Spot', 'Food Court', 'Steakhouse', 'Juice Bar',
       'Department Store', 'Sandwich Place', 'Gift Shop',
       'Convenience Store', 'Bowling Alley', 'Spa', 'Andhra Restaurant',
       'Dhaba', 'Health Food Store', 'Harbor / Marina',
       'Sporting Goods Sho

In [23]:
# check if the results contain "Multiplex"
"Multiplex" in vv_df['VenueCategory'].unique()

True

In [24]:
# p=pd.DataFrame({'Neighbourhood':vv_df['Neighbourhood'],'catergory':vv_df['VenueCategory']})
# p=p[p['Neighbourhood']=='Abidnagar']
# p.groupby(['catergory']).count()


In [25]:
#count of each category
s=vv_df.pivot_table(index=['VenueCategory'], aggfunc='size')
s=s.to_frame().reset_index()
s.columns=['category','count']
s.sort_values(['count'],ascending=False)
#We see that at present there are 111 Multiplexes

Unnamed: 0,category,count
58,Indian Restaurant,273
55,Hotel,173
20,Café,171
85,Restaurant,115
73,Multiplex,111
92,Shopping Mall,95
59,Indie Movie Theater,92
38,Fast Food Restaurant,89
57,Ice Cream Shop,86
40,Food Court,63


# Analyze each neighbourhood

In [26]:
# one hot encoding
vv_onehot = pd.get_dummies(vv_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vv_onehot['Neighbourhoods'] = vv_df['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [vv_onehot.columns[-1]] + list(vv_onehot.columns[:-1])
vv_onehot = vv_onehot[fixed_columns]

print(vv_onehot.shape)
vv_onehot.head(20)

(2425, 110)


Unnamed: 0,Neighbourhoods,ATM,African Restaurant,Airport,American Restaurant,Andhra Restaurant,Antique Shop,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beach,Beer Garden,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Bus Station,Bus Stop,Business Service,Cafeteria,Café,Campground,Candy Store,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Donut Shop,Drive-in Theater,Electronics Store,Fabric Shop,Farmers Market,Fast Food Restaurant,Food,Food Court,Food Truck,Gaming Cafe,Garden Center,Gastropub,Gift Shop,Go Kart Track,Golf Course,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Health Food Store,Historic Site,Hockey Arena,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mattress Store,Men's Store,Motel,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Pub,Resort,Rest Area,Restaurant,River,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Shopping Mall,Smoke Shop,Smoothie Shop,Snack Place,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Stadium,Steakhouse,Tea Room,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Volleyball Court,Women's Store,no popular venues
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
vv_grouped = vv_onehot.groupby(["Neighbourhoods"]).sum().reset_index()
print(vv_grouped.shape)
vv_grouped

(125, 110)


Unnamed: 0,Neighbourhoods,ATM,African Restaurant,Airport,American Restaurant,Andhra Restaurant,Antique Shop,Asian Restaurant,Athletics & Sports,Bakery,Bar,Beach,Beer Garden,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Bus Station,Bus Stop,Business Service,Cafeteria,Café,Campground,Candy Store,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Donut Shop,Drive-in Theater,Electronics Store,Fabric Shop,Farmers Market,Fast Food Restaurant,Food,Food Court,Food Truck,Gaming Cafe,Garden Center,Gastropub,Gift Shop,Go Kart Track,Golf Course,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Health Food Store,Historic Site,Hockey Arena,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Lake,Light Rail Station,Lounge,Market,Mattress Store,Men's Store,Motel,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Pub,Resort,Rest Area,Restaurant,River,Salad Place,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Shopping Mall,Smoke Shop,Smoothie Shop,Snack Place,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Stadium,Steakhouse,Tea Room,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Volleyball Court,Women's Store,no popular venues
0,Abidnagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0
1,Adarsh Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Adavivaram,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Aganampudi,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,Akkayyapalem,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0
5,Akkireddypalem,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Allipuram,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,1,5,4,2,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,2,3,0,0,0,1,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,1,0,0,2,1,0,0,0
7,Anakapalle,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
8,Anandapuram,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Appikonda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [28]:
vv_grouped['Multiplex'].sum() #So there are 111 multiplexes in visakhapatnam

111

In [29]:
len((vv_grouped[vv_grouped["Multiplex"] > 0])) #but these 111 multiplexes are limited to just 41 neighbourhoods out of 125 popular neighbourhoods of Visakhapatnam

41

# Create a dataframe for Multiplex data only

In [30]:
vm_df = vv_grouped[["Neighbourhoods","Multiplex"]]

In [31]:
# t=vm_df.sort_values(['Multiplex'],ascending=False).reset_index()
# qw=pd.DataFrame({"No. of multiplexes":t['Multiplex'],"count":t['Multiplex']})
# qw.groupby(["No. of multiplexes"]).count() 

# Now cluster the neighbourhoods

Run k-means to cluster the neighborhoods in Visakhapatnam into 4 clusters.

In [32]:
# set number of clusters
kclusters = 4

vm_clustering = vm_df.drop(["Neighbourhoods"], 1)

# run k-means clustering
kmeans = KMeans(init="k-means++", n_clusters=kclusters, n_init=12).fit(vm_clustering)

# check cluster labels generated for each row in the dataframe
qw=pd.DataFrame({"labels":kmeans.labels_,"count":kmeans.labels_})
qw.groupby(["labels"]).count()

Unnamed: 0_level_0,count
labels,Unnamed: 1_level_1
0,23
1,84
2,8
3,10


In [33]:
# create a new dataframe that includes the cluster as well 
vm_merged = vm_df.copy()

# add clustering labels
vm_merged["Cluster Labels"] = kmeans.labels_

In [34]:
vm_merged.rename(columns={"Neighbourhoods": "Neighbourhood","Multiplex":"Total multiplexes"}, inplace=True)
vm_merged.head()

Unnamed: 0,Neighbourhood,Total multiplexes,Cluster Labels
0,Abidnagar,1,0
1,Adarsh Nagar,0,1
2,Adavivaram,0,1
3,Aganampudi,0,1
4,Akkayyapalem,0,1


In [35]:
#Add latitude and longitude values by using the join operation(the new dataframe with the old dataframe containing the latitude and longitude values)
vm=Vn_df.join(vm_merged.set_index('Neighbourhood'), on='Neighbourhood')

In [36]:
# sorting the results by Cluster Labels
print(vm.shape)
vm.sort_values(["Cluster Labels"], inplace=True)
vm

(125, 5)


Unnamed: 0,Neighbourhood,latitude,longitude,Total multiplexes,Cluster Labels
0,Abidnagar,17.73786,83.29888,1,0
79,Peda Waltair,17.731494,83.334313,2,0
32,Dwaraka Nagar,17.73579,83.30378,2,0
77,"Pandurangapuram, Visakhapatnam",17.71797,83.32847,1,0
42,HB Colony,17.74527,83.32324,2,0
43,Isukathota,17.74201,83.32677,2,0
95,Resapuvanipalem,17.73369,83.31589,2,0
22,Chinna Waltair,17.72672,83.33061,2,0
100,Sankara Matam Road,17.731642,83.301744,2,0
19,Chengal Rao Peta,17.69335,83.29211,1,0


In [37]:
vm['Total multiplexes'].max()

6

# Now we visualize the resulting clusters

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vm['latitude'], vm['longitude'], vm['Neighbourhood'], vm['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examine clusters

### Cluster 4 -  Neighbourhoods with total 1-2 multiplexes 

In [39]:
s=vm.loc[vm['Cluster Labels'] == 0]
print(s.shape)
s
# 23 neighbourhoods in this cluster 0

(23, 5)


Unnamed: 0,Neighbourhood,latitude,longitude,Total multiplexes,Cluster Labels
0,Abidnagar,17.73786,83.29888,1,0
79,Peda Waltair,17.731494,83.334313,2,0
32,Dwaraka Nagar,17.73579,83.30378,2,0
77,"Pandurangapuram, Visakhapatnam",17.71797,83.32847,1,0
42,HB Colony,17.74527,83.32324,2,0
43,Isukathota,17.74201,83.32677,2,0
95,Resapuvanipalem,17.73369,83.31589,2,0
22,Chinna Waltair,17.72672,83.33061,2,0
100,Sankara Matam Road,17.731642,83.301744,2,0
19,Chengal Rao Peta,17.69335,83.29211,1,0


### Cluster 4 -  Neighbourhoods with total of 0 multiplexes 

In [40]:
s=vm.loc[vm['Cluster Labels'] == 1]
print(s.shape)
s
# 84 neighbourhoods in this cluster 1

(84, 5)


Unnamed: 0,Neighbourhood,latitude,longitude,Total multiplexes,Cluster Labels
80,Pedagantyada,17.6668,83.21039,0,1
64,Mulagada,17.69859,83.22464,0,1
65,Muralinagar,17.74794,83.26313,0,1
78,Parawada,17.62929,83.0798,0,1
75,One Town (Visakhapatnam),17.71984,83.26278,0,1
74,Nidigattu,17.87796,83.37371,0,1
73,Template:Neighbourhoods of Visakhapatnam,17.71984,83.26278,0,1
67,NAD X Road,17.71984,83.26278,0,1
72,Nathayyapalem,17.71099,83.20239,0,1
68,Nadupuru,17.67263,83.19407,0,1


### Cluster 3 -  Neighbourhoods with total 5-6 multiplexes 

In [41]:
s=vm.loc[vm['Cluster Labels'] == 2] 
print(s.shape)
s
# 23 neighbourhoods in this cluster 

(8, 5)


Unnamed: 0,Neighbourhood,latitude,longitude,Total multiplexes,Cluster Labels
18,CBM Compound,17.72456,83.31024,5,2
26,Daspalla Hills,17.718191,83.317069,5,2
106,"Siripuram, Visakhapatnam",17.72121,83.31686,5,2
121,Waltair Main Road,17.721832,83.316298,5,2
11,Asilmetta,17.72276,83.31078,6,2
91,"Ramnagar, Visakhapatnam",17.72119,83.30907,5,2
90,Rama Talkies Road,17.723706,83.308206,6,2
122,Waltair Uplands,17.722217,83.314525,5,2


### Cluster 4 -  Neighbourhoods with total 3 multiplexes 

In [42]:
s=vm.loc[vm['Cluster Labels'] == 3] 
print(s.shape)
s
# 10 neighbourhoods in this cluster 3

(10, 5)


Unnamed: 0,Neighbourhood,latitude,longitude,Total multiplexes,Cluster Labels
59,Maharanipeta,17.70829,83.31033,3,3
87,Prakashraopeta,17.716197,83.306925,3,3
110,Suryabagh,17.71191,83.29994,3,3
44,Jagadamba Centre,17.71584,83.3075,3,3
25,Daba Gardens,17.71853,83.30433,3,3
94,Relli Veedhi,17.70327,83.30316,3,3
84,Poorna Market,17.70683,83.29814,3,3
30,Dondaparthy,17.72661,83.29744,3,3
116,Velampeta,17.705771,83.297902,3,3
45,Jalari Peta,17.70027,83.30373,3,3


# Final observation

### A good number of Multiplexes are concentrated in the central area of Visakhapatnam city, with the highest number in cluster 2 and moderate number in cluster 3 . This represents a great opportunity and high potential areas to open new Multiplexes as there is very little to no competition from existing Multiplexes in areas of clusters 1 and 0  . Meanwhile, Multiplexes in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. Therefore, this project recommends property stakeholders to capitalize on these findings to open new Multiplexes in neighbourhoods in cluster 0 with no competition. Property stake holders with unique selling propositions to stand out from the competition can also open new shopping malls in neighbourhoods in cluster 1 and 3 with little and moderate competition respectively. Lastly, property developers are advised to avoid neighbourhoods in cluster 2 which already have a high concentration of shopping malls and suffering from intense competition.

### We also see that the existing 111 Multiplexes of visakhapatnam are limited to just 41 neighbourhoods out of its total 125 neighbours is an important insight that about 75% of neighbourhoods are void of these Multiplex inftastructures


