# The Battle of Neighborhoods : A location recommendation for opening up a new Vegan Restaurant in Bengaluru city

This notebook contains 2 sections :

1. Introduction/Business Problem
2. Data Anatomization

## Introduction

The restaurants have been a most profitable now a days. To mention specifically, the metropolitan city like Bengaluru always makes a top place when it comes to breakfast items like Idly, Dosa etc. A Bengalurian is always aware of the crowd at restaurants for morning coffee at 6. So, a Vegetarian restaurant makes a good sound about profitable business. 

## Business Problem

A person is wishing to open a new Vegetarian Restaurant in Bengaluru city and is in a confusion on which is the right place to invest so it would be a profitable business for him. Now, when we speak about the good place to start, it is known that the city is already having many such restaurants and in that locality the estimation of profit would not be as expected. So, the best way to resolve this issue is by looking for a location/neighborhood that has less similar cuisines.

## Data Anatomization

To tackle the above-mentioned problem, we need to have the dataset that contains
1.	All the neighbourhoods of Bengaluru city.
2.	Latitude and longitudes of all the neighbourhoods in city.

The page   https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Bangalore is the major source of data that is being used to obtain the neighbourhoods of Bengaluru. We then use beautifulsoup4 package, a Python module that helps to scrape information from the web pages to extract all the tables from this Wikipedia page and convert it into a pandas data frame. Then the data is cleaned by removing the unwanted cells and then we use Python’s geopy package to obtain the latitude and longitude of all the neighbourhoods present in the data frame. The obtained coordinates are then merged with the main data frame with list comprehension operation. 

Once the coordinates are obtained, we use Foursquare API to extract the venues using client credentials along with the version, radius and limit values and to cluster them based on preferences we use kmeans clustering. We also use folium map to display the clusters geographically.


#### Importing required libraries and modules

In [1]:
import numpy as np


import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from bs4 import BeautifulSoup # this module helps in web scrapping.


import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!pip install geocoder
import geocoder

import requests  # this module helps us to download a web page

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
#!conda install -c conda-forge folium=0.5.0 --yes 
import folium

# import k-means from clustering stage
from sklearn.cluster import KMeans



print("Successfully imported libraries!")

Successfully imported libraries!


#### Web Scraping using BeautifulSoup to extract the data from wiki page

In [2]:
url = "https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Bangalore"
data = requests.get(url).text

In [3]:
#create a soup object of the class BeautifulSoup

soup = BeautifulSoup(data, 'html5lib')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
# create a new DataFrame from the list
df_blr = pd.DataFrame({"Neighborhood": neighborhoodList})
df_blr.head(10)

Unnamed: 0,Neighborhood
0,List of areas in Bangalore Cantonment
1,List of areas in Bengaluru Pete
2,List of neighbourhoods in Bangalore
3,Adugodi
4,"Agara, Bangalore"
5,Ananthnagar
6,Anjanapura
7,Arekere
8,Austin Town
9,Babusapalya


#### Cleaning the data frame by removing unwanted data from cells

In [7]:
#cleaning the data by removing top 3 rows as they are not supposed to be our dataset's part and then resetting the index, a fresh data is obtained
df_blr.drop(df_blr.index[:3],inplace=True)
df_blr=df_blr.reset_index()
del df_blr['index']

In [8]:
df_blr.head(10)

Unnamed: 0,Neighborhood
0,Adugodi
1,"Agara, Bangalore"
2,Ananthnagar
3,Anjanapura
4,Arekere
5,Austin Town
6,Babusapalya
7,"Bagalur, Bangalore Urban"
8,Bahubalinagar
9,Baiyyappanahalli


#### The total neighborhoods in Bengaluru

In [9]:
#the total neighborhoods 
print("The total number of neighborhoods in Bangalore is {}".format(df_blr.shape[0]))

The total number of neighborhoods in Bangalore is 140
