# Capstone Project - The Battle of the Neighborhoods (Week 1)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem  <a name="introduction"></a>##

#### Prospects of starting a Restaurant cum Catering service by inspecting the Zones of Chennai ####

Chennai being one of the metropolitan areas, is one of the growing IT hubs of India. With a population of 8.7 million people (86,96,010) in an area of 426 $km^2$, the city of Chennai also has a lot of leading industries including automobile, textile, petrochemicals, and hardware manufacturing. All of this makes it a potential place to start a new business. 

While looking for places to open a business, we need to select the busiest zones in Chennai where a constant crowd is guaranteed. In a city like Chennai there will be a huge competition for businesses. Keeping this in mind, the surrounding of the selected zones should not have a lot of similar businesses as ours. Analyzing the office areas of the zones, it is expected that there will be a lot of restaurants. But a catering service is an idea which is not much explored in these areas. So, opening up a catering service which will also operate as a restaurant will be a brilliant idea to try.

While looking for places to open a business, we need to select the busiest zones in Chennai where a constant crowd is guaranteed. In a city like Chennai there will be a huge competition for businesses. Keeping this in mind, the surrounding of the selected zones should not have a lot of similar businesses as ours. Analyzing the office areas of the zones, it is expected that there will be a lot of restaurants. But a catering service is an idea which is not much explored in these areas. So, opening up a catering service which will also operate as a restaurant will be a brilliant idea to try.

The Business Problem can be stated as:

#### What is the best place to open a Restaurant-cum-Catering Service in Chennai?###

<img src='http://www.hcmadras.tn.nic.in/image/mhc-twilight-view.jpg'/>

#### Target Audience: ####

* The primary target audience for this project are definitely the entrepreneurs who want to open up a new business
* Investors who want to invest in good business ideas
* Offices in the locality of the business who will be interested in a contract-based catering service or employees who are interested in placing a catering order
* Students who are exploring Data Science and are trying to learn the art of telling a story by training, analyzing and learning from a data

## Data: Requirements and collection   <a name="data"></a>##
To open a business in an area, one needs to analyze the area, based on the average land prices, housing prices, most frequent venues, target audience, the competition and many other factors. 
In this project, the data requirements and collection are as below:

### Zones Data (along with Coordinates) ###


*    **Requirement:** There are 15 zones in Chennai with a total of 200 wards. The basic data required to start this project is the names of all these Zones along with their coordinates
*    **ollection:** Web scape the data of Zones of Chennai using **‘BeautifulSoup’**. Use **‘Python Geocoder’** to get the latitude and longitude values of these zones.

### Professional Venue Data ###


*   **Requirement:** From these 15 zones we need to find out which zones have the most professional venues like offices, hospitals, industries, factories etc. In other words, we need to know in which zones we will have a constant flow of people (customers).
*   **Collection:** Using **‘Foursquare’** by giving a specific category ID we can find the most frequent professional venues in these 15 zones.

### Nearby Venues Data ###


*   **Requirement:** We need to have an idea about the competition before we open a business. So, we need data about the most frequent venues nearby each selected zone.
*   **Collection:** Explore the zones using **‘Foursquare’**

### Pricing Data ###


*    **Requirement:** Pricing data will help us in two ways:
By giving us an estimate of the price values if you want to buy the land or rent it for the business.
By giving us an idea about what kind of resident customers we are dealing with
*    **Collection:** Websites have pricing data for all zones of Chennai. (It is generally difficult to find accurate pricing data.)

## Step by step lets collect the data. ##

#### Zones Data (along with Coordinates) ####

Data about the zones of Chennai can be obtained [<b>from this website</b>](https://en.wikipedia.org/wiki/List_of_Chennai_Corporation_zones) using Web scraping. [<b>BeautifulSoup</b>](https://beautiful-soup-4.readthedocs.io/en/latest/) is a python library used to scrape data from HTML and XML files. It works along with a parser (lxml parser is being used here).

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import folium

 ##### Get the  data and parse this to Lxml #####

In [2]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_Chennai_Corporation_zones').text
soup = BeautifulSoup(source, 'lxml')

##### 'find' method can be used to find the required table with the class = 'wikitable sortable'. In this table, using the inspect feature of the browser, we can see that the rows are under a 'tr' tag. To get all the rows in the table we use 'find_all' method. #####

In [3]:
table = soup.find('table', class_ = 'wikitable sortable')
rows = table.find_all('tr')

##### In each row, each column text is under the tag 'td'. So we strip the text out of the column as follows.#####

In [4]:
locations = []
for row in rows:
    col = row.find_all('td')
    col = [x.text.strip() for x in col]
    try:
        locations.append(col[1])
    except:
        locations.append(0)
del locations[0]

##### Create an empty Data Frame and add the above generated columns to it #####

In [5]:
df_chennai = pd.DataFrame()
df_chennai['Location'] = locations
df_chennai.head()

Unnamed: 0,Location
0,Thiruvottiyur
1,Manali
2,Madhavaram
3,Tondiarpet
4,Royapuram


##### We now got all the zone names. Inorder to get the coordinates of these zones we will use 'Geocoders'. #####

In [6]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Chennai_explorer")
df_chennai['Latitude']= df_chennai['Location'].apply(geolocator.geocode).apply(lambda x: (x.latitude))

In [7]:
df_chennai['Longitude']= df_chennai['Location'].apply(geolocator.geocode).apply(lambda x: (x.longitude))
df_chennai.head()

Unnamed: 0,Location,Latitude,Longitude
0,Thiruvottiyur,13.172222,80.304585
1,Manali,32.245461,77.187293
2,Madhavaram,13.142931,80.232517
3,Tondiarpet,13.127767,80.289585
4,Royapuram,13.114619,80.294028


#### Professional Venue Data ####
Lets use Foursquare to find out which areas have the most professional venues. 
To do this, lets get all the professional venues in a radies of 1km from the each zone by using the Profesional venues [<b>category ID</b>](https://developer.foursquare.com/docs/resources/categories). 

##### To get access about the venues in the neighbourhoods we use Foursquare through the client details. #####

In [8]:
CLIENT_ID = 'Your Foursquare ID' # your Foursquare ID
CLIENT_SECRET = 'Your Foursquare Secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [9]:
Professional_category = '4d4b7105d754a06375d81259'
RADIUS = 1000
LIMIT = 100

def getNearbyProfVenues(names, latitudes, longitudes, radius=RADIUS):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            Professional_category,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        try:    
        # make the GET request
            results = requests.get(url).json()["response"]['venues']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['name'], 
                v['location']['lat'], 
                v['location']['lng'],  
                v['categories'][0]['name']) for v in results])
        except:
            results = requests.get(url).json()
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zone(Location)', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue_Lat', 
                  'Venue_Long', 
                  'Venue_Category']
        
    return(nearby_venues)

In [10]:
Chennai_Zones_ProfVenues = getNearbyProfVenues(names=df_chennai['Location'],
                                   latitudes=df_chennai['Latitude'],
                                   longitudes=df_chennai['Longitude']
                                  )

In [11]:
Chennai_Zones_ProfVenues.head()

Unnamed: 0,Zone(Location),Latitude,Longitude,Venue,Venue_Lat,Venue_Long,Venue_Category
0,Thiruvottiyur,13.172222,80.304585,Aakash Hospital,13.171589,80.304123,Hospital
1,Thiruvottiyur,13.172222,80.304585,Sri Anjaneyar Temple,13.168089,80.310348,Temple
2,Thiruvottiyur,13.172222,80.304585,Sri Pattinathar Temple,13.163456,80.307246,Temple
3,Thiruvottiyur,13.172222,80.304585,Royal Enfield,13.173354,80.30779,Factory
4,Thiruvottiyur,13.172222,80.304585,Angel Broking,13.167497,80.302412,Office


In [12]:
Chennai_Zones_ProfVenues.shape

(498, 7)

#### let's visualize all these zones on a map using 'Folium' ####

In [13]:
#Chennai Lat long
Chennai_centre =[13.0827, 80.2707]

Chennai_Zones = folium.Map(location=Chennai_centre, zoom_start=12)
folium.Marker(Chennai_centre, popup='Chennai').add_to(Chennai_Zones)
# add markers to map
for lat, lng, label in zip(df_chennai['Latitude'], df_chennai['Longitude'], 
                           df_chennai['Location']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=9,
        popup=label,
        color='blue',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7).add_to(Chennai_Zones)  
    
Chennai_Zones