# The Battle of Nighborhoods

## Week 1

## 1. A Description of the Problem and A Discussion of the Background

### 1.1 Description of the Problem

London is the capital and largest city of England and the United Kingdom. It is a historic city containing invaluable history of the bristish empire. London is unique for many of its unique attributes. This includes, but is not limnited to, museums, universities, culture, history and bristish monarchs, football and cricket. London city is divided into a number of neighborhoods. It also showcases a remarkable collection of international foods and restaurants.
 

Consequently, it is difficult to sort out restaurants with authentic cuisines for different cultures as the number of restaurants are high. With a large number of mediterranean restaurants, it is often hard to find the ones that offer the best food and cuisinary experience in London.

### 1.2 Discussion of the Background

This capstone project is designed to a real life situation on the basis of the above crisis. In this project, my client wants to starts a new business with authentic mediterranean cuisine. The goal is to find out a couple of locations, which are profitable in order to sustain his business in London. He wants to find places that can maximize his business return, while simultaneously can serve as an iconic and authentic mediterranean restaurant in the city of London. 

## 2. A Description of the Data and How It Will be Used to Solve the Problem

### 2.1 Source of the Data 

Throughout this project, we will be using London data avaliable from wikipedia and foursqaure. The data acquistion and processing will be done following the course and lab materials provided in this Coursera course.

### 2.2 A Close Look into the Data

In [95]:
# Importing libraries
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [96]:
!pip -q install geocoder
import geocoder

In [97]:
Link_url = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
Link_page = requests.get(Link_url, headers = headers)
soup = BeautifulSoup(Link_page.content, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody

In [98]:
rows = table.find_all('tr')
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]
df = pd.DataFrame(columns = columns)
for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df = df.append(pd.Series(values, index = columns), ignore_index = True)

        df

The unprocessed, raw data looks as follows:

In [99]:
df.head(20)

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728
5,Aldborough Hatch,Redbridge[9],ILFORD,IG2,20,TQ455895
6,Aldgate,City[10],LONDON,EC3,20,TQ334813
7,Aldwych,Westminster[10],LONDON,WC2,20,TQ307810
8,Alperton,Brent[11],WEMBLEY,HA0,20,TQ185835
9,Anerley,Bromley[11],LONDON,SE20,20,TQ345695


In [100]:
# Constructing the dataframe
df = df.rename(index=str, columns = {'Location': 'Location', 'London\xa0borough': 'Borough', 'Post town': 'Post-town', 'Postcode\xa0district': 'Postcode', 'Dial\xa0code': 'Dial-code', 'OS grid ref': 'OSGridRef'})
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))

In [104]:
# Data processing: Joining tables
df_join = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))
# Dataframe containing in 'Location', 'Borough', 'Postcode', 'Post-town'
df1 = df_join[['Location', 'Borough', 'Postcode', 'Post-town']].reset_index(drop=True)

In [105]:
df2 = df1[df1['Post-town'].str.contains('LONDON')]
df2 = df2[['Location', 'Borough', 'Postcode']].reset_index(drop=True)
df2.head(20)

Unnamed: 0,Location,Borough,Postcode
0,Abbey Wood,"Bexley, Greenwich",SE2
1,Acton,"Ealing, Hammersmith and Fulham",W3
2,Acton,"Ealing, Hammersmith and Fulham",W4
3,Angel,Islington,EC1
4,Angel,Islington,N1
5,Church End,Brent,NW10
6,Church End,Barnet,N3
7,Clapham,"Lambeth, Wandsworth",SW4
8,Clerkenwell,Islington,EC1
9,Colindale,Barnet,NW9


In [107]:
def get_latlng(arcgis_geocoder):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [108]:
postal_codes = df2['Postcode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]

In [109]:
# London dataframe with location coordinates
df_london = df2
# The obtained coordinates (latitude and longitude) and join with the previosu tablen
df_london_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_london['Latitude'] = df_london_coordinates['Latitude']
df_london['Longitude'] = df_london_coordinates['Longitude']

In [110]:
df_london.head(20)

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",SE2,51.49245,0.12127
1,Acton,"Ealing, Hammersmith and Fulham",W3,51.51324,-0.26746
2,Acton,"Ealing, Hammersmith and Fulham",W4,51.48944,-0.26194
3,Angel,Islington,EC1,51.52361,-0.09877
4,Angel,Islington,N1,51.53792,-0.09983
5,Church End,Brent,NW10,51.53916,-0.25123
6,Church End,Barnet,N3,51.60104,-0.19401
7,Clapham,"Lambeth, Wandsworth",SW4,51.46095,-0.13922
8,Clerkenwell,Islington,EC1,51.52361,-0.09877
9,Colindale,Barnet,NW9,51.58486,-0.24881


### 2.3 How the Data will be Used?

This data provides important details and insights regarding the distributions of restaurants within various neighborhoods in London. By carefully, studying this dataset we can understand what type of restaurant is popular in each one of these neighborhoods. Thus, we can use this information to develop a machine learning model to make useful prediction for my client. This includes: to find deatils regarding the competition between different kinds of cuisines in London, to find a suitable location to place his restaurant to maximize the profit etc.