# The Battle of Neighbourhoods

# Business problem

The problem I am going to solve with data from Foursquare and Wikipedia is which neighbourhood in New York City is most suitable for a Swedish pizza restaurant chain to establish. The (fictive) chain is called SwePizz and has an established brand in Sweden and it is time for them to expand internationally. They are confident that the brand will be strong in all areas of New York City and to choose their first entry in this city they want to find a neighbourhood with high population but few pizza restaurants in their price tier (tier 2). I will achieve this through two objectives:

1. Find the ten most populated neighbourhoods in New York City
2. Find which one of these has the fewest pizza restaurants in price tier 2 - per capita

The idea behind this is that the demand is assumed to be high in all areas since SwePizz's pizzas have historically captivated all categories of customers and therefore the focus is on finding an area with low supply in order to efficiently grow the customer base. In price tier 2 the price sensitivity is also assumed to be low among customers (i.e. it does not really matter whether a pizza costs 8 or 12 USD), which makes to socioeconomic status of the neighbourhood less relevant.

# Target audience

This research will be valuable to the owner and managing director of SwePizz in order to make an effective entry into their next market. Entering the right market will help them gain traction in a new area and start building their brand in New York City, making it an important strategic decision for the long term.

# Data

The data I will use for this project are:
    
- New York City demographics from Wikipedia
- Venue menus, price tiers, and areas from Foursquare

The demographics data will show the ten most populated neighbourhoods in New York City.

The venue menu and price data will show how many restaurants in price tier 2 serve pizza in the top ten populated areas.

In [1]:
!pip install folium

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import os
from sklearn.cluster import KMeans
import folium as fol
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors



In [34]:
List_url = "https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City#Neighborhoods_by_borough"
source = requests.get(List_url).text
s = BeautifulSoup(source, 'xml')
table=s.find('table')

In [35]:
column_names = ['Community Board (CB)', 'Area', 'Pop. Census 2010', 'Pop.','Neighbourhoods']
df = pd.DataFrame(columns = column_names)

In [32]:
for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data

In [33]:
df.head()

Unnamed: 0,Community Board (CB),Area,Pop. Census 2010,Pop.,Neighbourhoods
