# Assignment Week 4 - part 1 -  Introduction / Business Problem

Clearly define a problem or an idea of your choice.

### Background to business problem choice

I've spent a number of years working with startups and hence have spent a large amount of time in WeWork offices around the world. I'm very interested in their success and the business model they have adopted. On that basis I've decided to focus on WeWork for this capstone project.

### Business problem

[WeWork](https://www.wework.com) is an American company, founded in 2010 and headquartered in New York City. Essentially they provide shared workspaces and have been very popular with technology startups, but also wider. They currently have 651 locations in 114 cities [around the world](https://www.wework.com/en-GB/locations).

The question we aim to answer here is the following: 

>On the assumption that the current 47 London WeWork offices are successful, what links the areas in which they are situated and could inform future choices for office locations in London?

The target audience for this work is WeWork themselves. Using this data WeWork could start to explore potential future locations for their offices based on previous successes in certain neighbourhood types. 

Note - a key assumption here is that the current WeWork offices are "successful". Success would need to be defined by WeWork i.e. profitable / high occupancy rates etc.

# Assignment Week 4 - part 2 - Data

_Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data._

## Data exploration

In this section we will explore the data to be used to investigate the question proposed in the introduction, namely a clustering by neighbourhood characteristics of the locations of current WeWork locations in London. 

In [12]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from urllib.request import urlopen
from bs4 import BeautifulSoup

from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
from tqdm import tqdm
tqdm.pandas()

import requests
import folium

### Where are WeWork locations today - worldwide?

Firstly, let us consider the locations of WeWork offices around the world. Using data from the [WeWork office space locations](https://www.wework.com/en-GB/locations) website, we will do the following:

* Scrape city names from the webpage above (using Beautiful Soup)
* Find the latitude and longitude of all locations (using Geopy)
* Plot a worldwide map of all locations (using Folium)

See the following for details of this process and the final map produced.

In [13]:
url = "https://www.wework.com/en-GB/locations"
html = urlopen(url)
soup = BeautifulSoup(html, 'html')
# soup.contents

In [14]:
countries = soup.find_all('a', 'marketLink__countryList__F4CBD')

cities_array = []
cities_array_cs = []

for name in list(countries):
    if '(coming soon)' in name.text:
        cities_array_cs.append(name.text)
    else:
        cities_array.append(name.text)
        
print('Number of cities with offices today: ',len(cities_array))
print('Number of cities with offices coming soon: ',len(cities_array_cs))

Number of cities with offices today:  0
Number of cities with offices coming soon:  0


Note that a number of cities are listed as "Coming soon", these have been removed from the list of cities to be visualised.

We next create a DataFrame to hold the city names and use Geopy to find the locations of each and insert into the main DataFrame.

In [15]:
df_cities = pd.DataFrame()
df_cities['Name']=cities_array

In [16]:
geolocator = Nominatim(user_agent="office_locations")

In [17]:
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
# df_cities['location'] = df_cities['Name'].apply(geocode)
df_cities['location'] = df_cities['Name'].progress_apply(geocode) # As above but shows progress bar
df_cities['latlong'] = df_cities['location'].apply(lambda x: tuple(x.point) if x else None)

0it [00:00, ?it/s]


In [18]:
df_cities.to_csv('cities.csv') # Save the data to a local CSV file if required since Geopy can be intermittent!

In [19]:
map_world = folium.Map(location=[30, 0], zoom_start=1.5)

for latlng, city in zip(df_cities['latlong'], 
                          df_cities['Name']):
    label = '{}'.format(city)
    label = folium.Popup(label, parse_html=True)
    if latlng != None:
            folium.CircleMarker(
            [latlng[0], latlng[1]],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_world)  
    
map_world

### Where are WeWork locations today - London specific view?

Secondly, let us consider the locations of WeWork offices in London specifically. Using data from the [WeWork London office space locations webpage](https://www.wework.com/en-GB/l/london), we will do the following:

* Scrape location postcodes from the webpage above (using Beautiful Soup)
* Find the latitude and longitude of all locations (using Geopy)
* Plot a map of all London locations (using Folium)

See the following for details of this process and the final map produced.

In [20]:
url_london = "https://www.wework.com/en-GB/l/london"
html_london = urlopen(url_london)
soup_london = BeautifulSoup(html_london, 'html')

In [21]:
london_locations = soup_london.find_all("div", class_="sc-bxivhb dHqnfT")

london_array = []

for name in list(london_locations):
    london_array.append(name.text)
        
print('Number of locations in London: ',len(london_array))

Number of locations in London:  0


The details of each office need to be split into the main street address, which we will use as the name, and the postcode.

We next create a DataFrame to hold the office names and use Geopy to find the locations of each and insert into the main DataFrame.

In [22]:
london_postcodes = []
london_addresses = []
for office in london_array:
    london_postcodes.append(office.split('London')[-1].replace('UK','').strip(' ,.'))
    london_addresses.append(office.split('London')[0].strip(''))

In [23]:
df_london = pd.DataFrame()
df_london['Name']=london_addresses
df_london['Postcode']=london_postcodes
df_london.head()

Unnamed: 0,Name,Postcode


In [24]:
geocode2 = RateLimiter(geolocator.geocode, min_delay_seconds=1)
df_london['location'] = df_london['Postcode'].progress_apply(geocode2) # Shows progress bar
df_london['latlong'] = df_london['location'].apply(lambda x: tuple(x.point) if x else None)

0it [00:00, ?it/s]


In [25]:
df_london.head()

Unnamed: 0,Name,Postcode,location,latlong


Note that a number of postcodes aren't found by Geopy (repeat searches gave the same result). So we needed to get the locations of those missing using the street address.

In [26]:
df_london_missing = df_london[df_london['latlong'].isna()].drop(columns=['location','latlong'])
df_london_missing

Unnamed: 0,Name,Postcode


In [27]:
df_london_missing['location'] = df_london_missing['Name'].progress_apply(geocode2) # As above but shows progress bar
df_london_missing['latlong'] = df_london_missing['location'].apply(lambda x: tuple(x.point) if x else None)

0it [00:00, ?it/s]


In [28]:
df_london_missing

Unnamed: 0,Name,Postcode,location,latlong


We then combine all the data together for all the London offices to give a final DataFrame. This is then visualised using Folium.

In [29]:
df_london_all = df_london.dropna()
df_london_all = df_london_all.append(df_london_missing).reindex()
df_london_all.head()

Unnamed: 0,Name,Postcode,location,latlong


In [30]:
df_london_all.to_csv('london.csv') # Save the data to a local CSV file if required since Geopy can be intermittent!

In [31]:
london_lat = 51.5074
london_long = -0.1278

In [32]:
map_london = folium.Map(location=[london_lat, london_long], zoom_start=12)

for latlng, office in zip(df_london_all['latlong'], 
                          df_london_all['Name']):
    label = '{}'.format(office)
    label = folium.Popup(label, parse_html=True)
    if latlng != None:
            folium.CircleMarker(
            [latlng[0], latlng[1]],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_london)  
    
map_london

### Foursquare data for each office

The final data set to be gathered is using Foursquare to provide data for the immediate location for each of the 47 offices of interest in London. 

In this section we find up 100 venues in the 500m closest to the offices listed in London. The final output is a DataFrame containing these details which can then be used for clustering in the next step of data analysis (out of scope for this week's assignment.

The steps followed are the following:

* Using the latitude and longitude found in the previous section, query Foursquare using the `explore` [API](https://developer.foursquare.com/docs/api/venues/explore) to find up to 100 local popular venues.
* Group the data produced by office name and shape using one hot encoding.
* Find the 10 most common venues for each office and create a final DataFrame with this data for analysis.

The final DataFrame can be found at the bottom of this section.

In [311]:
CLIENT_ID = 'REMOVED' # your Foursquare ID
CLIENT_SECRET = 'REMOVED' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: REMOVED
CLIENT_SECRET:REMOVED


In [254]:
# function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [294]:
# function that gets nearby venues

def getNearbyVenues(names, latlong, radius=500):
    LIMIT = 100
    
    venues_list=[]
    for name, latlng in zip(names, latlong):
        print(name, list(latlng)[0], list(latlng)[1])
        lat = list(latlng)[0]
        lng = list(latlng)[1]
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Office Name', 
                  'Office Latitude', 
                  'Office Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [273]:
df_london_all.head()

Unnamed: 0,Name,Postcode,location,latlong
0,5 Merchant Square,W2 1NH,"(City of Westminster, W2 1NH, UK, (51.5180379,...","(51.5180379, -0.1713545, 0.0)"
1,2 Minster Court,EC3R 7BB,"(London, EC3R 7BB, UK, (51.5113373470318, -0.0...","(51.5113373470318, -0.0802445243401305, 0.0)"
3,97 Hackney Rd,E2 8ET,"(London Borough of Hackney, E2 8ET, UK, (51.52...","(51.529409776996, -0.0753808705265654, 0.0)"
4,12 Moorgate,EC2R 6DA,"(London, EC2R 6DA, UK, (51.5152573476507, -0.0...","(51.5152573476507, -0.0892781789885309, 0.0)"
5,77 Leadenhall Street,EC3A 3DE,"(London, EC3A 3DE, UK, (51.5133463830723, -0.0...","(51.5133463830723, -0.0779604206301277, 0.0)"


In [295]:
london_venues = getNearbyVenues(names=df_london_all['Name'],
                                latlong=df_london_all['latlong']
                                )
london_venues.head()

5 Merchant Square  51.5180379 -0.1713545
2 Minster Court  51.5113373470318 -0.0802445243401305
97 Hackney Rd  51.529409776996 -0.0753808705265654
12 Moorgate  51.5152573476507 -0.0892781789885309
77 Leadenhall Street  51.5133463830723 -0.0779604206301277
The Bard, Shoreditch  51.5229407264196 -0.0784646120558583
21 Soho Square  51.5155972418882 -0.131553517097588
41 Blackfriars Road  51.4966461 -0.0995819
120 Moorgate  51.5191215976723 -0.0885812010133903
The Hewitt, Shoreditch  51.5220564 -0.081771
10 East Road  51.5272802404136 -0.0875806033120571
123 Buckingham Palace Road  51.493078125 -0.14681935
1 Waterhouse Square  51.5187245333333 -0.109628533333333
Aldgate Tower, 2 Leman Street,  51.5148743358659 -0.0719950469255574
115 Mare Street  51.5376453848606 -0.0572825791259911
16 Great Chapel St,  51.5151727 -0.1342188
14 Gray's Inn Road  51.5189251408874 -0.111382048329653
12 Hammersmith Grove  51.4942183091413 -0.225692027308601
1 St. Katharine's Way  51.5079079207978 -0.07356876987

Unnamed: 0,Office Name,Office Latitude,Office Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,5 Merchant Square,51.518038,-0.171354,Virgin Active,51.518988,-0.173483,Gym / Fitness Center
1,5 Merchant Square,51.518038,-0.171354,Frontline Club,51.516932,-0.172523,Bar
2,5 Merchant Square,51.518038,-0.171354,Kioskafé,51.516914,-0.172626,Bookstore
3,5 Merchant Square,51.518038,-0.171354,BrewDog Paddington,51.518948,-0.170546,Beer Bar
4,5 Merchant Square,51.518038,-0.171354,Java U Paddington,51.516217,-0.174553,Coffee Shop


In [298]:
print(london_venues.shape)
print('There are {} unique categories within this data for London.'.format(len(london_venues['Venue Category'].unique())))

(4212, 7)
There are 249 unique categories within this data for London.


In [300]:
london_venues.groupby('Office Name').count()

Unnamed: 0_level_0,Office Latitude,Office Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Office Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1 Fore Street Avenue,83,83,83,83,83,83
1 Mark Square,100,100,100,100,100,100
1 Poultry,100,100,100,100,100,100
"1 Primrose Street,",100,100,100,100,100,100
1 St. Katharine's Way,76,76,76,76,76,76
1 Waterhouse Square,96,96,96,96,96,96
10 Devonshire Square,100,100,100,100,100,100
10 East Road,98,98,98,98,98,98
115 Mare Street,95,95,95,95,95,95
119 Marylebone Road,74,74,74,74,74,74


In [303]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add office name column back to dataframe
london_onehot['Office Name'] = london_venues['Office Name'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

print('Shape is:', london_onehot.shape)
london_onehot.head()

Shape is: (4212, 250)


Unnamed: 0,Office Name,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Court,Bath House,Beach,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Boarding House,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brasserie,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Camera Store,Canal,Canal Lock,Cantonese Restaurant,Caribbean Restaurant,Casino,Castle,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Churrascaria,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,College Cafeteria,College Gym,Colombian Restaurant,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Lebanese Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Museum,Music Store,Music Venue,Nature Preserve,New American Restaurant,Nightclub,Noodle House,Office,Okonomiyaki Restaurant,Opera House,Organic Grocery,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Portuguese Restaurant,Print Shop,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Road,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Scottish Restaurant,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Snack Place,Soba Restaurant,Social Club,Soup Place,South American Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Squash Court,Sri Lankan Restaurant,Stables,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,5 Merchant Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,5 Merchant Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,5 Merchant Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,5 Merchant Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,5 Merchant Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [309]:
london_grouped = london_onehot.groupby('Office Name').mean().reset_index()
print('Shape is:', london_grouped.shape)
london_grouped.head()

Shape is: (47, 250)


Unnamed: 0,Office Name,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Court,Bath House,Beach,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Boarding House,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brasserie,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Camera Store,Canal,Canal Lock,Cantonese Restaurant,Caribbean Restaurant,Casino,Castle,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Churrascaria,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,College Cafeteria,College Gym,Colombian Restaurant,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Lebanese Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Museum,Music Store,Music Venue,Nature Preserve,New American Restaurant,Nightclub,Noodle House,Office,Okonomiyaki Restaurant,Opera House,Organic Grocery,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Portuguese Restaurant,Print Shop,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Road,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Scottish Restaurant,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Snack Place,Soba Restaurant,Social Club,Soup Place,South American Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Squash Court,Sri Lankan Restaurant,Stables,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,1 Fore Street Avenue,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.036145,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.012048,0.0,0.144578,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.048193,0.0,0.0,0.012048,0.012048,0.036145,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.024096,0.012048,0.0,0.0,0.024096,0.012048,0.0,0.0,0.012048,0.0,0.012048,0.0,0.0,0.072289,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.012048,0.024096,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.0,0.024096,0.0
1,1 Mark Square,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0
2,1 Poultry,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.01,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.05,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0
3,"1 Primrose Street,",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.04,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.03,0.02,0.01,0.02,0.0,0.0
4,1 St. Katharine's Way,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.013158,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.065789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.013158,0.105263,0.026316,0.0,0.026316,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.039474,0.0,0.0,0.0,0.039474,0.013158,0.0,0.0,0.0,0.026316,0.0,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [305]:
#  Function to return most common venues

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [310]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Office Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Office Name'] = london_grouped['Office Name']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Office Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1 Fore Street Avenue,Coffee Shop,Sandwich Place,Hotel,Art Gallery,Italian Restaurant,Café,Plaza,Steakhouse,Boxing Gym,Vietnamese Restaurant
1,1 Mark Square,Coffee Shop,Hotel,Restaurant,Bar,Cocktail Bar,Italian Restaurant,Pub,Café,Pizza Place,Gym / Fitness Center
2,1 Poultry,Coffee Shop,Pub,Italian Restaurant,Gym / Fitness Center,Seafood Restaurant,French Restaurant,Vietnamese Restaurant,Restaurant,Hotel,Asian Restaurant
3,"1 Primrose Street,",Coffee Shop,Hotel,Pub,Food Truck,Pizza Place,Gym / Fitness Center,Sushi Restaurant,Wine Bar,Lounge,Flea Market
4,1 St. Katharine's Way,Hotel,Coffee Shop,French Restaurant,Pub,Café,Italian Restaurant,Scenic Lookout,Restaurant,Castle,Cocktail Bar
5,1 Waterhouse Square,Coffee Shop,Pub,Sandwich Place,Italian Restaurant,Salad Place,French Restaurant,Vietnamese Restaurant,Bar,Food Truck,Burrito Place
6,10 Devonshire Square,Coffee Shop,Salad Place,Hotel,Pub,Restaurant,Gym / Fitness Center,Pizza Place,Sushi Restaurant,Cocktail Bar,Mediterranean Restaurant
7,10 East Road,Coffee Shop,Pub,Restaurant,Hotel,Cocktail Bar,Italian Restaurant,Thai Restaurant,Yoga Studio,Beer Bar,Café
8,115 Mare Street,Café,Pub,Coffee Shop,Bakery,Grocery Store,Pizza Place,Cocktail Bar,Bookstore,Restaurant,Art Gallery
9,119 Marylebone Road,Coffee Shop,Pub,Gastropub,Pizza Place,Hotel,Japanese Restaurant,Platform,Movie Theater,Chinese Restaurant,Thai Restaurant


The next step is to start clustering this data - this will be in next week's assignment.