# The Battle of Neighborhoods

## Introduction 
This is the notebook that belongs to the official report. 

#### Business problem - Recap
As mentioned in the report, John asked me to examine which venue in Toronto is best to be located if you want to have the most of these spots nearby: 
- Breakfast spot 
- Bakery 
- Sushi Restaurant
- Sandwich Place
- Pizza Place 

## Step 1
In the first step we are going to import all libraries that we need. Then we are going to scrape the Wikipedia page of Toronto. After that we put all the data in a dataframe. The last step is to add the latitude and longitude values to the dataframe. 

#### Import libraries 


In [18]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

from bs4 import BeautifulSoup # import recommended html parsing library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.18.1                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported.


#### Scrape the Wikipedia page of Toronto

In [35]:
# Assign the link of the website through which we are going to scrape the data and assign it to variable named website_url.
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

# Read the source code and create soup object
soup = BeautifulSoup(source,'lxml')

# Find the article in the HTML script
article = soup.find('table', class_='wikitable sortable')

# Create table list
table_list = []
for rows in article.find_all('td'):
    row = rows.text
    row = row.replace('\n', '')
    table_list.append(row)

#### Put all the data that we've collected into a pandas dataframe

In [36]:
# Define the dataframe columns
column_names = ['PostalCode', 'Borough', 'Neighborhood'] 

# Create empty dataframe 
df = []
df = pd.DataFrame(columns=column_names)

# Only process the cells that have an assigned borough. 
# Ignore cells with a borough that is Not assigned.
df.iloc[:,0]=table_list[::3]
df.iloc[:,1]=table_list[1::3]
df.iloc[:,2]=table_list[2::3]
df.replace("Not assigned", np.nan, inplace = True)
df.dropna(subset=["Borough"], axis=0, inplace = True)
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [37]:
# Assign the non-assigned neighborhood names of boroughs
for i in range(0, df.shape[0]):
    if pd.isnull(df.loc[i,'Neighborhood']):
        df.replace(df.loc[i,'Neighborhood'], df.loc[i,'Borough'],inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

In [38]:
# Combine the rows with the same code into one row and seperate them by comma
df['Neighborhood'] = df[['PostalCode','Borough','Neighborhood']].groupby(['PostalCode','Borough'])['Neighborhood'].transform(lambda x: ','.join(x)) 
df.drop_duplicates(inplace=True)
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


#### Add latitude and longitude to the dataframe

In this part we use a csv file that has the geographical coordinates of each postal code. We add the data from the csv file to the dataframe we´ve created so that it also shows the latitude and longitude of each postal code.

In [39]:
#Lets rename the existing dataframe to toronto_data
toronto_data=df

#We create a second dataframe which contains the data from the csv file
toronto_coordinates=pd.read_csv("http://cocl.us/Geospatial_data")

#We add the values to the first dataframe
toronto_data[['Latitude', 'Longitude']]=toronto_coordinates[['Latitude', 'Longitude']]

#Let's check the result
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.806686,-79.194353
1,M4A,North York,Victoria Village,43.784535,-79.160497
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.763573,-79.188711
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.770992,-79.216917
4,M7A,Queen's Park,Queen's Park,43.773136,-79.239476


## Step 2

In the second step we are going to explore and cluster the neighborhoods in Toronto.

We use Geopy library to get the latitude and longitude values of Toronto. 

In order to define an instance of the geocoder, we need to define a user_agent. We call it 'toronto_explorer'.

In [40]:
address = 'Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


We create a map of Toronto with the neighborhoods superimposed on top.

In [41]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Step 3
In this step we are going to start utilizing the Foursquare API. We retreive all data that we need for this assignment, we clean the data and finally we store it in a csv file so we don't need Foursquare API calls anymore. 

In [42]:
#Define Foursquare Credentials and Version 
CLIENT_ID = 'RNOZQCLVFPDRIZTKLDYFOPDRHJDZD0WAXRIYSECCTPOP2VMF' # your Foursquare ID
CLIENT_SECRET = 'XB3BVV05SCJF3LYM5YWXZFJAKUOSKSEYT44IF31H11SWCVQN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RNOZQCLVFPDRIZTKLDYFOPDRHJDZD0WAXRIYSECCTPOP2VMF
CLIENT_SECRET:XB3BVV05SCJF3LYM5YWXZFJAKUOSKSEYT44IF31H11SWCVQN


Let's first explore the neighborhoods in our dataframe. 
Get the first neighborhood's name. 

In [43]:
toronto_data.loc[0, 'Neighborhood']

'Parkwoods'

Get the neighborhood's latitude and longitude values.

In [44]:
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, neighborhood_latitude, neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.806686299999996, -79.19435340000001.


Let's get the top 100 venues within a radius of 500 meter. 
First, let's create the GET request URL. 

In [45]:
radius = 500 # define radius
limit = 100 #define limit
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    limit)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=RNOZQCLVFPDRIZTKLDYFOPDRHJDZD0WAXRIYSECCTPOP2VMF&client_secret=XB3BVV05SCJF3LYM5YWXZFJAKUOSKSEYT44IF31H11SWCVQN&v=20180605&ll=43.806686299999996,-79.19435340000001&radius=500&limit=100'

Next we send the GET request, clean the json and structure it into a pandas dataframe. 

In [46]:
# Send the GET request
results = requests.get(url).json()

In [47]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [48]:
#Clean the json and structure it into a dataframe
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wendy's,Fast Food Restaurant,43.807448,-79.199056


Now we create a function to repeat the same process for all neighborhoods. 

In [62]:
def foursquare_crawler (postal_code_list, neighborhood_list, lat_list, lng_list, LIMIT = 500, radius = 1000):
    result_ds = []
    counter = 0
    for postal_code, neighborhood, lat, lng in zip(postal_code_list, neighborhood_list, lat_list, lng_list):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        tmp_dict = {}
        tmp_dict['PostalCode'] = postal_code; tmp_dict['Neighborhood'] = neighborhood; 
        tmp_dict['Latitude'] = lat; tmp_dict['Longitude'] = lng;
        tmp_dict['Crawling_result'] = results;
        result_ds.append(tmp_dict)
        counter += 1
        print('{}.'.format(counter))
        print('Data is Obtained, for the Postal Code {} (and Neighborhoods {}) SUCCESSFULLY.'.format(postal_code, neighborhood))
    return result_ds;

In [64]:
print('Crawling different neighborhoods of Toronto')
toronto_foursquare_dataset = foursquare_crawler(list(toronto_data['PostalCode']),
                                                   list(toronto_data['Neighborhood']),
                                                   list(toronto_data['Latitude']),
                                                   list(toronto_data['Longitude']),)

Crawling different neighborhoods of Toronto
1.
Data is Obtained, for the Postal Code M3A (and Neighborhoods Parkwoods) SUCCESSFULLY.
2.
Data is Obtained, for the Postal Code M4A (and Neighborhoods Victoria Village) SUCCESSFULLY.
3.
Data is Obtained, for the Postal Code M5A (and Neighborhoods Harbourfront,Regent Park) SUCCESSFULLY.
4.
Data is Obtained, for the Postal Code M6A (and Neighborhoods Lawrence Heights,Lawrence Manor) SUCCESSFULLY.
5.
Data is Obtained, for the Postal Code M7A (and Neighborhoods Queen's Park) SUCCESSFULLY.
6.
Data is Obtained, for the Postal Code M9A (and Neighborhoods Islington Avenue) SUCCESSFULLY.
7.
Data is Obtained, for the Postal Code M1B (and Neighborhoods Rouge,Malvern) SUCCESSFULLY.
8.
Data is Obtained, for the Postal Code M3B (and Neighborhoods Don Mills North) SUCCESSFULLY.
9.
Data is Obtained, for the Postal Code M4B (and Neighborhoods Woodbine Gardens,Parkview Hill) SUCCESSFULLY.
10.
Data is Obtained, for the Postal Code M5B (and Neighborhoods Ryers

79.
Data is Obtained, for the Postal Code M1S (and Neighborhoods Agincourt) SUCCESSFULLY.
80.
Data is Obtained, for the Postal Code M4S (and Neighborhoods Davisville) SUCCESSFULLY.
81.
Data is Obtained, for the Postal Code M5S (and Neighborhoods Harbord,University of Toronto) SUCCESSFULLY.
82.
Data is Obtained, for the Postal Code M6S (and Neighborhoods Runnymede,Swansea) SUCCESSFULLY.
83.
Data is Obtained, for the Postal Code M1T (and Neighborhoods Clarks Corners,Sullivan,Tam O'Shanter) SUCCESSFULLY.
84.
Data is Obtained, for the Postal Code M4T (and Neighborhoods Moore Park,Summerhill East) SUCCESSFULLY.
85.
Data is Obtained, for the Postal Code M5T (and Neighborhoods Chinatown,Grange Park,Kensington Market) SUCCESSFULLY.
86.
Data is Obtained, for the Postal Code M1V (and Neighborhoods Agincourt North,L'Amoreaux East,Milliken,Steeles East) SUCCESSFULLY.
87.
Data is Obtained, for the Postal Code M4V (and Neighborhoods Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West) SUCC

Let's save the results of Foursquare so we don't need to connect each time. 

In [65]:
import pickle
with open("toronto_foursquare_dataset.txt", "wb") as fp:   #Pickling
    pickle.dump(toronto_foursquare_dataset, fp)
print('All received data from foursquare is saved to the computer')

All received data from foursquare is saved to the computer


In [66]:
with open("toronto_foursquare_dataset.txt", "rb") as fp:   # Unpickling
    toronto_foursquare_dataset = pickle.load(fp)

Next, we can clean all the raw data we received from the Foursquare database. 

In [68]:
# Connect to the saved list and extract each venue for every neighborhood.

def get_venue_dataset(foursquare_dataset):
    result_df = pd.DataFrame(columns = ['Postal Code', 'Neighborhood', 
                                           'Neighborhood Latitude', 'Neighborhood Longitude',
                                          'Venue', 'Venue Summary', 'Venue Category', 'Distance'])
      
    for neigh_dict in foursquare_dataset:
        postal_code = neigh_dict['PostalCode']; neigh = neigh_dict['Neighborhood']
        lat = neigh_dict['Latitude']; lng = neigh_dict['Longitude']
        print('Number of Venues in Coordination "{}" Postal Code and "{}" Neighborhood(s) is:'.format(postal_code, neigh))
        print(len(neigh_dict['Crawling_result']))
        
        for venue_dict in neigh_dict['Crawling_result']:
            summary = venue_dict['reasons']['items'][0]['summary']
            name = venue_dict['venue']['name']
            dist = venue_dict['venue']['location']['distance']
            cat =  venue_dict['venue']['categories'][0]['name']
            
            result_df = result_df.append({'Postal Code': postal_code, 'Neighborhood': neigh, 
                              'Neighborhood Latitude': lat, 'Neighborhood Longitude':lng,
                              'Venue': name, 'Venue Summary': summary, 
                              'Venue Category': cat, 'Distance': dist}, ignore_index = True)
            
    return(result_df)

In [69]:
toronto_venues = get_venue_dataset(toronto_foursquare_dataset)

Number of Venues in Coordination "M3A" Postal Code and "Parkwoods" Neighborhood(s) is:
17
Number of Venues in Coordination "M4A" Postal Code and "Victoria Village" Neighborhood(s) is:
4
Number of Venues in Coordination "M5A" Postal Code and "Harbourfront,Regent Park" Neighborhood(s) is:
25
Number of Venues in Coordination "M6A" Postal Code and "Lawrence Heights,Lawrence Manor" Neighborhood(s) is:
8
Number of Venues in Coordination "M7A" Postal Code and "Queen's Park" Neighborhood(s) is:
26
Number of Venues in Coordination "M9A" Postal Code and "Islington Avenue" Neighborhood(s) is:
11
Number of Venues in Coordination "M1B" Postal Code and "Rouge,Malvern" Neighborhood(s) is:
23
Number of Venues in Coordination "M3B" Postal Code and "Don Mills North" Neighborhood(s) is:
28
Number of Venues in Coordination "M4B" Postal Code and "Woodbine Gardens,Parkview Hill" Neighborhood(s) is:
13
Number of Venues in Coordination "M5B" Postal Code and "Ryerson,Garden District" Neighborhood(s) is:
13
Num

Number of Venues in Coordination "M1S" Postal Code and "Agincourt" Neighborhood(s) is:
100
Number of Venues in Coordination "M4S" Postal Code and "Davisville" Neighborhood(s) is:
12
Number of Venues in Coordination "M5S" Postal Code and "Harbord,University of Toronto" Neighborhood(s) is:
19
Number of Venues in Coordination "M6S" Postal Code and "Runnymede,Swansea" Neighborhood(s) is:
44
Number of Venues in Coordination "M1T" Postal Code and "Clarks Corners,Sullivan,Tam O'Shanter" Neighborhood(s) is:
100
Number of Venues in Coordination "M4T" Postal Code and "Moore Park,Summerhill East" Neighborhood(s) is:
100
Number of Venues in Coordination "M5T" Postal Code and "Chinatown,Grange Park,Kensington Market" Neighborhood(s) is:
72
Number of Venues in Coordination "M1V" Postal Code and "Agincourt North,L'Amoreaux East,Milliken,Steeles East" Neighborhood(s) is:
100
Number of Venues in Coordination "M4V" Postal Code and "Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West" Neighborh

We can show all venues for each neighborhood in Toronto

In [71]:
toronto_venues.head()

Unnamed: 0,Postal Code,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
0,M3A,Parkwoods,43.806686,-79.194353,Images Salon & Spa,This spot is popular,Spa,595
1,M3A,Parkwoods,43.806686,-79.194353,Caribbean Wave,This spot is popular,Caribbean Restaurant,912
2,M3A,Parkwoods,43.806686,-79.194353,Wendy's,This spot is popular,Fast Food Restaurant,600
3,M3A,Parkwoods,43.806686,-79.194353,Harvey's,This spot is popular,Fast Food Restaurant,796
4,M3A,Parkwoods,43.806686,-79.194353,Wendy's,This spot is popular,Fast Food Restaurant,387


This looks good! Let's save the cleaned dataframe in a csv file. 

In [75]:
toronto_venues.to_csv('toronto_venues.csv')
print('All data is stored in csv format')

All data is stored in csv format


Next we will load this csv file so we can use it for further analysis. 

In [78]:
toronto_venues = pd.read_csv('toronto_venues.csv')
print('All data from the csv is loaded into a dataframe')

All data from the csv is loaded into a dataframe


## Step 4
In this step we are going to analyze all data we have so far to see what we already have and how this data can help us with the assignment. 

Let's first count all the unique neighborhoods in Toronto. 

In [168]:
neigh_list = list(toronto_venues['Neighborhood'].unique())
print('Number of Neighborhoods inside Toronto:')
print(len(neigh_list))

Number of Neighborhoods inside Toronto:
102


Let's summarize all the information about the neighborhoods in Toronto. 

In [169]:
neigh_venue_summary = toronto_venues.groupby('Neighborhood').count()
neigh_venue_summary.drop(columns = ['Unnamed: 0']).head()

Unnamed: 0_level_0,Postal Code,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Adelaide,King,Richmond",18,18,18,18,18,18,18
Agincourt,100,100,100,100,100,100,100
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",100,100,100,100,100,100,100
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",24,24,24,24,24,24,24
"Alderwood,Long Branch",13,13,13,13,13,13,13


Let's find out how many unique venues there are in our dataset. 

In [170]:
print('In this dataset there are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

In this dataset there are 335 uniques categories.


And let's make a list of all the different categories. 

In [171]:
list(toronto_venues['Venue Category'].unique())

['Spa',
 'Caribbean Restaurant',
 'Fast Food Restaurant',
 'Coffee Shop',
 'Paper / Office Supplies Store',
 'Hobby Shop',
 'Bus Station',
 'African Restaurant',
 'Chinese Restaurant',
 'Greek Restaurant',
 'Fruit & Vegetable Store',
 'Gym',
 'Sandwich Place',
 'Park',
 'Italian Restaurant',
 'Burger Joint',
 'Breakfast Spot',
 'Playground',
 'Fried Chicken Joint',
 'Food & Drink Shop',
 'Liquor Store',
 'Pizza Place',
 'Smoothie Shop',
 'Discount Store',
 'Beer Store',
 'Pharmacy',
 'Bank',
 'Sports Bar',
 'Medical Center',
 'Supermarket',
 'Filipino Restaurant',
 'Clothing Store',
 'Indian Restaurant',
 'Electronics Store',
 'Hakka Restaurant',
 'Thai Restaurant',
 'Music Store',
 'Athletics & Sports',
 'Bakery',
 'Wings Joint',
 'Yoga Studio',
 'Grocery Store',
 'Asian Restaurant',
 'Rental Car Location',
 'Restaurant',
 'Convenience Store',
 'Train Station',
 'Japanese Restaurant',
 'Bowling Alley',
 'Department Store',
 'Bus Line',
 'Light Rail Station',
 'Metro Station',
 'Inters

Let's see which venues are in which neighborhoods. 

In [191]:
# We use one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']  

# Move neighborhood column to the first column
cols = list(toronto_onehot.columns)
cols = ['Neighborhood'] + cols[:-1]
toronto_onehot = toronto_onehot[cols]
toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Lab,College Quad,College Rec Center,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundry Service,Light Rail Station,Lighting Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood.1,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Photography Lab,Piano Bar,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Tree,Tunnel,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Step 5
Now we have all data that we need so we can continue by creating a dataframe with the required information and segment to find the answer to our business problem. 

We start by making a list of the requirements. The owner of the company wants to live in a neighborhood that has the most:
- Bakeries  
- Breakfast spot 
- Coffee shops
- Sushi restaurants
- Sandwich places 
- Pizza place

In [194]:
# Create a list of the required venues
required_venues = [
    'Neighborhood', 
    'Bakery', 
    'Breakfast Spot', 
    'Coffee Shop', 
    'Sushi Restaurant', 
    'Sandwich Place', 
    'Pizza Place']
required_venues

['Neighborhood',
 'Bakery',
 'Breakfast Spot',
 'Coffee Shop',
 'Sushi Restaurant',
 'Sandwich Place',
 'Pizza Place']

Let's update our dataframe so that it only shows the required venues and group the data by neighborhoods.  

In [195]:
toronto_onehot = toronto_onehot[required_venues]
toronto_onehot.head()

Unnamed: 0,Neighborhood,Neighborhood.1,Bakery,Breakfast Spot,Coffee Shop,Sushi Restaurant,Sandwich Place,Pizza Place
0,Parkwoods,Parkwoods,0,0,0,0,0,0
1,Parkwoods,Parkwoods,0,0,0,0,0,0
2,Parkwoods,Parkwoods,0,0,0,0,0,0
3,Parkwoods,Parkwoods,0,0,0,0,0,0
4,Parkwoods,Parkwoods,0,0,0,0,0,0


Let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [185]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

ValueError: Grouper for 'Neighborhood' not 1-dimensional