# Capstone Project: The Battle of Neighbourhoods

### New York City - Theaters' visitors Venue Recommendation

#### Import neccessary libraries

In [1]:
import numpy as np #library to handle data in a vectorized manner

import pandas as pd #library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Download and Explore New York Theatres Dataset

In [2]:
#Download data from a remote url to a csv file
!wget -O 'theaters.csv' https://data.cityofnewyork.us/api/views/2hzz-95k8/rows.csv
print('Data downloaded!')

--2019-05-23 22:01:02--  https://data.cityofnewyork.us/api/views/2hzz-95k8/rows.csv
Resolving data.cityofnewyork.us (data.cityofnewyork.us)... 52.206.68.26, 52.206.140.205, 52.206.140.199
Connecting to data.cityofnewyork.us (data.cityofnewyork.us)|52.206.68.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘theaters.csv’

    [ <=>                                   ] 18,705      --.-K/s   in 0s      

2019-05-23 22:01:03 (131 MB/s) - ‘theaters.csv’ saved [18705]

Data downloaded!


#### Load and explore the data

In [3]:
df = pd.read_csv('theaters.csv') # Read data from csv to a pandas dataframe
# take a look at the dataset
df.head()

Unnamed: 0,the_geom,NAME,TEL,URL,ADDRESS1,ADDRES2,CITY,ZIP
0,POINT (-73.99061840882582 40.75985115447559),45th Street Theater,(212) 352-3101,http://www.theatermania.com/new-york/theaters/...,354 West 45th Street,,New York,10036
1,POINT (-73.9881059525377 40.76047123447081),47th Street Theater,(800) 775-1617,http://www.bestofoffbroadway.com/theaters/47st...,304 West 47th Street,,New York,10036
2,POINT (-73.97038450260143 40.76339942774153),59E59,(212) 753-5959,http://www.59e59.org/,59 East 59th Street,,New York,10022
3,POINT (-73.99332384622063 40.7585366821068),Acorn Theater,(212) 279-4200,http://www.theatrerow.org/theacorn.htm,410 West 42nd Street,,New York,10036
4,POINT (-73.9892143340222 40.75926091219353),Al Hirschfeld Theater,(212) 239-6200,http://www.newyorkcitytheatre.com/theaters/alh...,302 W 45th Street,,New York,10036


In [4]:
# Check the number of rows and columns of the dataframe
df.shape

(117, 8)

## 1. Preprocessing

Data Cleanup and re-grouping.
The retrieved table contains some un-wanted entries and needs some cleanup. The following tasks will be performed:
<ul>
<li>Drop/ignore cells with missing data.</li>
<li>Rename columns.</li>
<li>Separate Latitudes and Longitudes into separate columns.</li>
<li>Fix data types.</li>
</ul>

In [5]:
# Extract the column containing Latitudes and Longitudes
df['Longitudes'] = df[['the_geom']]
df['Longitudes'].head()

0    POINT (-73.99061840882582 40.75985115447559)
1     POINT (-73.9881059525377 40.76047123447081)
2    POINT (-73.97038450260143 40.76339942774153)
3     POINT (-73.99332384622063 40.7585366821068)
4     POINT (-73.9892143340222 40.75926091219353)
Name: Longitudes, dtype: object

In [6]:
# split geometry into strings
df['Longitudes'] = df.Longitudes.str.split(' ')
df['Longitudes'].head()

0    [POINT, (-73.99061840882582, 40.75985115447559)]
1     [POINT, (-73.9881059525377, 40.76047123447081)]
2    [POINT, (-73.97038450260143, 40.76339942774153)]
3     [POINT, (-73.99332384622063, 40.7585366821068)]
4     [POINT, (-73.9892143340222, 40.75926091219353)]
Name: Longitudes, dtype: object

In [7]:
# Split further into desirable strings and map the desired Longitude values into a dataframe column
for i in range(0, len(df['Longitudes'])):
    df['Longitudes'][i][1] = df['Longitudes'][i][1].split('(')# Split further into desirable strings
    df['Longitudes'][i] = df['Longitudes'][i][1][1]# map the desired Longitude values into a dataframe column
df['Longitudes'].head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


0    -73.99061840882582
1     -73.9881059525377
2    -73.97038450260143
3    -73.99332384622063
4     -73.9892143340222
Name: Longitudes, dtype: object

In [8]:
# Extract the column containing Latitudes and Longitudes
df['Latitudes'] = df[['the_geom']]
df['Latitudes'].head()

0    POINT (-73.99061840882582 40.75985115447559)
1     POINT (-73.9881059525377 40.76047123447081)
2    POINT (-73.97038450260143 40.76339942774153)
3     POINT (-73.99332384622063 40.7585366821068)
4     POINT (-73.9892143340222 40.75926091219353)
Name: Latitudes, dtype: object

In [9]:
# split geometry into strings
df['Latitudes'] = df.Latitudes.str.split(' ')
df['Latitudes'].head()

0    [POINT, (-73.99061840882582, 40.75985115447559)]
1     [POINT, (-73.9881059525377, 40.76047123447081)]
2    [POINT, (-73.97038450260143, 40.76339942774153)]
3     [POINT, (-73.99332384622063, 40.7585366821068)]
4     [POINT, (-73.9892143340222, 40.75926091219353)]
Name: Latitudes, dtype: object

In [10]:
# Split further into desirable strings and map the desired Longitude values into a dataframe column
for i in range(0, len(df['Latitudes'])):
    df['Latitudes'][i][2] = df['Latitudes'][i][2].split(')') # Split further into desirable strings
    df['Latitudes'][i] = df['Latitudes'][i][2][0] #map the desired Latitude values into a dataframe column
df['Latitudes'].head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


0    40.75985115447559
1    40.76047123447081
2    40.76339942774153
3     40.7585366821068
4    40.75926091219353
Name: Latitudes, dtype: object

In [11]:
# Convert Latitudes and Longitudes columns from python objects to float
df[['Longitudes']] = df[['Longitudes']].astype(float, copy=False)
df[['Latitudes']] = df[['Latitudes']].astype(float, copy=False)

In [12]:
#Drop unwanted columns
df = df.drop(['the_geom', 'ADDRES2', 'TEL', 'URL', 'ZIP'], axis=1)

In [13]:
# Rename columns to understandable names
df.rename(columns={'ADDRESS1':'Address', 'CITY':'City', 'NAME':'Name'}, inplace = True)

In [14]:
df.head() # display processed dataframe

Unnamed: 0,Name,Address,City,Longitudes,Latitudes
0,45th Street Theater,354 West 45th Street,New York,-73.990618,40.759851
1,47th Street Theater,304 West 47th Street,New York,-73.988106,40.760471
2,59E59,59 East 59th Street,New York,-73.970385,40.763399
3,Acorn Theater,410 West 42nd Street,New York,-73.993324,40.758537
4,Al Hirschfeld Theater,302 W 45th Street,New York,-73.989214,40.759261


In [15]:

address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [16]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, address, name in zip(df['Latitudes'], df['Longitudes'], df['Address'], df['Name']):
    label = '{}, {}'.format(name, address)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  

map_newyork

#### Segmenting and Clustering Theaters in New york

#### Define Foursquare Credentials and Version

In [17]:
CLIENT_ID = 'YTWBBZM5CULLSWJRSPQWD4E2IFSERT1RS2HZQXWAAY5OBOJZ' # your Foursquare ID
CLIENT_SECRET = 'QYFF1XIY1RRP0QFW40QC0PNCYOGRTTINLRJMA0Y5WVZHZ0W0' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YTWBBZM5CULLSWJRSPQWD4E2IFSERT1RS2HZQXWAAY5OBOJZ
CLIENT_SECRET:QYFF1XIY1RRP0QFW40QC0PNCYOGRTTINLRJMA0Y5WVZHZ0W0


## 2. Exploring Theaters

#### Let's explore the first theater in our dataframe

Get the theater's name

In [18]:
df.loc[0, 'Name']

'45th Street Theater'

Get the theater's latitude and longitude values.

In [19]:
theater_latitude = df.loc[0, 'Latitudes'] # theater latitude value
theater_longitude = df.loc[0, 'Longitudes'] # theater longitude value

theater_name = df.loc[0, 'Name'] # theater name

print('Latitude and longitude values of {} are {}, {}.'.format(theater_name, 
                                                               theater_latitude, 
                                                               theater_longitude))

Latitude and longitude values of 45th Street Theater are 40.75985115447559, -73.99061840882582.


#### Now, let's get the top 20 venues that are around 45th Street Theater within a radius of 100 meters.

First, create the GET request URL. Name your URL **url**.

In [20]:
# type your answer here
LIMIT = 20
radius = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    theater_latitude, 
    theater_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=YTWBBZM5CULLSWJRSPQWD4E2IFSERT1RS2HZQXWAAY5OBOJZ&client_secret=QYFF1XIY1RRP0QFW40QC0PNCYOGRTTINLRJMA0Y5WVZHZ0W0&v=20180604&ll=40.75985115447559,-73.99061840882582&radius=100&limit=20'

Send the GET request and examine the resutls

In [21]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ce7182b0719020025805b45'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-46a9bed8f964a52054491fe3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/performingarts_dancestudio_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d134941735',
         'name': 'Dance Studio',
         'pluralName': 'Dance Studios',
         'primary': True,
         'shortName': 'Dance Studio'}],
       'id': '46a9bed8f964a52054491fe3',
       'location': {'address': '322 W 45th St',
        'cc': 'US',
        'city': 'New York',
        'country': 'United States',
        'crossStreet': 'btwn 8th & 9th Ave',
        'distance': 97,
        'formattedAddress': ['322 W 45th St (btwn 8th & 9th Ave)',
         'New York, NY 100

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [23]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Broadway Dance Center,Dance Studio,40.75949,-73.989562
1,Schmackary's,Bakery,40.76007,-73.990815
2,City Sandwich,Sandwich Place,40.760606,-73.99123
3,Hold Fast Kitchen and Spirits,American Restaurant,40.760678,-73.990252
4,Bareburger,Burger Joint,40.760612,-73.990326


In [24]:
# Number of venues returned bby foursquare
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

20 venues were returned by Foursquare.


#### A function to repeat the same process to all the Theaters in New York

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Theater', 
                  'Theater Latitude', 
                  'Theater Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### A code to run the above function on each theater and create a new dataframe called *new_york_venues*.

In [26]:
new_york_venues = getNearbyVenues(names=df['Name'],
                                   latitudes=df['Latitudes'],
                                   longitudes=df['Longitudes']
                                  )

45th Street Theater
47th Street Theater
59E59
Acorn Theater
Al Hirschfeld Theater
Ambassador Theatre
American Airlines Theatre
Apollo Theater
Arclight Theatre
Astor Place Theatre
Atlantic Theatre
August Wilson Theatre
Barrow Street Theatre
Beacon Theatre
Belasco Theatre
Bernard B. Jacobs Theatre
Bleecker Street Theater
Booth Theatre
Broadhurst Theatre
Broadway Theatre
Brooks Atkinson Theatre
Castillo Theater
Century Center For the Performing Arts
Cherry Lane Theatre
Circle In the Square Theatre
Connelly Theater
Cort Theatre
Daryl Roth Theatre
Duke Theatre
Ed Sullivan Theater
Ethel Barrymore Theatre
Eugene O'Neill Theater
Gene Frankel Theatre Workshop
Gerald Schoenfeld Theatre
Gershwin Theatre
Gramercy Arts Theatre
Greenwich Street Theater
Harold Clurman Theater
Helen Hayes Theater
Henry Miller Theatre
Here Theater
Hilton Theatre
Imperial Theatre
Jane Street Theatre
John Golden Theatre
Julia Miles Theater
Kraine Theater
La Mama Experimental Theatre
Lamb's Theatre
Laura Pels Theater
Lion

#### Size of the resulting dataframe

In [27]:
print(new_york_venues.shape)
new_york_venues.head()

(2334, 7)


Unnamed: 0,Theater,Theater Latitude,Theater Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,45th Street Theater,40.759851,-73.990618,Kinky Boots at the Al Hirschfeld Theatre,40.759384,-73.989173,Theater
1,45th Street Theater,40.759851,-73.990618,Broadway Dance Center,40.75949,-73.989562,Dance Studio
2,45th Street Theater,40.759851,-73.990618,Birdland,40.758947,-73.989677,Jazz Club
3,45th Street Theater,40.759851,-73.990618,Amy's Bread,40.761323,-73.990414,Bakery
4,45th Street Theater,40.759851,-73.990618,Gyu-Kaku Japanese BBQ,40.759042,-73.990043,Japanese Restaurant


Number of venues returned for each theater

In [28]:
new_york_venues.groupby('Theater').count()

Unnamed: 0_level_0,Theater Latitude,Theater Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Theater,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
45th Street Theater,20,20,20,20,20,20
47th Street Theater,20,20,20,20,20,20
59E59,20,20,20,20,20,20
Acorn Theater,20,20,20,20,20,20
Al Hirschfeld Theater,20,20,20,20,20,20
Ambassador Theatre,20,20,20,20,20,20
American Airlines Theatre,20,20,20,20,20,20
Apollo Theater,20,20,20,20,20,20
Arclight Theatre,20,20,20,20,20,20
Astor Place Theatre,20,20,20,20,20,20


#### Number of unique categories that can be curated from all the returned venues

In [29]:
print('There are {} uniques categories.'.format(len(new_york_venues['Venue Category'].unique())))

There are 207 uniques categories.


## 3. Analyze Each Theater

In [30]:
# one hot encoding
new_york_onehot = pd.get_dummies(new_york_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
new_york_onehot['Theater'] = new_york_venues['Theater'] 

# move neighborhood column to the first column
fixed_columns = [new_york_onehot.columns[-1]] + list(new_york_onehot.columns[:-1])
new_york_onehot = new_york_onehot[fixed_columns]

new_york_onehot.head()

Unnamed: 0,Zoo,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beer Bar,Beer Store,Bistro,Board Shop,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridge,Bubble Tea Shop,Building,Burger Joint,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Theater,Comedy Club,Comic Shop,Concert Hall,Cosmetics Shop,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Dive Bar,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Furniture / Home Store,Garden,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Library,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Opera House,Optical Shop,Organic Grocery,Other Event,Other Great Outdoors,Outdoor Sculpture,Outdoor Supply Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Pub,Ramen Restaurant,Resort,Restaurant,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Skate Park,Soba Restaurant,Soup Place,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Synagogue,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Track,Trail,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,45th Street Theater,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,45th Street Theater,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,45th Street Theater,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,45th Street Theater,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,45th Street Theater,0,0,0,0,0,0,0,0,0,0,0,0,0


Examine size of new dataframe

In [31]:
new_york_onehot.shape

(2334, 207)

#### Group rows by theater and by taking the mean of the frequency of occurrence of each category

In [32]:
new_york_grouped = new_york_onehot.groupby('Theater').mean().reset_index()
new_york_grouped

Unnamed: 0,Theater,Zoo,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Stadium,Beer Bar,Beer Store,Bistro,Board Shop,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridge,Bubble Tea Shop,Building,Burger Joint,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Theater,Comedy Club,Comic Shop,Concert Hall,Cosmetics Shop,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Dive Bar,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Furniture / Home Store,Garden,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Library,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Opera House,Optical Shop,Organic Grocery,Other Event,Other Great Outdoors,Outdoor Sculpture,Outdoor Supply Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Pub,Ramen Restaurant,Resort,Restaurant,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Skate Park,Soba Restaurant,Soup Place,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Synagogue,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theme Restaurant,Toy / Game Store,Track,Trail,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,45th Street Theater,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,47th Street Theater,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,59E59,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Acorn Theater,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0
4,Al Hirschfeld Theater,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Ambassador Theatre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,American Airlines Theatre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Apollo Theater,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05
8,Arclight Theatre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.05
9,Astor Place Theatre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0


In [33]:
# Confirm the new size
new_york_grouped.shape

(117, 207)

#### Print each theater along with the top 5 most common venues

In [34]:
num_top_venues = 5

for eachtheater in new_york_grouped['Theater']:
    print("----"+eachtheater+"----")
    temp = new_york_grouped[new_york_grouped['Theater'] == eachtheater].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----45th Street Theater----
                venue  freq
0        Burger Joint  0.10
1              Bakery  0.10
2              Resort  0.05
3       Indie Theater  0.05
4  Chinese Restaurant  0.05


----47th Street Theater----
                   venue  freq
0              Jazz Club  0.05
1                 Resort  0.05
2            Coffee Shop  0.05
3  Performing Arts Venue  0.05
4       Sushi Restaurant  0.05


----59E59----
               venue  freq
0              Hotel  0.10
1                Spa  0.10
2      Indie Theater  0.05
3  French Restaurant  0.05
4         Food Truck  0.05


----Acorn Theater----
                     venue  freq
0     Gym / Fitness Center  0.15
1              Pizza Place  0.05
2      Peruvian Restaurant  0.05
3  New American Restaurant  0.05
4               Steakhouse  0.05


----Al Hirschfeld Theater----
                   venue  freq
0           Burger Joint  0.05
1          Indie Theater  0.05
2  Performing Arts Venue  0.05
3                 Bakery  0.05
4

Function to sort the venues in descending order.

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a new dataframe and display the top 10 venues for each neighborhood.

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Theater']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
theater_venues_sorted = pd.DataFrame(columns=columns)
theater_venues_sorted['Theater'] = new_york_grouped['Theater']

for ind in np.arange(new_york_grouped.shape[0]):
    theater_venues_sorted.iloc[ind, 1:] = return_most_common_venues(new_york_grouped.iloc[ind, :], num_top_venues)

theater_venues_sorted.head()

Unnamed: 0,Theater,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,45th Street Theater,Burger Joint,Bakery,Cosmetics Shop,Sushi Restaurant,Ice Cream Shop,Hotel,Japanese Restaurant,Jazz Club,Juice Bar,Dance Studio
1,47th Street Theater,Italian Restaurant,Jazz Club,Sushi Restaurant,Resort,Coffee Shop,Vegetarian / Vegan Restaurant,Performing Arts Venue,Dance Studio,American Restaurant,Ice Cream Shop
2,59E59,Spa,Hotel,French Restaurant,Shoe Store,Food Truck,Mediterranean Restaurant,Steakhouse,Salon / Barbershop,Boutique,Liquor Store
3,Acorn Theater,Gym / Fitness Center,Pie Shop,Sandwich Place,Dive Bar,Gift Shop,French Restaurant,Steakhouse,New American Restaurant,Chinese Restaurant,Peruvian Restaurant
4,Al Hirschfeld Theater,Dance Studio,Jazz Club,Bakery,Resort,Performing Arts Venue,Indie Theater,Burger Joint,Japanese Restaurant,Ice Cream Shop,American Restaurant


## 4. Cluster Theaters

Run *k*-means to cluster the theaters into 4 clusters.

In [39]:
# set number of clusters
kclusters = 4

new_york_grouped_clustering = new_york_grouped.drop('Theater', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(new_york_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 3, 1, 3, 1, 3, 3, 3], dtype=int32)

In [40]:
df.rename(columns={'Name':'Theater'}, inplace = True)

Create a new dataframe that includes the cluster as well as the top 10 venues for each theater.

In [43]:
# add clustering labels
theater_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

new_york_merged = df

# merge newy york grouped with new york data to add latitude/longitude for each theater
new_york_merged = new_york_merged.join(theater_venues_sorted.set_index('Theater'), on='Theater')

new_york_merged.head() # check the last columns!

Unnamed: 0,Theater,Address,City,Longitudes,Latitudes,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,45th Street Theater,354 West 45th Street,New York,-73.990618,40.759851,1,Burger Joint,Bakery,Cosmetics Shop,Sushi Restaurant,Ice Cream Shop,Hotel,Japanese Restaurant,Jazz Club,Juice Bar,Dance Studio
1,47th Street Theater,304 West 47th Street,New York,-73.988106,40.760471,1,Italian Restaurant,Jazz Club,Sushi Restaurant,Resort,Coffee Shop,Vegetarian / Vegan Restaurant,Performing Arts Venue,Dance Studio,American Restaurant,Ice Cream Shop
2,59E59,59 East 59th Street,New York,-73.970385,40.763399,1,Spa,Hotel,French Restaurant,Shoe Store,Food Truck,Mediterranean Restaurant,Steakhouse,Salon / Barbershop,Boutique,Liquor Store
3,Acorn Theater,410 West 42nd Street,New York,-73.993324,40.758537,3,Gym / Fitness Center,Pie Shop,Sandwich Place,Dive Bar,Gift Shop,French Restaurant,Steakhouse,New American Restaurant,Chinese Restaurant,Peruvian Restaurant
4,Al Hirschfeld Theater,302 W 45th Street,New York,-73.989214,40.759261,1,Dance Studio,Jazz Club,Bakery,Resort,Performing Arts Venue,Indie Theater,Burger Joint,Japanese Restaurant,Ice Cream Shop,American Restaurant


Let's visualize the resulting clusters

In [45]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(new_york_merged['Latitudes'], new_york_merged['Longitudes'], new_york_merged['Theater'], new_york_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

#### Cluster 1

In [46]:
cluster1 = new_york_merged.loc[new_york_merged['Cluster Labels'] == 0, new_york_merged.columns[[0] + list(range(5, new_york_merged.shape[1]))]]
cluster1

Unnamed: 0,Theater,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Barrow Street Theatre,0,Italian Restaurant,Cosmetics Shop,Pizza Place,Cheese Shop,New American Restaurant,Candy Store,Seafood Restaurant,French Restaurant,Sushi Restaurant,Sandwich Place
23,Cherry Lane Theatre,0,Pizza Place,American Restaurant,Cheese Shop,Japanese Restaurant,Italian Restaurant,Beer Store,New American Restaurant,Sandwich Place,Seafood Restaurant,Food & Drink Shop
25,Connelly Theater,0,Italian Restaurant,Moroccan Restaurant,Gift Shop,Bookstore,Garden,Furniture / Home Store,Tea Room,Breakfast Spot,Thai Restaurant,Southern / Soul Food Restaurant
40,Here Theater,0,Italian Restaurant,Gym,Sushi Restaurant,Bakery,American Restaurant,Coffee Shop,Seafood Restaurant,Optical Shop,Grocery Store,Wine Bar
43,Jane Street Theatre,0,Italian Restaurant,Bistro,Hotel,New American Restaurant,Optical Shop,Park,Playground,Clothing Store,Roof Deck,Shoe Store
46,Kraine Theater,0,Italian Restaurant,Hotel,Japanese Restaurant,BBQ Joint,Gift Shop,Market,Flower Shop,Pizza Place,Ice Cream Shop,Coffee Shop
47,La Mama Experimental Theatre,0,Italian Restaurant,Hotel,Japanese Restaurant,Thai Restaurant,Gym,Gift Shop,Market,Flower Shop,New American Restaurant,Coffee Shop
52,Lucille Lortel Theatre,0,Italian Restaurant,American Restaurant,Chinese Restaurant,Cosmetics Shop,French Restaurant,Gourmet Shop,Candy Store,Garden,Speakeasy,Gastropub
61,Metropolitan Playhouse,0,Italian Restaurant,Southern / Soul Food Restaurant,Gift Shop,Bookstore,Garden,Furniture / Home Store,Tea Room,Juice Bar,Thai Restaurant,Breakfast Spot
88,SoHo Playhouse,0,Italian Restaurant,Sushi Restaurant,Beer Bar,Hotel,Gym,Grocery Store,French Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Optical Shop


#### Cluster 2

In [47]:
cluster2 = new_york_merged.loc[new_york_merged['Cluster Labels'] == 1, new_york_merged.columns[[0] + list(range(5, new_york_merged.shape[1]))]]
cluster2

Unnamed: 0,Theater,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,45th Street Theater,1,Burger Joint,Bakery,Cosmetics Shop,Sushi Restaurant,Ice Cream Shop,Hotel,Japanese Restaurant,Jazz Club,Juice Bar,Dance Studio
1,47th Street Theater,1,Italian Restaurant,Jazz Club,Sushi Restaurant,Resort,Coffee Shop,Vegetarian / Vegan Restaurant,Performing Arts Venue,Dance Studio,American Restaurant,Ice Cream Shop
2,59E59,1,Spa,Hotel,French Restaurant,Shoe Store,Food Truck,Mediterranean Restaurant,Steakhouse,Salon / Barbershop,Boutique,Liquor Store
4,Al Hirschfeld Theater,1,Dance Studio,Jazz Club,Bakery,Resort,Performing Arts Venue,Indie Theater,Burger Joint,Japanese Restaurant,Ice Cream Shop,American Restaurant
6,American Airlines Theatre,1,Italian Restaurant,Taco Place,Hotel,Music Store,Exhibit,Comic Shop,Cajun / Creole Restaurant,Burger Joint,Indie Theater,Asian Restaurant
10,Atlantic Theatre,1,Ice Cream Shop,American Restaurant,Cupcake Shop,Italian Restaurant,Poke Place,Burger Joint,Shoe Repair,Speakeasy,Beer Bar,Bar
14,Belasco Theatre,1,Hotel,Coffee Shop,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Grocery Store,Fast Food Restaurant,Deli / Bodega,Cuban Restaurant,Plaza
15,Bernard B. Jacobs Theatre,1,Indie Theater,Jazz Club,Exhibit,Resort,Performing Arts Venue,Taco Place,Japanese Restaurant,Dance Studio,Ice Cream Shop,Farmers Market
17,Booth Theatre,1,Indie Theater,Ice Cream Shop,Concert Hall,Exhibit,Resort,Performing Arts Venue,Taco Place,Yoga Studio,Dumpling Restaurant,Farmers Market
18,Broadhurst Theatre,1,Indie Theater,Ice Cream Shop,Jazz Club,Concert Hall,Dance Studio,Performing Arts Venue,Exhibit,Taco Place,Electronics Store,Fast Food Restaurant


#### Cluster 3

In [48]:
cluster3 = new_york_merged.loc[new_york_merged['Cluster Labels'] == 2, new_york_merged.columns[[0] + list(range(5, new_york_merged.shape[1]))]]
cluster3

Unnamed: 0,Theater,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
59,Mazer Theatre,2,Chinese Restaurant,Coffee Shop,Cocktail Bar,Organic Grocery,Mexican Restaurant,Gift Shop,Malay Restaurant,Grocery Store,French Restaurant,Gym / Fitness Center
71,Next Stage Theater,2,Italian Restaurant,Cocktail Bar,Bakery,Gastropub,Chinese Restaurant,Park,Ethiopian Restaurant,Gourmet Shop,Boutique,Mediterranean Restaurant
76,P.S.122 Performance Space,2,Korean Restaurant,Coffee Shop,Chinese Restaurant,Japanese Restaurant,Ice Cream Shop,Organic Grocery,Seafood Restaurant,Moroccan Restaurant,Food & Drink Shop,Gourmet Shop
78,Pearl Theatre,2,Ice Cream Shop,Japanese Restaurant,Chinese Restaurant,Seafood Restaurant,Beer Store,Jewelry Store,Korean Restaurant,Mexican Restaurant,Moroccan Restaurant,Organic Grocery
102,Theater For The New City,2,Chinese Restaurant,Korean Restaurant,Organic Grocery,Jewelry Store,Gourmet Shop,Beer Store,Coffee Shop,Moroccan Restaurant,Bakery,Mexican Restaurant
104,Theatre 80 St Marks,2,Ice Cream Shop,Japanese Restaurant,Chinese Restaurant,Seafood Restaurant,Beer Store,Jewelry Store,Korean Restaurant,Mexican Restaurant,Moroccan Restaurant,Organic Grocery
106,Under St. Marks,2,Chinese Restaurant,Seafood Restaurant,Japanese Restaurant,Art Gallery,Korean Restaurant,Caribbean Restaurant,Cocktail Bar,Coffee Shop,Organic Grocery,Beer Store
108,Village Theater,2,Italian Restaurant,Café,Gourmet Shop,Sushi Restaurant,American Restaurant,Beer Bar,Caribbean Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop


#### Cluster 4

In [49]:
cluster4 = new_york_merged.loc[new_york_merged['Cluster Labels'] == 3, new_york_merged.columns[[0] + list(range(5, new_york_merged.shape[1]))]]
cluster4

Unnamed: 0,Theater,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Acorn Theater,3,Gym / Fitness Center,Pie Shop,Sandwich Place,Dive Bar,Gift Shop,French Restaurant,Steakhouse,New American Restaurant,Chinese Restaurant,Peruvian Restaurant
5,Ambassador Theatre,3,Italian Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Restaurant,Mexican Restaurant,Bar,Gym,Food Truck,Pizza Place,Hotel
7,Apollo Theater,3,Arts & Crafts Store,Yoga Studio,Gym / Fitness Center,Pizza Place,Indian Restaurant,Caribbean Restaurant,Shoe Store,Spanish Restaurant,French Restaurant,Sporting Goods Shop
8,Arclight Theatre,3,Juice Bar,Italian Restaurant,Wine Bar,Coffee Shop,Gym / Fitness Center,Spa,Bakery,Bookstore,Church,French Restaurant
9,Astor Place Theatre,3,Cycle Studio,Gym,Bagel Shop,Cosmetics Shop,Coffee Shop,Ramen Restaurant,Salad Place,Music Venue,Soba Restaurant,Miscellaneous Shop
11,August Wilson Theatre,3,Bar,Mexican Restaurant,Restaurant,Steakhouse,Grocery Store,Gym,Ramen Restaurant,Sandwich Place,Karaoke Bar,Food Truck
13,Beacon Theatre,3,Bakery,Italian Restaurant,Salad Place,Chinese Restaurant,Cocktail Bar,Coffee Shop,Concert Hall,Cultural Center,Dance Studio,Dessert Shop
16,Bleecker Street Theater,3,Italian Restaurant,Spa,Gym,American Restaurant,Bookstore,Boxing Gym,Cocktail Bar,Coffee Shop,Hotel,Ice Cream Shop
19,Broadway Theatre,3,Restaurant,Karaoke Bar,Sandwich Place,Bar,Steakhouse,Grocery Store,Gym,Cocktail Bar,Food Truck,Performing Arts Venue
21,Castillo Theater,3,Gym / Fitness Center,Wine Shop,Gym,Italian Restaurant,Peruvian Restaurant,Comedy Club,Café,Movie Theater,Building,Sporting Goods Shop


## 6. Discussion and Conclusion

On this notebook, Analysis of best venue recommendations based on theater locations has been presented. Recommendations based on other user searches like available restaurants and recreation areas are also available. As New York is a metropolitan City with a whole host of interesting venues scattered around the city, the information extracted in this notebook will be a good supplement to web based recommendations for visitors to find out nearby venues of interest and be a useful aid in deciding a place to stay or where to go during their visits.

Using Foursquare API, we have collected a good amount of venue recommnedations in New York City. Sourcing from the venue recommendations from FourSquare has its limitation, The list of venues is not exhaustive list of all the available venues is the area. Furthermore, not all the venues found in the the area has a stored ratings. For this reason, the number of analyzed venues are only about 60% of all the available venues initially collected. The results therefore may significantly change, when more information are collected on those with missing data.

The generated clusters from our results shows that restaurants are the most common venues around theaters, italian restaurants. This kind of results may be very interesting for travelers who are looking for a specific type of restaurant to visit.

This information may also be used by investors who are looking to put up a new theater. Knowing the most common venues around highly rated theaters might help position a theater around readily available customers.

Thank you.

harrison J. Angonga
email: jumaharrison1@gmail.com
linkedin: https://www.linkedin.com/in/harrisonangonga/